1. Introduction

GObject is the C-based object system for GTK+ and GNOME programs. Gnome-class is a Rust crate that lets you write GObject implementations in Rust with a convenient syntax.

Quick overview of GObject

While C does not have objects or classes by itself, GObject makes it possible to write object-oriented C programs. The GObject library defines the GObject type system, which supports features like:

  • Classes and subclasses with single inheritance.

  • A class may implement multiple interfaces.

  • Virtual methods and static methods.

  • Signals, which are events emitted by objects (nothing to do with Unix signals). For example, a Button object may emit a "clicked" signal.

  • Properties, or getters/setters for values on objects, with notification of changes.

  • Introspection — asking the type system about which classes are registered and what features they contain.

Writing GObjects in C normally requires that you write an uncomfortable amount of boilerplate code to do things like register a new class, define its methods, register object signals and properties, etc. Due to the nature of C, many operations are not type-safe and depend on correct pointer casts, or on knowing the types that you should really be passing to varargs functions, which are not checked by the compiler.

Why gnome-class?

Since GObject is a C library, it can be called from Rust through a bunch of extern "C" functions. One could write #[repr(C)] structs in Rust that match the layout that GObject functions expect: for example, those structs could have fields with function pointers to virtual method implementations.

Doing things that way is very verbose and cumbersome: it means using Rust as if it were C, and dealing with GObject's idiosyncrasies in a non-native language.

The fundamental goal of this Gnome-class crate is to let you write GObject implementations in Rust with minimal or no boilerplate, and with compile-time type safety all along. The goal is to require no unsafe code on your part.

How is gnome-class different from glib-rs?

Glib-rs is the fundamental building block in the Gtk-rs ecosystem. It provides the basic wrappers to write a Rust-friendly language binding to GObject-based libraries.

However, glib-rs is a language binding to GObject-based libraries. It lets you use GObject libraries from Rust; it does not let you implement new such libraries easily. That is the purpose of gnome-class: we generate GObject-compatible code, from Rust, and that has the same kind of Rust API as a "traditional" library would have if wrapped with glib-rs.

Goals of gnome-class

  • Let users write new GObject classes completely in Rust, with no unsafe code, and no boilerplate.

  • Generate GObject implementations that look exactly like C GObjects from the outside. The generated GObjects should be callable from C or other languages in exactly the same way as traditional GTK+/GNOME libraries.

  • Automatically emit GObject Introspection information so that the generated objects can be consumed by language bindings.

  • In the end, we aim to make it compelling for users to not write new GObject libraries in C, but rather to give them an "obvious" way to it in Rust. This should ensure higher-quality, safer code for GNOME's general-purpose libraries, while maintaining backwards compatibility with all the GObject-based infrastructure we have.

About this document

This is an overview of how gnome-class works. It is implemented as a Rust procedural macro that extends the Rust language with GObject-friendly constructs: for example, Rust does not have class or signal keywords, but gnome-class adds them to the language.

  • It will be helpful for you to know a bit of how GObject works. Read the GObject Tutorial in the GObject Reference Guide. You can also read the source code for libraries which implement GObjects, for example, GTK+.

  • Please read this overview of how GObject Introspection works. This will give you a good idea of what we want to generate at some point with gnome-class.

  • While it will be helpful to have some basic understanding of compilers (parsers, analyzers, code generators), this is not necessary. This document will explain what you need to know for gnome-class's internals.

If you find any issues with this document, like missing information, unclear explanations, or anything at all, please file an issue in the gnome-class issue tracker, or even submit a merge request with a correction!

2. Overview

Gnome-class extends the Rust language to support a very particular kind of classes with inheritance, for the GObject type system. Gnome-class is implemented as a procedural macro in Rust; this means that it runs as part of the Rust compiler itself, and parses the user's Gnome-like code and generates Rust code for it.

Stages

Gnome-class operates in various stages, similar to a compiler:

  1. Parsing into an Abstract Syntax Tree. We parse the code that the user put inside the gobject_gen! invocation using a syn-based parser. The parser generates an Abstract Syntax Tree (AST), which closely matches the structure of the user's code. At the end of this process, the code will be fully parsed into an AST (or it will have failed with a syntax error), but the AST may not be semantically valid. The AST is defined in src/ast.rs.

  2. We check the AST for semantic errors. For example, there cannot be two classes defined with the same name.

  3. We create a High-level Internal Representation (HIR) from the AST. The HIR matches GObject concepts more closely. For example, while the AST may contain separate items for class Foo and impl Foo, the HIR has a single Class struct who knows which methods are defined for it, which virtual methods have default implementations, etc. This is also where we ensure that the user's code is semantically valid. For example, we check that the same signal name is not being declared twice for a class. The HIR is defined in src/hir.

  4. Code generation. We generate code based on the HIR. For each class defined in the HIR, we emit the necessary GObject boilerplate to register that class, its methods, signals, properties, etc. We emit the actual code for methods and signal handlers, and the necessary trampolines to call Rust methods and signal handlers from C. The code generator is defined in src/gen. In there, the one-time, per-class GObject boilerplate is in src/gen/boilerplate.rs. The other files in the src/gen directory are used for things that require extra code generation like signals and traits for method trampolines.

Code structure

The entry point for gnome-class is the gobject_gen! procedural macro. It is defined in src/lib.rs.

The AST structures are defined in src/ast.rs.

The parser is in src/parser/mod.rs.

Some of the AST validation code is in src/checking.rs. Other checks happen in the HIR.

The HIR is in src/hir/mod.rs.

Finally, code generation is in src/gen/*.rs.

Parsing

Gnome-class obtains a TokenStream from the Rust compiler in the entry point for the procedural macro, and parses that stream of tokens into an Abstract Syntax Tree (AST). We use the syn crate for the parsing machinery: it is able to parse arbitrary Rust code, and allows creating new parsers for our extensions to the language.

Overview of the Abstract Syntax Tree (AST)

The AST is defined in src/ast.rs. The AST is intended to match the user's code pretty much verbatim. For example, consider a call like this:


# #![allow(unused_variables)]
#fn main() {
gobject_gen! {
    class Counter {
        f: Cell<u32>
    }

    impl Counter {
        pub fn add(&self, x: u32) -> u32 {
            self.get_priv().f.set(self.get() + x);
            self.get()
        }

        pub fn get(&self) -> u32 {
            self.get_priv().f.get()
        }
    }
}
#}

First, there is the actual invocation of the gobject_gen! macro. It has two items, a class and an impl. Even though Rust does not have a class item by itself, we use the same terminology to indicate that this is a toplevel thing in the user's code. (FIXME: replace "thing" with something more meaningful?)

The contents of the gobject_gen! invocation will be parsed into the following; see src/ast.rs for the actual definitions of these structs/enums:

Program {
    items: [
        Item::Class(
            Class {
                name: Ident("Counter"),
                extends: None,
                fields: FieldsNamed {
                  brace_token: Brace,
                  named: Punctuated { /* the "f: Cell<u32>" field goes here */ }
                }
            }
        ),

        Item::Impl(
            Impl {
                trait_: None,
                self_path: Ident("Counter"),
                items: [
                    ImplItem {
                        attrs: [empty vector],
                        node: ImplItemKind::Method(
                            ImplItemMethod {
                                public:   true,
                                virtual_: false,
                                signal:   false,
                                name:     Ident("add"),
                                inputs:   Punctuated {...},
                                output:   ReturnType, // u32
                                body:     Some(Block {...}),
                            }
                        ),

                        node: ImplItemKind::Method(
                            ImplItemMethod {
                                public:   true,
                                virtual_: false,
                                signal:   false,
                                name:     Ident("get"),
                                inputs:   Punctuated {...},
                                output:   ReturnType, // u32
                                body:     Some(Block {...}),
                            }
                        ),
                    }
                ],
            }
        ),
    ],
}

Whew! Fortunately, within the parsing functions we only need to deal with one thing at a time, and not the entire tree of code.

In summary: the macro call that looks like

gobject_gen! {
    class Counter {
        ... field definitions for the per-instance private struct ...
    }

    impl Counter {
        ... two method definitions ...
    }
}

gets parsed into

Program {
    items: [
        Item::Class(
            Class {
                name: Ident("Counter"),
                items: a syn::Punctuated that contains
                       an f member of type Cell<u32> ...
                ]
            }
        ),

        Item::Impl(
            Impl {
                self_path: Ident("Counter"),
                items: [ 
                    ... two ImplItemKind::Method ...
                ],
            }
        ),
    ],
}

That is, we parse the invocation above into. a Program with two items, an Item::Class and an Item::Impl. In turn, each of these items has a detailed description of the corresponding constructs.

The parsing process

Gnome-class uses the syn crate to parse a TokenStream into our AST structures. To define a parser for SomeStruct, one creates an impl Synom for SomeStruct. The Synom trait has a parse method; Syn provides a set of parser combinators that let one "fill out" the resulting structs by recursively parsing their fields.

Parser combinators are recursive-descent parsers that let one compose big parsers from small parsers. Syn implements parser combinators with macros similar to the nom crate. We won't go into a full description of how syn works here, and just focus on the peculiarities of gnome-class. (FIXME: link to syn/nom docs)

The parsing code — the bunch of impl Synom and parser combinators that gnome-class uses — is in parser/mod.rs.

We define parsers for the constructs in the gobject_gen! macro that are not normally part of Rust, like the class item and the signal keyword. In the deep part of these structures, we use plain Syn structs like syn::FnArg to represent function arguments, or syn::Ident for identifiers.

High-level Internal Representation

Constraining Rust features to GObject features

GObject's methods may look like normal function definitions, but they do not support all the features that full-fledged Rust functions (or trait methods) have: GObject doesn't support generics or attributes, and it supports a limited set of argument types — specifically, only types that can be represented by GObject Introspection.

So, while the AST directly uses syn::FnArg for function arguments in ast::ImplItemMethod, we "limit" their features by creating a custom hir::FnArg type that only supports the following:

// this is in hir/mod.rs
pub enum FnArg<'ast> {
    SelfRef(Token!(&), Token!(self)),
    Arg {
        mutbl: Option<Token![mut]>,
        name: Ident,
        ty: Ty<'ast>,
    }
}

pub enum Ty<'ast> {
    Unit,
    Char(Ident),
    Bool(Ident),
    Borrowed(Box<Ty<'ast>>),
    Integer(Ident),
    Owned(&'ast syn::Path),
}

That is, a function argument is either &self or a named argument of a limited set of possible types, and no attributes/generics/etc.

Similarly, hir::FnSig only supports what GObject function signatures support, and not everything that is present in a Rust syn::FnSig.

Type conversions between Rust and Glib

Conversions in methods

Consider a method like this:


# #![allow(unused_variables)]
#fn main() {
class Foo {
}

impl Foo {
    virtual pub fn my_method(&self, an_int: u32, a_string: &str) -> bool;
}
#}

If this were C code, we would be using a prototype like this:

gboolean my_method(Foo *foo, guint an_int, const char *a_string);

Here, the Rust types are not the same as the C-side Glib types:

  • bool / gboolean
  • u32 / guint
  • &str / char *

These conversions of values can be done with the ToGlib and FromGlib family of traits in glib-rs. However, we need to convert the types as well, so that we can generate trampolines.

Extern functions for methods

What does a virtual method look like in GObject? It is a function pointer inside a class structure. The method above would be something like

struct FooClass {
    GObjectClass parent_class;
    
    gboolean (* my_method)(Foo *foo, guint an_int, const char *a_string);
}

By convention, C code implements a public function that calls this virtual method by dereferencing the function pointer:

gboolean
foo_my_method (Foo *foo, guint an_int, const char *a_string) 
{
    FooClass *klass = FOO_GET_CLASS(foo);
    
    (* klass->my_method) (foo, an_int, a_string);
}

This function does the following:

  • Given a foo instance, find its class structure.

  • Dereference the klass->my_method function pointer and call into it.

Language bindings expect this public function to be present: they call into it so that the function can do its own argument checking and so on. The gnome-class code generator must generate an ABI-compatible function as a pub unsafe extern "C" fn. We do this in imp::extern_methods:

#[no_mangle]
pub unsafe extern "C" fn #ffi_name(this: *mut #InstanceNameFfi,
                                   #inputs)
    -> #output
{
    #callback_guard

    let klass = (*this).get_class();
    // We unwrap() because klass.method_name is always set to a method_trampoline
    (klass.#name.as_ref().unwrap())(this, #args)
}

Note that this function: a) takes a raw pointer to an FFI struct for the instance on which the method is being called; b) calls a function pointer inside the klass vtable, with C types for arguments. In effect, this is as if we had written a C function that just calls the function pointer inside the vtable.

Trampolines

So far, we have a Rust method function callable from C, that calls a function pointer with C types. We need to do a few things to glue this nicely to Rust code:

  • Go from the this (a raw pointer to an FFI instance structure) in the function above, to a Rust &self.

  • Convert C types from arguments into Rust types, with glib-rs.

This is what a trampoline does: it converts the arguments and obtains the &self. We generate trampolines in imp::instance_slot_trampolines:

Argument conversions

Given a function signature like

virtual pub fn my_method(&self, an_int: u32, a_string: &str) -> bool;

we need to generate a few things:

  • The return type as a Glib type, i.e. bool gets translated to gboolean.

  • Input arguments but with Glib types, i.e. an_int: u32, a_string: &str gets translated to an_int: u32, a_string: *const libc::c_char.

  • Just the list of Rust types, i.e. u32, &str for use in a Fn closure declaration: Fn(&Self, u32, &str) that doesn't have argument names.

  • Each argument value converted from a Rust type to a Glib type: <u32 as ToGlib>::to_glib(&an_int), <&str as ToGlibPtr>::to_glib_none(a_string).

  • Each argument value converted from a Glib type to a Rust type: <u32 as FromGlib<_>>::from_glib(an_int), <&str as FromGlibPtrBorrow<_>>::from_glib_borrow(a_string).

  • etc.

Our representation of a method or signal signature is hir::FnSig. It provides methods like output_glib_type() or input_args_to_glib_types() that generate the conversions above. This is done by wrapping the FnSig's fields into helper types, and then those helper types have impl ToTokens; those implementations generate the appropriate code.

Errors and Diagnostic Handling

As with any procedural macro, gnome-class goes to great lengths to enusre that errors are handled properly all throughout the macro. The errors that we're interested in fall into a few categories:

  • Parse errors. For example the tokens inside of a gobject_gen! { ... } invocation are invalid or malformed.

  • AST errors. While it's syntactically valid to define the same function twice it's semantically invalid to do so. This class of errors is any error which originates in the the procedural macro while it's generating the final set of tokens.

  • User code errors. For example if the user places let x: u32 = "foo" into a function that'll generate an error at compile time.

Each of these error cases are handled slightly differently, so let's go over them in turn.

Parse Errors

The first step of the gobject_gen! macro is to parse the input into an internal AST representation. This parsing operation is fallible, and errors can happen at any time!

All parsing happens in src/parser/* and it generates an ast::Program. The trick for handling errors here is all related to syn, the parsing library that we're using. The syn crate provides a Parse trait which is used to define custom parsers. Each custom parser can be defined in terms of other parsers as well. The implementation of Parse for ast::Program transitively uses many implementations of Parse for items already in syn.

The parsers themselves defined in this macro are each responsible for error handling. Errors can be generated when a sub-parser fails or explicitly generated via syn methods. The syn crate provides many useful opportunities to produce good error messages during parsing, for example pointing directly at an erroneous token and indicating what expected tokens were there.

The tl;dr; of handling parse errors is "we use syn and it just works".

AST Errors

Things get a little more interesting with AST errors or other semantic errors that are detected after parsing is completed. Outside of parsing we're not using syn's framework of error handling, but we still use syn::parse::Error for our fundamental error type!

The first thing you'll notice is that almost all functions in the procedural macro are fallible, returning a Result<T>. This is a typedef for Result<T, Errors> where the Errors type is defined in the src/errors.rs module. An instance of Errors represents a list of errors to present to the user. A list is used here so as many errors about the AST can be collected and presented to the user, rather than forcing them to go through errors one at a time.

The Errors type is a list of syn::parse::Error errors, and implements From from the syn error as well. Typically the Errors type is only constructed in loops where each iteration is fallible (and errors are collected across iterators). Otherwise it's vastly more common to only create one error and return it via the From impl.

The primary way to create an Error is via the bail! macro:


# #![allow(unused_variables)]
#fn main() {
bail!(some_item, "my message here");
#}

The some_item argument must implement the ToTokens trait and the error returned will point to the spans of some_item. This is a convenient and lightweight way of creating a custom error message on a specified set of spanned tokens. Internally this uses syn::parse::Error::new_spanned to create an error which actually spans the tokens represented by some_item.

With this idiom you'll find bail! used liberally throughout the library. Almost all semantic errors are created and returned through this macro (which is similar to the failure crate's own version of bail!). The first argument is typically whatever token is being examined or construct that's relevant, and is used to provide context for the error to ensure the users sees not only the error message but where in the code it's actually pointing to.

Note that there is also a format_err! macro to create an instance of syn::parse::Error if necessary.

User Errors

The final class of errors has to do with errors in user-written code, such as type errors or borrow-check errors. These errors do not come from gobject_gen! or the macro here, but rather from the compiler. If this happens, though, we want to make sure that the compiler errors are presented in a meaningful fashion.

This class of errors is largely transparently handled by simply using syn. The syn crate preserves all Span information of all tokens which means that all errors messages will be appropriately positioned by rustc. The crucial aspect of this error handling is ensuring that the Span information is not erased or forgotten from the input tokens, as the Span on each token is used to generate compiler diagnostics.

FAQ

The above describes a few high-level classes of errors and how gobject_gen! handles them, but there's also various questions about how other pieces work! Here's some common questions that may arise:

How does this all work?

All of this is fundamentally built on the concept of Span and the ability for a macro to expand to arbitrary tokens, including other macro invocations. A Span represents a pointer to a part of the code, and of the tokens in the original TokenStream are annotated with a span of where they came from. These Span objects are then used to set spans on the returned TokenStream or otherwise tokens may be preserved as-is in the output. By ensuring as many tokens as possible have correct Span information we can have the highest quality diagnostics from the compiler.

If an error actually happens then we'll bubble out an Err(error_list) all the way to the entry point of the macro. We still have to return a TokenStream though! To do this we convert the error_list to a TokenStream by iterating over each error and converting it to a TokenStream. The way syn::error::Error is converted to tokens looks like:


# #![allow(unused_variables)]
#fn main() {
compile_error!("your custom message");
#}

It generates an invocation of the compile_error! macro which is a way to produce custom error messages when compiling Rust code. This macro is defined by the Rust compiler.

By controlling the span information on each of these tokens (the compile_error identifier, the ! punctuation, and the ( ... ) group) we can control where the error message is pointed to. The implementation will adjust the span of each of these tokens generated to the tokens relevant to the error messages, causing rustc to produce a directed aerror message at the tokens we want.

Why are my errors pointing at the macro invocation?

The "default span" is created with Span::call_site() which represents the call-site of the macro, or the macro invocation. This Span is used by default for all tokens generated by quote! (liberally used to create TokenStream). If an error happens on tokens that point to Span::call_site() then the error will look like it comes from the macro invocation.

This typically happens when the macro itself generates invalid code. For example if you were to return quote! { let x: u32 = "foo"; } then that's a type error but the error message will point to the entire macro invocation (of gobject_gen!, not quote!) due to the usage of Span::call_site() on each of these tokens.

One helpful way to investigate these errors it to use the cargo expand subcommand. That subcommand will print out the output of the macro, allowing you to manually inspect the output or otherwise run it through rustc to figure out where the error is happening.

Testing strategy

Unit tests

Normally one writes Rust unit tests with functions marked with the #[test] attribute. This makes cargo test create little programs with the test functions.

However, gnome-class is not a normal library. It defines a procedural macro that runs at compilation time of the user's program. We need to create tests that are run at compilation time.

To do this, we have a second procedural macro called testme!(). It gets called from tests/lib-unit-tests.rs as a normal procedural macro. The macro implementation, in src/lib.rs, calls functions from the various modules in gnome-class. These functions work like normal unit tests.

Adding a new unit test

Create a test function in one of the files under src/, and make sure that it ultimately gets called by testme() in src/lib.rs. Your test will run as part of the tests/lib-unit-tests program. Unfortunately we cannot run individual unit tests with this scheme; all the tests must be run in a single shot.

Integration tests

Our integration tests consist of full invocations of the gobject_gen! macro; they live in the tests/ directory.

Things to test for:

  • Can the resulting GObjects be instantiated?

  • Do internal fields get dropped when an object is finalized?

  • Can one call methods? Do they come with the correct arguments and argument types?

  • Can one override virtual methods?

  • Can signals be connected and emitted? Do they have the correct arguments?

  • Can properties be read/written? Do they have the correct argument types?

  • Can we declare interfaces? Can we implement classes with those interfaces?

  • Can we inherit from an existing class? Can we implement an existing interface?

  • Do class/interface structs have the correct size?

  • Does the generated C-callable API work?

Mixed tests

Mixed tests are meant to test the integration between rust objects and c-code. It is a subset of the integration tests and consist of three part:

  • The rust code (defined in tests/testname.rs)
  • The c code (defined in mixed_tests/testname.c)
  • Glue crate (mixed_tests/Cargo.toml with crate-name gobject_gen_test)
  • generated code by gobject_gen (mixed_tests/rustgen/testname.rs).

Rust code

The rust code is the place where the unit-test begins. It contains rust-defined gobjects and ways to test it. Furthermore, it can contains call to c-code (not related to gobject). Those methods are responsible to execute the c-parts of the test.

C-code

This code is responsible for running the c-side of the test. It can check if it can create a type defined in the rust-part, it can try if it can use a type defined in rust or it can try to subclass a type implemented in rust. It can also check if the generated header-file fulfills its needs.

Glue-crate

The crate gobject-gen-test, defined in mixed_tests is used to contain the c-functions. It is build all the c-code files in build.rs. The compiled crate exposed the entry-points of the c-code. It is linked only the tests-crate and not to the main crates. That keeps the test-c-code out of the final shared object.

Generated code

To use a rust-defined gobject in c, its needs a basic definition defined in a header file. That definition will only be available after the procedural macro is run, which is too late for our case. To these files pre-exists in mixed-tests/rustgen. The rust-side of the unittest can compare the pre-existing header file with the header file it would generate, and fail the test if those files do not match.

Compatibility

The mixed tests current depends on pkg-config, so I don't think it works on Windows.

Glossary

TermMeaning
Abstract Syntax Tree (AST)Data structures which closely match the syntax of some code.
GIShort for GObject Introspection.
GlibA C library with common utilities like hash tables, linked lists, and portability aids. Also contains the GObject system.
Glib-rsThe Rust bindings for Glib and GObject. They include macros and wrappers for the GType system.
GObjectAn object system for C, used by GTK+ and GNOME programs. It adds classes to C.
GObject Introspection (GI)A system which generates machine-readable descriptions of the API in libraries which contain GObjects. These descriptions can be used to generate language bindings automatically. Overview of GObject Introspection
GTypeA dynamic type system for C, which is the foundation for GObject.
procedural macroUser-supplied code that runs in the Rust compiler; it lets one extend the language with a custom parser and code generator.