Using `proc-macro` in Rust

There are lot of good sources for learning about Rust's proc-macros. I can recommend two freely available ones:

The Rust Programming Language - the official Rust book which is freely available. Chapter 19 is dedicated to macros. Contains a detailed walk through on how to write a derive macro. Explains syn and quote a little.
The Rust Reference - explains the different types of macros and has got code samples for each of them. I find this very useful as general reference.

I have also found this post on LogRocket's blog very helpful.

You can read all day long about proc-macros but you won't get anywhere until you start writing code. Fork Rust Latam: procedural macros workshop by David Tolnay, read the instructions and do the exercises. You will not regret it.

What I wish I knew about `proc-macros` in advance

As I already mentioned proc-macros are different and require some mind shift. It's hard to write an exhaustive list of shifts so I will share some which helped me.

Debugging macros

Macros are expanded during compilation so most of the regular debugging techniques don't work with them. Using a debugger is definitely not an option. I followed the advice from the workshop and it was enough to figure out what is going on with my macros. It suggests two approaches - cargo expand and printing traces.

cargo-expand is another project by David Tolnay. It is a binary crate which you can install on your system. When invoked it will replace all macro invocations in a given source with actual code which the macro produces and dump it on stdout. The command supports the regular target selection used in cargo build and cargo test so you can specify single module/tests/etc.

The other approach mentioned is just to print the tokens which your macro generates on stderr. This is especially useful when you have messed something up and your macro generates invalid code. The example from the repo:

eprintln!("Tokens: {}", tokens);

Note that the code will be printed during compilation, not execution. Look for it in the output from cargo check or cargo build.

Another useful compiler option for fixing problems is -Zproc-macro-backtrace. If your macro panics during expansion you can use this option to see a backtrace which helps to figure out what is wrong. A convenient way to run it via cargo is

RUSTFLAGS="-Z proc-macro-backtrace" cargo +nightly <cargo cmd>

`proc-macro` and `proc-macro-2`

This was very confusing for me. Why two versions? Why first version is still alive if 2 is superior? There are some good answers for these questions but they are scattered around the internet. I will try to summarize them. First - why there are two versions? In a nutshell because proc-macro types can't exist outside proc macro code.

proc-macro2 is a wrapper around the procedural macro API of the compiler's proc_macro crate. This library serves three purposes

Bring proc-macro like functionality to other contexts like build.rs and main.rs Types from proc_macro are entirely specific to procedural macros and cannot exist in code outside of a procedural macro. Meanwhile proc_macro2 types can exist anywhere including non-macro code. By developing foundational libraries like syn and quote against proc_macro2 rather than proc_macro, the procedural macro ecosystem becomes easily applicable to many other use cases and we avoid reimplementing non-macro equivalents of those libraries.

Make procedural macros unit testable. As a consequence of being specific procedural macros, nothing that uses proc_macro can be executed from a unit test. In order for helper libraries or components of a macro to be testable in isolation, they must be implemented using proc_macro2 ,

Why the two coexist? The input for each proc macro is TokenStream type from proc-macro crate. You can't escape from this - it should be in your outer API. But inside your implementation you should use proc_macro2. It is more convenient and more testable.

Another thing that caused a lot of confusion in the beginning was how these two versions work together. syn and quote work with proc-macro2, while the entry point requires proc-macro. The result for me was a bunch of errors like these

expected struct `TokenStream2`, found struct `proc_macro::TokenStream`

or its reversed twin:

expected struct `proc_macro::TokenStream`, found struct `TokenStream2`

and also

the trait `ToTokens` is not implemented for `proc_macro::TokenStream`

This drove me crazy. The solution is very simple. Use proc-macro at API level and proc-macro2 everywhere else. Here is a sample to see what this means. You have got a simple rust crate with the following structure

Cargo.toml
    src
        lib.rs
        impl.rs

Cargo.toml looks like this

[package]
name = "my-proc-macro"
version = "0.1.0"
edition = "2021"

[lib]
proc-macro = true

[dependencies]
syn = {version = "1.0", features = ["full"]}
quote = "1.0"
proc-macro2 = "1.0"

This is regular Cargo.toml file. In [lib] there is proc-macro = true indicating the crate contains a proc-macro. Note the dependencies we have got proc-macro2 there. proc-macro is included by default.

Now src/lib.rs

use proc_macro::TokenStream;
use syn::parse_macro_input;

mod my_proc;

#[proc_macro]
pub fn my_proc_macro(input: TokenStream) -> TokenStream {
    let input = parse_macro_input!(input);
    my_proc::my_proc_impl(input).into()
}

This is the main entrypoint I wrote about before. Note that we use TokenStream from proc_macro because this is the API level. Also note that my_proc module is included here. The into() call on the last line does the conversion between proc-macro and proc-macro2 types. We will get to it soon.

And finally src/my_proc.rs

use proc_macro2::TokenStream;
use quote::quote;

pub fn my_proc_impl(input: TokenStream) -> TokenStream {
    quote!(println!("Answer: {}", #input))
}

This is the implementation of macro that uses proc_macro2. The my_proc_impl function returns proc_macro2::TokenStream and the into() call in the previous file converts it to proc_macro::TokenStream

Let's resume: 1. lib.rs declares the API of the macro and uses proc-macro. It calls an impl function from another module. 2. impl.rs contains the impl function and works with proc-macro2 3. into() is used to convert from proc_macro2::TokenStream to proc_macro::TokenStream

Organizing your code

When I was writing my first macro I did not structure my code very well. I used a very long single function doing a lot of work. This is a very bad idea because proc-macros are just code. As every other code it needs to be easy to read and test. A better way is to encapsulate the logic in functions which return some syn structs and at some point to stitch them together with quote!. Let's have a look at a macro which prints a message (hardcoded in our case) and then the answer from our initial sample. We split the code into two functions. The first one will print the message and second one - the result itself.

use proc_macro2::TokenStream;
use quote::quote;
use syn::{parse_quote, ExprMacro};

pub fn my_proc_impl(input: TokenStream) -> TokenStream {
    let progress = progress_message("Thinking about the answer".to_string());
    let answer = answer(input);

    quote!(
        #progress;
        #answer;
    )
}

fn progress_message(msg: String) -> ExprMacro {
    parse_quote!(println!(#msg))
}

fn answer(result: TokenStream) -> ExprMacro {
    parse_quote!(println!("Answer:{}", #result))
}

Another pattern I have seen is all the functions to return TokenStream and again combine them with quote!. The benefit is that you are not limited to a single type plus you can handle unknown number of elements. For example:

use proc_macro2::TokenStream;
use quote::quote;

pub fn my_proc_impl(input: TokenStream) -> TokenStream {
    let mut result = Vec::new();

    result.push(progress_message("Thinking about the answer".to_string()));
    result.push(answer(input));

    quote!(
        #(#result);*
    )
}

fn progress_message(msg: String) -> TokenStream {
    quote!(println!(#msg))
}

fn answer(result: TokenStream) -> TokenStream {
    quote!(println!("Answer: {}", #result))
}

#(#result);* is an interpolation feature of quote. It expands all elements from the vector and puts a ; between them.

The examples above are of course not universal but they were a good start for me. Do whatever works for you but do it in timely manner before you reach the point where you have got one big unmaintainable function.

A few words about `syn` and `quote`

These are mandatory libraries for working with proc-macros.

syn parses the input Rust code (TokenStream) to structures. With them you can generate new code, modify the existing one or remove code. Have a look at the list of structs in syn. There is one for every piece of Rust syntax you can think about. For example ExprClosure. This represents a closure expression like |a, b| a + b. All relevant parts of this expression are extracted as struct fields. For example output is its return type. Each field is another structure representing part of the syntax. You see how you start from a single struct representing something, and it has got other structs chained together to represent the whole code fragment. This is the AST (Abstract Syntax Tree) pattern mentioned in the documentation.

quote crate provides the quote! macro which gets Rust source code as input and converts it to TokenStream. It can be used as a result of macro you are writing or processed further. It also does 'quasi quoting'. This means you can use variables from the scope where quote! is executed and the macro will embed them to the resulting TokenStream. Let's have a look at a quick example:

fn generate_getter() -> TokenStream {
    const ANSWER: u32 = 42;
    quote! {
        fn get_the_answer() -> u32 {
            #ANSWER
        }
    }
}

Note the #ANSWER syntax inside the code block of quote!. It refers to const ANSWER defined in the beginning of the function. This is called variable interpolation and can be done with any type implementing ToTokens trait. There are implementations for all base types and all syn structs.

Some useful functions from `syn` and `quote`

Now let's see some functions from both crates which I believe are worth knowing. To avoid dead links all references to documentation link to specific version.

Spans and `quote_spanned`

syn uses spans to represent the location (line and column number) of the expression in the source where it was initially located. This is used mainly for error reporting. All structs (AST elements) implement Spanned. The trait contains a single function span() which returns a Span. Then you can pass the span around and attach it to errors so that the compiler renders them on the offending lines. Spans are often used with quote_spanned from quote crate. It generates a TokenStream and attaches a span to it. Let's see a small example which generates a compilation error.

use proc_macro2::TokenStream;
use quote::quote_spanned;
use syn::spanned::Spanned;

pub fn my_proc_impl(input: TokenStream) -> TokenStream {
    quote_spanned!(input.span() => compile_error!("I don't like this...");)
}

The generated compilation error looks like:

   Compiling proc-macro-post v0.1.0 (/home/ceco/projects/proc-macro-post)
error: I don't like this...
 --> src/runner.rs:3:20
  |
3 |     my_proc_macro!(42);
  |                    ^^

error: could not compile `proc-macro-post` due to previous error

Error reporting with `syn`

I barely touched error reporting in the previous section by mentioning spans and compile_error!. Let's have a look at a more complete example. syn has got type aliases for Result and Error. Together they make error handling very elegant. Error has got a method named to_compile_error() which generates a compilation error from the error object. You can generate an Error instance somewhere within your code and propagate it with ? operator up to a place where you can handle it by converting it to an actual compilation error.

Let's see an example. We want to write a proc-macro which accepts an integer and prints Answer: INTEGER. However the only accepted value will be 42. For everything else an error will be generated.

Let's modify the example we have used so far.

use proc_macro2::TokenStream;
use quote::quote;
use syn::{parse2, spanned::Spanned, Error, LitInt, Result};

pub fn my_proc_impl(input: TokenStream) -> Result<TokenStream> {
    let span = input.span();
    let ans = parse2::<LitInt>(input)?.base10_parse::<i32>()?;
    if ans != 42 {
        return Err(Error::new(span, "Answer should be 42"));
    }

    Ok(quote!(println!("Answer: {}", #ans);))
}

We import Error and Result from syn. The function my_proc_impl parses the input to LitInt and extracts its value. base10_parse generates syn::Error on failure so we can use ? to unwrap or propagate the error. Then if the input is not 42 we return another instance of syn::Error. Its constructor accepts two parameters - a span and an error message.

use proc_macro::TokenStream;
use syn::parse_macro_input;

mod my_proc;

#[proc_macro]
pub fn my_proc_macro(input: TokenStream) -> TokenStream {
    let input = parse_macro_input!(input);
    my_proc::my_proc_impl(input)
        .unwrap_or_else(|e| e.to_compile_error())
        .into()
}

Here we call my_proc_impl and convert any syn::Error to compilation error. If we pass bad value to the proc macro we have a nice compilation error:

error: expected integer literal
 --> src/runner.rs:3:20
  |
3 |     my_proc_macro!("test");
  |                    ^^^^^^

error: could not compile `proc-macro-post` due to previous error

Testing your code

If you use proc-macro2 you can write all kind of unit and integration tests for your proc macro and this should be pretty standard. There is one aspect of testing specific to macros though - the UI tests. This was confusing for me because when I hear UI I usually think about graphical user interfaces or web frontends. For proc macros this means the interface of your macro or more specifically the compilation errors it generates. In this context UI tests make sense. You create a new piece of code, which interacts with another piece of code. You have no way to generate error codes or exceptions. The compiler is the one which should generate errors for your macro.

trybuild is a crate which helps you create UI tests for macros. You write a test code which should compile or not. The library checks the desired outcome and in case of failure if the expected compilation error is generated. This might sound a bit complicated but it's actually very simple. Let's add a test to the example from the previous section.

Let's add a test on lib.rs

#[test]
fn ui() {
    let t = trybuild::TestCases::new();
    t.compile_fail("tests/ui/*.rs");
}

The code initializes trybuild and adds all .rs files in tests/ui as tests which are supposed to fail. Now let's add a new test in tests/ui/wrong_answer.rs. It will call the proc macro with 43 which should generate an error.

use proc_macro_post::my_proc_macro;

fn main() {
    my_proc_macro!(43);
}

And now we run cargo test


$ cargo test

    <skipped>

    Finished test [unoptimized + debuginfo] target(s) in 6.44s
     Running unittests src/lib.rs (target/debug/deps/proc_macro_post-faaca3c42c745804)

running 1 test

    <skipped>

    Finished dev [unoptimized + debuginfo] target(s) in 10.38s


test tests/ui/wrong_answer.rs ... wip

NOTE: writing the following output to `wip/wrong_answer.stderr`.
Move this file to `tests/ui/wrong_answer.stderr` to accept it as correct.
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈
error: Answer should be 42
 --> tests/ui/wrong_answer.rs:4:20
  |
4 |     my_proc_macro!(43);
  |                    ^^
┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈┈



test ui ... FAILED

failures:

<skipped>

What trybuild does is to compile the test and compare the compilation error with the one in wrong_answer.stderr. If they don't match the test fails. If the stderr file does not exist the output will be saved in wip/wrong_answer.stderr. You can either move the file by hand or run cargo test with TRYBUILD=overwrite environment variable set.

Conclusion

Macros in Rust do not look that scary after you spend time with them. Quite the opposite - they enable you to do pretty interesting and useful things. You have to keep things under control though.

One more project worth exploring is expander. It expands a proc-macro in a file and uses include! directive on its place (copy-pasted from the project README). Sounds like a life saver for the cases when you want to see what code your proc macro is producing. For better or worse - I didn't have to use it so far.

What I wish I knew about proc-macros in advance

Debugging macros

proc-macro and proc-macro-2

Organizing your code

A few words about syn and quote

Some useful functions from syn and quote

Spans and quote_spanned

Error reporting with syn