Parsing JSON: Part 2 - Using Macros
Part 2 of 2 in a series in creating a JSON parser in Rust

- Jacob
- 18 min read

Introduction
Code
Welcome back! This is the second, and final part in a series about building a JSON parser in Rust. In the first part, we built a basic JSON parser from scratch. It worked, but accessing values was clunky and error-prone.
In this part, we’ll extend our JSON parser to support parsing directly into user-defined Rust structs - automatically and type-safely - using procedural macros. Along the way, we’ll explore Rust’s powerful procedural macros and generate type-safe, boilerplate-free Parse
implementations. Finally, to show it working, I will plug it into my Hand-rolled Auth project discussed in this post. Let’s get into it!
The Problem
Let’s start by looking at the issues with the current design. While it works, it is cumbersome - here is a simple example:
fn main() {
let json = r#"{"name": "John Smith", "contact": {"phone": "01234567890"}}"#;
let result: JsonValue = Parser::parse(json).unwrap();
}
The result
here is the following object:
Object(
{
"name": String(
"John Smith",
),
"contact": Object(
{
"phone": String(
"01234567890",
),
},
),
},
)
This may not look bad, but if we wanted to retrieve the phone number, we would need all this code:
match result {
JsonValue::Object(outer_obj) => {
match &outer_obj["contact"] {
JsonValue::Object(inner_obj) => {
match &inner_obj["phone"] {
JsonValue::String(phone_number) => {
println!("{phone_number}");
}
_ => panic!("Phone number should be string, but was not"),
}
}
_ => panic!("Contact should be an object, but was not"),
}
}
_ => panic!("Expected object, but was not"),
}
This, of course, is very inconvenient. If we know the layout is always the same, it would be much nicer to parse it into a pre-defined struct such as:
struct Person {
name: String,
contact: Contact,
}
struct Contact {
phone: String,
}
Then accessing the phone number is as simple as:
let phone_number = result.contact.phone_number;
println!("{phone_number}");
We can achieve this using a custom implementation of Parse
for the Person
struct. However, we don’t want to make a user of this library have to implement the Parse
trait themselves for every type they want to parse. Luckily for us, Rust has a powerful macro system that allows us to write some code that generates the Parse
implementation automatically!
Whenever I mention the “user”, I am referring to a developer using this parser as a library
Macros
First, we should explain what a macro is - in this context, it is a bit of code executed at compile time used to modify existing code or generate new code before compilation.
Rust has two categories of macros - declarative and procedural. Procedural macros have 3 subtypes. We are going to be using derive macros (a type of procedural macro) for our purposes.
Rust Derive Macros
A derive macro is a function that takes in a struct and returns additional code to be included in the compilation. As mentioned above, this macro code is executed at compile time - while creating the executable file. This means that we can procedurally generate an implementation of Parse
when compiling, avoiding the user needing to write any parsing code.
All the user will need to do is derive the macro like so:
#[derive(JsonDeserialise)]
struct Person {
// ...
}
Here is our derive macro function signature:
#[proc_macro_derive(JsonDeserialise)]
pub fn derive_json_deserialise(input: TokenStream) -> TokenStream {
// Macro code here
}
This is just standard Rust code, which takes in a stream of tokens (the tokens for the struct), and returns a new stream of tokens (the code we wish to add - in this case, our Parse
implementation). There are a couple of useful crates (libraries) for both reading and creating token streams - syn
and quote
respectively.
Using syn
, we can parse the token stream into an Abstract Syntax Tree (AST) node with the following line of code:
let input: DeriveInput = syn::parse(input).unwrap();
Then we can check if the AST node is a struct (we will not be covering derive macros for enums or other types in this post):
match &input.data {
Data::Struct(data) => derive_json_deserialise_struct(&input.ident, data),
_ => panic!("Cannot derive JsonDeserialise on this type"),
}
Here, data
is the information specific to a struct (e.g. its properties), which we pass through to our function derive_json_deserialise_struct
, along with the identifier (name of the struct).
Throughout this post I use the terms “field” and “property” interchangably
Generating the Implementation Code Block
Now, in derive_json_deserialise_struct
, we can do the heavy-lifting. Let’s have a look at the quote
crate next to see the core idea of code generation. To generate a very basic Parse
implementation (that throws an unimplemented
error when called), we can do the following:
fn derive_json_deserialise_struct(struct_name: &Ident, data: &DataStruct) -> TokenStream {
// Generated impl block
let generated_impl = quote! {
impl Parse for #struct_name {
fn parse(parser: &mut Parser) -> Result<Self, ParserErr> {
unimplemented!();
}
}
};
generated_impl.into()
}
The code inside quote!
is treated as a template - it’s not executed during macro expansion, but instead the emitted as output Rust code to be compiled later. This is stored in a variable generated_impl
, and converted into a TokenStream
before being returned. In the quote!
code, we can write regular Rust code, as well as include variables verbatim in the code. For example, #struct_name
will be substituted for the struct_name
identifier we take into our derive function.
As structs represent objects in JSON, the code is pretty similar to the object parsing we covered in the previous part, so let’s have a refresher - the current object parsing code looks like the following:
impl<T: Parse> Parse for HashMap<String, T> {
fn parse(parser: &mut Parser) -> Result<Self, ParserErr> {
parser.consume(TokenKind::LCurlyBracket)?;
let mut props = HashMap::new();
// Loop through all properties, until reaching closing bracket
while !parser.check(TokenKind::RCurlyBracket)? {
let token = parser.advance()?;
match token.kind {
TokenKind::String(name) => {
parser.consume(TokenKind::Colon)?;
let value = T::parse(parser)?;
props.insert(name, value);
// Break out the loop if no more commas
if parser.check(TokenKind::Comma)? {
parser.advance()?;
} else {
break;
}
}
_ => return Err(parser.make_err_prev(ParserErrKind::UnexpectedToken)),
}
}
parser.consume(TokenKind::RCurlyBracket)?;
Ok(props)
}
}
I have removed the trailing comma check for readability.
For an explanation of this code, please refer back to Parsing Objects in my previous post.
The Expected Macro Output
Before writing the macro logic, it helps to understand what we want the generated code to look like. Let’s walk through a manual implementation of Parse
for a concrete struct without macros. We will generalise this later using macros.
We will be writing a Parse
implementation for the following example struct:
struct Person {
name: String,
age: u32,
}
To transform the generic object parsing code into code specific for this struct, we need to rework how we parse the properties. Here are the goals we aim to achieve by parsing into a struct:
- Easier access of data
- Type checking
- Ensuring all properties exist in the object
Easier access of data comes automatically if we return the data in a struct provided. When initialising a struct in Rust, we must define all the properties straight away. This means we can’t fill them out as we find them. We also can’t parse them one-by-one, in the order defined by the struct, as the order may be different - JSON objects do not have a specific ordering of properties. So, we could continue to use the HashMap
and just access the properties at the end, like so:
// Create `Person` struct by unwrapping the variables we have parsed
// If one is missing, report an error with `expect`
Ok(Person {
name: props.get("name").expect("Missing prop 'name'"),
age: props.get("age").expect("Missing prop 'age'"),
})
This solves the issue of ensuring that all properties exist on the struct, as if any property hasn’t been set, props.get()
will return None
, and the exepct
will panic, reporting an error message.
The problem with this is that the HashMap
can only store values of a single type, which in this case must encompass all values i.e. our JsonValue
type. We would then need to do some type checking to ensure that the JsonValue
is the type we expect. This is not ideal for two reasons:
- Complexity of implementation
- It is slow due to redundant checks
The implementation of this would be difficult, as we would need a check for every type, including user-defined structs. This would likely mean implementing a function, for each type, that both checks the type and turns it from a JsonValue
into the desired type. Speed is also an issue, as parsing JsonValue
s requires checking which type to parse first. Then we would need to check the JsonValue
type before converting it into the type we desire. This is unnecessary computation.
Luckily, there is a better option! We already have a parse
function for each type that implements Parse
. This does the type checking for us - if the type is wrong, the parsing will fail. Additionally, there is no wasted computation - no additional/redundant checks are needed after parsing. Perfect!
To implement this, we need to do two things:
- Create variables of the correct type for each property (this replaces our HashMap)
- When parsing a property, call the parse function for that property’s type, and store the result in the variables we defined at the start
As the variables will not have a value until set, we make them all type Option<T>
where T
is the type of the property. We initially set them to None
and then fill them in when parsing. In our example, the code will look something like this:
// Variables to keep track of properties we've found, initialised to `None`
let name: Option<String> = None;
let age: Option<u32> = None;
// Fill in variables when parsing
When parsing, we can check the key (property name) to know which type to parse:
// Instead of inserting into a HashMap,
// Depending on `key`, parse a different type, and store in variable
match key.as_str() {
"name" => name = Some(String::parse(parser)?),
"age" => age = Some(u32::parse(parser)?),
// Extra prop provided that isn't on the `Person` struct
_ => panic!("Unexpected prop: {key}"),
}
Then, after parsing, we can create the struct with all the properties defined at the same time:
// Create `Person` struct by unwrapping the variables we have parsed
// If one is missing, report an error with `expect`
Person {
name: name.expect("Missing prop 'name'"),
age: age.expect("Missing prop 'age'"),
}
I use
panic!
andexpect
in these examples to make the code easier to follow. In the full code, I have proper error handling usingResult
s
Great - we now check all properties exist on the object, they are type checked, and we have our struct for easy data access! Let’s look at the whole parsing code:
impl Parse for Person {
fn parse(parser: &mut Parser) -> Result<Self, ParserErr> {
parser.consume(TokenKind::LCurlyBracket)?;
// Variables to keep track of properties we've found, initialised to `None`
let name: Option<String> = None;
let age: Option<u32> = None;
// Loop through all properties, until reaching closing bracket
while !parser.check(TokenKind::RCurlyBracket)? {
let token = parser.advance()?;
match token.kind {
TokenKind::String(key) => {
parser.consume(TokenKind::Colon)?;
// Instead of inserting into a HashMap,
// Depending on `key`, parse a different type, and store in variable
match key.as_str() {
"name" => name = Some(String::parse(parser)?),
"age" => age = Some(u32::parse(parser)?),
// Extra prop provided that isn't on the `Person` struct
_ => panic!("Unexpected prop: {key}"),
}
// Break out the loop if no more commas
if parser.check(TokenKind::Comma)? {
parser.advance()?;
} else {
break;
}
}
_ => return Err(parser.make_err_prev(ParserErrKind::UnexpectedToken)),
}
}
parser.consume(TokenKind::RCurlyBracket)?;
// Create `Person` struct by unwrapping the variables we have parsed
// If one is missing, report an error with `expect`
Ok(Person {
name: name.expect("Missing prop 'name'"),
age: age.expect("Missing prop 'age'"),
})
}
}
There’s nothing new here - we’ve just replaced some of the object parsing code with the new, type safe, code that we’ve just discussed. This code will work for parsing any Person
object, ensuring name
exists and is a string, and age
exists and is an integer. Also, if any extra properties are in the object, it will provide an error message.
There’s one small problem with this though - if a prop on the struct has the same name as a variable in the code, it may break the parsing. For example, calling a prop key
will not work, as there is already a variable called key
in the code. To get around this, we can encapsulate the variables in a temporary struct:
// Create a temporary struct definition
struct ParsedFields {
name: Option<String>,
age: Option<u32>,
}
// Set up the variables, inside the struct
let mut parsed_fields = ParsedFields {
name: None,
age: None,
};
We can replace the variable initialisation with the above, then replace any use of name
or age
with parsed_fields.name
and parsed_fields.age
. This is how we want the final generated code to look:
impl Parse for Person {
fn parse(parser: &mut Parser) -> Result<Self, ParserErr> {
parser.consume(TokenKind::LCurlyBracket)?;
// Create a temporary struct definition
struct ParsedFields {
name: Option<String>,
age: Option<u32>,
}
// Set up the variables, inside the struct
let mut parsed_fields = ParsedFields {
name: None,
age: None,
};
// Loop through all properties, until reaching closing bracket
while !parser.check(TokenKind::RCurlyBracket)? {
let token = parser.advance()?;
match token.kind {
TokenKind::String(key) => {
parser.consume(TokenKind::Colon)?;
// Instead of inserting into a HashMap,
// Depending on `key`, parse a different type, and store in variable
match key.as_str() {
"name" => parsed_fields.name = Some(String::parse(parser)?),
"age" => parsed_fields.age = Some(u32::parse(parser)?),
// Extra prop provided that isn't on the `Person` struct
_ => panic!("Unexpected prop: {key}"),
}
// Break out the loop if no more commas
if parser.check(TokenKind::Comma)? {
parser.advance()?;
} else {
break;
}
}
_ => return Err(parser.make_err_prev(ParserErrKind::UnexpectedToken)),
}
}
parser.consume(TokenKind::RCurlyBracket)?;
// Create `Person` struct by unwrapping the variables we have parsed
//If one is missing, report an error with `expect`
Ok(Person {
name: parsed_fields.name.expect("Missing prop 'name'"),
age: parsed_fields.age.expect("Missing prop 'age'"),
})
}
}
Writing the Macro
Okay, we now know the goal code that we want to generate, so let’s write the macro to do this for us (for any struct we throw at it)!
Parsed Fields Struct
Starting with the ParsedFields
struct definition and initialisation to None
- we need to generate a list of lines of code for each property on the provided struct. Each line of the definition is in the format name: Option<type>
, and each line of the initialisation is in the format name: None
. We do this inside our derive macro function:
// Get the struct fields. We are only supporting named field structs (the most common type)
let fields = match &data.fields {
Fields::Named(data) => data,
_ => panic!(
"JSON deserialising can only be derived for named field structs (no tuple or unit structs)"
),
};
// List of each line in the type defintion and initialisation of the fields_struct
let mut fields_struct_types = Vec::new();
let mut fields_struct_init = Vec::new();
// Loop through each field on the struct
for field in &fields.named {
let name = field.ident.as_ref().unwrap();
let ty = &field.ty;
// Generated code
let field_type = quote! { #name: Option<#ty> }; // e.g. `age: Option<u32>`
let field_init = quote! { #name: None }; // e.g. `age: None`
// Add to vecs
fields_struct_types.push(field_type);
fields_struct_init.push(field_init);
}
Then to generate code such as:
// Create a temporary struct definition
struct ParsedFields {
name: Option<String>,
age: Option<u32>,
}
// Set up the variables, inside the struct
let mut parsed_fields = ParsedFields {
name: None,
age: None,
};
We can include the following in our quote!
block:
struct ParsedFields {
// Insert all items in the fields_struct_types vec, separated by commas
#( #fields_struct_types, )*
}
let mut parsed_fields = ParsedFields {
// Insert all items in the fields_struct_init vec, separated by commas
#( #fields_struct_init, )*
}
Match and Parse
Next, we’ll look at the match
statement that sets each field in the struct. This is what we’re trying to generate:
match key.as_str() {
"name" => name = Some(String::parse(parser)?),
"age" => age = Some(u32::parse(parser)?),
_ => panic!("Unexpected prop: {key}"),
}
We start by generating each line/branch of the match statement, and storing it in a list:
let mut field_setters = Vec::new();
// Loop through each field
for field in &fields.named {
let name = field.ident.as_ref().unwrap();
let ty = &field.ty;
// When we come across a property, set the value in the fields_struct
// e.g. `"age" => parsed_fields.age = Some(u32::parse(parser)?),`
let field_setter = quote! {
stringify!(#name) => parsed_fields.#name = Some(<#ty>::parse(parser)?),
};
field_setters.push(field_setter);
}
Then, in our quote!
block, after matching the field key and following colon:
// Assign the data to the parsed_fields struct
match key.as_str() {
// NOTE: here the comma is in each field_setter, as we need a trailing comma for the final `_` case
#(#field_setters)*
_ => return Err(parser.make_err_from_token(ParserErrKind::UnknownProperty, &token)),
};
Struct Initialisation
Finally, to put the data into the final struct, we generate each line like so:
// Initialise the user's struct with the data collected
// If there is a field missing, report an error
let mut struct_init_lines = Vec::new();
// Loop through each field
for field in &fields.named {
let name = field.ident.as_ref().unwrap();
let ty = &field.ty;
let struct_init_line = quote! {
#name: parsed_fields.#name.expect("Missing prop 'name'")
};
// Add to vecs
struct_init_lines.push(struct_init_line);
}
And use it in the quote block:
return Ok(#struct_name {
#(#struct_init_lines),*
});
Putting It All Together
That’s everything we need! As we loop through the fields the same way for each of the generated lines, we can combine all the for loops into one big for loop. Here’s the full macro:
fn derive_json_deserialise_struct(struct_name: &Ident, data: &DataStruct) -> TokenStream {
let fields = match &data.fields {
Fields::Named(data) => data,
_ => panic!(
"JSON deserialising can only be derived for named field structs (no tuple or unit structs)"
),
};
// Code generation
// fields_struct is a temporary object to store the field data when it's being parsed
// Each value is initialised to None, and set once it is found
let mut fields_struct_types = Vec::new();
let mut fields_struct_init = Vec::new();
// When we come across a property, set the value in the fields_struct
// If the value does not exist in the fields_struct, report an error
let mut field_setters = Vec::new();
// Initialise the user's struct with the data collected
// If there is a field missing, report an error
let mut struct_init_lines = Vec::new();
// Loop through each field in the struct
for field in &fields.named {
let name = field.ident.as_ref().unwrap();
let ty = &field.ty;
// Generated code
let field_type = quote! { #name: Option<#ty> };
let field_init = quote! { #name: None };
let field_setter =
quote! { stringify!(#name) => parsed_fields.#name = Some(<#ty>::parse(parser)?), };
let struct_init_line = quote! {
#name: parsed_fields.#name.ok_or(
parser.make_err_from_token(ParserErrKind::MissingProperty(stringify!(#name).to_string()), &l_curly_token)
)?
};
// Add to vecs
fields_struct_types.push(field_type);
fields_struct_init.push(field_init);
field_setters.push(field_setter);
struct_init_lines.push(struct_init_line);
}
// Generated impl block
let generated_impl = quote! {
impl Parse for #struct_name {
fn parse(parser: &mut Parser) -> Result<Self, ParserErr> {
let l_curly_token = parser.consume(TokenKind::LCurlyBracket)?;
let mut had_comma = false;
// Temporary object to store field data. Initialise all values to None
let mut parsed_fields = {
struct ParsedFields {
#( #fields_struct_types, )*
}
ParsedFields {
#( #fields_struct_init, )*
}
};
// Loop through all properties, until reaching closing bracket
while !parser.check(TokenKind::RCurlyBracket)? {
let token = parser.advance()?;
match token.kind {
TokenKind::String(ref key) => {
parser.consume(TokenKind::Colon)?;
// Assign the data to the parsed_fields struct
match key.as_str() {
#(#field_setters)*
_ => return Err(parser.make_err_from_token(ParserErrKind::UnknownProperty, &token)),
};
// Once no comma at end, we have reached end of object
had_comma = parser.check(TokenKind::Comma)?;
if had_comma {
parser.advance()?;
} else {
break;
}
}
_ => return Err(parser.make_err_prev(ParserErrKind::UnexpectedToken)),
}
}
// No trailing comma
if had_comma {
return Err(parser.make_err_prev(ParserErrKind::UnexpectedToken));
}
parser.consume(TokenKind::RCurlyBracket)?;
// Convert parsed_fields into the user's struct
// If data is missing, return an error
return Ok(#struct_name {
#(#struct_init_lines),*
});
}
}
};
generated_impl.into()
}
#[proc_macro_derive(JsonDeserialise)]
pub fn derive_json_deserialise(input: TokenStream) -> TokenStream {
let input: DeriveInput = syn::parse(input).unwrap();
match &input.data {
Data::Struct(data) => derive_json_deserialise_struct(&input.ident, data),
_ => panic!("Cannot derive JsonDeserialise on this type"),
}
}
This generates the exact Parse
implementation we defined earlier, but will work for any struct, including nested ones!
In Practice
Let’s see it in practice! For my hand-rolled authentication post, I made a working implementation of everything discussed. This included a /login
endpoint that took a JSON body in the structure:
#[derive(Debug, Clone, Deserialize)]
struct LoginRequest {
username: String,
password: String,
}
This currently uses Serde’s Deserialize
derive macro. In the endpoint handler, we use Serde’s parser to decode the body:
pub fn login(req: Request, _: &Params, db: &&dyn UserDatabase) -> Response {
let decoded: LoginRequest = serde_json::from_str(&req.body.unwrap()).unwrap();
// Login code...
}
The JSON parser we have worked on has a very similar interface. Other than changing the import statement, we only need to change two lines of code:
// Change `Deserialize` to our own `JsonDeserialise`
#[derive(Debug, Clone, JsonDeserialise)]
struct LoginRequest {
username: String,
password: String,
}
pub fn login(req: Request, _: &Params, db: &&dyn UserDatabase) -> Response {
// Change `serde_json::from_str` to `Parser::parse`
let decoded: LoginRequest = Parser::parse(&req.body.unwrap()).unwrap();
// Login code...
}
And that’s all - it works exactly the same as before, but with our very own parser.
Conclusion
In this mini-series we have created a fully functioning JSON parser, with support for parsing into user-defined structs. Along the way, we learned how to implement some critical parts of a compiler (the scanner and parser), how to represent JSON data in Rust, and how Rust’s derive macros work. If you’ve understood everything, then give yourself a huge pat on the back - you’ve learned a lot of tricky concepts.
The full working code is on my GitHub here - check it out if you’re interested. If you’d like a challenge, there are many ways you could improve or extend this implementation. For example:
- Support parsing enums (parsing one type or another). I’d recommend looking at how Serde handles this
- Support parsing into other types of structs (tuple and unit structs)
- Allow an
Option<T>
field to be missing - parse this asNone
(currently, it must be defined in the JSON, but set tonull
)
Or maybe you could try writing a similar parser for another data format such as TOML, YAML, or XML.
Whatever path you take, I hope to have demystified the concepts of scanners, parsers, and Rust’s procedural macros. I’ve found it incredibly rewarding to build this from the ground up, looking into how abstraction, performance, and developer ergonomics all intersect. I’ve learned a lot, and I hope you have too.
Thanks for reading, and happy parsing!