HTTP From Scratch

An exploration into writing an HTTP server from scratch

Jacob avatar
  • Jacob
  • 10 min read
Image generated by DALL.E

Introduction

Code

Hi, this is my first blog post, and the first in a series of coding things from scratch. In this post, I look into the HTTP protocol and build an HTTP server from scratch. I don’t aim for this to be a full tutorial, but it should hopefully give you enough information to have a go yourself if you wish. All the code for this project is on my GitHub, linked above.

After writing many HTTP servers using libraries, I realised I wanted to understand how these HTTP libraries work under the hood. So, I decided to make a bare-bones HTTP server from the ground up.

To make things easy, I decided to only look at HTTP/1.1, as this is text-based and is built on TCP. Conveniently, this format is also described in the MDN Web Docs, which was my main resource in this project.

HTTP Messages

First, let’s have a look at the HTTP protocol. HTTP is a messaging system - a client sends a message to the server (the request), and the server responds with a message (the response). As mentioned before, HTTP/1.1 uses plain text messages in a fixed format.

Requests and responses have a very similar format, which is as follows:

Request

Response

  • A start line
  • The headers (1 per line)
  • An empty line (to separate metadata from the body)
  • An optional body

Each line is separated by a CRLF (\r\n).

The main difference between requests and responses is the start line:

  • In the request, it includes the method, request target (path) and protocol version
  • In the response, it includes the protocol version, status code and status text

Requests and responses also often use different headers, but this does not change how we parse them.

Server Setup

Now let’s get into some code. I am using Rust for this, for no particular reason other than I enjoy writing Rust. It should be a very similar process in other languages.

As HTTP is built on TCP, we need to set up a TCP listener:

let listener = TcpListener::bind("127.0.0.1:8080").unwrap();

This listens for TCP connections on the port 8080. Then, when a connection is made, we pass the TCP stream to a connection handler:

for stream in listener.incoming() {
    let stream = stream.unwrap();
    handle_connection(stream);
}

In the handle_connection function, we first read the request from the stream (details in Parsing Requests):

let req = Request::from_reader(&mut stream);

Finally, after some processing of the request, we can generate a response (details in Generating Responses) and write that to the TCP stream:

let resp = Response::new(Status::Ok);
stream.write_all(resp.to_string().as_bytes()).unwrap();

Parsing Requests

To parse the request, I have decided to read it directly from the TCP stream (as opposed from reading the whole request as a string and parsing that).

I have always found TCP streams difficult to work with, which is partly why I decided to do this project. This is because they are a constant stream of data, so at the point of reading, the data may not all be ready. Reading will result in an arbitrary amount of data being read, so we need to know when to stop reading. We will get back to how to do this later.

Each line in the metadata of an HTTP request ends in \r\n. This makes parsing the metadata simple - we just need to read until the end of each line until we get an empty line.

Request Line

First, we parse the request line - we read the line, and split it into its 3 parts (the method, path, and version):

// Set up reader and buffer
let mut buf_read = BufReader::new(stream);
let mut buf = String::new();

// Read the first line
buf_read.read_line(&mut buf).unwrap();

// Parse the first line - split on space and take the 3 individual parts
let mut parts = buf.split(" ");
let method = Method::from_str(parts.next().unwrap().trim()).unwrap();
let path = parts.next().unwrap().trim().to_string();
let version = parts.next().unwrap().trim().to_string();

I decided to only support the official status codes, which I represent with an enum, called Method. This means I need to convert from a string to a Method (which I do in Method::from_str using a simple match statement).

Here, I am using lots of unwraps to keep the code simple. In an actual HTTP server, it should handle an incorrectly formatted HTTP request without crashing the server.

I use trim to get rid of extra whitespace. This is necessary for the end of the line, as read_line reads until a \n, leaving the \r in the buffer. This may not exactly match the HTTP spec, but it works for my purposes.

Headers

Now to read the headers, we loop until reaching an empty line. Headers are in the format <name>: <value>. This is how I parse the headers:

let mut headers = Vec::new();

// Loop until empty line
while !buf.trim().is_empty() {
    let (name, value) = buf.split_once(":").unwrap();
    headers.push(Header {
        name: name.trim().to_owned(),
        value: value.trim().to_owned(),
    });

    // Read the next line
    buf.clear();
    buf_read.read_line(&mut buf).unwrap();
}

I have stored these in a Vec instead of a HashMap or BTreeMap as sometimes we may want multiple headers with the same name (e.g. Set-Cookie)

Body

For my purposes, I have assumed that all requests with a body have a Content-Length header. If the header is missing, I do not read a body. This is how we know when to stop reading from the stream.

The Content-Length tells us the number of bytes of the body. This makes it really easy to read the rest of it - we just read a fixed number of bytes:

// Read body if there is a `Content-Length` header that is more than 0
let mut body = None;
if let Some(length) = headers
    .iter()
    .find(|h| h.name.to_lowercase() == "content-length")
{
    let length: usize = length.value.parse().unwrap();

    if length > 0 {
        // Read the entire length of the body,
        // using the length from `Content-Length`
        let mut buf = vec![0; length];
        buf_read.read_exact(&mut buf).unwrap();

        // Convert to a string
        let body_str = String::from_utf8(buf).unwrap();
        body = Some(body_str.trim().to_owned());
    }
}

And just like that, we’ve parsed the entire request! The status line, headers, and body are all nicely stored in variables for later use. I store these in a Request struct to group them nicely together.

Generating Responses

Now to send a response. I have a simple struct to store the response data:

#[derive(Debug, Clone)]
pub struct Response {
    pub version: String,
    pub status_code: Status,
    pub headers: Vec<Header>,
    pub body: Option<String>,
}

I won’t show it here, to keep things short, but I also have some helper/builder functions to make instantiating this Response struct easier.

Once we have created a Response, all that needs to be done is converting this struct into a string in the format of an HTTP response:

let mut result = String::new();

// First line of response
result.push_str(&self.version);
result.push_str(" ");
result.push_str(&self.status_code.to_string());
result.push_str("\r\n");

// Headers
for header in &self.headers {
    // Content-Length is calculated and added later, skip if manually defined
    if header.name.to_lowercase() == "content-length" {
        continue;
    }

    // Add the header followed by a CRLF
    result.push_str(&header.name.to_string());
    result.push_str(": ");
    result.push_str(&header.value.to_string());
    result.push_str("\r\n");
}

if let Some(body) = &self.body {
    // If there is a body, add the Content-Length header with its length
    result.push_str("Content-Length: ");
    result.push_str(&body.len().to_string());

    // Add an empty line followed by the body
    result.push_str("\r\n\r\n");
    result.push_str(body);
} else {
    // If there isn't a body, the content length is 0
    result.push_str("Content-Length: 0");
}

And that’s all there is to it - the HTTP server now has everything we need to work!

For example, to write an echo server, once we have read the request, we just need to make a Response with the same body and write that back to the stream. To see this implemented, you can look at my project on GitHub in src/main.rs.

Nice Extras

We have everything we need for the server to run now, but there are a few extra things we can do to improve the experience of using the HTTP server.

Cookies

Cookies are bits of information stored on the client’s device. They are sent to the server in headers - they are just a header in a particular format (<cookie1_name>=<cookie1_value>; <cookie2_name>=<cookie2_value> ...). Although not necessary, I decided to make the server more useful by parsing the cookies for easy access.

Here, we read all cookie headers (in case there are multiple), and read all the cookies in each header:

let mut cookies = Vec::new();
// For each `Cookie` header
for header in &headers {
    if header.name.to_lowercase() == "cookie" {
        // Loop through all the cookies in the header
        for cookie in header.value.split(";") {
            // Parse the cookie
            let (name, value) = cookie.split_once("=").expect("Invalid cookie");
            cookies.push(Cookie {
                name: name.trim().to_string(),
                value: value.trim().to_string(),
            })
        }
    }
}

Routing

Most of the time in an HTTP server, we want to call different functions based on the HTTP method and path (an endpoint). A naive approach would be to use an if statement per endpoint, but this does not scale well. Instead, we can create a router that stores all the possible endpoints and the function they map to. Then we can loop through all the routes to check for a match, and return that route’s handler method if there is a match.

// Utility type alias
pub type Handler = fn(req: Request) -> Response;

// Utility struct to store info about a route and a function to handle the route
#[derive(Clone, Debug)]
struct Route {
    method: Method,
    path: String,
    handler: Handler,
}

// Router just stores a list of routes
#[derive(Clone, Debug)]
pub struct Router {
    routes: Vec<Route>,
}

impl Router {
    // Initialise the router with no routes
    pub fn new() -> Self {
        Self { routes: Vec::new() }
    }

    // Add a route given a method, path, and a function to handle the route
    pub fn add(&mut self, method: Method, path: &str, handler: Handler) {
        self.routes.push(Route {
            path: path.to_string(),
            method,
            handler,
        });
    }

    // Pick the correct route handler and call it
    // If no route was found, it will return None
    pub fn handle(&self, req: Request) -> Option<Response> {
        let handler = self.routes.iter().find_map(|r| {
            if r.method != req.method || r.path != req.path {
                return None;
            }

            return Some(r.handler);
        });

        handler.map(|handler| handler(req))
    }
}

To use this, we first set up the router:

let mut router = Router::new();
router.add(Method::Post, "/echo", |req| Response {
    version: "HTTP/1.1".to_string(),
    status_code: Status::Ok,
    headers: Vec::new(),
    body: req.body,
});

And then when we have an incoming connection, we can handle it like so:

let req = Request::from_reader(&mut stream);
let resp = router
    .handle(req)
    .unwrap_or(Response::new(Status::NotFound));

stream.write_all(resp.to_string().as_bytes()).unwrap();

The code in my project on GitHub is more complicated, but still follows the same principle. I have glob and parameter matching to make pulling variables out of the path easier. I also allow for a global state of type T which is passed to each handler.

Missing Features

There are still loads of extra features that could be added to this HTTP server to make it easier to use or more useful. For example, supporting different HTTP versions, multi-part requests, and JSON/XML/form parsing based on Content-Type.

The performance could also be improved - I had not focused on performance at all - hence the use of String (instead of &str) in all my structs. Also, the request handling can be done in parallel, allowing for many more connections to be handled at the same time. If you have been following along, I challenge you to try and implement this yourself!

Conclusion

Together, we have explored the HTTP protocol - a top level overview, the layout of messages, and a bare-bones working implementation. We have briefly touched on many topics including TCP sockets, parsing, and cookies. I hope you have enjoyed and learned something from this, and maybe even have a go at your own server!

Resources

Jacob

Written by : Jacob

I am a software engineer that loves exploring tech and figuring out how things work. I am passionate about learning and hope to share some of that passion with you!