The tech blog of a fish

I've been thinking about attempting one of the r/badcode coding challenges for a very long time, but I've always had the problem that I wanted to use Rust (which isn't actually a problem, but I also want to follow best practices all the time when writing Rust, so it's hard to bend my mind to the opposite direction).

I decided to try and combine both of these things; best practices and bad code. Let's see how far I can push that idea.

Requirements

Citing directly from Reddit:

Write a method that accepts an array of letters from the English language that are in alphabetical order, but with one letter missing, and determine which letter is missing.

find_missing(['a', 'b', 'd', 'e', 'f', 'g']) //c is missing

find_missing(['n', 'p']) //o is missing

The input should always be provided in order and with a single correct answer. How to handle invalid input is left undefined.

Well that certainly sounds doable!

After creating our project (since this is a really useful function, we'll do a library crate to make it reusable), we start doing some...

Test-driven development

This is something I was never told to do at my workplace, but I kinda got into it while learning Rust.

Let's write our basic structure and some tests:

/// Find a missing character in an otherwise valid sequence of the alphabet.
pub fn find_missing(input: &[char]) -> &'static str {
    ""
}

#[cfg(test)]
mod tests {
    use crate::find_missing;
    #[test]
    fn abdefg_works() {
        assert_eq!(
            find_missing(&['a', 'b', 'd', 'e', 'f', 'g']),
            "c is missing"
        );
    }
    #[test]
    fn np_works() {
        assert_eq!(find_missing(&['n', 'p']), "o is missing");
    }
}

We're closely following the specification here, and of course, the test is failing for now.

Let's make it pass:

pub fn find_missing(input: &[char]) -> &'static str {
    match input {
        ['a', 'b', 'd', 'e', 'f', 'g'] => "c is missing",
        ['n', 'p'] => "o is missing",
        _ => panic!(),
    }
}

Since how to handle invalid input is undefined, we just panic; and yay, the tests pass now!

Let's add some more simple test to check if we correctly handle invalid input:

#[test]
#[should_panic]
fn abc_panics() {
    find_missing(&['a', 'b', 'c']);
}
#[test]
#[should_panic]
fn cba_panics() {
    find_missing(&['c', 'b', 'a']);
}

And of course, those tests pass without any failures, so I can be pretty proud of myself.

Let's go on by adding a test which isn't in the requirements:

#[test]
fn ac_works() {
    assert_eq!(find_missing(&['a', 'c']), "b is missing");
}

And since this fails, let's add one more match arm to find_missing:

    ['a', 'c'] => "b is missing",

Now I'm starting to notice that this isn't getting me anywhere; I obviously can't hand-write any valid use-case.

However, Rust provides us with the necessary tools to get around that...

Meet procedural macros

Rust has a great macro system which is perfect for generating things you want to hardcode, but without writing them by hand.

The normal macro_rules! macros are pretty limited however, which is why we'll go for the second macro system we have: procedural macros (or proc-macros).

Before we get into it, I'd like to make a fundamental change to how our code works currently:

fn get_valid_inputs() -> HashMap<Vec<char>, &'static str> {
    [
        (vec!['a', 'b', 'd', 'e', 'f', 'g'], "c is missing"),
        (vec!['n', 'p'], "o is missing"),
        (vec!['a', 'c'], "b is missing"),
    ]
    .iter()
    .cloned()
    .collect()
}

pub fn find_missing(input: &[char]) -> &'static str {
    let valid_inputs = get_valid_inputs();
    valid_inputs[input]
}

All our tests still pass, this just makes it easier to grasp what we're gonna do next (and it helps us by neatly putting EVERYTHING on the heap).

After creating a new project in a subfolder and setting proc-macro = true in the [lib] section of its Cargo.toml, we can add this basic skeleton for a function-like proc macro:

extern crate proc_macro;
use proc_macro::TokenStream;

#[proc_macro]
pub fn generate_valid_inputs(_: TokenStream) -> TokenStream {
    r#""""#.parse().unwrap()
}

After adding the subcrate (I called it badcodegen) to our dependencies, we have to add this to the head of the main lib's lib.rs:

#![feature(proc_macro_hygiene)]
use badcodegen::generate_valid_inputs;

To make stuff like this work:

let valid_inputs = generate_valid_inputs!();

We depend on nightly now since we want to assign the macro's output to a variable, but whatever.

We want to replace the contents of get_valid_inputs() with just this macro:

fn get_valid_inputs() -> HashMap<Vec<char>, &'static str> {
    generate_valid_inputs!()
}

But that won't work right now because it's creating "", an &'static str, and not a hashmap.

Let's fix that by updating the proc macro's lib.rs as well (this adds a new dependency, quote = "1"):

extern crate proc_macro;
use proc_macro::TokenStream;

use quote::quote;

#[proc_macro]
pub fn generate_valid_inputs(_: TokenStream) -> TokenStream {
    TokenStream::from(quote! {
        [
            (vec!['a', 'b', 'd', 'e', 'f', 'g'], "c is missing"),
            (vec!['n', 'p'], "o is missing"),
            (vec!['a', 'c'], "b is missing"),
        ]
        .iter()
        .cloned()
        .collect()
    })
}

Now everything compiles and the tests pass as well; that's because, you know, we haven't changed any code, we only moved it so far (and made it a fair bit more complicated).

Let's get deep into the weeds

If it wasn't already obvious, I want to achieve the following: Having every possible valid input in a gigantic Hashmap, stored as the key, with the result as the value, so it's totally fast and usable.

I'll start by defining the alphabet (in badcodegen's lib.rs):

fn gimme_alphabet() -> Vec<char> {
    // Because more `as` is always better!
    ('a' as u8..='z' as u8).map(|v| v as char).collect()
}

And let's throw in a test as well, just to be sure:

#[test]
fn correct_alphabet() {
    assert_eq!(
        gimme_alphabet(),
        vec![
            'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p',
            'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z'
        ]
    )
}

Since I had to think hard here, I'll just dump the code:

#[proc_macro]
pub fn generate_valid_inputs(_: TokenStream) -> TokenStream {
    let inputs: Vec<_> = (3..=26)
        .map(|n| {
            let pi = (0..=(26 - n)).map(|i| {
                let abc = gimme_alphabet();
                let wr: Vec<_> = (&abc[i..n + i]).iter().copied().collect();
                let o: Vec<_> = (1..n - 1)
                    .map(|n| {
                        let mut v = wr.clone();
                        let f = format!("{} is missing", v.remove(n));
                        quote! {
                            (vec![#(#v),*], #f)
                        }
                    })
                    .collect();
                quote! {
                    #(#o),*
                }
            });
            quote! {
                #(#pi),*
            }
        })
        .collect();
    TokenStream::from(quote! {
        [
            #(#inputs),*
        ]
        .iter()
        .cloned()
        .collect()
    })
}

In short: Create every possible valid input and return a statement creating a Hashmap (or whatever) mapping them directly to each desired output.

Results

For comparison, I've created a version without the proc macro, which you can see at the GitHub repo; I've also added a test binary so I can see how the size is affected there.

The nice tool cargo-expand can show us the real difference between the two by running cargo expand --lib in each crate's root.
The macro-less version with the hardcoded possible inputs expands to just 23 lines, while the proc-macro powered one yields a whopping 17811 lines.

However, the actual size difference of the binaries is just about 500KB; not that much, really.

Conclusion

In the end, I've realized that this could be made much worse by doing it at runtime instead of compile-time - my abuse of heap-allocated memory is only showing at compile time (where the proc-macro aspect already takes away a lot of speed). So in the end, it's not really bad code or bad performance, it's just a totally overengineered way of hardcoding every possible value.

While probably not a valid entry for the competition, I hope I could at least amuse someone with it. Maybe you could file a PR to make the code worse (I'm sure that could be done).