Solving the New York Times Spelling Bee, Part 2: Striding blocks

Last time, I discussed the New York Times Spelling Bee, and presented three solutions for it. We had a naive solver, a trie solver, and a bitmask solver. The trie was fastest, then the bitmask, then the naive solution.

Here, I'm going to talk a bit about an improvement to the bitmask solver which puts it ahead of the trie. The core idea is to modify the bitmask solver to stride past long sequences of words if none of them could possibly match. We can do this efficiently because bitmasks can be combined easily. We have two tools:

If two words' bitmasks are ANDed together, then we get a bitmask representing letters that are in both words.
If two words' bitmasks are ORed together and then inverted, then we get a bitmask representing letters that are in neither of the words.

These are really useful because we can carry them out on more than just two words' bitmasks. We can combine any number of bitmasks in this way.

This means that we can take a large block of words, like 50 words, and carry out the above two operations. The AND-ed one tells us letters that all words in the block share. If all words share a forbidden letter - that is, a letter which isn't present in the Spelling Bee - then we can skip that entire block. Similarly, the OR-inverse tells us which letters are completely missing from the block; if none of the words have the Spelling Bee's central letter, then we can skip the entire block.

If we sort all the words alphabetically before building these blocks, we will actually get a pretty good efficiency boost from this - particularly from the forbidden letter check. For one, this will let us quickly pass by all words that start with a forbidden letter. But it will also work for chunks in the middle.

Here's an implementation in Rust:

pub struct BitmaskBlockSolver {
    blocks: Vec<BitmaskBlock>,
}

impl BitmaskBlockSolver {
    pub fn new(dictionary: Vec<String>, chunk_size: usize) -> BitmaskBlockSolver {
        let mut blocks = Vec::with_capacity(dictionary.len() / chunk_size + 1);
        let mut sorted: Vec<String> = dictionary
            .iter()
            .filter(|w| w.len() >= MIN_LENGTH)
            .cloned()
            .collect();
        sorted.sort();
        for chunk in sorted.chunks(chunk_size) {
            blocks.push(BitmaskBlock::new(chunk));
        }
        BitmaskBlockSolver { blocks: blocks }
    }
}

impl Solver for BitmaskBlockSolver {
    fn solve(&self, puzzle: &Puzzle) -> Vec<String> {
        let center_letter_mask = BitmaskSolver::bitmask_letter(&puzzle.center_letter);

        // forbidden_letter_mask has 1 for every letter which must *not* be
        // used. We compute it by ORing together all the allowed words, and then
        // inverting.
        let mut forbidden_letter_mask: u32 = center_letter_mask;
        for letter in puzzle.outer_letters.iter() {
            forbidden_letter_mask |= BitmaskSolver::bitmask_letter(letter)
        }
        forbidden_letter_mask = !forbidden_letter_mask;

        let mut result: Vec<String> = Vec::with_capacity(TYPICAL_RESULT_SIZE);

        for block in self.blocks.iter() {
            if let Some(matches) = &mut block.matches(center_letter_mask, forbidden_letter_mask) {
                result.append(matches);
            }
        }
        result
    }
}

struct BitmaskBlock {
    // Mask encoding the characters present in all words in the block.
    common_chars_mask: u32,
    // Mask encoding the characters present in no words in the block.
    missing_chars_mask: u32,
    // The words present in the block.
    words: Vec<BitmaskedWord>,
}

impl BitmaskBlock {
    fn new(words: &[String]) -> BitmaskBlock {
        let mut common_chars_mask: u32 = !0;
        let mut missing_chars_mask: u32 = 0;
        let mut masked_words = Vec::with_capacity(words.len());

        for w in words.iter() {
            let masked_word = BitmaskedWord {
                mask: BitmaskSolver::bitmask_word(&w),
                word: w.to_string(),
            };
            missing_chars_mask |= masked_word.mask;
            common_chars_mask &= masked_word.mask;
            masked_words.push(masked_word);
        }

        BitmaskBlock {
            common_chars_mask: common_chars_mask,
            missing_chars_mask: missing_chars_mask,
            words: masked_words,
        }
    }

    /// Returns the list of all words that match, if there are any matches. If
    /// there aren't any, then returns None.
    fn matches(&self, center_letter_mask: u32, forbidden_letter_mask: u32) -> Option<Vec<String>> {
        if (self.common_chars_mask & forbidden_letter_mask) != 0 {
            return None;
        }
        if (self.missing_chars_mask & center_letter_mask) == 0 {
            return None;
        }
        let mut result: Vec<String> = Vec::with_capacity(self.words.len());
        for w in self.words.iter() {
            if (w.mask & center_letter_mask != 0) && (w.mask & forbidden_letter_mask == 0) {
                result.push(w.word.to_string());
            }
        }
        if result.len() == 0 {
            return None;
        } else {
            return Some(result);
        }
    }
}

How does this thing do? It does very, very well. I get down to around 140μs per puzzle with a block size of 50 words. Compare that to ~70,000μs for the naive solver, 5,000μs for the bitmask solver, and 2,000μs for the trie. This is really good!

Why is it so good? I think it works well because it combines the benefits of the trie and the bitmask solver, making a solution that is much more than the sum of its parts. The blockwise striding of the algorithm gets it into quasi-logarithmic¹ speed in leaping through the irrelevant sections, approximating the behavior of the trie, while the bitmask operations are much faster than the recursive pointer lookups and string concatenation required of the trie solution.

Tuning the blockwise bitmask solver

An obvious question with this thing is "how big should the blocks be?"

I have only done a little probing of this. I wrote a benchmark which steps through different block sizes, from 1 up to 1,000 words. It always solves the same puzzle each time, which is a weakness; surely the performance will depend upon the puzzle. But in my case it looks like the blockwise solver does best around a stride size of 50.

The block size involves tradeoffs. Larger blocks let the solver skip more words at once - when there's an important comman character across them. But larger blocks also reduce the number of common characters in each block, which means that more blocks will need to be entered to check if there are any matches at all.

More improvement ideas

I have a few ideas for more ways to possibly improve this and get a faster solver.

Adaptive block sizing

It might be interesting to try a solver which uses adaptive block sizes: keep adding to the block as long as there are at least 2 or 3 common characters, for example. This would lead to blocks which are spaced much more widely, which could really help.

Deduplicate bitmasks

At least for individual words, there's no need to test the same bitmask twice - we'll get the same answer each time. Instead of storing a vector of pairs, we could store a vector of pairs. This would be very simple to do, and might make quite a large difference.

There might be something analogous for blocks too, here, but I haven't worked it out.

Hierarchical blocks

The block size presents a tradeoff - larger blocks skip faster, but have a worse "hit rate." We could evade this by doing a two-step blocking scheme - have large superblocks, like of 1000 words, which are split up into subblocks of 10 words. This sort of progressive hierarchy might do well.

It is tempting to take this to its logical limit. If two layers of hierarchy are good, why not three, or five, or ten? My hunch is that this would be a mistake - we'd end up doing many bitmask checks per valid word. I doubt it would outcompete the trie solver.

N.B. I made this term up; it probably already exists and has precise meaning, which I'm getting wrong. ↩