Hacking Word Hunt

7 minute read

Update: The code was modified with further optimizations. In particular, instead of checking the trie per every DFS call, we update the trie pointer along the DFS call so that the trie does not have to be queried repeatedly.

Recently, I started playing Game Pidgeon games with my girlfriend. We often play Word Hunt, where the objective is to find as many words as possible in a grid of English letters within 30 seconds.

img

Being a non-native English speaker, I seldom score a win against my girlfriend; she often claims victory with significant margins. In a desparate attempt to level the playing field, and also inspired by a YouTube video on Word Hunt, I decided to resort to computers and algorithms.

Brute Force DFS

The goal of this project is to come up with as many valid word combinations as possible given a grid of letters. Since the game ascribes higher scores to longer sequences, the longer the words, the better. Most importantly, we need to find these solutions within 30 seconds.

A naïve brute-force approach would be to traverse the grid to recover all possible sequences of letters, then check if these letters are in a source-of-truth list of vocabulary. Concretely, we can use any graph traversal algorithm like DFS to explore the grid and use a Python set for all English words to achieve amortized $O(1)$ lookup. Unfortunately, after a few iterations, I realized that this brute force approach is too inefficient given the 30 second time crunch.

DFS with Pruning via Trie Lookup

One glaring inefficiency with the above approach is that we end up wastefully exploring infelicitous paths, i.e., paths which we already know will provide no solution. For instance, if we know ahead of time that there exists no word that starts with the prefix “xyz”, then there is no point in exploring “xyza” or “xyzb.” Instead, we can terminate the search and move onto paths where there is hope.

Unfortunately, the built-in Python set does not provide prefix lookup. Instead, a more suitable data structure is a trie, also known as a prefix tree. A trie not only gives us speedy lookup, but also allows us to efficiently query words that start with a given prefix. If there is no word that starts with the prefix, we exit the search sequence, which effectively amounts to DFS backtracking with pruning.

Trie

Python does not provide a built-in trie implementation. Although third-party packages exist, I decided to implement my own.

class Trie:
    def __init__(self) -> None:
        self.root = {}
        self.delimiter = "*"

    def insert(self, word: str) -> None:
        if self.contains(word):
            return
        pointer = self.root
        word += self.delimiter
        for char in word:
            if char not in pointer:
                pointer[char] = {}
            pointer = pointer[letter]

Internally, this trie implementation uses a nested dictionary to store words as a sequence of letters. We use an asterisk to mark the end of a word. For instance, adding the word “cat” to an empty trie will yield the following result:

>>> from trie import Trie
>>> t = Trie()
>>> t.insert("cat")
>>> t.trie
{'c': {'a': {'t': {'*': {}}}}}

Once we insert “car”, the “ca” prefix will be preserved, and we will see an additional “r” node.

>>> t.insert("car")
>>> t.trie
{'c': {'a': {'t': {'*': {}}, 'r': {'*': {}}}}}

Now that we have a trie, we can store the list of English words in this data structure. Quite simply, we read the text file and store its content in the trie.

def get_dictionary() -> Trie:
    dictionary = Trie()
    with open("dictionary.txt") as f:
        for word in f:
            word = word.strip()
            dictionary.insert(word)
    return dictionary

Solving Word Hunt

Now that the trie dictionary is ready, the next step is to traverse the board and retrieve all valid solutions. I took inspiration from DFS backtracking templates used to solve common problems, such as sudoku. For each cell in the game grid, we want to check for valid words that start with that cell. The solve(grid) function accepts a grid and calls the traverse(...) function to check for words starting at each index.

from typing import Dict, List, Tuple

def solve(grid: List[List[str]]) -> Dict[str, List[Tuple[int, int]]]:
    solutions = {}
    dictionary = get_dictionary()
    # BOARD_SIZE == 4
    for i in range(BOARD_SIZE):
        for j in range(BOARD_SIZE):
            if board[i][j] in dictionary.root:
                traverse(grid, i, j, "", [], solutions, dictionary.root)
    return solutions

Although the function is named solve(...), the actual heavy lifting is performed by the traverse(...) function, which recursively calls itself to perform DFS. Specifically, the traverse(...) function populates the solutions dictionary, which will contain valid words as keys and index sequences as values.

from collections.abc import Generator

def get_neighbors(i: int, j: int) -> Generator[int, int]:
    for delta_i in range(-1, 2, 1):
        for delta_j in range(-1, 2, 1):
            if delta_i == delta_j == 0:
                continue
            next_i = i + delta_i
            next_j = j + delta_j
            if 0 <= next_i < BOARD_SIZE and 0 <= next_j < BOARD_SIZE:
                yield (next_i, next_j)

def traverse(
    grid: List[List[str]],
    i: int,
    j: int,
    word: str,
    order: List[Tuple[int, int]],
    solutions: Dict[str, List[Tuple[int, int]]],
    pointer: Dict[str: Any],
) -> None:
    char = grid[i][j]
    word += char
    order.append((i, j))
    prev = pointer
    pointer = pointer[char]
    if "*" in pointer:
        solutions[word] = order
        del pointer["*"]
        if not pointer:
            del prev[char]
            return
    grid[i][j] = None
    for next_i, next_j in get_neighbors(i, j):
        if (
            grid[next_i][next_j] is not None
            and grid[next_i][next_j] in pointer
        ):
            traverse(grid, next_i, next_j, word, order.copy(), solutions, pointer)
    grid[i][j] = char

To prevent the algorithm from visiting cells it has previously visited (it’s illegal to duplicate a character by revisiting a letter we’ve already used in the current sequence), we mark the visited cell as None and recursively call traverse(...) on neighboring cells, which is obtained via get_neighbors(i, j). Once all paths have been consumed, we unmark the cell back to its original value. This marking and unmarking is at the heart of backtracking. Notice that the implicit base case for this function is if no neighbors exist.

Also worthy of note is the use of the dictionary trie. The return in the middle of the function is where pruning occurs: if there is no word that starts with word as its prefix, there is no need to further venture down this path. Moreover, if word itself is in the vocabulary, we add it to solutions. Note that it is possible that multiple paths exist for the same word, but since we don’t care which path, there is no need to record all of them.

Putting It All Together

Now that we have all the core algorithms ready, all we need is a surface-level API that will allow the user to interact with these functions. Although it would be nice to have a GUI component, for sake of simplicity I decided to make this a Python script. I also decided that the easiet way for a user to input the grid to the script is in raster scan order, which is a fancy way of saying left to right, top to bottom. Therefore, the 2D grid would be flattened to a line of 16 characters. Internally, we still want to parse the board as a grid: hence the make_grid(board) function, where board is the line of 16 characters inputted by the user.

def make_grid(board: str) -> List[List[str]]:
    grid = [[] for _ in range(BOARD_SIZE)]
    for i, char in enumerate(board):
        grid[i // BOARD_SIZE].append(char)
    return grid

Now we are truly done! All we need is to (1) create the grid, (2) call the solve(grid) function, and (3) sort answers by word length and print them in order to the user.

def main(board: str) -> None:
    grid = make_grid(board)
    solutions = solve(grid)
    for i, (word, order) in enumerate(
        sorted(solutions.items(), key=lambda x: len(x[0]), reverse=True)
    ):
        if i == SHOW_TOP_K:
            break
        print(word, order)

if __name__ == "__main__":
    board = input()
    assert len(board) == 16
    main(board)

Here is a sample top-10 result with the example board shown at the very beginning of this blog post.

jaketae:wordhunt $ python main.py
oatrihpshtnrenei
haptene [(1, 1), (0, 1), (1, 2), (2, 1), (3, 2), (3, 1), (3, 0)]
haptens [(1, 1), (0, 1), (1, 2), (2, 1), (3, 2), (2, 2), (1, 3)]
pterins [(1, 2), (2, 1), (3, 2), (2, 3), (3, 3), (2, 2), (1, 3)]
staithe [(1, 3), (0, 2), (0, 1), (1, 0), (2, 1), (2, 0), (3, 0)]
tenners [(2, 1), (3, 0), (3, 1), (2, 2), (3, 2), (2, 3), (1, 3)]
tapnet [(0, 2), (0, 1), (1, 2), (2, 2), (3, 2), (2, 1)]
hapten [(1, 1), (0, 1), (1, 2), (2, 1), (3, 2), (3, 1)]
pterin [(1, 2), (2, 1), (3, 2), (2, 3), (3, 3), (2, 2)]
staith [(1, 3), (0, 2), (0, 1), (1, 0), (2, 1), (2, 0)]
sprent [(1, 3), (1, 2), (2, 3), (3, 2), (3, 1), (2, 1)]

There is no way I would have come up with some of these words.

Conclusion

Today, we have seen one very practical application of algorithms: beating your girlfriend in Word Hunt. While the real test is to use this script in a game against her, preliminary results appear promising.

I hope you enjoyed reading this post. See you in the next one!