Solving the Daily Jumble

Jul 09, 2020

Recently I decided to subscribe to my local newspaper (the Hartford Courant, pronounced current) again. That’s been valuable by itself, and brought some benefits I forgot about (like a comics page), but one of the unexpected side effects has been the Daily Jumble.

Daily Jumble, from the *Hartford Courant*, July 8, 2020

The idea is to unscramble the clues to form words. All of the clues are either five or six characters long.

Most of the time I can figure out the five-letter clues quickly. The six letter words are harder (there really appears to be something to that notion that the mind can only handle five things at a time), and sometimes I can’t get the solutions at all. I don’t like having to wait for the next day to get the answers, though.

It occurred to me that it wouldn’t be all that hard to write a Jumble Solver. My first thought was to use Groovy, because the Groovy standard library adds a permutations method to the Iterable interface.

The permutations method, which returns a Set.

So if I could convert a clue (of type String) into a List (which implements Iterable), then I could invoke permutations, join the results back into a string, and check each value against a dictionary file.

I run on a Mac, and the operating system for Macs is ultimately based on BSD Unix. That means that if you look in the /usr/share/dict/ directory, you’ll find a file called web2 (and a link called words that points to it). That file contains a list of about 235,000 words, one per line, from Webster’s 2nd International Dictionary, whose 1934 copyright expired long ago. That makes a perfect word list for me to search for each permutation.

I therefore wrote the following Groovy class, called Jumble1, to do the job:

import groovy.transform.CompileStatic

@CompileStatic
class Jumble1 {
    private List<String> wordList =
            new File('/usr/share/dict/words').readLines()
                    .findAll { it.size() == 5 || it.size() == 6 }

    String solve(String clue) {
        List<String> letters = clue.split('').toList()
        letters.permutations()
                .collect { it.join('') }
                .find { wordList.contains(it) }
    }
}

The idea is to:

Read all the lines from the dictionary, filtering them by only allowing five and six letter words, into a list of words.
In the solve method, split a clue into individual letters.
Find all the permutations of those letters, and for each permutation,
Join the letters back into a word, and
Search the file for the first match.

For those who are not familiar with Groovy, the collect method is what other languages call map. It creates a new list by applying the provided closure to each element of the original list. Also, Groovy methods return their last evaluated expression automatically, so I didn’t need the return keyword.

The code works (I’ll show a test case later), but it’s not terribly fast. I also wanted to implement the same thing in Java or Kotlin for comparison, but neither of those languages have a permutations method in their standard libraries. I debated writing one, or downloading an open source library with one, but it turned out there was another option.

My friend and awesome developer Tim Yates pointed out that instead of reading the dictionary words into a List, I could create a Map instead. Each key in the map would be composed from a word after sorting its letters alphabetically, and the values in the map would be lists of words that matched those letters.

For example, the string PUREP sorts to EPPRU alphabetically. That means EPPRU is the key in the map, and the dictionary word UPPER sorts the same way, so it would be in the corresponding list of values for that key. If you process the dictionary word by word, you can find which words sort to the same key and add them to each list. This is a standard “group by” operation.

It’s the same process as grouping employees by department. You get a department for each employee and that becomes a key in the map, and the corresponding values are the lists of employees assigned to that department. In this case, the keys are the sorted words, and the values are the dictionary words that sort the same way. Most of the time there’s only one dictionary word for each key, but sometimes there are two or more. I worried about that briefly (what if I picked the wrong word?), but then realized that’s not my problem as much as it’s the Jumble creator’s problem. The person who wrote the puzzle wants the solution to be unique, if possible, so I just returned the first word in the list.

Groovy makes the whole process simple. Here’s the revised code:

@CompileStatic
class Jumble2 {
    private Map wordMap =
            new File('/usr/share/dict/words').readLines()
                .findAll { it.size() == 5 || it.size() == 6 }
                .groupBy { it.toList().sort().join('') }

    String solve(String clue) {
        wordMap[clue.toList().sort().join('')].head()
    }
}

The groupBy operation does all the work. It takes each word in the dictionary (after the filter), converts it to a list, sorts it alphabetically (or, more to be more accurate, lexicographically), joins the letters back into a word again, and adds all the dictionary words that map the same way to the corresponding lists. The result is a Map of sorted strings to the lists of words that also sort the same way.

The following Spock test shows that the both solutions work for at least the three words shown:

import spock.lang.Specification
import spock.lang.Unroll

class JumbleSpec extends Specification {
    @Unroll
    void "unscramble #scrambled to get #word"() {
        given:
        Jumble1 jumble1 = new Jumble1()
        Jumble2 jumble2 = new Jumble2()

        expect:
        jumble1.solve(scrambled) == word
        jumble2.solve(scrambled) == word

        where:
        scrambled || word
        'cautla'  || 'actual'
        'agileo'  || 'goalie'
        'mmlueb'  || 'mumble'
    }
}

All the tests pass. The nice part is that since I no longer needed a permutations method, I could now port this algorithm to Java and to Kotlin.

Java has a group by operation, but it is associated with streams and collectors. Therefore in my JumbleJava class, I read in the file and produce the map this way:

public class JumbleJava {
    private final Map<String, List<String>> wordMap;

    public JumbleJava() {
        try {
            wordMap = Files.lines(
                Paths.get("/usr/share/dict/words"))
                    .filter(word -> 
            word.length() == 5 || word.length() == 6)
                    .collect(Collectors.groupingBy(this::word2key));
        } catch (IOException e) {
            throw new UncheckedIOException(e);
        }
    }

    private String word2key(String word) {
        return Arrays.stream(word.split(""))
                .sorted()
                .collect(Collectors.joining());
    }

// ... more to come ...

It’s really annoying that reading the file in Java throws a checked exception, but so be it. I just caught it in the constructor and re-threw it as unchecked, which may or may not be a good idea. The other part that’s different is that converting the clue into a string, sorting it, and converting it back is now done in the private method called word2key.

With that code in place, the solve method can now be done this way:

    public String solve(String clue) {
        return wordMap.getOrDefault(word2key(clue),
                Collections.singletonList("")).get(0);
    }

The solution is to search the map for the processed clue, and since Java maps return null when you look for key it doesn’t have, in that case return a list containing only an empty string. Either way, return the first element of the resulting list.

Just for fun, I can solve for multiple words using a stream, or even a parallel stream:

    public List<String> parallelSolve(String... clues) {
        return Arrays.stream(clues)
                .parallel()  // totally not necessary
                .map(this::solve)
                .collect(Collectors.toList());
    }

A parallel stream is rather ridiculous in this case, since the overhead needed to fork the solver into individual searches and join the results back together no doubt exceeded the time necessary for all four trivial lookups from the word map. Still, the process works.

Now for Kotlin. I wanted to use Kotlin partly just to see the results, and partly because I can then use the GraalVM compiler and the native image tool to generate a really fast solver on my laptop.

(Yes, I could use GraalVM’s native image compiler on the Java solution, and even — with some coaxing — on the Groovy one, but I’m on a roll here so please just go with it.)

Here is the Kotlin implementation, which takes some explaining:

import java.io.File

class JumbleKotlin {
    private val wordMap =
            File("/usr/share/dict/words").useLines { lineSeq ->
                lineSeq.filter { it.length == 5 || it.length == 6 }
                .groupBy(this::word2key)
    }

    private fun word2key(word: String) =
            word.toList().sorted().joinToString("")

    fun solve(clue: String): String =
            wordMap[word2key(clue)]?.get(0) ?: ""

    fun solveAll(vararg clues: String) =
            clues.map(this::solve)
}

You don’t need the word new to instantiate a Kotlin class, so the File constructor is invoked just by the name. The extension function useLines reads the lines of the file into a sequence and automatically closes the file when it’s done. The sequence is filtered and grouped as before.

This time the word2key function can be implemented as a single statement, which is a nice idiom in Kotlin. The solve function deals with the possibility that accessing a key in a map might return null by using the safe-call operator, ?., which only proceeds if the left-side is not null, and returning the first value if you get back a list. An Elvis operator, ?:, (clearly borrowed from Groovy) is used to return an empty string if the clue does not appear in the map.

Also in this case, the map function is already part of the Array class, which is produced by the vararg arguments. Therefore solving a series of clues is also a one-liner.

For completeness, here’s a JUnit 5 test (in Kotlin) to show that the code is working properly:

import org.hamcrest.MatcherAssert.assertThat
import org.hamcrest.Matchers.`is`
import org.junit.jupiter.api.Assertions.assertEquals
import org.junit.jupiter.api.Test
import org.junit.jupiter.api.assertAll
import org.junit.jupiter.params.ParameterizedTest
import org.junit.jupiter.params.provider.CsvSource

class JumbleKotlinTest {
    private val jumbleKotlin: JumbleKotlin = JumbleKotlin()

    @Test
    fun `check solver`() {
        assertAll(
                { assertEquals("actual", 
                        jumbleKotlin.solve("cautla")) },
                { assertEquals("goalie", 
                        jumbleKotlin.solve("agileo")) },
                { assertEquals("mumble", 
                        jumbleKotlin.solve("mmlueb")) }
        )
    }

    @ParameterizedTest(name = "{0} unscrambles to {1}")
    @CsvSource("cautla, actual",
        "agileo, goalie", "mmlueb, mumble")
    fun `unscramble words`(clue: String, answer: String) =
            assertThat(jumbleKotlin.solve(clue), `is`(answer))

    @Test
    fun `check solveAll`() {
        assertEquals(listOf("actual", "goalie", "mumble"),
            jumbleKotlin.solveAll("cautla", "agileo", "mmlueb"))
    }
}

Now that the code is working, it’s time to take advantage of GraalVM’s native image generator. To do that, I used SDKMAN! to install the most recent version of the GraalVM Java implementation. Then I installed the native image tool.

> sdk install java 20.1.0.r11-grl
> sdk use java 20.1.0.r11-grl
> gu install native-image

Then the game is to produce a jar file from the Kotlin code and convert it. I added a main function to my Kotlin file:

fun main(args: Array<String>) {
    println(JumbleKotlin().solveAll(*args))
}

Again, the class is instantiated without the word new, and I invoke solveAll with the spread operator on args, which allows me to supply the command line arguments and pass them to the function.

I saved the code to a file called jumble.kt and executed the following steps:

> kotlinc-jvm jumble.kt -include-runtime -d jumble.jar
> native-image -jar jumble.jar

The result was a file called jumble that I could use on the command line to unscramble words. For example, using the words in the Jumble puzzle listed at the top of this post, here are the results (spoilers if you didn’t do that puzzle):

> jumble purep gynta dizcoa sicafo
[upper, tangy, zodiac, fiasco]

Elapsed time is about 0.267 seconds, which is pretty darn fast. I expect virtually all of that is reading the dictionary file. As a next step I should probably create a smaller dictionary file made up of just the five- and six-letter words, but since the required time is already down to about a quarter of a second, that starts to seem like overkill.

Admittedly, the whole project seems like overkill, but it was fun anyway.

The code for all three implementations, Java, Groovy, and Kotlin, is in this GitHub repository, which holds many other examples as well.

I guess to really go overboard I should add this to a Docker container, or convert it to a microservice and deploy it as a cloud function, or — no, this is already more than enough. The rest is left as an exercise to the seriously bored reader.

I discuss the process of working with Kotlin and GraalVM in my book Kotlin Cookbook. I talk about using Java streams, lambdas, and method references in my book Modern Java Recipes. Finally, I discuss how to work with Groovy in my book Making Java Groovy. Have fun!

(My (free!) weekly newsletter, where I first discussed this problem, can be found here.)

Solving the Daily Jumble

Discussion about this post