If the letter is in the word but not in the correct spot, show a π¨
If the letter is not in the word, show a β¬οΈ
For example, WEARY means that W is in the word and in the correct spot. PILLS means that I is in the word but in the wrong spot. VAGUE means that U is not in the word.
Here's a class to hold the result of each letter in the guess.
Here's a representation of the game. Each game has an answer and provides a way to guess that answer.
We can also create a driver program to try out the game:
It won't cover cases like guesses that aren't 5 letters or aren't in the dictionary, but for our purposes, this will get the job done.
Before we start implementing a program to solve this, let's define an interface that each type of solver will implement.
Brute Force Approach
A naive approach is to check every single 5-letter word in the dictionary until the game is solved. In order to implement this, I've pulled in a dictionary from dwyl/english-words.
Let's test it out.
Guessing about took 60 guesses since it was at the start of the dictionary, but guessing zorro required 15,907 guesses because this program has to guess almost every word in the dictionary before finding the solution.
Iterative Approach
To improve, we can leverage the result from each guess in order to inform our future guesses.
Thinking back to the rules of the game, we can improve our solution by doing the following:
Filter out any words that don't have a g (π¨). This eliminates words like built and bulbs but keeps words like egret and ought
Filter out any words that contain letters that aren't in the solution (β¬οΈ). In this example, eliminate any words that contain a r, o, or u
Guess the first word in the set of remaining words
Continue using the result of each guess to filter out future guesses
Let's put this into code.
I use a LinkedHashSet to store future guesses because it provides predictable iteration order; that is, elements are iterated over in the same order in which they were inserted. LinkedHashSet also provides constant-time performance for add(element), contains(element), and remove(element) methods, meaning that those methods will take the same amount of time whether the collection has 10 elements or 10 million elements.
This approach is an improvement from the brute force approach, but there's still an issueβ it will still guess the words in dictionary order If the solution is tight, this will guess the following words in this order after determining the answer ends in ight:
bight (a bend in the coast or a loop in a rope)
dight (archaic, past tense of dress?)
eight
fight
hight (archaic, named)
light
might
night
pight (archaic, past tense of pitch)
right
sight
tight
Here is a test case to demonstrate that worst-case scenario.
I've added some code to print the guess and result to emphasize the problem with this approach.
Some of those are reasonable guesses (light was the answer in Wordle #226), but some words in there are archaic words that are unlikely to be a Wordle answer.
Iterative Approach Using Word Frequency
We can improve on the above solution by guessing more common words first. Peter Norvig, the Director of Research at Google, has compiled a list of the most common ~300,000 words in English. We can use this list in order to guess more common words prior to guessing obscure, archaic words.
Here is what the beginning of the list looks like:
We don't care how often a word occurs, but do care about the order in which each word occurs.
Let's modify our iterative approach to try more common words first.
Let's test it out.
Not badβ taking word frequency into account, we were able to improve a worst-case scenario by almost 50%. It's not enough to win the game in 6 turns, but it's enough to efficiently beat the game for most words.
Elimination Approach
There's another approach that I haven't coded but have thought about. Some people use their second guess to guess a word that has none of the same characters as their first guess in order to get a better sense of what letters are in the answer. If the answer is about and their first guess is tough (β¬οΈπ¨π¨β¬οΈβ¬οΈ), they would ignore the fact that there's an ou in the answer and instead guess something like races in order to see if there are common letters like r or s in the answer.
I'm curious how this approach stacks up against an iterative approach. Reach out to me if you or someone else ends up writing a program that uses this approach.