2010/03/23

4 Letter Word Square

Warning.  Highly geeky content follows:

A while back, my parents were over visiting.  While my mom was playing with Allison, my dad and I started playing with Allison's magnetic letters. We were taking the letters (26 of 'em, no duplicates) and trying to arrange them in a three-by-three grid such that there was a word formed across each row, as well as down each column.  After some minutes of trying, we didn't come up with any arrangements.  Close, but not quite perfect.

 

In my free time at work, I wrote a program to look for such arrangements, and found that there are many, many of them.  In the face of dozens of solutions, I felt a little sheepish.

 

For example:

ASK
PIE
TRY

 

Reading down the columns, you get APT, SIR, and KEY.  Note that no letter appears more than once.

 

When I told this to my dad, he (bless him) said "I wonder if there are any four-letter word squares?"  Sounded like a good challenge to me.

 

In my dictionary file, there are 2000+ four letter words.  I want to lay these words out horizontally into four rows, like this:

 

FOUR

WORD

LIKE

THIS

 

Once I have them in this arrangement, I want to test for letter uniqueness, as well as if the columns are also creating words.  In this case, I'd get FWLT, OOIH, URKI, RDES.  Obviously not words.  I also see that some letters are used more than once.  So this arrangement would fail both tests.

 

If there are 2000 words in my set of four letter words, that means there are 2000*2000*2000*2000 ways to arrange them.  That's 1.6x10^13.  If I wrote a program that analyzed one million arrangements per second(!!!), it would still take 185 days to run!  However, with some cleverness, we be a little smart about the arrangements.

 

One thing I immediately noted is that any word with repeating letters can be discarded from the start.  So, no more "WOOD" and "ICIY" and "BALL".  That left with me 1873 words.  That shaves some weeks off the runtime, but still, 100+ days is far too long.

 

If the top word is "ABEL", and the second word is also "ABEL", then there's no need to try all combinations of filling in the third and fourth slots with words.  I already know that no matter what is in the third and fourth row, the first and second row already break the "unique letter" test.  In fact, if the first word starts with "AB", then there's simply no need to try ANY A or B words in the second row, much less all the meaninglessness of filling in the third and fourth row.  So that gets rid of millions of iterations right there.

 

I've left this running for a few hours, and the top word has advanced to ALUM.  That puts me at about 2% complete after about 4 hours of runtime.  That's a bit more manageable, but is still going to take multiple days to run.  On my work computer, it's doing about 290K iterations per second.

 

*Update*

 

After writing all this, work pulled me away.  When I was done with that and spent some more time on the problem, I found two things.

 

First, the way that I was checking for letter uniqueness was pretty slow.  Just by making a few changes, my iterations per second went from 290K to about 650K.

 

Also, I found another little shortcut.

 

If the first word is ABLE and the second word is SPIT, then I'm eventually going to be looking for words that start with AS, BP, LI, and ET.  Do you think there are any four letter words that start with "BP"?  Me neither.  So I added some code that says this:  Any time the second word in the list is updated, check the word list to make sure that continuing down this path even makes sense.  That is, make sure that there are actually words that start with those letters.

 

This additional logic slows the iterations down a little, but pares down the number of cases that need to be checked by a quite a bit.

 

At the rate it's going now, I suspect it will be finished by the time I get into work tomorrow.  Which is really the point of this whole post.  Something that is really interesting about computer programming is taking a problem that seems overwhelming in it's scope, and thinking about it from different angles and coming up with ways to break it down into a more manageable size.

 

And even if there weren't any way to reduce the problem set down, this would be a great candidate for parallel processing.  I get a bunch of friends, and have them each run my program, but each friend only works on a distinct set of the problem (you get the set where the first letter is between A-D, you get E-M, etc).

 

I have to run.  Nerdy stuff, but fun.



1 comment:

Adman said...

When I got into work this morning, it appears that the job had completed without finding any valid word squares.

The only thing left to do now is to add some fake words to the dictionary to force a positive result and verify that my program actually does what I think it does.