Publishing date: 2023-05-12 10:28 +0200
I'm kind of obsessed with historic cryptography and puzzles.
A week ago or so I had to find anagrams for a given word and
although you could use your favorite search engine to look
up an existing list for a given language - or even fancier,
using ChatGPT - I decided to cook it up my own.
First, an anagram isn't just a simple random permutation, it
must also be a proper word existing in the language. While
one could simply do something like
```
In [21]: iword = list("hello")
In [22]: shuffle(iword)
In [23]: ''.join(iword)
Out[23]: 'ollhe'
```
this isn't exactly helpful.
So what I'm doing instead is reading in a list of words into
a list of strings, then sort the word I want to find
anagrams for by the ASCII-value of each individual
characters and then look for words in the list matching the
same pattern. Example:
```
In [25]: [ord(c) for c in 'hello']
Out[25]: [104, 101, 108, 108, 111]
In [29]: o = [ord(c) for c in 'hello']
In [30]: o.sort(); o
Out[30]: [101, 104, 108, 108, 111]
```
IF an anagram exists, then there should be at least two
words in the list, which follow the same pattern. To
accommodate for upper-/lowercase characters, I make all
characters lowercase first.
So, first, read in a list of words - with one word per
line - and put it into a list:
```
en = []
with open("/home/alex/share/wordlists/english.txt") as f:
while True:
line=f.readline()
if not line:
break
else:
en += [ line.strip('\n') ]
en[0:5]
['W', 'w', 'WW', 'WWW', 'WY']
```
Alrighty... Now the fun:
```
def findAnagram(word, wl):
"""Find an anagram for word in wordlist wl.
wl must be python list of words (as strings).
A wordlist can be generated by reading a flat
text file containing words,
e.g. by using the helper function
gen_wordlist_list_from_file().
"""
# The idea is to grab all words of the same
# length, then sort the characters and get an
# ascii representation; then find all
# which have the same representation.
word = word.lower()
tmp_wl = [i for i in wl if len(i) == len(word)]
enc_word = [ord(i) for i in word]
enc_word.sort()
out = []
for i in tmp_wl:
i = i.lower()
t = [ord(x) for x in i]
t.sort()
if enc_word == t:
out += [ i ]
return out
```
Let's try this!
```
[findAnagram(word, en) for word in "How does this \
even work".split(" ")]
[['how', 'who', 'who'],
['odes', 'does', 'dose'],
['this', 'hist', 'hits', 'shit'],
['even'],
['work']]
```
Fun!
___________________________________________________________________
Gophered by Gophernicus/3.1.1 on Raspbian/12 armv7l