[HN Gopher] An AI system for solving crossword puzzles that outp...
___________________________________________________________________
An AI system for solving crossword puzzles that outperforms the
best humans
Author : DantesKite
Score : 44 points
Date : 2022-05-20 18:47 UTC (4 hours ago)
(HTM) web link (twitter.com)
(TXT) w3m dump (twitter.com)
| zwieback wrote:
| My dad was a big crossword puzzler. I asked him if he thought
| that if you pick one of two possible answers to the first clue
| whether it would be possible to solve the entire puzzle one way
| or another way. He sat down and created a series of puzzles with
| "themes", e.g. "north", "south" or "Schiller", "Goethe", where
| all the major words were from one or the other theme.
|
| Anyway, it would be interesting what the AI would do with this,
| would there be two hotspots in the solution space, one for each
| variant?
| mcherm wrote:
| Also famously the November 5 1996 NYT puzzle where a clue about
| the newly elected president could be solved either CLINTON or
| BOBDOLE and all the crossing words had two solutions.
|
| If they trained the AI on the NYT archive then they would have
| the results of testing it on this one.
| thom wrote:
| Note for Brits that this isn't cryptic (dare I say 'real')
| crosswords, but I assume it could be retooled for that.
| tialaramex wrote:
| American Crosswords are different in two key ways as I
| understand it:
|
| Firstly, all "serious" British crosswords are "Cryptic" ie once
| you figure out what the answer is, it's apparent why that's the
| correct clue, but figuring out the answer from the clue
| involves lateral thinking and some skills learned from years of
| staring at such clues.
|
| e.g. Private Eye's crossword 726 (back in April), clue 23 down,
|
| "He finally gets to penetrate agreeable person (relatively)
| (5)"
|
| The correct answer is "Niece". "Nice" can mean agreeable, the
| final letter of "He" is E, and so by having the letter E
| "penetrate" the word nice you produce "niece", a person who is
| a relative.
|
| [ and yes, Private Eye is a satirical magazine, the crossword
| clues are, likewise, intended to make you a little
| uncomfortable while you laugh ]
|
| Secondly, British crosswords are arranged with black "dead"
| squares between letters to produce more of a lattice, in which
| many letters only take part in one word, as a result longer
| answers are common
|
| e.g. same crossword, clue 26 across is
|
| "Figure on getting your teeth into our statistical revelations
| (6,9)"
|
| The answer was "Number Crunching".
| jen729w wrote:
| Brit here. I woke up one morning - I was 15, so this was in
| the 90s - with the word 'microdot' in my head. The first
| thought, clear as anything, as if it was painted across the
| inside of my eyes. Microdot!
|
| Puzzled, I didn't move and set about figuring out why.
| Eventually I realised that I had solved, in my sleep, a
| crossword clue that I had not even gone to bed thinking
| about. I'd read it at my grandma's house earlier the previous
| day.
|
| Tiny picture makes computer work on time (8)
|
| The brain is amazing. I'm not even any good at the cryptic
| crossword!
| dane-pgp wrote:
| I'm reminded of an article I read about an AI that competed in
| a crossword competition and one particularly difficult clue it
| faced was "Apollo 11 and 12 [180 degrees]". I don't know if it
| would be allowed as part of a cryptic crossword, but the number
| of letters in the (words of the) answer were 8, 4.
|
| The answer to that clue is included here:
|
| https://www.uh.edu/engines/epi2783.htm
| nilstycho wrote:
| That would usually be considered an invalid cryptic clue.
| jamespwilliams wrote:
| For cryptic crosswords I've found
| https://www.crosswordgenius.com/ impressive (once you get past
| the kind of clunky UI)
| mnd999 wrote:
| Anyone else getting a bit bored with all these AI does some super
| specialised task better than humans after enormous amounts of
| training. It's not very interesting anymore.
|
| Sure, it can do crosswords well but the average human that does
| crosswords well can also do a zillion other things and this type
| of AI is not getting us any closer to that.
| joshcryer wrote:
| But every specialized model like this is getting us closer to
| "doing a zillion other things." By logic it is exactly one step
| closer. The general AI agent will be composed of many such
| models.
| DantesKite wrote:
| If you skim the paper, you'll realize what's most interesting
| are the new techniques they developed to accomplish this,
| advancing the field of machine learning in the process.
| cinntaile wrote:
| Now automatically send in the answers to the various weekly
| magazine and newspaper competitions to get a passive prize
| income.
| ericwallace_ucb wrote:
| Hi, I am the first author of this paper and I am happy to answer
| any questions. You can find a link to the technical paper here
| https://arxiv.org/abs/2205.09665.
| mikeryan wrote:
| Hey this is cool, I do the NYT Crossword every day. A few
| questions.
|
| 1. You mention an 82% solve rate. The NYT puzzle gets "harder"
| each day Monday through Saturday. Do you track the days
| separately? If so I'd be curious how much of the 18% unsolved
| end up on Fridays and Saturday. (for anyone who doesn't know
| the Sunday puzzle is outside of the M-Sat range since its a
| bigger puzzle).
|
| 2. Related to the above Thursday puzzles usually have "tricks"
| (skipped letters and what not) in them or require a Rebus
| (multiple letters in one space) - do you handle these at all?
|
| 3. Is this building an ongoing model and getting better at
| solving? Or did you have to seed it with a set of solved
| puzzles and clues?
|
| Sorry didn't have time to read the whole paper.
| nickatomlin wrote:
| Hi! I'm another author on this paper. To answer your
| questions:
|
| 1. Monday puzzles are the easiest for our model, and
| Thursdays are the most difficult. You can see a graph of day-
| by-day performance here:
| https://twitter.com/albertxu__/status/1527704535912787968
|
| 2. Our current system doesn't have any handling for rebuses
| or similar tricks, although Dr. Fill does. I think this is
| part of why Thursday is the hardest day for us, even though
| Saturday is usually considered the most difficult.
|
| 3. We trained it with 6.4M clues. As new crosswords get
| published, we could theoretically retrain our model with more
| data, but we aren't currently planning to do that.
| sp332 wrote:
| I don't suppose you gave more weight to more recent
| puzzles? Is there a time period or puzzle setter that was
| harder to solve because they favored an unusual clue type?
| avrionov wrote:
| Do you think your approach can be applied to other problems?
| Imnimo wrote:
| For handling cross-reference clues, do you think it would be
| feasible in the future to feed the QA model a representation of
| the partially-filled puzzle (perhaps only in the refinement
| step - hard to do for the first step before you have any
| answers!), in order to give it a shot at answering clues that
| require looking at other answers?
|
| It feels like the challenges might be that most clues are not
| cross-referential, and even for those that are, most
| information in the puzzle is irrelevant - you only care about
| one answer among many, so it could be difficult to learn to
| find the information you need.
|
| But maybe this sort of thing would also be helpful for theme
| puzzles, where answers might be united by the theme even if
| their clues are not directly cross-referential, and could give
| enough signal to teach the model to look at the puzzle context?
| gardenfelder wrote:
| https://github.com/albertkx/berkeley-crossword-solver
___________________________________________________________________
(page generated 2022-05-20 23:00 UTC)