[HN Gopher] Using a Markov chain to generate readable nonsense w...
___________________________________________________________________
Using a Markov chain to generate readable nonsense with 20 lines of
Python
Author : benhoyt
Score : 199 points
Date : 2024-01-01 04:40 UTC (18 hours ago)
(HTM) web link (benhoyt.com)
(TXT) w3m dump (benhoyt.com)
| westurner wrote:
| >> _Making the prefix shorter tends to produce less coherent
| prose; making it longer tends to reproduce the input text
| verbatim. For English text, using two words to select a third is
| a good compromise; it seems to recreate the flavor of the input
| while adding its own whimsical touch._
|
| Next word prediction; vector databases:
|
| Vector database: https://en.wikipedia.org/wiki/Vector_database
|
| https://news.ycombinator.com/item?id=37812219
| raister wrote:
| This was posted 50 days ago - how does it reach first page again?
| https://news.ycombinator.com/from?site=benhoyt.com
| j4yav wrote:
| By people being interested and upvoting it. Not everyone reads
| the home page all day every day.
| not2b wrote:
| This is exactly the approach I used back in the late 80s; I
| published a Markov chain text generator on comp.sources.games
| that digested Usenet posts, built tables and used the previous
| two words to generate the next word. It was mildly amusing at the
| time. Someone resurrected it and put on GitHub, which I'm fine
| with.
| 082349872349872 wrote:
| Mark V. Shaney?
| not2b wrote:
| No, it was called markov3. Mark V. Shaney was similar but
| from Bell labs. See https://github.com/cheako/markov3 (not
| coded very well but I got better).
| lowbloodsugar wrote:
| Never forget a mate implemented a markov chainer in Lambda MOO,
| and named it "Cute Ickle Baby". Left it in the Living Room,
| listening to what people said, and randomly saying things.
| Sometimes it said the funniest shit and had us all pissing
| ourselves laughing.
| bugbuddy wrote:
| This works pretty well for languages that use a space as word
| separator. Now do this for those languages that don't and have no
| clear and easy rules for word separation.
| abetusk wrote:
| As many know and point out, this idea is now very old (40+
| years?). The big problem is that it creates "word salad" [0]. In
| the generative art and procgen community, the term "10,000 bowls
| of oatmeal" problem has been used [1].
|
| Taking a step back, this is a perennial problem, even with AI old
| and new. I've heard that the early (good) chess bots, after Deep
| Blue, had a problem of only being locally context sensitive and
| being easily fooled by longer range attack planning. We even see
| it, to a certain extent, in LLMs where they forget or contradict
| themselves with what was just said. I have no doubt that LLMs
| will just keep getting better but, as a snapshot of right now,
| you can see shades of this "word salad" problem, just a few steps
| removed to become a "logic salad" problem.
|
| [0] https://en.wikipedia.org/wiki/Word_salad
|
| [1] https://galaxykate0.tumblr.com/post/139774965871/so-you-
| want...
| tgv wrote:
| Also: it works much better in English than in other languages I
| know. English grammar is simple, and many of the function words
| carry a strong prediction for the next tokens.
| echelon wrote:
| Which languages are harder for LLMs? Is there any writeup or
| analysis of this?
| tahnyall wrote:
| If I had to guess, from my time working with ASR (Advanced
| Speech Recognition) languages and its difficulty with the
| Indian (country of) accent; I'd guess Indian languages.
| justsomehnguy wrote:
| Any with inflections and particularly with grammatical
| gender.
| tgv wrote:
| Idk. First, the obvious candidates: those with small, low
| quality text corpora. Second, I expect LLMs to perform less
| on long distance dependencies. English has a pretty strict
| word order, and the constituents are close to each other.
| There's also a lot of input available, so there are more
| large gaps to learn from. But (some) Germanic languages
| have some weird rules which allows constituents to move far
| away, so it's likely LLMs will make more errors there.
| Also, free word order languages might be a problem,
| although I've always suspected that those languages aren't
| that free in reality: speakers probably have strong
| preferences.
| 3abiton wrote:
| Any suggestiob on the background of the method?
| tempodox wrote:
| Do a web search for "Markov chain".
| tempodox wrote:
| And the logic salad problem will not be solved by LLMs because
| there is nothing in their construction that makes them capable
| of understanding logic.
| brookst wrote:
| Can you elaborate? I would argue that logic has even stronger
| token prediction characteristics than language.
|
| As a total handwave, I would expect an LLM trained on formal
| logic and a huge corpus of proofs to produce pretty strong
| logic output.
| YeGoblynQueenne wrote:
| The output is "word salad" because the probabilities of the
| model are uniform, albeit implicitly- eyballing the python, it
| selects the next n-gram uniformly at random. A better n-gram
| model would calculate the probability of an n-gram following
| another n-gram according to their frequency in the training
| corpus. With a larger corpus and better training techniques
| (some smoothing for one thing) you'd get much more coherent
| output.
|
| And of course, if you trained a gigantic model on the entire
| web, then you'd get a very smooth approximation of the text in
| the corpus and very fluent, grammatical output. So basically,
| an LLM.
|
| >> As many know and point out, this idea is now very old (40+
| years?).
|
| More like 110 years. Markov Chains were proposed in 1913 (by
| Markov) and popularised by Shannon in 1948 (in "A Mathematical
| Theory of Communication", the work that introduced information
| theory):
|
| _The underlying mathematics of the n-gram was first proposed
| by Markov (1913), who used what are now called Markov chains
| (bigrams and trigrams) to predict whether an upcoming letter in
| Pushkin's Eugene Onegin would be a vowel or a con- sonant.
| Markov classified 20,000 letters as V or C and computed the
| bigram and trigram probability that a given letter would be a
| vowel given the previous one or two letters. Shannon (1948)
| applied n-grams to compute approximations to English word
| sequences. Based on Shannon's work, Markov models were commonly
| used in engineering, linguistic, and psychological work on
| modeling word sequences by the 1950s. In a series of extremely
| influential papers starting with Chomsky (1956) and including
| Chomsky (1957) and Miller and Chomsky (1963), Noam Chomsky
| argued that "finite-state Markov processes", while a possibly
| useful engineering heuristic, were incapable of being a
| complete cognitive model of human grammatical knowl- edge.
| These arguments led many linguists and computational linguists
| to ignore work in statistical modeling for decades._
|
| https://web.archive.org/web/20220522005827/https://web.stanf...
|
| (Note the usual shaking of angry fists at Chomsky. The NLP
| community continued on its path anyway, and built smooth,
| fluent generators of total bullshit and absolutely no progress
| towards a "cognitive model of human grammatical knowledge").
| shawnz wrote:
| > A better n-gram model would calculate the probability of an
| n-gram following another n-gram according to their frequency
| in the training corpus. With a larger corpus and better
| training techniques (some smoothing for one thing) you'd get
| much more coherent output.
|
| I don't understand what you mean here, isn't that exactly
| what an n-gram model is already designed to achieve where n >
| 1? Since this is a 2-gram model isn't it already doing this?
| danieldk wrote:
| _The output is "word salad" because the probabilities of the
| model are uniform, albeit implicitly- eyballing the python,
| it selects the next n-gram uniformly at random._
|
| From a quick look, it doesn't seem to sample uniformly. w3 is
| added to a list for the context w1, w2, not a set. So, say
| word A occurs twice as often in a particular context as B, it
| will be in the list twice as often. So, even though a uniform
| choice function is used, the probability of A getting samples
| is twice as high.
|
| You get a word salad because a trigram model has to little
| context to do anything else. This is a well-known issue with
| Markov and hidden Markov models.
|
| (Fun fact: some hidden Markov model taggers switch to a
| different trigram distribution for V2 languages after seeing
| the finite verb, otherwise they often fail catastrophically
| in the verb cluster due to the limited context.)
| jameshart wrote:
| > absolutely no progress towards a "cognitive model of human
| grammatical knowledge"
|
| The thing is, it's only if you buy fully into Chomskyan
| thinking that you think a "cognitive model of human
| grammatical knowledge" might even be useful. Or that it's a
| particularly special thing compared to any other cognitive
| model of human knowledge.
| doubloon wrote:
| i put Alexis De Tocqueville through this and the results were
| legible but hard to read.
|
| Then I put a demagogue politician speech through this and the
| results were almost like a speech that politician would have
| given. It was hard to tell the difference between the original
| and the generated.
|
| in other words, if the public admires word salad, and someone
| speaks in word salad, then word salad output would not be
| considered as problematic by a large number of people.
| chuckadams wrote:
| I recall one site way back that used a Markov chain generator
| to mash up Karl Marx and Ayn Rand. Was fairly plausible
| reading, actually.
| nurettin wrote:
| I imagine that's like mixing different kinds of salad.
| hermitcrab wrote:
| I did a markov reworking of some of Trump's speeches and got:
|
| "Hillary brought death and disaster to Iraq, Syria and Libya,
| she empowered Iran, and she unleashed ISIS. Now she wants to
| raise your taxes very substantially. Highest taxed nation in
| the world is a tenant of mine in Manhattan, so many great
| people. These are people that have been stolen, stolen by
| either very stupid politicians ask me the question, how are
| you going to get rid of all the emails?" "Yes, ma'am, they're
| gonna stay in this country blind. My contract with the
| American voter begins with a plan to end government that will
| not protect its people is a government corruption at the
| State Department of Justice is trying as hard as they can to
| protect religious liberty"
|
| Details at:
| https://successfulsoftware.net/2019/04/02/bloviate/
| vorticalbox wrote:
| When ngrams is high (10) how do you pick "what" to start
| from?
| hermitcrab wrote:
| IIRC it starts with the same few characters as the source
| text.
| max_ wrote:
| I wonder how this would behave if trained on 1 trillion words
| like the LLMs.
|
| Also, Is training a Markov cheaper that training neural nets? It
| would be a great way to cut AI costs if they could be made as
| effective as neural nets.
|
| There is also an interesting post by Orange duck. [0]
|
| [0] https://theorangeduck.com/page/17-line-markov-chain
| gattilorenz wrote:
| The only way to make it sound less like a word salad is to
| increase the n-gram length (e.g. use the last 10 words to
| predict the next), but at that point it starts repeating the
| corpus.
|
| I guess technically you can train on a huge corpus like those
| of NNs to mitigate that, but you'll end up with more refined
| word salads then (edit: refined as in "locally coherent but
| globally still a word salad")
| sgt101 wrote:
| well - the lawyers from the NYT called, and they disagree...
| gattilorenz wrote:
| I didn't get the call, so it's not clear what they disagree
| on :)
|
| I doubt the have anything to say about Markov chains. We're
| talking about technical possibilities, not legality of
| training corpora
| sgt101 wrote:
| The NN's trained on huge corpus's appear to be prone to
| memorization as well.
| danieldk wrote:
| Yes, the problem with such long context lengths is sparsity -
| you still don't have enough data points to make
| representative many-gram distributions. This was exactly the
| motivation for Bengio and others to propose neural language
| models in 2003:
|
| https://dl.acm.org/doi/10.5555/944919.944966
|
| I recommend everyone interested in neural language models
| and/or wondering why we can't scale up Markov models by just
| using longer ngrams to read this paper. It's pretty
| accessible and explains how we got to where we are now in
| 2023.
| stabbles wrote:
| Yeah, a century old approach using frequency tables where the
| next word is generated based just on the previous two words is
| highly competitive with contemporary neural nets. To produce
| coherent sentences, do you ever need to remember more than the
| last two words you said? I doubt it.
| CamperBob2 wrote:
| Now ask it to render a deepfake of Noam Chomsky waving his
| hands saying "Nothing to see here."
| bradrn wrote:
| For a more thorough exploration of Markov text generation, I like
| this article: https://zompist.com/markov.html
| alejoar wrote:
| We had a bot that would randomly say things in our IRC channel 15
| years ago that worked like this. You could also mention it to
| prompt it to reply.
|
| Every message was added to it's knowledge base and it would say
| random but hilarious stuff made up from all the nonsense we used
| to talk about.
|
| Good times.
| duke_sam wrote:
| For a hack day project someone took our IRC logs to do
| something similar for employees who had left (one bot per
| engineer, small company so a handful of bots). This made for a
| great afternoon until the bots started talking to each other
| and then started triggering the simple command the ops bot
| would accept. It quickly got shut down.
| Lukkaroinen wrote:
| I wrote one bot like this. It was fun to play around.
| bhattid wrote:
| Out of curiosity, was the bot based on the bMotion repository?
| https://github.com/jamesoff/bmotion
|
| I remember a friend of mine settings up an IRC bot (named Zeta)
| like that for his sheet music forum many years ago. She was
| involved in a lot of hilarity - probably my favorite antics
| were when she randomly decided to courtmatial someone. Good
| times indeed! :)
| seabass-labrax wrote:
| > she randomly decided to courtmatial someone
|
| Well, you can't stave off fate - you've got to face the music
| eventually.
| ZetaH wrote:
| I was doing a similar experiment recently to generate random
| names that sound like names from a specific language.
|
| I was breaking the list of names apart into 3 and 2 letter parts,
| marking which fragments are from the start, middle and end.
|
| To generate the words I started from a random one from the start
| fragments, then continued with a random one from the middle
| fragments that starts with the latter that the previous one ended
| with, and similarly ended it with one from the end fragments.
| Some examples:
|
| Spanish Names:
|
| Armusa Vantara Modria
|
| German Names:
|
| Ven Marwar, Ger Naroff, Vort Kraldent, Gorn Henter, Urg Wicher,
| Wan Ehranus, Eck Hayazin, Wert Biewin, Rein Relberid,
|
| Catalan:
|
| Pallava Ecorus Sangana Ginavari Telamita Exorxio
|
| Hungarian cities:
|
| Joszog Alszeny Hernafo Garnaza Ragytur Hidacska Mezecs
|
| (edit formatting)
| jokoon wrote:
| I remember finding a set of english names generated with a
| markov chain, I wish I know how to make it, they sounded good.
| pavel_lishin wrote:
| > _I wish I know how to make it_
|
| You can learn! I know it's easy to say something is easy to a
| beginner, but figuring out Markov chains is truly something
| you can get the basics of over a weekend if you've ever
| written any software at all.
| weare138 wrote:
| https://en.wikipedia.org/wiki/MegaHAL
| forinti wrote:
| I did this once with Perl and a book called Eu by the Brazilian
| poet Augusto dos Anjos. It spat out verses and I joined a few to
| create this little thing:
|
| Andam monstros sombrios pela escuridao dos remorsos
|
| Pairando acima dos transeuntes
|
| Maldito seja o genero humano
|
| Prostituido talvez em desintegracoes maravilhosas
|
| Source code: https://alquerubim.blogspot.com/2018/01/gerador-de-
| augusto-d...
| Frummy wrote:
| We uh as a group project did markov chains with some custom
| algorithmic adjustments, scraped reddit via api, and academic
| torrent (this data required a lot of cleaning and was orders of
| magnitude bigger, ran separately), to simulate posts and
| comments. In same group project we also implemented Variable
| Length Markov Chain, tree-based custom context-length, although a
| team member did the matrix based implementation custom context-
| length as well through some matrix magic.
| Frummy wrote:
| Genuinely curious to hear what's wrong with what I thought a
| cool, curious fun time implementing markov chains. We learned a
| lot about statistics and linear algebra, matrix diagonalisation
| and exponentiation, I find it very relevant and can answer
| questions if you don't understand.
| mebassett wrote:
| this is old tech - but it had me thinking. Markov chains are
| picking the next token from a random set and they are giving
| approximately all possible tokens (from the training set) an
| equal probability. What if it weighted the probability - say
| using the inverse vector distance or cosine similarity of the
| neighbors as a proxy for probability, where the vector embedding
| came from word2vec...how close would the performance be to a
| transformer model or even something like lstm rnns ? (I suppose
| I'm cheating a bit using word2vec here. I might as well just say
| I'm using the attention mechanism...)
| pona-a wrote:
| That sounds interesting, but still fails to capture long-range
| dependencies.
|
| If I understand correctly, what you're proposing is to replace
| co-occurrence frequency with word2vec cosine similarity.
|
| I suppose it may help improve overall performance, you're still
| just blindly predicting the next word based on the previous one
| like a first order Markov chain would.
|
| For example, it won't ever fit "2 plus 2 equals 4," because
| right when we get to equals, we discard all the previous words.
|
| Perhaps if we could get the embedding model to consider the
| full sentence and then produce a set of probability-scored next
| token predictions it may work, but now we've just reinvented a
| transformer.
| csmpltn wrote:
| > "I suppose it may help improve overall performance, you're
| still just blindly predicting the next word based on the
| previous one like a first order Markov chain would."
|
| Instead of only taking in the last "token" as context to the
| function that generates the next token - take the last 15
| tokens (ie. the last 2-3 sentences), and predict based on
| that. And that's your "attention" mechanism.
| terminous wrote:
| Yeah, that's just attention with extra steps
| kwhitefoot wrote:
| > using Lewis Carroll's Alice in Wonderland as input. The book is
| more or less nonsense to begin with,
|
| Has he ever read Alice?
| ltr_ wrote:
| When I read about Markov Chains and hallucination of LLMs, the
| first thing that comes to my mind is Reggie watts [0]
| performances, also remind me i used to do the same thing when i
| was bored in the classroom, but on my paper notebooks,just
| nonsense that seems plausible, it was kind of entering in a
| meditative state, it felt good. when i discovered watts i thought
| to my self, 'what?! I cloud have done that for a living?'.
|
| [0] https://www.youtube.com/watch?v=UXyHf_SpUUI
| samsquire wrote:
| I had an idea.
|
| What if you trained a markov chain and got weights of every pair
| or sequence of pairs for a certain length.
|
| Could you also do rotations with this information in vector
| space? With multiple points of freedom?
|
| What if you could do inverse kinematics with markov chains?
|
| Like robot planning for a goal?
| dahart wrote:
| See also Mark V Shaney
| https://en.wikipedia.org/wiki/Mark_V._Shaney
|
| This was a class assignment in college, we had a lot of fun with
| it, and one of my classmates, the brilliant Alyosha Efros,
| decided to apply the exact same technique to images instead of
| text. It turned into a paper that revitalized texture synthesis
| in the Siggraph community. The most interesting part about it (in
| my opinion) is that he ran it on images of text, and it produces
| images of readable text nonsense! With the right window size,
| perhaps there's a nonzero possibility of being able to produce
| the same text either way. This always struck me as very meta and
| makes me wonder if there are ways we could go in reverse with
| image processing (or other types of data); if there's a latent
| underlying representation for the information contained in image
| content that is much smaller and more efficient to work with.
| Neural networks might or might not be providing evidence.
|
| https://people.eecs.berkeley.edu/~efros/research/EfrosLeung....
| fluoridation wrote:
| That's amazing! It's a literal uncrop. I'll have to look into
| this later.
| AndruLuvisi wrote:
| The first edition of Programming Perl had a script that did this
| called travesty. You can download it from
| https://resources.oreilly.com/examples/9780937175644
|
| When I was learning Perl in the '90s, I remember having a lot of
| fun running this on different manual pages.
| chuckadams wrote:
| It's also built into emacs: I believe it's called
| disassociated-press
| heikkilevanto wrote:
| I remember as a school boy I heard about Markov Chains, and
| played a bit with them. Instead of building big n-gram tables, I
| found a simpler way. Start with one n-gram, then scan the source
| text until you find the next occurrence of it, and take the next
| letter after it. Output it, and append it to your n-gram,
| dropping its first letter, and repeat until you have sufficient
| amount of nonsense text.
| mdberkey wrote:
| One of my first personal projects was making 2 discord bots that
| would have an infinite "conversation" using Markov chains.
| Feeding them text from Reddit produced some hilarious results.
| hermitcrab wrote:
| I created a little program in C++/Qt for using Markov chains to
| re-write text. But it works on character sequences, rather than
| words. There are examples and binaries for Windows and Mac:
|
| https://successfulsoftware.net/2019/04/02/bloviate/
| qarl wrote:
| https://archive.org/details/ComputerRecreationsMarkovChainer
| swayvil wrote:
| Plausible nonsense of sufficient complexity and consistency as to
| seem real. It's trivial to generate.
|
| It casts every cherished truth into doubt.
___________________________________________________________________
(page generated 2024-01-01 23:01 UTC)