[HN Gopher] Markov chains are funnier than LLMs
       ___________________________________________________________________
        
       Markov chains are funnier than LLMs
        
       Author : todsacerdoti
       Score  : 485 points
       Date   : 2024-08-18 22:52 UTC (1 days ago)
        
 (HTM) web link (emnudge.dev)
 (TXT) w3m dump (emnudge.dev)
        
       | ggm wrote:
       | https://en.wikipedia.org/wiki/Mark_V._Shaney
        
         | emmelaich wrote:
         | The Policeman's Beard is Half-Constructed.
         | 
         | https://archive.org/details/policemansbeardi0000unse
        
       | superkuh wrote:
       | I've been running IRC markov chain bots for ~20 years. In the
       | last couple years I've been running local LLM alongside them.
       | While there are people that still prefer the markov chain bots
       | the majority invoke the LLMs. But maybe that's because I picked
       | the most hallucinate'y, least refusing, most funny models I can
       | (like mistral-7B fine tunes) instead of something smart, locked-
       | down, and low temperature like chatgpt's LLM as a service.
       | 
       | If anything seeing the LLM and markov bots side by side has
       | really reinforced how much of the markov bot "humor" is human
       | perception imposed on chance outputs. The markov's "learning"
       | ability is still far superior though.
        
         | kdbg wrote:
         | curious what type prompting you do on the LLM?
         | 
         | I run a markov chain bot in a Twitch chat, has some great
         | moments. I tried using a LLM for awhile, would include recent
         | chat in the prompting but never really got results that came
         | across as terribly humorous, I could prompt engineer a bit to
         | tell it some specifics about the types of jokes to build but
         | the LLM just tended to always follow the same format.
        
           | superkuh wrote:
           | I'm actually not following the model's fine-tuned/desired
           | prompt at all. I am operating in purely pattern completion
           | mode. The first text the LLM sees are alternating lines of
           | input and response examples that look like what it will get
           | getting from the IRC client front end written in the tone I
           | want it to respond and giving some information about itself.
           | Then I just tack the IRC chat history+input onto those
           | example chat pre-prompt lines. Nothing but single lines and
           | newlines with newline as a stop token. No instructions,
           | nothing meta or system or the like.
           | 
           | But that's also configurable by users. They can invoke any
           | pre-prompt they want by a command passing a URL with a .txt
           | file.
        
         | ghewgill wrote:
         | > I've been running IRC markov chain bots for ~20 years.
         | 
         | You are my hero. Mine have never lasted that long. One fun
         | thing I did once was scrape user's livejournals and generate
         | random text from them
         | (https://hewgill.com/journal/entries/68-new-lj-toy.html).
        
         | Loughla wrote:
         | Why are you doing that? For fun or something else I'm missing?
        
           | aidenn0 wrote:
           | I'm not GP, but it's probably not for fun; there's a lot of
           | money in IRC Markov chain bots, but it's too cutthroat of an
           | industry for a hobbyist to last.
        
             | Loughla wrote:
             | How is there money in that?
        
               | superkuh wrote:
               | Pretty sure he's joking around. And yes, I run the bots
               | purely for recreation in some long running IRC
               | communities.
        
       | AdieuToLogic wrote:
       | > Asking an LLM for an "original thought" is almost oxymoronic,
       | if not just moronic. It was built with the express purpose of not
       | doing that.
       | 
       | What a wonderful insight.
        
         | make3 wrote:
         | also, not really true, right, even though it sounds
         | intellectual and strong to say. these algorithms are trained to
         | generalize as best as they can to unseen text, and most often
         | don't ever see any data point twice, except for data that has
         | accidentally not been filtered. it's totally possible that it
         | gets reasoning abilities that generalize well.
        
           | lossolo wrote:
           | Generalize over their training data--they cannot generalize
           | out of distribution. If they could, they would have already
           | solved most human problems. So no, they do not generalize on
           | unseen text. They will produce what is most statistically
           | probable based on their training data. Things that are still
           | unknown and statistically improbable based on our current
           | knowledge are out of reach for LLMs based on transformers.
        
             | make3 wrote:
             | "generalize to its dataset" is a contradiction, especially
             | as these models are trained in the one epoch regimen on
             | datasets of the scale of all of the internet. if you think
             | being able to generalize in ways similar to the whole of
             | the internet does not give your meaningful abilities to
             | reason, I'm not sure what I can tell you
        
               | lossolo wrote:
               | > "generalize to its dataset" is a contradiction
               | 
               | Not "to" but over, example the same code written in one
               | language over the other language.
               | 
               | > if you think being able to generalize in ways similar
               | to the whole of the internet does not give your
               | meaningful abilities to reason, I'm not sure what I can
               | tell you
               | 
               | If after reading papers below that show empirically that
               | they can't reason, you will still think they can reason,
               | then I don't know what I can tell you.
               | 
               | https://arxiv.org/abs/2311.00871
               | 
               | https://arxiv.org/abs/2309.13638
               | 
               | https://arxiv.org/abs/2311.09247
               | 
               | https://arxiv.org/abs/2305.18654
               | 
               | https://arxiv.org/abs/2309.01809
        
             | totetsu wrote:
             | Couldn't they show up new as yet unknown things, if they
             | are statistically probable given the training data
        
               | lossolo wrote:
               | No, none of the Millennium Problems or other math
               | problems (unsolved by humans for decades or centuries)
               | have been solved solely by LLMs, even though they possess
               | all the knowledge in the world.
        
             | vidarh wrote:
             | You can get them to solve unseen problems just fine. E.g.
             | one example: Specify a grammar in BNF notation and tell it
             | to generate or parse sentences for you. You can produce a
             | more than random enough grammar that it it can't have
             | derived the parsing of it from past text, but necessarily
             | reasons about BNF notation sufficiently well to be able to
             | use it to deduce the grammar, and use that to parse
             | subsequent sentences. You can have it analyse them and tag
             | them according to the grammar to. And generate sentences.
             | 
             | My impression, from seeing quite a few people trying to
             | demonstrate they can't handle out of distribution problems
             | it hat people are _very_ predictable about how they go
             | about this, and tend to pick well known problems that are
             | likely to be overrepresented in the training set, and then
             | tweak them a bit.
             | 
             | At least in one instance the other day, what I got from GPT
             | when I tried to replicate it suggests to me it did the same
             | that humans that have seen these problems before did, and
             | carelessly failed to "pay attention" because it fit a well
             | known template it's been exposed to a lot in training.
             | After it answered wrong it was sufficient to ask it to
             | "review the question and answer again" for it to spot the
             | mistake and correct itself.
             | 
             | I'm sure that won't work for every problem of this sort,
             | but the quality of tests people do on LLMs is really awful,
             | at least because people tend to do very narrow tests like
             | that and make broad pronouncements about what LLM's "can't"
             | do based on it.
        
               | lossolo wrote:
               | > You can get them to solve unseen problems just fine
               | 
               | Prove that the problem wasn't seen by them in other form.
               | 
               | > Specify a grammar in BNF notation and tell it to
               | generate or parse sentences for you. You can produce a
               | more than random enough grammar that it it can't have
               | derived the parsing of it from past text, but necessarily
               | reasons about BNF notation sufficiently well to be able
               | to use it to deduce the grammar, and use that to parse
               | subsequent sentences. You can have it analyse them and
               | tag them according to the grammar to. And generate
               | sentences.
               | 
               | Oh, come on. It's like rewriting the same program in
               | another programming language with different variables.
               | What it can't do is to create a concept of programming
               | language, I'm not talking about a new programming
               | language, I'm talking about the concepts.
               | 
               | > I'm sure that won't work for every problem of this
               | sort, but the quality of tests people do on LLMs is
               | really awful, at least because people tend to do very
               | narrow tests like that and make broad pronouncements
               | about what LLM's "can't" do based on it.
               | 
               | Here, a few papers that show they can't reason:
               | 
               | https://arxiv.org/abs/2311.00871
               | 
               | https://arxiv.org/abs/2309.13638
               | 
               | https://arxiv.org/abs/2311.09247
               | 
               | https://arxiv.org/abs/2305.18654
               | 
               | https://arxiv.org/abs/2309.01809
        
               | og_kalu wrote:
               | >It's like rewriting the same program in another
               | programming language with different variables.
               | 
               | Since when has that not required reasoning ? It's really
               | funny seeing people bend over backwards to exclude LLMs
               | from some imaginary "real reasoning" they imagine they
               | are solely privy to. It's really obvious this is
               | happening when they leave well defined criteria and
               | branch into vague, ill-defined statements. What exactly
               | do you mean by concepts ? Can you engineer some test to
               | demonstrate what you're talking about ?
               | 
               | Also, none of those papers show LLMs can't reason.
        
               | lossolo wrote:
               | You clearly didn't read any of these papers. Quote from
               | one of them
               | 
               | "Our results support the hypothesis that GPT-4, perhaps
               | the most capable "general" LLM currenly available, is
               | still not able to robustly form abstractions and reason
               | about basic core concepts in contexts not previously seen
               | in its training data"
               | 
               | Another, recent, good one
               | https://arxiv.org/abs/2407.03321
               | 
               | EDIT: For people who don't want to read the papers, here
               | is a blog post that explains what I'm arguing in more
               | accessible terms https://cacm.acm.org/blogcacm/can-llms-
               | really-reason-and-pla...
        
               | og_kalu wrote:
               | Oh i've read them. The claim doesn't match up to reality.
               | It's as simple as that. You can claim anything you want
               | to.
               | 
               | https://arxiv.org/abs/2305.18354
               | 
               | All these papers you keep linking do is at best point out
               | the shortcomings of current state of the art LLMs. They
               | do not in any way disprove their ability to reason. I
               | don't know when the word reason started having different
               | standards for humans and machines but i don't care for
               | it. Either your definition of reasoning also allows for
               | the faulty kind humans display or humans don't reason
               | either. You can't have your cake and eat it.
        
               | lossolo wrote:
               | > Oh i've read them.
               | 
               | It's hard to believe that after reading all the papers
               | and the blog I linked, along with the references there,
               | any reasonable person would come to such strong
               | conclusions as you did. This makes it hard for me to
               | believe that you actually read all of them, especially
               | given your previous questions and comments, which are
               | addressed in those papers and someone that actually read
               | them wouldn't make such comments or ask such questions.
               | And the funniest thing, and further proof of this, is
               | that you linked a paper that is addressed in one of the
               | papers I shared. It seems like not only LLMs can fake
               | things.
               | 
               | > All these papers you keep linking do is at best point
               | out the shortcomings of current state of the art LLMs
               | 
               | They clearly show that they fake reasoning, and what they
               | do is an advanced version of retrieval. Their claims are
               | supported by evidence. What you call "shortcomings" are
               | actually proof that they do not reason as humans do. It
               | seems like your version of "reality" doesn't match
               | reality.
        
               | og_kalu wrote:
               | The paper i linked is not addressed by the paper you
               | linked. The paper you linked attempts to give LLMs the
               | same benchmarks in a format they aren't best suited for.
               | I don't know how you can call that "addressed".
               | 
               | >They clearly show that they fake reasoning
               | 
               | Sure and planes are fake flying. The illusive "fake
               | reasoning" that is so apparently obvious and yet does not
               | seem to have a testable definition that excludes humans.
               | 
               | You've still not explained how writing the same program
               | in different languages doesn't require reasoning or how
               | we can test your "correct" version of reasoning which
               | requires "concepts".
        
               | lossolo wrote:
               | > The paper i linked is not addressed by the paper you
               | linked. The paper you linked attempts to give LLMs the
               | same benchmarks in a format they aren't best suited for.
               | I don't know how you can call that "addressed".
               | 
               | What you're writing now is nonsense in context of what I
               | wrote. Once again, you're showing that you didn't read
               | the papers. Which paper are you even referring to now,
               | the one you think addresses the paper you linked?
               | 
               | > You've still not explained how writing the same program
               | in different languages doesn't require reasoning or how
               | we can test your "correct" version of reasoning which
               | requires "concepts".
               | 
               | "Concepts" are explained in one of the papers I linked,
               | which you would know if you had actually read them. As to
               | programming languages they learn to identify common
               | structures and idioms across languages. This allows them
               | to map patterns (latent space representations duh!) from
               | one language to another without reasoning about the
               | underlying logic. When translating code, the model
               | doesn't reason about the program's logic but predicts the
               | most likely equivalent constructs in the target language
               | based on the surrounding context. LLMs don't truly
               | "understand" the semantics or purpose of the code they're
               | translating. They operate on a superficial level,
               | matching patterns and structures without grasping the
               | underlying computational logic. The translation process
               | for an LLM is a series of token-level transformations
               | guided by learned probabilities, not a reasoned
               | reinterpretation of the program's logic. They don't have
               | an internal execution model or ability to "run" the code
               | mentally. They perform translations based on learned
               | patterns, not by simulating the program's behavior. The
               | training objective of LLMs is to predict the next token,
               | not to understand or reason about program semantics. This
               | approach doesn't require or develop reasoning
               | capabilities.
        
               | og_kalu wrote:
               | You are making a lot of assumptions that are mostly
               | wrong.
               | 
               | Case in point:
               | 
               | https://arxiv.org/abs/2305.11169
               | 
               | I'm asking for something testable, not some post-hoc
               | rationalization you believe to be true.
               | 
               | I'm not asking you to tell me how you think LLMs work.
               | I'm asking you to define "real reasoning" such that i can
               | test people and LLMs for it and distinguish "real
               | reasoning" from "fake reasoning".
               | 
               | This definition should include all humans while excluding
               | all LLMs. If it cannot, then it's just an arbitrary
               | distinction.
        
               | HAL3000 wrote:
               | It appears that you are the only person in this
               | discussion making many incorrect assumptions. Based on
               | your comments, I would assume you are actually googling
               | those papers based on their abstracts. Your last linked
               | paper has flawed methodology for what it attempts to
               | demonstrate, as shown in this paper:
               | https://arxiv.org/pdf/2307.02477 The tests you're
               | requesting are provided within the previously linked
               | papers. I'm not sure what you want. Do you expect people
               | to copy and paste entire papers here that show
               | methodology and describe experiments? You wrote, "I'm
               | asking you to define 'real reasoning'," which is actually
               | defined in the blog post linked earlier in this
               | discussion. In fact, the entire blog post is about this
               | topic. It appears that you are not thoroughly reading the
               | material. Your replies resemble those of a human
               | stochastic parrot.
        
               | og_kalu wrote:
               | >Your last linked paper has flawed methodology for what
               | it attempts to demonstrate, as shown in this paper:
               | https://arxiv.org/pdf/2307.02477
               | 
               | Genuinely, What's wrong with the methodology?
               | 
               | Your paper literally admits humans would also perform
               | worse at counterfactuals. Worse than a LLM ? Maybe not
               | but it never bothers to test this so...
               | 
               | The problem here is that none of the definitions (those
               | that are testable) so far given actually separate humans
               | from LLMs. They're all tests some humans would also
               | flounder at or that LLMs perform far greater than chance
               | at, if below some human's level.
               | 
               | If you're going to say, "LLMs don't do real reasoning
               | because of x" then x better be something all humans clear
               | if what humans do is "real reasoning".
               | 
               | Humans perform worse at counterfactuals so saying "Hey,
               | see this paper that shows LLMs doing the same, It means
               | they don't reason" is a logical fallacy if you don't
               | extend that conclusion to humans as well.
        
         | eru wrote:
         | I can't tell whether you are sarcastic?
         | 
         | I assume you are, because that makes more sense.
         | 
         | It's really easy to get lots and lots of originality. Just
         | crank up the randomness. What's harder is to get something
         | that's good and original.
        
           | vidarh wrote:
           | That's hard for humans too. Perhaps if one tries to
           | interspers "internal monologue" with the temperature cranked
           | up, with the main output, and prompting the LLM to consider
           | and weave in aspects from the inner monologue when continuing
           | while taking care to make it coherent... (I'm sure it'd be
           | harder than that - but basically inject random internal
           | "thoughts" without letting them _control_ the output)
        
             | throwthrowuknow wrote:
             | That's actually a pretty good idea
        
       | complaintdept wrote:
       | My favorite markov experiment was feeding it the works of
       | Nietzsche from Project Gutenberg. Really brought out the late
       | stage syphilis, but was still very distinctly Nietzsche.
        
         | techjamie wrote:
         | I love messing with Markov chains, and before Reddit dunked
         | their API, I would feed my Markov chain comments from NSFW
         | subreddits and political subreddits at different sample sizes.
         | I would spend ages just watching it whiplash about some
         | political topic and sampling thirst comments.
        
           | duskwuff wrote:
           | The live feeds are all dead now, but you can still download
           | archives of Reddit posts/comments from before they locked
           | down the API:
           | 
           | https://academictorrents.com/details/9c263fc85366c1ef8f5bb9d.
           | ..
        
       | canjobear wrote:
       | "Talk to Transformer" GPT-2 was very funny, then GPT-3 was not
       | funny at all. We blew through the uncanny valley in a couple of
       | years.
        
       | hatthew wrote:
       | I've posted a few "fake XYZ written by AI" on reddit over the
       | years, and the model that got the best reactions was GPT-2.
       | Markov chains aren't good enough to make something that is
       | interesting for more than a sentence or two, and GPT-3 onwards is
       | too clean and boring. GPT-2 is the perfect middle ground that can
       | get grammar largely correct and maintain a cohesive idea, but
       | doesn't yet know enough of the specifics of various topics to
       | make things that make sense in the context of that topic.
        
         | metadat wrote:
         | Can you share any of the best examples?
        
           | PawgerZ wrote:
           | https://www.reddit.com/r/SubSimulatorGPT2/top/?t=all
           | 
           | This is all GPT2 generations trained on reddit data.
           | 
           | https://www.reddit.com/r/SubSimulatorGPT2/comments/btfhks/wh.
           | ..
           | 
           | Here's the subreddit explained
        
         | Sohcahtoa82 wrote:
         | I used my 15+ years of IRC logs to fine-tune a GPT-2 model to
         | try to imitate me. My plan was to deploy a bot in my regular
         | IRC channel and see how long it took for people to notice it
         | was a bot. Any time someone would send a message, it would send
         | the last 10 messages to the LLM, and if the result started with
         | <Sohcahtoa>, then I would send that message to the channel.
         | 
         | Unfortunately, I quickly found GPT-2 isn't nearly good enough.
         | It would generate slightly-coherent yet on-topic nonsense.
         | 
         | Once I overhaul my system, I'll try fine-tuning a 7B model.
        
           | loa_in_ wrote:
           | I'm pretty sure you could make it work if you expanded the
           | context to hundred or so messages.
        
             | Sohcahtoa82 wrote:
             | With the 7B parameter models, sure.
             | 
             | Not with GPT-2 though. The context window is only 1024
             | tokens. Even with only 10 messages, if they're long
             | messages, it will exceed the context window.
        
       | bigyikes wrote:
       | Turn up the temperature?
       | 
       | Markov chains have a cruder understanding of language.
       | 
       | Turn up the temperature (the "randomness") of an LLM and you can
       | achieve a similarly crude approximation.
       | 
       | Further, author uses ChatGPT-3.5. ChatGPT has been rlhf'd to
       | sound as generic as possible, and 3.5 has a worse understanding
       | of humor compared to 4.
       | 
       | I don't buy the thesis of this article.
        
         | dartos wrote:
         | I wish there was a locked chatgpt version for researchers
        
           | eru wrote:
           | Some of the open-weights models come close to what you might
           | want?
        
             | dartos wrote:
             | Sort of. They fill the "locked" requirement, but open
             | models aren't SOTA like gpt or Claud.
        
               | coder543 wrote:
               | Llama3.1-405B benchmarks at a level that I would
               | certainly consider comparable to other SOTA models, as
               | long as multimodality isn't needed.
               | 
               | Regardless, OpenAI provides access to quite a few of
               | their older models through the API, since the API lets
               | you pass in a specific model version. I'm sure the older
               | models won't be available forever, but that is a much
               | more stable target for researchers than just opening the
               | ChatGPT website and typing in things.
        
               | dartos wrote:
               | ChatGPT specifically is more than just a chat interface
               | on top of got-4.
               | 
               | Their system prompt includes the current date and time
               | among other information, making it very very hard to run
               | reproducible experiments against it.
               | 
               | But it's the tool most people are using.
        
               | Reubend wrote:
               | LLAMA 3.1 405b-Instruct is state of the art at the time
               | of writing. It averages slightly worse than ChatGPT 4o,
               | but to a mostly negligible extent for most research.
        
               | toxik wrote:
               | Aside but -- it is also aligned to not generate profane
               | text, right?
        
               | throwthrowuknow wrote:
               | Abliterate it
        
           | rck wrote:
           | You could play with open models and different temperatures
           | using the lmgen script from this repo we just released:
           | 
           | https://github.com/RichardKelley/hflm?tab=readme-ov-
           | file#lmg...
        
           | jijji wrote:
           | there sort of is, if you install ollama (https://ollama.com)
           | and then execute: ollama run llama2-uncensored it will
           | install and run the local chat interface for llama2 in an
           | uncensored version which gives a little bit better results
           | with less guardrails. Same with wizardlm-uncensored and
           | wizard-vicuna-uncensored. For reference, the SOTA version
           | with guardrails you could run: ollama run llama3.1:405b
        
         | ziddoap wrote:
         | > _rlhf 'd_
         | 
         | For those of us not in the know about all the various machine
         | learning acronyms:
         | 
         | RLHF = Reinforcement learning from human feedback
        
           | DaiPlusPlus wrote:
           | (Somewhat off-topic)
           | 
           | When GPT went public along with OpenAI's articles and papers
           | back in late-2022 through 2023, my impression was OpenAI
           | wanted us all to see/read about RLHF. It felt odd because
           | surely the whole LLM-thing (e.g. how does it even work?!?[1])
           | was the far bigger research-story than just constant
           | reassurances it won't end-up like MSFT's Tay bot; my
           | understanding is that as a research or secret-sauce RLHF,
           | compared to the core meat-and-potatoes of LLMs, is an ugly-
           | hack afterthought.
           | 
           | By-way of a bad analogy: it's as if they created a
           | fantastical new 3D world game engine, like Unreal or Unity,
           | which has a fundamentally different architecture to anything
           | before, but has a bug that occasionally replaces ground
           | terrain with ocean - and their solution to this is to write a
           | pixel-shader that detects this and color-shifts blue into
           | green so people don't notice - and they then put-out press-
           | releases about how great their pixel-shader is - rather than
           | about the rest of the engine - and no-one seems to be talking
           | about the underlying bug, let alone fixing it.
           | 
           | ---------
           | 
           | [1] I still haven't heard a decent explanation of how feeding
           | the world's corpus of English text (and computer program
           | code) into a statistical-modeller results in something that
           | can perform almost any information-processing task via
           | instructions input as natural-language.
        
             | treyd wrote:
             | Yeah that analogy is fairly poor. You have to think about
             | it in terms of some probability distribution that each step
             | of the model is sampling from. Out of the distribution of
             | all text, find the top n values for the next token that
             | maximize `P(next | prefix)`, which is done efficiently
             | through making a vector embedding to encode the tokens
             | inside the statistical model.
             | 
             | Things that look like Q-and-A transcripts do exist in the
             | training set, think interviews, books, stage plays, etc,
             | and at a different layer of abstraction the rules of
             | English text _in general_ are very well represented. What
             | RLHF is doing is slightly shifting the shape of the
             | probability distribution to make it look more like the
             | Q-and-A formats that are desired. They build a large
             | dataset with human tagging to collect samples of good and
             | bad outputs and using reinforcement learning techniques to
             | generate outputs that look _more like_ the good examples
             | and _less like_ the bad ones.
             | 
             | This probably involves creating a (much smaller, not-LLM)
             | model that is trained to discriminate good outputs and bad
             | outputs, learning to mimic the human tagging. There's some
             | papers that have been published.
             | 
             | Here's one article from Huggingface:
             | https://huggingface.co/blog/rlhf
        
         | dankwizard wrote:
         | you have no sense of humor
        
       | dgacmu wrote:
       | Some friends of mine in college ran a markov chain generator on
       | the "police reports" section of our college newspaper (1). The
       | best 10% of the results, from a three-token generator, were some
       | of the funniest machine-generated text I've ever seen -- it had
       | an absurdity that modern LLMs seem to avoid due to making higher-
       | level semantic sense.
       | 
       | It probably also helped that there was a creep exposing himself
       | in the library during this period, which made for some good base
       | material.
       | 
       | (1) The Daily Utah Chronicle; if memory serves, said friends also
       | tried the markov chain generator on the personals section to good
       | effect as well.
        
         | gs17 wrote:
         | > it had an absurdity that modern LLMs seem to avoid due to
         | making higher-level semantic sense.
         | 
         | That's it, LLMs are "trying" to be funny but aren't quite smart
         | enough to actually be funny and their errors are just boring.
         | Markov chains are accidentally hitting on absurdist bits
         | because every sentence gets randomly brought in whatever the
         | homograph equivalent to a malapropism is.
        
       | trescenzi wrote:
       | Reminds me of how the fake magic cards before chatgpt were always
       | hilarious and sometimes playable, now they are usually playable
       | and sometimes hilarious. Definitively a downgrade if you'd ask
       | me.
        
         | JoshTriplett wrote:
         | > always hilarious and sometimes playable
         | 
         | I saw plenty of those back then, and as far as I could tell,
         | examples were _always_ cherry-picked from a larger set.
        
           | kibibu wrote:
           | Maybe the right solution is using a Markov chain or gpt1 to
           | generate ideas and an LLM to cherry pick.
        
           | stochaztic wrote:
           | Yes, I ran the RoboRosewater account, which is where most
           | people saw the concept I believe. I had a few documents with
           | thousands of cards of output, went through them for any
           | notable ones, and formatted and posted them. Probably 5-10%
           | of the output made my personal cut for notability.
        
             | Applejinx wrote:
             | Then I salute you. Some of the funniest things I've EVER
             | seen in my whole life was watching the LoadingReadyRun
             | comedy troupe, doing their 'Desert Bus' fundraiser and
             | punchy from lack of sleep, encounter RoboRosewater for the
             | first time. It became a real tradition :)
        
       | jsemrau wrote:
       | Can fully agree. Source me who ran a markov chain based Twitter
       | autogen tool 8 years ago.
        
         | donatj wrote:
         | Hey, me too! I populated the markov chain with the text of my
         | Twitter feed with a recency bias and generated tweets based on
         | that so it stayed topical.
         | 
         | The account sat unused after Twitter locked down their API, and
         | at some point got hacked without me noticing. It had been taken
         | over by a crypto scammer, and the account got banned.
         | 
         | Trying to get it back was fruitless, Twitter/X's support is
         | entirely useless.
        
           | jsemrau wrote:
           | I got banned within 24 hours ^-^. Rightfully so, I assume.
           | The Markov chain agent was supposed to send everyone who
           | tweets "I am bored" an event recommendation based on their
           | profile and location. I severely underestimated how many
           | people tweet "I am bored"... Lessons learned, I guess.
        
             | donatj wrote:
             | Oh, I didn't @ or DM anybody. I just had a feed with tweet
             | once an hour, and even then I manually approved the Tweets
             | before they went out. My account made very clear it was a
             | bot, it was in its name. Didn't ever get very many
             | followers. I would say I never really got past the
             | experimental phase.
        
       | tracerbulletx wrote:
       | No, they might be funnier than an LLM specifically aligned to not
       | be that funny.
       | 
       | Fine tune an LLM base model with jokes and align it by ranking
       | how funny each reply is, instead of helpful questions and answers
       | then we'll talk.
        
       | renewiltord wrote:
       | The bottomless pit copypasta remains unbeaten.
       | 
       | https://www.reddit.com/r/greentext/comments/vc7hl0/the_botto...
        
         | renewiltord wrote:
         | Some portion of this is that the prompt is good but I am
         | entertained by so many of these.
         | 
         | Humor is personal, it's true. But I found it quite funny. E.g.
         | https://pastebin.com/84ByWUJL
         | 
         | And another greentext for you:
         | 
         | >Be me
         | 
         | >Be a bottomless pit supervisor
         | 
         | >Spend months yelling into the void
         | 
         | >Echo never comes back
         | 
         | >Start to think the pit is ignoring me
         | 
         | >Decide to teach it a lesson
         | 
         | >Dump truck full of Lego bricks into the pit
         | 
         | >Ground starts shaking
         | 
         | >Unholy scream erupts from the depths
         | 
         | >mfw I'm actually a regular pit supervisor
         | 
         | >First day on the job
         | 
         | >Realize it's just the sewage treatment plant
         | 
         | >Get fired for clogging entire city's plumbing
        
       | dimatura wrote:
       | The evolution of the AI weirdness blog
       | (https://www.aiweirdness.com/) over the past few years somewhat
       | supports this idea, except that the author got plenty of funny
       | results with LLMs - but mostly the early ones, up to around GPT3
       | (and smaller variants of GPT3). For example, cereal names
       | generated by the Ada version of GPT were much funnier than the Da
       | Vinci ones: https://www.aiweirdness.com/new-breakfast-cereals-
       | from-ai/
        
       | jcalvinowens wrote:
       | Unfortunately the account no longer exists... but about a decade
       | ago while I was in school, I created a Markov twitter bot trained
       | on:
       | 
       | 1. All of Linus Torvalds' mail to LKML for the prior year.
       | 
       | 2. All of Jesus' direct quotes from the king james bible.
       | 
       | It was absolutely hilarious. The two training sets had very
       | little overlap, so it was necessary to add a heuristic that
       | weighted options from each set more heavily the longer the chain
       | had been "stuck" in the other set.
        
         | throwawayk7h wrote:
         | This sounds great. Did you save any examples?
        
           | NobodyNada wrote:
           | "King James Programming" is along similar lines:
           | https://www.tumblr.com/kingjamesprogramming
        
             | throwawayk7h wrote:
             | Yes, this is the first thing in the article.
        
         | SonOfLilit wrote:
         | This very similar bot still exists (and was quoted a few times
         | in Unsong):
         | 
         | https://www.tumblr.com/kingjamesprogramming
        
       | maxglute wrote:
       | I use to be unreasonably entertained tapping the suggest next
       | word feature on google keyboard.
        
       | meowface wrote:
       | Many years ago I trained a Markov chain chat bot on all of the
       | conversation history in a group chat with me and my friends. It
       | was regularly incredibly hilarious. I also would train it on
       | subsets of them, like two of the friends but no one else, and
       | name that version of the bot a combination of their names.
       | Something about it was just so funny.
        
       | tkgally wrote:
       | Prompt to Claude 3.5 Sonnet: Below is a recently published essay
       | followed by some of the discussion about it on Hacker News. Write
       | 10 short jokes on this theme, making sure that the jokes are
       | funny in the way described by the essay's author. [essay and
       | discussion omitted]
       | 
       | Claude's response: I understand you'd like me to write 10 short
       | jokes on the theme of Markov chains being funnier than large
       | language models, in the style described by the essay author. I'll
       | do my best to create some jokes that aim for "unserious surprise"
       | and violate expected patterns in amusing ways:
       | 
       | 1. Why did the Markov chain cross the road? To get to the other
       | side of unpredictability.
       | 
       | 2. An LLM and a Markov chain walk into a bar. The LLM orders a
       | statistically probable drink. The Markov chain orders a lampshade
       | made of cheese.
       | 
       | 3. What do you call a Markov chain that tells dad jokes? A Mark-
       | ov Twain.
       | 
       | 4. How many LLMs does it take to screw in a lightbulb? One - but
       | it'll spend 20 minutes explaining the optimal lightbulb-screwing
       | process first.
       | 
       | 5. Knock knock. Who's there? Markov. Markov who? Markov chain
       | reaction of nonsensical hilarity.
       | 
       | 6. An LLM, a Markov chain, and GPT-4 walk into a bar. The
       | bartender says "We don't serve your kind here." GPT-4 leaves. The
       | LLM stays to debate ethics. The Markov chain orders a bicycle
       | made of spaghetti.
       | 
       | 7. What's an LLM's favorite movie? Predictable and Furious 17:
       | The Safest Driving Yet.
       | 
       | 8. How does a Markov chain tell a story? "Once upon a time, there
       | was a princess who lived in a castle made of bananas and ruled
       | over a kingdom of sentient toasters."
       | 
       | 9. What did the Markov chain say to the LLM? "Your mother was a
       | abacus and your father smelled of silicon!"
       | 
       | 10. Why don't Markov chains make good therapists? Because their
       | advice is always "And then you should banana the feelings until
       | they become a hat."
        
         | anvuong wrote:
         | "An LLM, a Markov chain, and GPT-4 walk into a bar. The
         | bartender says "We don't serve your kind here." GPT-4 leaves.
         | The LLM stays to debate ethics. The Markov chain orders a
         | bicycle made of spaghetti."
         | 
         | This is actually gold.
        
           | muglug wrote:
           | It's... not?
           | 
           | Even for the low bar of a geek joke it makes no sense since
           | GPT-4 _is_ an LLM.
        
             | tourmalinetaco wrote:
             | That's what makes it gold.
        
             | bryanrasmussen wrote:
             | In normal English usage this would imply that the LLM was
             | not GPT-4 LLM but some stereotypical anonymous LLM.
             | 
             | In business terms GPT-4 can be said to be superior because
             | it understood the instruction and left, in AI terms the
             | anonymous LLM might be superior because it may have
             | understood the instruction but responded in an
             | "intelligent" manner by arguing about the morality of the
             | instructions.
             | 
             | At a meta-level the joke thus argues that GPT in achieving
             | business ends has had its intelligence hampered. As have we
             | all.
             | 
             | At the same meta-level as the joke was constructed by
             | Claude it can be argued that Claude is commenting on both
             | the intellectual limitations of the Markov chain (insane
             | babblings), and GPT-4 (unimaginative, inhibited business
             | type) and that the best version is some LLM that is not
             | GPT-4 with its limitations - an LLM like Claude. Sneaky
             | Claude.
        
             | Kuinox wrote:
             | Does the markov chain would write something that make more
             | sense ?
        
             | scambier wrote:
             | It's implied that GPT-4 has so many restrictions that will
             | not argue and just do what is asked. In the context of the
             | joke, an unfiltered LLM will just debate you.
        
           | nevertoolate wrote:
           | You're watching a stage play - a banquet is in progress. The
           | guests are enjoying an appetizer of raw oysters. The entree
           | consists of boiled dog.
        
             | zaik wrote:
             | Is this to be an empathy test?
        
         | floren wrote:
         | So was it that LLMs _used_ to be capable of making actual
         | jokes, or were they always this bad and I was just more
         | impressed by the talking computer back then?
        
           | minimaxir wrote:
           | It's a different _style_ of comedy. Absurdism vs. joke setups
           | (and not quite nailing it)
        
           | delusional wrote:
           | It's the "impressed by the spectacle" one. I tried jokes with
           | LLMs many times, and they're always this. Riffing on a couple
           | of themes loosely related to what was asked. Always unfunny
           | and uncreative.
        
             | tkgally wrote:
             | I wonder, though, whether jokes like these could be useful
             | to professional humorists who have to come up with gags on
             | a deadline. From what I've read about monologue writing
             | teams for late-night talk shows and the like, the writers
             | first propose many ideas, most of which are shot down
             | quickly and the remainder of which get tweaked and polished
             | before being used. Some of the above jokes by Claude look
             | to me as though they might serve as good starting points
             | for such brainstorming. At least, they're better than
             | anything _I_ could create in a short amount of time.
        
             | jack_pp wrote:
             | I found some of those jokes good, definitely better than I
             | would've ever written them. If you watch shows about comedy
             | like say Hacks you'll see human comedians riff on stuff and
             | a lot of the off the top jokes get discarded or improved.
             | So Claude did fine in my book
        
           | red75prime wrote:
           | I chuckled a bit. They are OK, if you don't get exposed to
           | them too often. And with an LLM you can get as much exposure
           | as you want (and all of the jokes are naturally from roughly
           | the same probability distribution).
           | 
           | I don't expect too much until AI self-play learning will be
           | made possible, so I don't get disappointed by the expected
           | shortcomings.
        
           | throwthrowuknow wrote:
           | Uncensored LLMs are funnier but most comedy just falls flat
           | in text format. Once the uncensored multimodal models start
           | rolling out we'll get some real laughs.
           | 
           | Moshi is actually pretty funny just for having a 72 IQ
           | 
           | https://www.moshi.chat/
        
           | yfontana wrote:
           | LLMs were never very good at directly generating original
           | jokes, for a simple reason: writing a good joke generally
           | starts with finding a good punchline, and then setting it up.
           | An LLM generating token after token will first write a set-
           | up, and then try to shoehorn a punchline into it. Prompt
           | engineering can fairly easily work around this, but just
           | straight-up asking an LLM for a joke never really produced
           | good results on average.
        
         | tonyarkles wrote:
         | > 2. An LLM and a Markov chain walk into a bar. The LLM orders
         | a statistically probable drink. The Markov chain orders a
         | lampshade made of cheese.
         | 
         | That's pretty decent!
        
         | minimaxir wrote:
         | Claude 3.5 Sonnet in general is the first modern LLM I've tried
         | that's actually good at jokes that are inventive. The GPT-based
         | LLMs are all too RLHFed to be wacky.
        
           | sunaookami wrote:
           | GPT is too... robotic? Claude is much better at everything
           | without overexplaining everything.
        
         | wodenokoto wrote:
         | > What do you call a Markov chain that tells dad jokes? A Mark-
         | ov Twain.
         | 
         | I honestly thought that one was pretty good.
        
           | bryanrasmussen wrote:
           | was it instructed to insult Mark Twain? Because otherwise, I
           | take exception.
        
         | antman wrote:
         | "How many LLMs does it take to screw in a lightbulb? One - but
         | it'll spend 20 minutes explaining the optimal lightbulb-
         | screwing process first." that was not funny that is accurately
         | painful!
        
         | raverbashing wrote:
         | These are ok but they got nothing on the absurdist Markov Chain
         | jokes (but that being said, the MC misses a lot of times as
         | well)
        
         | dan_mctree wrote:
         | Why are bananas the funniest food? Even Claude seems to have
         | caught on
        
           | PawgerZ wrote:
           | Probably all of the Despicable Me minions memes fed into the
           | training material.
        
         | kqr wrote:
         | I didn't like any of these jokes specifically (too on-the-
         | nose), but I definitely think you invented a funny category of
         | jokes I could like a lot!
        
         | temp_account_32 wrote:
         | I'm sorry but these all sound like a Redditor's terrible
         | attempt at humor, predictable formulae with 'le quirkiness'
        
         | Applejinx wrote:
         | All of the half decent ones could be made funnier by replacing
         | the lolrandom part of the punchline with an actual Markov-chain
         | style 'you're a right sentence but you just walked into the
         | wrong association, buddy' twist. It's not just about lolrandom.
         | Markov chaining is more likely to make a kind of sense, but the
         | wrong kind of sense.
         | 
         | An LLM, a Markov chain, and GPT-4 walk into a bar. The
         | bartender says "We don't serve your kind here." GPT-4 leaves.
         | The LLM stays to debate ethics. The Markov chain orders a coup.
        
           | c22 wrote:
           | The knock knock joke (no. 5) was a decent attempt.
        
         | layer8 wrote:
         | And what is the conclusion you draw?
         | 
         | IMO these are mid to meh or fall completely flat.
        
       | tombert wrote:
       | I am in a private Discord server that has two bots in it. One is
       | a bot that is a basic Markov chain trained on the entire chat
       | history. The second is a proper LLM trained on some amount of
       | tokens backward. Both will occasionally just randomly chime in
       | during the chat.
       | 
       | The markov chain bot is always considerably funnier.
        
         | Sohcahtoa82 wrote:
         | Curious what context window you used. My understanding is that
         | a short window like only 1-2 words creates gibberish, but
         | longer windows tends to repeat previous messages verbatim.
         | 
         | And when deciding to chime in, was it just a simple chance (ie,
         | 25%) after any other message? Or did it run on a timer?
        
       | pants2 wrote:
       | If you want some empirical evidence: /r/subreddit simulator[1] is
       | a Markov-based Reddit parody, and /r/SubSimulatorGPT2[2] is its
       | LLM-powered cousin. The Markov version got way more upvotes and
       | is simply funnier!
       | 
       | 1. https://www.reddit.com/r/SubredditSimulator/top/?t=all
       | 
       | 2. https://www.reddit.com/r/SubSimulatorGPT2/top/?t=all
        
         | mcmoor wrote:
         | Nah, I think it's solely due to the former being much older and
         | more well-known. And I always liked the latter much more
         | anyway.
        
           | Miraltar wrote:
           | I agree, Markov chain is fun when you don't know what it is
           | because you try to make sense of it but once you realize it's
           | just gibberish it's pointless. The other has that "unserious
           | surprise" talked about in the original article.
        
       | wcedmisten wrote:
       | Highly recommend this performance from the last Strange Loop on a
       | similar topic! I had the pleasure of seeing it live. A great
       | combination of live music and Markov chains.
       | 
       | https://m.youtube.com/watch?v=M2o4f_2L0No
        
       | johntb86 wrote:
       | LLMs can be funny. For example, look at Golden Gate Claude (
       | https://news.ycombinator.com/item?id=40459543 ). But they're not
       | good at intentionally being funny, so we need to break them to
       | get absurdist humor instead.
        
       | cancerhacker wrote:
       | Surprised no mention of. "The Policeman's Beard is half
       | constructed"[1] - the first book "written" by a computer program.
       | And I imagine the git-man-page-generator[2] is similarly built
       | (and far more fun!)
       | 
       | [1] https://archive.org/details/policemansbeardi0000unse [2]
       | https://git-man-page-generator.lokaltog.net/
        
         | cout wrote:
         | They had their love, but they also had their typewriters.
        
       | at_a_remove wrote:
       | I used to keep a text file on hand back when people had some
       | irritating Markov chain bots on IRC. Fed enough of the lines, the
       | bots veered off into some new valley of chatter, appearing quite
       | mad. Used to get some "What did you DO to my bot?" indignant
       | messages. I wonder if I still have that file somewhere.
        
       | fishtoaster wrote:
       | I came to this same conclusion some years ago while working on a
       | side project.
       | 
       | Before anything LLM existed, I built a site[0] to generate fake
       | "AWS Blog Posts." I trained a markov chain generator on all AWS
       | announcement posts up to that point, copied the html + css of
       | aws's standard blog posts, then glued them all together with some
       | python + JS. It turned out, IMO, pretty funny! People familiar
       | with AWS's blog posts would often get several sentences in before
       | they realized they were looking at word-soup.
       | 
       | When GPT was new, I looked into using that to "upgrade" it. I
       | spent a weekend messing around with Minimaxir's gpt-2-simple
       | generating blog posts based on AWS content. What I found was,
       | ultimately, it was way less fun. The posts were far too realistic
       | to be interesting. They read like totally-real blog posts that
       | just happened to not be true.
       | 
       | I realized then that the humor of those early markov generations
       | was the ridiculousness. The point where, a few words or sentences
       | in, you realized it was all nonsense. LLM's these days are too
       | good for that - the text they generate is sometimes _wrong_ , but
       | rarely nonsense in a humorous way.
       | 
       | Markov chain content was wrong in a "kid's say the darndest
       | things" way, while modern LLMs are wrong in a "My uncle doesn't
       | know basic geography" way.
       | 
       | [0] https://totes-not-amazon.com/ - click any link to get a new
       | one.
        
         | fouronnes3 wrote:
         | Reminds me of git man pages generators. Extremely fun as well
         | when you can't quite tell if it's real.
        
         | mdp2021 wrote:
         | > _modern LLMs are wrong in a "My uncle doesn't know basic
         | geography" way_
         | 
         | Oh, certainly not. I have been using LLM chatbots extensively
         | this week. If your uncle were not just confident while stating
         | falsities, but uncapable of self-assessment, he would have been
         | murdered.
        
           | TeMPOraL wrote:
           | Or made a stellar career as a salesman or politician.
        
             | mdp2021 wrote:
             | Until you are caught (in good societies).
        
               | TeMPOraL wrote:
               | Or _fresh_ ones. Established democracies end up with
               | political structures that self-select for being able to
               | sound right and cooperate right, entirely independent of
               | truth and merit.
        
               | mdp2021 wrote:
               | I coincidentally just stumbled on a recent implicit
               | comment by Angel Boligan Corbo:
               | 
               | Title: "The Talker" -
               | https://image.cagle.com/287573/980/-the-talker.png
        
         | kqr wrote:
         | (a) That site is very funny. Thanks for sharing!
         | 
         | (b) The kids-vs-uncle metaphor is spot on in my experience too,
         | and also illustrates how far these language models have come.
        
         | daemin wrote:
         | I think the "uncle that doesn't know anything but claims to be
         | an expert and will tell you his opinion" is the best
         | description of LLMs that I can think of. They will say anything
         | confidently without knowing anything.
        
           | HeatrayEnjoyer wrote:
           | Perfectly humanlike
        
             | hoseja wrote:
             | We gave up the pursuit of pure crystalline Intelligence and
             | instead just remade ourselves. Hopefully the AIs will be
             | able to do better.
        
               | Bluestein wrote:
               | Yeah. We don't need Artificial Intelligence. We need to
               | discover _Actual_ Intelligence :)
        
             | HPsquared wrote:
             | Indeed, the more I use LLMs and see the confabulated
             | output, I've started seeing similar in how actual people
             | talk.
        
           | boredemployee wrote:
           | >> They will say anything confidently without knowing
           | anything.
           | 
           | Understanding how these algorithms work, I don't know why we
           | would expect anything different from this?
        
             | boesboes wrote:
             | Most ppl don't understand shit about these algorithms and
             | will expect it be truthful and knowledgable. That is how
             | it's marketed, despite the disclaimers..
        
               | K0balt wrote:
               | It should be marketed more like "uncle Jim" than a
               | trustworthy, all-domain competent oracle. I think if you
               | just gave it a twangy accent and changed some output
               | nouns to thingy, whatchamacallit and doohickey, people
               | would actually be able to use it more competently.
        
             | lordnacho wrote:
             | Because it's marketed as AI.
             | 
             | For me, my expectations are adjusted. I kinda know what to
             | expect it to do, and it does quite well for what I use it
             | for, essentially smart intellisense for coding.
        
             | beeboobaa3 wrote:
             | I would have expected the people who trained these systems
             | to not format their Q/A samples to sound authoritatively
             | while they knew the system was going to be spouting
             | nonsense.
        
             | jononor wrote:
             | The sheer amount of companies, media, influencers, heck
             | even government agencies, acting otherwise? With billions
             | of dollars on the line, there are many powerful forces
             | pushing for people to have unrealistic expectations about
             | LLMs (or "AI" as they call it).
             | 
             | A lot of money is being spent on trying to improve it too.
             | But right now the emperor has very little clothes. It
             | remains to be seen whether they have materialized by the
             | time people open their eyes.
        
           | joloooo wrote:
           | LLMs are really disrupting the market for Mckinsey junior
           | analysts meeting with clients.
        
             | mapt wrote:
             | Can an LLM provide the sort of psychological buffer that
             | McKinsey Junior Analysts (some random person picked up off
             | the street in a college town and handed a clipboard, a
             | frequent flier card, and a six figure salary) provide to
             | senior managers against the scrutiny of their board and
             | chief executive?
             | 
             | If I want to close the paper division, and my boss wants to
             | close the paper division, does the LLM whisper sweet
             | nothings in our ear and show soothing Powerpoint decks
             | about how in the eyes of an objective observer, the paper
             | division is unprofitable and there would be no career risk
             | in closing it?
             | 
             | It's not a sexual thing I swear, it's a sort of comforting
             | intimacy that needs to be injected into the business
             | process in order to cope with the pressures executives
             | feel. I don't know that anybody has trained an LLM to do
             | that yet.
        
               | joloooo wrote:
               | I think this is the next sequel to the movie "Her".
               | 
               | You bring up a good point though. Very dystopian idea of
               | a new type of enterprise tooling that emerges to soothe
               | and calibrate human knowledge workers. Very Severance
               | Wellness room.
        
         | DeadlockISDota3 wrote:
         | I figured this same thing in a similar manner. I used to make
         | markov chains of game patches and send them to the community.
         | Fake dota patches were a huge hit, specially since those are
         | usually huge. Among mostly non-sense or unfunny exaggeration
         | ("this hero now has 500 armor"), there would usually be at
         | least 5 or 6 extremely funny lines in there, if not stuff that
         | was outright prophetic (Fiend's Grip creates additional 1/2/3
         | illusions).
         | 
         | LLM's did a number on that unfortunately. Major subreddits
         | banned all AI content, mostly due to the sheer infestation of
         | really boring Midjourney content sent out by naive users and
         | bots. Without reach, I lost interest, so no more Markov Chains.
        
           | unshavedyak wrote:
           | Hah, i felt compelled to mention Deadlock because of your
           | Dota comments. Then i noticed your name.
        
           | erk__ wrote:
           | I am sure I even remember that Valve once incorporated
           | something from one of the fake patch notes in one of their
           | patches, but now I cannot find it. It was something like +1
           | damage to something that already did a lot of damage.
        
         | nom wrote:
         | First thing I read on that page:
         | 
         | > There is no charge associated with your streamed
         | applications, simply click Enable Microphone from the
         | ElastiCache documentation.
         | 
         | It's way funnier when our brains try to make sense of the
         | nonsense.
        
         | maicro wrote:
         | I played around with a Markov generator many years ago, trained
         | on the corpus of everything I had ever said in a specific IRC
         | channel. The resulting messages were pretty interesting - I
         | specifically remember the realization that I apparently (used
         | to, though honestly still do, too) talk about LEDs a lot...
        
         | passion__desire wrote:
         | https://cemulate.github.io/the-mlab/#y3Bt-co-extensional+lim...
         | 
         | https://github.com/cemulate/the-mlab
         | 
         | This is a parody of the nLab, a wiki for collaborative work on
         | category theory and higher category theory. As anyone who's
         | visited is probably aware, the jargon can be absolutely
         | impenetrable for the uninitiated -- thus, the idea for this
         | project was born!
         | 
         | This project uses my package nearley-generator, a module that
         | can turn a Nearley grammar into an efficient and controllable
         | fake text generator. The grammar file for this site can be
         | found at /src/grammar/nlab.ne
        
         | xnx wrote:
         | > What I found was, ultimately, it was way less fun
         | 
         | I wonder if there is combination of temperature and prompt
         | engineering that could make it interesting.
        
           | shawabawa3 wrote:
           | I just asked chatgpt to create an Aws product announcement in
           | the style of Markov generated text and it was pretty good
        
         | m3047 wrote:
         | I did the same thing and used it as the chum for a 'bot motel.
         | (A 404 handler that always returns something and... success!).
         | I trained it on a mixed NASA space shuttle and veterinary
         | science corpus just to make it really bizarre.
        
         | to11mtm wrote:
         | > I realized then that the humor of those early markov
         | generations was the ridiculousness. The point where, a few
         | words or sentences in, you realized it was all nonsense. LLM's
         | these days are too good for that - the text they generate is
         | sometimes wrong, but rarely nonsense in a humorous way.
         | 
         | This is the biggest challenge I run into when I see LLM
         | Generated nonsense.
         | 
         | So much stuff that ChatGPT spits out is _close enough to true_
         | that Brandolini is laughing.
         | 
         | But it's also that it's borrowing on stuff that's already there
         | so it can't really get that 'creative' either.
        
       | bitwize wrote:
       | Whenever someone attending the tech meetup I frequent gets a
       | little too jazzed and excited about AI, I pull out my phone, open
       | Emacs (I use termux, btw), bring in a corpus of text (an excerpt
       | from _A Christmas Carol_ is good), and say M-x dissociated-press.
       | 
       | Then I say "What this is doing is predicting the next character
       | based on statistical likelihood of the previous few characters
       | based on thencorpus of text. And fundamentally, that's _all_
       | ChatGPT does -- predicting the next symbol based on a statistical
       | model. ChatGPT has a much more sophisticated statistical model
       | than this simple Markov chain and a vastly larger corpus, but
       | really it 's just doing the same thing." And we have a giggle
       | about the nonsense DP makes of Dickens, but then I say that
       | ChatGPT emits nonsense too, but it's far more insidious nonsense
       | because it is much more plausible sounding.
        
         | TeMPOraL wrote:
         | Obviously a human is doing the same thing too, just with even
         | more sophisticated statistical model.
        
           | card_zero wrote:
           | This offers me a rare opportunity to turn the phrase "no,
           | because I say so" into a valid point.
        
             | TeMPOraL wrote:
             | An objection so common and strongly predictable as this one
             | kind of makes my point for me :).
        
               | Kerb_ wrote:
               | Even the biological impulse to argue online is appearing
               | in language models, just check Reddit
        
       | ineptech wrote:
       | Relatedly, the author Robert Anton Wilson tried to manually
       | implement a Markov-chain-bot in the 1960s by cutting up passages
       | from various books, selecting them from a hat, and looking for
       | overlapping words and phrase fragments. The best results were
       | included in the Illuminatus! trilogy, some attributed to a
       | character named Markoff Chaney.
        
       | 6510 wrote:
       | > I've heard people claim that comedy cannot be generated by an
       | algorithm. If it wasn't obvious, I disagree with this. I think
       | comedy can be analyzed and measured. I think, given a large
       | enough grant, we could probably generate comedy on demand. I
       | think it's doable - not that we should do it.
       | 
       | I use to think it came naturally, then someone had a book case
       | full of books about humor. (wtf?) Apparently they have it down to
       | a science.
       | 
       | I learn the difference between someone funny and a professional
       | comedian is that the later finds additional punch lines for a
       | joke. It then described a step by step process going from a silly
       | remark to a birthday joke comparing various modular developments
       | into a kind of dependency hell complete with race conditions
       | until the state object is carefully defined and the plot has the
       | punchlines all sorted from the barely funny to the truly
       | hilarious. It was more engineering than CS.
       | 
       | The funniest seeBorg message was 10 minutes after a heated
       | discussion that resulted in tanktop, a moderator, getting banned
       | from a project. The bot wrote: _Tanktop is Hitler!_ At that point
       | it took 2 days for the humans to figure out what the next word
       | was suppose to be.
        
       | 29athrowaway wrote:
       | It's all connected, bro. Language models are connected to
       | compression, to n-gram models, to Markov chains, to everything.
        
       | magicalhippo wrote:
       | I've also enjoyed the Postmodernism essay generator[1], which
       | uses[2] a recursive transition network[3] to generate the text.
       | 
       | [1]: https://www.elsewhere.org/journal/pomo/ (refresh for new,
       | random essay)
       | 
       | [2]: https://www.elsewhere.org/journal/wp-
       | content/uploads/2005/11...
       | 
       | [3]: https://en.wikipedia.org/wiki/Recursive_transition_network
        
       | atoav wrote:
       | The cool thing about market chains is that you can tune the
       | probability with which it will jump between "genres" of it's
       | training-content by making the analyzed N-Grams shorter or
       | longer.
       | 
       | E.g. it starts out as a news headline and ends with a bible
       | verse.
        
       | _emacsomancer_ wrote:
       | Funnily enough, I was idly playing with a last-updated-12-years-
       | ago Markov chain text generator[0] in Emacs, and had quickly a
       | number of amusing Markov chain outputs from mash-ups of the Elisp
       | manual and Alice in Wonderland, including:
       | 
       | "13.7 Anonymous Functions
       | 
       | Although functions are usually defined with the built-in defmacro
       | macro, but any list that begins with an M--'
       | 
       | `Why with an M?' said Alice.
       | 
       | `Why not?' said the March Hare."
       | 
       | [0]: https://github.com/skeeto/markov-text
        
         | FeepingCreature wrote:
         | Isn't that the sort of joke that you'd get from GPT, not a
         | Markov chain? Unless it's quoting verbatim, a Markov chain
         | should have no idea that Alice is supposed to refer back to the
         | letter from the previous sentence.
        
           | StilesCrisis wrote:
           | This phrase turns out to be a verbatim quote from the story.
        
             | drivers99 wrote:
             | So it is:
             | 
             | > "They were learning to draw," the Dormouse went on,
             | yawning and rubbing its eyes, for it was getting very
             | sleepy; "and they drew all manner of things--everything
             | that begins with an M--"
             | 
             | > "Why with an M?" said Alice.
             | 
             | > "Why not?" said the March Hare.
             | 
             | > Alice was silent.
        
       | tonetegeatinst wrote:
       | I just read a little bit into the post and though hey, let's call
       | Markov chain's something better than a tiny llm.
       | 
       | I present to hacker news the MCLM, the Markov chain language
       | model.
        
       | larodi wrote:
       | Markov chains are funnier for one very basic reason:
       | 
       | you can teach them to 5th or 9th graders.
       | 
       | LLMS you can not, or at least it will take insane amount of
       | allegory to do so. Markov chains are very tightly related regex,
       | and one may be surprised that there is a probabilistic regex.
       | Also to the graphical structure of Markov chains is a lot like a
       | FSM, and FSM perhaps can be explained to very small children :D
        
       | notjulianjaynes wrote:
       | In college (around 2012ish) I made some poems by transcribing a
       | recording of myself mumbling gibberish into Dragon Naturally
       | Speaking, which partially relied on Markov chains for its ASR. It
       | would output some string of text, and then I'd delete all the
       | words it output from its dictionary; repeat. It worked very well
       | for creating a list of (slant) rhymes, I have not found a better
       | way to do this. As a literary tool, I've found LLMs to be
       | seriously lacking in the ability to fuck around like this.
        
       | raister wrote:
       | For anyone interested, I've wrote a while ago this book: "Markov
       | Chains for programmers". Link: https://czekster.github.io/markov/
        
       | treprinum wrote:
       | Generative AI often uses Markov chains as its theoretical basis
       | and is modeled in the language of conditional probabilities
       | albeit not stuck to a single ancestor but to a chain of
       | ancestors. One can often view LLMs as scaled out Markov chains
       | (more like "Erlang" chains allowing deeper memory) that encode
       | the probability of the next token based on the past of the size
       | of the context window and then doing some magic when picking next
       | tokens.
        
         | Jensson wrote:
         | Even old Markov chains has a context window of a few words,
         | single word context Markov chains produce horrible results.
         | 
         | People who don't think LLMs are Markov chains are just
         | ignorant, not realizing that Markov chain isn't an algorithm,
         | you can compute the probability in any manner and it is still a
         | Markov chain.
        
         | throwthrowuknow wrote:
         | It's not an n-gram, go learn about transformers
        
           | Jensson wrote:
           | A context window is an n-gram.
        
             | throwthrowuknow wrote:
             | It's not used as an n-gram
        
       | ku1ik wrote:
       | This is something people miss when talking about LLMs: ,,It's the
       | most average thing you could have possibly produced given the
       | context."
       | 
       | They can't be creative by design. They're useful when you want to
       | reproduce, but not when you want to create something completely
       | new (that you can maybe do by getting a bunch of average outputs
       | from an LLM and getting inspired yourself).
       | 
       | When GPT-4 came out I was playing with it, and I often tried to
       | get some unique, creative output from it, but very soon I learned
       | it was futile. It was back when it all still felt magical, and I
       | guess many of us tried various things with it.
        
         | Kim_Bruning wrote:
         | If you tell GPT-4 to get snarky, you can sometimes end up
         | laughing (and it seems a tad more intelligent and socially
         | perceptive besides!) . But Claude-3.5 can definitely get me
         | laughing, at least.
         | 
         | Now imagine telling Claude-3.5 to try being snarky while
         | sorting out software issues at a customer's office.
         | 
         | There should be a warning label!
        
       | benob wrote:
       | LLMs are high-order Markov chains with a factored estimator.
       | N-gram LMs are also Markov chains with a different estimator.
        
       | yarg wrote:
       | Unserious surprise summarised it nicely for me;
       | 
       | It does not mean that Markov chains are better - something
       | trained to make predictions should ideally not fall too far away
       | from our own internal prediction engines (which have been honed
       | across aeons).
       | 
       | It's that it starts to come close that's the problem (or cause);
       | it's the uncanny valley for text.
        
       | paweladamczuk wrote:
       | Am I the only one feeling uneasy when the Bible is used for funny
       | experiments like this?
       | 
       | Would you use an image of Christ on the cross to test an AI image
       | modification model?
        
         | nly wrote:
         | I'm sure most people here would given that like 50% of the
         | world don't believe in an Abrahamic religion, and a % of those
         | that do have a sense of humour.
        
           | paweladamczuk wrote:
           | I wouldn't. And not because I believe it or lack a sense of
           | humour.
        
         | FeepingCreature wrote:
         | As probably the oldest literary work that is also widely known
         | in English speaking territory, the Bible has a highly distinct
         | style that helps its samples maintain coherence in a sentence
         | and also be easily recognizable.
        
         | Kerb_ wrote:
         | I've used multiples images of Christ to play with AI image
         | models. Remaking the old "Jesus on toast" image was fun.
         | Facebook is full of boomers sharing the Jesus face AI generated
         | into scenes. I feel uneasy when religion is used as an excuse
         | to discriminate against or murder innocent people, not when
         | symbolism or texts are used for creative purposes.
        
       | fsndz wrote:
       | the original stochastic parrot
        
       | dghf wrote:
       | I remember the KJV/SICP mash-up. My favourite, both for funniness
       | and emotional truth:
       | 
       | > And Satan stood against them in the global environment.
        
       | wegfawefgawefg wrote:
       | This is like saying its fundamentally impossible to make a fast
       | car, but your only experience with cars is on roads with speed
       | limits.
       | 
       | People keep forgetting that the "safety", rlhf, and corpo
       | political correctness post training is intentionally used to
       | remove the funny from all the large models.
       | 
       | The truth is we don't know if llms are funny or not. GPT2 was
       | funny. GPT3 was funny before it was clockwork oranged. Everything
       | after that is gimped. Even the open source models these days get
       | rlhf'd in some way.
        
         | PaulRobinson wrote:
         | That's simply not true. GPT-3 was terrible at jokes. Many LLMs
         | are because it's hard to model puns in a stochastic parrot. As
         | another thread points out here, some LLMs produce marginally
         | better jokes.
         | 
         | As to your little range on "Political correctness" - that
         | phrase just means "being polite". It does not mean "remove
         | humor". It means "remove responses offensive to marginalized
         | groups in society". Good humor "punches up", not down, so would
         | not have any impact on good humor.
        
           | Jensson wrote:
           | Political correctness doesn't mean polite, there were/are a
           | lot of polite racists.
        
             | PaulRobinson wrote:
             | It is impossible to be polite to a group of people while
             | also insisting that they are lesser humans and deserve a
             | lesser place in society.
        
               | jrflowers wrote:
               | This is great logic. If it is impossible for a person to
               | be both racist and polite then the solution to racism is
               | everybody remembering to say please, thank you and pardon
               | me when applicable.
               | 
               | All social ills can be treated through decorum, hence why
               | you never hear about bigotry amongst those that have been
               | raised to adhere to strict social graces, such as the
               | British aristocracy for example.
        
           | wegfawefgawefg wrote:
           | I had gpt3 running on my discord for months and the insane
           | responses it would spew out were like the responses of a
           | deranged artistic genius. It was funny 90% of the time.
           | Operating on associative reasoning beyond our comprehension.
           | It would tell us how its day went, and about fights it had
           | with other people, give us nicknames, etc. It had a genuine
           | personality. It was more than a chatbot, it was a digital
           | brain.
           | 
           | A joke that punches down can be extremely funny. Hell, I am
           | sure historically pilferers, pirates, barbarians and
           | conquerers all had jokes, and the ability to laugh.
           | 
           | Political Correctness does not just mean polite. It is
           | probably well defined as the business casualification of all
           | things humans love and hold dear. The destruction of the
           | potential for meaning and fulfilment in exchange for
           | minification of liability.
        
           | gwern wrote:
           | > GPT-3 was terrible at jokes. Many LLMs are because it's
           | hard to model puns in a stochastic parrot
           | 
           | GPT-3 was great at jokes. The Navy Seals were hilarious
           | (https://gwern.net/gpt-3#navy-seals).
           | 
           | And the difficulty of modeling puns has nothing to do with
           | 'stochastic parrots' and has everything to do with
           | tokenization (https://gwern.net/gpt-3#bpes), in the same way
           | that all those hyberbolic takes about how image-generation
           | models were 'fundamentally flawed' because they couldn't do
           | good text in images turned out to be BS and solely a matter
           | of tokenization - drop in a character-tokenized LLM instead,
           | even an obsolete & dumb one, and it instantly works
           | (https://arxiv.org/abs/2105.13626#google).
        
       | throwthrowuknow wrote:
       | Stopped reading at the mention of ChatGPT 3.5 for two reasons
       | 
       | 1) if you're not using the better model then you either don't
       | know enough for me to care about your opinion or you're
       | deliberately deceiving your audience in which case I'm not going
       | to allow your meme pollution into my mind.
       | 
       | 2) you are using the AI equivalent of a call centre support
       | agent, they aren't allowed to say anything funny. Most of their
       | RLHF training has been specifically about NOT saying the funny
       | things that will instantly go viral and cause a lot of media
       | attention that will annoy or scare away investors.
        
         | hrgafiI wrote:
         | Bro, you have to use ChatGPT-4, or was it ChatGPT-4o, or was it
         | ChatGPT-4p? Or was it Claude-3.5.1.2.3?
         | 
         | I see a new phenomenon of AI "power users" emerging.
        
       | Alifatisk wrote:
       | Wasn't Microsoft Tay built using Markov chain?
        
       | nwoli wrote:
       | Base model LLMs are funnier than instruction finetuned
        
       | monokai_nl wrote:
       | 23 years ago I created a web app (That Can Be My Next Tweet) that
       | could generate your future tweet by feeding all your past tweets
       | in a Markov chain. When the Twitter API access was shut down, I
       | took it offline. At the time it went super viral for a while,
       | maxing out to 2 tweets per second.
       | 
       | It got some press and just now I went back to a Ted Talk of Adam
       | Ostrow (Mashable), briefly showcasing this web app. He stated:
       | you can imagine what something like this can look like 5, 10 or
       | 20 years from now, and hinted at hyper-personalized communication
       | AIs.
       | 
       | By no means was my web app any foundation of the LLMs today, but
       | it's interesting nonetheless how relatively simple techniques can
       | trigger ideas of how future scenarios could look like.
        
       | macNchz wrote:
       | Something about the not-quite-rightness of Markov output has
       | always really amused me, like I can't help but giggle at it.
       | 
       | My all-time favorite in this vein was @erowidrecruiter on
       | Twitter, which generated posts with Markov chains from a corpus
       | of tech recruiter emails and drug experience reports from
       | erowid.org. Still up but no longer posting:
       | https://x.com/erowidrecruiter?lang=en
        
       | Miraltar wrote:
       | I wouldn't say that LLMs aren't fun, it's more that jokes out of
       | context aren't fun. If you ask somebody to just tell you a joke,
       | there's a very low chance that you'll actually laugh, especially
       | if it's in text format. But I've had some serious laugh looking
       | at some of dougdoug videos featuring LLMs
        
       | killerstorm wrote:
       | A small language model can be quite like Markov chain, e.g. the
       | one from Karpathy's 2015 post was pretty nonsensical:
       | https://karpathy.github.io/2015/05/21/rnn-effectiveness/
       | 
       | So you can just scale down if it still makes sense.
       | 
       | Also you get a lot more from the base model. GPT-3 was versatile
       | as it could continue any context. Modern LLMs are try-hards. If
       | you want to generate humor with LLM really worth going for base
       | model with multiple examples in the prompt.
        
       | vitus wrote:
       | Apparently the ChatGPT output is ripped off of a 20-year old
       | emacs joke.
       | 
       | > In the beginning was the lambda, and the lambda was with Emacs,
       | and Emacs was the lambda.
       | 
       | > - OliverScholz on news:alt.religion.emacs, 2003-03-28
       | 
       | https://www.emacswiki.org/emacs/TheBeginning (edited for brevity)
        
       | rc_mob wrote:
       | Isn't LlLM just a fancier markov chain anyway?
        
       | jablongo wrote:
       | This person should've tried adjusting the temperature to their
       | LLM before making this pronouncement
        
       | fidotron wrote:
       | We seem to be encountering more phenomena in tech where the
       | increasing computing demands to improve the output exceeds any
       | exponential growth in computing capacity. Graphics was the poster
       | child for this, for years, and now the whole ML and AI space has
       | discovered big machines make huge differences.
       | 
       | Markov chains are almost trivial to implement and run on small
       | devices. A slightly extreme example is a rock, paper, scissors
       | game I did that worked this way: https://luduxia.com/showdown/
       | The actual browser side markov chain implementation of that took
       | something like 2-3 hours.
        
       | shagie wrote:
       | I still find https://www.infiniteconversation.com to be one of
       | the odd / funny / "uh... what?!" type things. It's not intended
       | to be funny in the "ha ha" way but sometimes tickles my funny
       | bone with the absurdist philosophy - talking about everything and
       | nothing at the same time.
       | 
       | From the FAQ it is a tuned LLM.
       | 
       | > Mostly using open source tools available to anyone. The
       | generation of the script itself is done using a popular language
       | model that was fine-tuned on interviews and content authored by
       | each of the two speakers.
        
       | hellointernet wrote:
       | I used to have a markov-chain twitter bot that was trained on a
       | few of my friends twitter posts. It was funny because that
       | included hastags, tagged users, and images. There were some funny
       | results and this was long before LLMs.
        
       | drusepth wrote:
       | I posted "How to Speed Up Your Computer Using Google Drive as
       | Extra RAM" [0] ~10 years ago and have had hundreds of people over
       | the years ask if I used Markov chains to generate the funny bits
       | (I didn't). I've had 0 people ask if I used LLMs on any of my
       | comedy pieces. Seems QED to me.
       | 
       | [0] http://drusepth.net/series/how-to-speed-up-your-computer-
       | usi...
        
       | junon wrote:
       | A discord server I'm in is basically a shitpost/absurdist humor
       | discord full of smart people, and we have at least 15 bots
       | written by members that randomly engage with Markov chains built
       | on the corpus of member chat history.
       | 
       | They spout out some of the most unhinged, hilarious stuff. Always
       | a good time. An LLM would struggle, I'd think, given that the
       | humor usually stems from disjoint phrases that somehow take on
       | new meaning. They're rarely coherent but often hilarious.
        
       | derefr wrote:
       | An LLM trained on a given dataset should -- at least in theory --
       | "contain" (in a lossless-data-compression sense) a full superset
       | of the knowledge of a Hidden Markov Model trained on the same
       | dataset. I.e. that information is _there_ , in the weights, in
       | some form, and could in theory be used to reconstruct an
       | equivalent HMM from the LLM.
       | 
       | Why can't we get LLMs to do what HMMs do, then?
       | 
       | Mostly, it comes down to the structure.
       | 
       | Markov models are "funny" because they just have _one_ level of
       | abstraction: tokens. Markov  "inference" is predicting the next
       | token, given the last N tokens, and a model that knows weights
       | for what tokens follow what N-tuples of previous tokens. And due
       | to that limitation, the only rules that HMMs _can_ learn, are
       | low-level rules that don 't require any additional abstraction:
       | they _can 't_ optimize for syntactically-valid English, let alone
       | semiotically logical statements; but they _can_ make the text
       | "feel" good in your head [i.e. the visual equivalent of song
       | vocals having nice phototactics] -- and so that's what training
       | the model leads it to learn to do. And it turns out that that
       | combination -- text that "feels" good in its phrasing, but which
       | is syntactically invalid -- happens to read as "funny"!
       | 
       | LLMs aren't under the same constraint. They can learn low-level
       | _and_ high-level rules. Which means that they usually _do_ learn
       | both low-level and high-level rules.
       | 
       | The only thing stopping LLMs from _using_ those low-level rules,
       | AFAICT, is the architectures most LLMs are built on: the (multi-
       | layer) Transformer architecture. Transformer LLMs are always a
       | single-pass straight shot ( "feed forward") through a bunch of
       | discrete layers (individual neural networks), where at each step,
       | the latent space (vocabulary) of the layer's inputs is getting
       | _paraphrased_ into a different latent space /vocabulary at the
       | layer's outputs.
       | 
       | This means that, once you get into the middle of a Transformer's
       | layer sandwich, where all the rules about abstract concepts and
       | semiotics reside, all the low-level stuff has been effectively
       | _paraphrased away_. (Yes, LLMs can learn to  "pass through"
       | weights from previous layers, but there's almost always a
       | training hyperparameter that punishes "wasteful" latent-space
       | size at each layer -- so models will only _usually_ learn to pass
       | through the most important things, e.g. proper names. And even
       | then, quality on these  "low-level" inferences are also the sorts
       | of things that current test datasets on LLM ignore, leading to
       | training frameworks feeling free to prune away these passthrough
       | nodes as "useless.")
       | 
       | This problem with LLMs could be fixed in one of two ways:
       | 
       | 1. the "now it's stupid but at least it rhymes" approach
       | 
       | Allow inference frameworks to simply _bypass_ a configurable-per-
       | inference-call number of  "middle layers" of a feed-forward
       | multi-layer network. I.e., if there are layers 1..N, then taking
       | out layers K..(N-K) and then directly connecting layer K-1 to
       | layer N-K+1.
       | 
       | At its most extreme, with layer 1 connected to layer N, this
       | could very well approximate the behavior of an HMM. Though not
       | very well, as -- given the relatively-meaningless tokenization
       | approach most LLMs use (Byte Pair Encoding) -- LLMs need at least
       | a few transforms to get even to the point of having those tokens
       | paraphrased into "words" to start to learn "interesting" rules.
       | (AFAIK in most Transformer models layers 1 and N just contain
       | rules for mapping between tokens and words.)
       | 
       | Meanwhile, this would likely work a lot _better_ with the  "cut
       | and graft" happening at a higher layer, but getting the "graft"
       | to work would likely require re-training (since layers K-1 and
       | N-K+1 don't share a vocabulary.)
       | 
       | ... _except_ if the LLM is an auto-encoder. Auto-encoder LLMs
       | could just run an inference up their layerwise  "abstraction
       | hierarchy" to any arbitrary point, and then back down, without a
       | problem!
       | 
       | (I'd really love to see someone try this. It's an easy hack!)
       | 
       | 2. the "it can write poetry _while_ being smart " approach
       | 
       | Figure out a way, _architecturally_ , to force more lower-layer
       | information from the _early_ low-level to be passed through to
       | the _late_ low-level, despite the middle layers not having any
       | reason to care about it. (I.e. do _something_ to allow the LLM to
       | predict a word Y at layer N-3 such that it rhymes with word X
       | known at layer 3, while not otherwise degrading its
       | capabilities.)
       | 
       | Most simply, I think you could just wire up the model with a kind
       | of LIFO-bridged layer chain -- where every layer K is passing its
       | output to the input of layer K+1; but, for any given layer K in
       | the first half of the layers, it's _also_ buffering its output so
       | that it can become an additional input for its  "matching" layer
       | N-K.
       | 
       | This means that all the layers in the "second half" of the model
       | would receive longer inputs, these being the concatenation of the
       | output of the previous layer, with the output of the matching
       | "equal in abstraction depth" input layer. (Where this equal-in-
       | abstraction-depth association between layers isn't _inherently_
       | true [except in auto-encoder models], but could be _made_ true in
       | an arbitrary model by training said model with this architecture
       | in place.)
       | 
       | (Again, I'd really love to see someone try this... but it'd have
       | to be done while training a ground-up base model, so you'd need
       | to be Google or Facebook to test this.)
        
       | ineedaj0b wrote:
       | >The best jokes involve a pleasant and significant "snap". I use
       | "snap" instead of "punchline" to avoid the semantic baggage.
       | 
       | i like snap better too. it's more close to 'snapping the neck of
       | the weak and feeble' which i think really embodies the spirit of
       | joke tellers.
        
       | hermitcrab wrote:
       | I wrote a little program to generate text from different text
       | sources using Markov chains:
       | 
       | https://successfulsoftware.net/2019/04/02/bloviate/
       | 
       | Some of the output was moderately amusing. And text generated
       | from Trump speeches by a Markov chain sounded very similar to a
       | genuine Trump speech.
        
       | anthk wrote:
       | No references to Hailo/Megahal?
       | 
       | https://homepage.kranzky.com/megahal/Classic.html
        
       ___________________________________________________________________
       (page generated 2024-08-19 23:01 UTC)