[HN Gopher] Show HN: Visual intuitive explanations of LLM concep...
       ___________________________________________________________________
        
       Show HN: Visual intuitive explanations of LLM concepts (LLM
       University)
        
       Hi HN,  We've just published a lot of original, visual, and
       intuitive explanations of concepts to introduce people to large
       language models.  It's available for free with no sign-up needed
       and it includes text articles, some video explanations, and code
       examples/notebooks as well. And we're available to answer your
       questions in a dedicated Discord channel.  You can find it here:
       https://llm.university/  Having written
       https://jalammar.github.io/illustrated-transformer/, I've been
       thinking about these topics and how best to communicate them for
       half a decade. But this project is extra special to me because I
       got to collaborate on it with two of who I think of as some of the
       best ML educators out there. Luis Serrano of
       https://www.youtube.com/@SerranoAcademy and Meor Amer, author of "A
       Visual Introduction to Deep Learning"
       https://kdimensions.gumroad.com/l/visualdl  We're planning to roll
       out more content to it (let us know what concepts interest you).
       But as of now, it has the following structure (With some links for
       highlighted articles for you to audit):  ---  Module 1: What are
       Large Language Models  - Text Embeddings
       (https://docs.cohere.com/docs/text-embeddings)  - Similarity
       between words and sentences
       (https://docs.cohere.com/docs/similarity-between-words-and-
       sentences)  - The attention mechanism  - Transformer models
       (https://docs.cohere.com/docs/transformer-models HN Discussion:
       https://news.ycombinator.com/item?id=35576918)  - Semantic search
       ---  Module 2: Text representation  - Classification models
       (https://docs.cohere.com/docs/classification-models)  -
       Classification Evaluation metrics
       (https://docs.cohere.com/docs/evaluation-metrics)  - Classification
       / Embedding API endpoints  - Semantic search  - Text clustering  -
       Topic modeling (goes over clustering Ask HN posts
       https://docs.cohere.com/docs/clustering-hacker-news-posts)  -
       Multilingual semantic search  - Multilingual sentiment analysis
       ---  Module 3: Text generation  - Prompt engineering
       (https://docs.cohere.com/docs/model-prompting)  - Use case ideation
       - Chaining prompts  ---  A lot of the content originates from
       common questions we get from users of the LLMs we serve at Cohere.
       So the focus is more on application of LLMs than theory or training
       LLMs.  Hope you enjoy it, open to all feedback and suggestions!
        
       Author : jayalammar
       Score  : 207 points
       Date   : 2023-05-25 12:50 UTC (10 hours ago)
        
       | jfarmer wrote:
       | > We've just published a lot of original, visual, and intuitive
       | explanations of concepts to introduce people to large language
       | models.
       | 
       | Kinda frustrating that the main link dumps me onto what reads
       | like a university syllabus, and nothing original, visual, or
       | intuitive.
       | 
       | If I click through the sections in order, there are 5 "preamble"
       | sections describing logistical and other meta-information about
       | the course. All text.
       | 
       | The first pedagogical image I see this this, which tbh doesn't
       | make any sense to me: https://files.readme.io/329efd5-image.png
       | 
       | "Where would you put the word apple?"
       | 
       | The image alone doesn't work without reading the supporting text
       | very closely. I also have to have a pretty sophisticated
       | understanding to get the idea that I can represent words as
       | points in a plane.
       | 
       | Representing the words as icons is fundamentally confusing, too,
       | I think. After all, maybe I say the word "apple" should go in "d"
       | because it has at least two senses: a fruit and a machine.
       | 
       | Oh, sorry, you failed your first quiz!
       | 
       | "You can't fail the quiz, you're not being graded." Then why call
       | it a quiz? Why use classroom metaphors unless you want students
       | to fall back on classroom behaviors?
       | 
       | Of course, you know the #1 student classroom behavior: not
       | reading the syllabus.
       | 
       | But if I have no trouble with that level of abstraction, what's
       | with the cutesty way of describing the problem?
       | 
       | Get rid of all this chocolate-covered broccoli. Just say and show
       | what you mean.
       | 
       | Computers like numbers. Vectors are lists of numbers. Vectors
       | come with concepts like length and distance. We want to transform
       | words into vectors so that words we think of as similar are close
       | together as vectors.
       | 
       | There are many ways to translate words into vectors. Here are
       | 5-10 examples of how we might do that. What are some pros/cons?
       | What relationship(s) do they make clear or obscure?
       | 
       | Get them thinking about what it means to embed things and why
       | we'd want to embed words one way vs. another. That'll pay
       | dividends. Having them remember "where the apple icon goes" isn't
       | going to be something they'll benefit from reflecting on in any
       | future experience.
        
         | SanderNL wrote:
         | That's just you. I find the apple thing obvious on first sight
         | and cements an intuition that talk of vectors does not or
         | differently. Why choose?
        
           | TeMPOraL wrote:
           | IDK, with only GP's comment and the image for context, and
           | some earlier familiarity with the concept of embeddings, I'd
           | put the apple... somewhere near the middle of the picture,
           | nowhere near any of the lettered placement points. To me, the
           | apple fits about equally to all the clusters, excepting
           | bottom-left one, where I couldn't figure out the relationship
           | in few seconds.
           | 
           | EDIT: in some sense, the whole idea and usefulness of
           | embedding comes from it working like the _inverse_ of this
           | kind of  "intelligence"/"logic" tests - tests that ask you to
           | which, out of several groups, a new symbol belongs. Usually
           | there are couple competing answers, but the test has you
           | guess the one that's the Right One. Embedding is about
           | subverting this - it's about telling the test giver, "you
           | know what, it actually belongs to _all_ of them ", and adding
           | enough dimensions to the problem space that you can have all
           | the groups be far away, from each other, and the new thing
           | close to all of them - to each along a different dimension.
        
         | jayalammar wrote:
         | The landing page is technically the course overview. I'd love
         | to hear what you think would've made it more engaging for you.
         | We can probably pull up some of the visuals to it as a preview.
         | Let me see what we can do on that front.
        
           | digging wrote:
           | The landing page should probably be more of a marketing page
           | explaining why the course is worth checking out, with low
           | information density and large visuals
        
           | samstave wrote:
           | _start_ with a visual hors d 'oeuvre
        
           | jfarmer wrote:
           | I understand that, of course.
           | 
           | Is your goal to make it feel like a typical university course
           | or like something else?
           | 
           | If like a typical university course, start with a syllabus
           | and a course description and all the logistics.
           | 
           | If like something else then the first 10 seconds of the
           | experience should make people go "Oh, this is different."
           | 
           | What's happening in the first 1 second, 30 seconds, 1 minute,
           | 10 minutes, etc. that are reflective of the rest of my
           | experience? That will serve as an advanced organizer for
           | what's to follow?
           | 
           | The very first graphic I see is labeled as a "quiz" and
           | requires me to read a bunch of surrounding text to make sense
           | of it.
           | 
           | That's the vibe: a promise of something visual and intuitive,
           | first consummated by a long syllabus and a quiz.
        
             | jayalammar wrote:
             | The goal is to make the materials as accessible as
             | possible. So we're definitely not limited to the structure
             | of a typical university course and are happy to iterate on
             | it.
             | 
             | I appreciate you elaborating on your feedback. Thank you.
        
           | enumjorge wrote:
           | I know this course is meant to be content marketing for your
           | company, and I don't mean this in a derogatory sense--you're
           | doing content marketing right by providing high quality
           | information to an audience who could be interested in your
           | services--but it's a bit odd that going to
           | https://llm.university unceremoniously drops me in the middle
           | of what looks to be your product's documentation. That's not
           | wrong necessarily, but it all adds up to arriving at a site
           | and having a feeling of "what am I looking at?".
           | 
           | Like the grandparent comment mentioned, the pitch is "visual,
           | intuitive explanations", but I don't see that on the landing
           | page. I'm looking for a way to get to the start of your
           | content, but the top and left hand menus don't help and are,
           | if anything, confusing until I realize that I'm now inside of
           | a larger set of documentation unrelated to the course.
           | 
           | Below the fold we see a "Let's get started!", but the link I
           | see, Structure of the Course" doesn't sound like getting
           | started. It sounds like more front matter. From the nav menu
           | I see that after that I still won't get to the content, but
           | instead a page about the instructors. Do I really need to
           | read blurbs of the instructors before I get to the meat of
           | the course?
           | 
           | It just feels like too much wrapping paper and packaging to
           | get to the good stuff--and it really does seem like good
           | stuff! And I think the way that you've embedded this course
           | into the rest of your documentation prevents you from
           | presenting it in a structure that is more familiar and easy
           | to navigate (e.g. an 'About' link at the top that talks about
           | the instructors and Cohere).
           | 
           | It might be frustrating to put a lot of time and effort into
           | high quality materials, only for people to not want to spend
           | a few minutes looking around, but from the audience
           | perspective, there's a sea of LLM-related content out there.
           | I want to quickly determine if this is worth adding to my
           | already-too-long list of LLM related bookmarks of things I
           | want to read.
        
         | pumanoir wrote:
         | I read and watched almost all the modules and for me it as it
         | is, perfectly accomplishes the intention of the course as
         | stated by the op.
         | 
         | Your suggestion may work for other intents (like having a
         | Schaum's Outline of LLM's) and I would also love to have that
         | additional material (maybe yourself could provide it as it
         | seems you have a clear idea)
        
       | ZeroCool2u wrote:
       | This looks like a pretty great resource and I'm looking forward
       | to checking it out. My only ask is that since it's the type of
       | site I'd probably be looking at for quite a while it'd be nice if
       | it had a dark mode.
        
       | kfarr wrote:
       | This is pretty excellent material, even just spending 10 minutes
       | I have learned more than most random blog posts in the past few
       | months.
        
       | jwilber wrote:
       | Love these.
       | 
       | I've also made some visual explanations for ml for Amazon,
       | available at https://mlu-explain.github.io/
       | 
       | Big fan of your early work, Jay, a big inspiration for me!
        
       | axpy906 wrote:
       | You sir get an up vote for simply being Jay on HN. Thank you for
       | all you do.
        
       | sva_ wrote:
       | Interesting, just yesterday I was googling something about
       | transformers and had arrived on your page.
        
       | beeburrt wrote:
       | You know what would be helpful? A little tag or something at the
       | beginning of each section that says about how long it's going to
       | take.
       | 
       | From what I've seen so far, it looks awesome. I'm excited to dive
       | in. Thanks!
        
       | toppy wrote:
       | Jay, I liked your tutorial on Transformer models. Helped me a lot
       | when I read it in 2020. One of the best resources on a topic
       | then. Thanks for your work! Fingers crossed for your new
       | endeavour.
        
         | jayalammar wrote:
         | Thank you so much (and others for your kind messages). Glad you
         | found them useful! Writing is the best way for me to learn, I
         | find.
        
       | HarHarVeryFunny wrote:
       | I'm not sure how much is actually known to write about, but what
       | I'd like to see explained is how transformer-based LLMs/AI
       | _really_ work - not at the mechanistic level of the architecture,
       | but in terms of what they learn (some type of world model ?
       | details, not hand waving!) and how do they utilize this when
       | processing various types of input ?
       | 
       | What type of representations are being used internally in these
       | models ? We've got token embeddings going in, and it seems like
       | some type of semantic embeddings internally perhaps, but exactly
       | what ? OTOH it's outputting words (tokens) with only a linear
       | layer between the last transformer block and the softmax, so what
       | does that say about the representations at that last transformer
       | block ?
        
         | uoaei wrote:
         | > not at the mechanistic level of the architecture, but in
         | terms of what they learn (some type of world model ? details,
         | not hand waving!)
         | 
         | https://imgs.xkcd.com/comics/tasks.png
        
           | HarHarVeryFunny wrote:
           | Sure - but it's still the interesting part!
           | 
           | I'm sure some of key players know at least a little, but they
           | don't seem inclined to share. In his Lex Fridman interview
           | Sam Altam said something along the lines of "a LOT of
           | knowledge went into designing GPT-4", and there's a time gap
           | between GPT-3 (2020) and GPT-4 (2022) where it seems they
           | spent a lot of time probably trying to understand it, among
           | other things.
           | 
           | It seems the way values are looked up via query/key and added
           | must constrain representations quite a bit, and comparing
           | internal activations for closely related types of input might
           | be one way to start to understand what's going on.
           | 
           | A high level understanding of what the model has learnt may
           | be the last thing to fall, but understanding the internal
           | representations would go a long way towards that.
        
           | quickthrower2 wrote:
           | Are you saying no one really knows how these things work? I
           | am very curious about if you can "peer into the weights". I
           | have seen simple examples of that with image recognition but
           | only for early layers.
        
         | jayalammar wrote:
         | This is a field I find fascinating. It's generally the research
         | field of Machine Learning Interpretability. The BlackboxNLP
         | workshop is one of the main places for investigating this and
         | is a very popular academic workshop
         | https://blackboxnlp.github.io/
         | 
         | One of the most interesting presentations in the last session
         | of the workshop is this talk by David Bau titled "Direct Model
         | Editing and Mechanistic Interpretability". David and his team
         | locate exact information in the model, and edit it. So for
         | example they edit the location of the Eiffel Tower to be in
         | Rome. So whenever the model generates anything involving
         | location (e.g., the view from the top of the tower), it
         | actually describes Rome
         | 
         | Talk: https://www.youtube.com/watch?v=I1ELSZNFeHc
         | 
         | Paper: https://rome.baulab.info/
         | 
         | Follow-up work: https://memit.baulab.info/
         | 
         | There is also work on "Probing" the representation vectors
         | inside the model and investigating what information is encoded
         | at the various layers. One early Transformer Explainability
         | paper (BERT Rediscovers the Classical NLP Pipeline
         | https://arxiv.org/abs/1905.05950) found that "the model
         | represents the steps of the traditional NLP pipeline in an
         | interpretable and localizable way: POS tagging, parsing, NER,
         | semantic roles, then coreference". Meaning that the
         | representations in the earlier layers encode things like
         | whether a token is a verb or noun, and later layers encode
         | other, higher-level information. I've made an intro to these
         | probing methods here: https://www.youtube.com/watch?v=HJn-
         | OTNLnoE
         | 
         | A lot of applied work doesn't require interpretability and
         | explainability at the moment, but I suspect the interest will
         | continue to increase.
        
           | HarHarVeryFunny wrote:
           | Thanks, Jay!
           | 
           | I wasn't aware of that BERT explainability paper - will be
           | reading it, and watching your video.
           | 
           | Are there any more recent Transformer Explainability papers
           | that you would recommend - maybe ones that build on this and
           | look at what's going on in later layers?
        
             | jayalammar wrote:
             | Additional ones that come to mind now are:
             | 
             | Transformer Feed-Forward Layers Are Key-Value Memories
             | https://arxiv.org/abs/2012.14913
             | 
             | The Dual Form of Neural Networks Revisited: Connecting Test
             | Time Predictions to Training Patterns via Spotlights of
             | Attention https://arxiv.org/abs/2202.05798
             | 
             | https://github.com/neelnanda-io/TransformerLens
        
               | HarHarVeryFunny wrote:
               | That's great - thank you!
        
       | senttoschool wrote:
       | Looks great. Thank you.
        
       | abrinz wrote:
       | Nice work!
       | 
       | Minor nitpick: The intercom button obscures the topic expansion
       | button for the final appendix in the nav menu. Maybe move
       | intercom to the bottom right instead?
        
       ___________________________________________________________________
       (page generated 2023-05-25 23:00 UTC)