[HN Gopher] Computing Inside an AI
___________________________________________________________________
Computing Inside an AI
Author : pongogogo
Score : 99 points
Date : 2024-12-14 09:52 UTC (1 days ago)
(HTM) web link (willwhitney.com)
(TXT) w3m dump (willwhitney.com)
| t0lo wrote:
| Gotten to a point where i have a visceral reaction to any
| intersection of AI and Psychological thought. As a human it
| dependably makes me feel sick. We're going to see a lot of
| changes that are good, and not so good.
| FezzikTheGiant wrote:
| i think lots of apps are going to go in the adaptive/generate ui
| direction - even if it starts a lot simpler than generating the
| code
| Towaway69 wrote:
| Perhaps an UI passed on a Salvador Dali painting - perhaps we
| should also be questioning our UI concepts of slider, button,
| window and co.
| ilaksh wrote:
| I think that Cerebras and Groq would be fun to experiment with
| using normal LLMs for generating interfaces on the fly, since
| they are so fast.
| FezzikTheGiant wrote:
| what's the cost difference between groq/cerebras vs using
| something else for inferencing open source models? I'm guessing
| the speed comes at a cost?
| ilaksh wrote:
| I don't know off the top of my head, only played with it a
| little not seriously.
| FezzikTheGiant wrote:
| fair enough
| el_isma wrote:
| 0.6/1$ per M tokens in groq/cerebras vs 0.3$ per M tokens in
| deepinfra (for llama 3.3 70b)
|
| But note the free tiers for groq and cerebras are _very_
| generous.
| deadbabe wrote:
| "Wherever they use AI as a tool they will, in the end, do the
| same with human beings."
| falcor84 wrote:
| Why is that in quote marks? I couldn't find any matches in TFA
| nor elsewhere.
|
| And as to the sentence itself, I'm unclear on what exactly it's
| saying; people have been using other people at tools from
| before recorded history. Leaving aside slavery, what is it that
| you would say that HR departments and capitalism in general do?
| llm_trw wrote:
| >Acting like a computer means producing a graphical interface. In
| place of the charmingly teletype linear stream of text provided
| by ChatGPT, a model-as-computer system will generate something
| which resembles the interface of a modern application: buttons,
| sliders, tabs, images, plots, and all the rest. This addresses
| key limitations of the standard model-as-person chat interface:
|
| Oh boy I can't wait for GPT Electron, so I can wait 60 seconds
| for the reply to come back and then another 60 seconds for it to
| render a sad face because I hit some guard rail.
| Towaway69 wrote:
| Not forgetting the computing power required to generate that
| single sad face.
| doug_durham wrote:
| I appreciated the thought given in this piece. However in the age
| of LLMs these types of "what if we look at problems this way..."
| seem obsolete. Instead of asking the question, just use an LLM to
| help you build the proof of concept and see if it works.
|
| Back in the pre-LLM days these types of thought pieces made sense
| as a call to action because the economics of creating
| sophisticated proof's of concept was beyond the abilities of any
| one person. Now you can create implementations and iterate at
| nearly the speed of thought. Instead of telling people about your
| idea, show people your idea.
| ilaksh wrote:
| I'm kind of with you in that you could build something kind of
| like it based on a fast LLM. But what they are actually talking
| about is a new cutting edge ML model that takes a huge amount
| of data and compute to train.
| doug_durham wrote:
| I see your point, but that's not what I took away from the
| article. To me it seems like an alternate way to use existing
| models. In any case I think you could make a PoC that touched
| on main idea using an existing model.
| ilaksh wrote:
| Yes you can and there is at least one example, a web
| application where you enter a URL and the LLM automatically
| generates the page including links, you click a link, the
| LLM fills it in on the fly. I can't remember the name of
| it.
|
| But they mention things like Oasis in the article that use
| a specialized model to generate games frame-by-frame.
| achierius wrote:
| But LLMs are nowhere near being able to do what you suggest,
| for anything that one person wouldn't have been able to do
| beforehand.
| llm_trw wrote:
| If I cared enough about guis I could implement what the op
| said in two months by myself with unlimited access to a good
| coding model, something like qwq.
|
| The issue is training a multi modal model that can make use
| of said gui.
|
| I don't believe that there is a better general interface than
| text however so I won't bother.
| throw646577 wrote:
| No amount of repeating is ever, unfortunately, going to get
| this across; LLMs are founding a new kind of alchemy.
| bobxmax wrote:
| They absolutely are. I'm somewhat non-technical but I've been
| using Claude to hack MVPs together for months now.
| doug_durham wrote:
| Not in my experience. Used properly an LLM is an immense
| accelerator. Every time this comes up on HN we get the same
| debate. There is the side that say LLMs are time wasting
| toys, and the other which says they are transformative. You
| need to know how critically ask questions and critique
| answers to use a search engine effectively. The same is true
| for an LLM. Once you learn how to pose your questions and the
| correct level to ask your questions it is a massive
| accelerator.
| beepbooptheory wrote:
| If we are never going to take the time to write, articulate, or
| even think about things anymore, how can we still feel like we
| have the authority or skills or even context to evaluate what
| we generate?
| dartos wrote:
| > because the economics of creating sophisticated proof's of
| concept was beyond the abilities of any one person
|
| What?
|
| Are you trying to say it's too expensive for a single worker to
| make a POC, or that one person can't make a POC?
|
| Either way that's not true at all...
|
| There have been one person software shops for a long long time.
| mirekrusin wrote:
| Why stop there? Let it figure out how to please us without need
| for sliders etc. We'll just relax. Now that's paradigm shift.
| Towaway69 wrote:
| That was my thought, is model as a computer the best we can do?
|
| Isn't that limiting our perspective of AIs models to being
| computers and what computers can't do, so the model can't do.
| uxhacker wrote:
| So what are the other models we could use ?
| Towaway69 wrote:
| Perhaps metaphor would be better terminology than models.
|
| AI as an animal we are trying to tame. Why does it have to
| be a machine metaphor?
|
| Perhaps AI is a ecosystem upon which we all interact at the
| same time. The author pointed out that the one-on-one
| interaction is too slow for the AI - perhaps a many-to-one
| metaphor would be more appropriate.
|
| I agree with the author that we are using the wrong
| metaphors when interacting with AI but personally I think
| we should go beyond repeating the mistakes of the past by
| just extending our current state, I.e. going from a
| physical desktop to a virtual ,,desktop".
| uxhacker wrote:
| How about powerpoint as a metaphor? The challenge we face
| is how to explain something complex. But also do we not
| get into the issue that the Medium is the Message? That
| just by using voice rather than an image do we not change
| the meaning. And is that necessarily bad?
| Towaway69 wrote:
| > And is that necessarily bad?
|
| Selecting a metaphor implies that ones imagination is -
| at least partially - constrained by the metaphor. AI as a
| powerpoint would make using AI for anything other than
| presentations seem unusual since that what powerpoint is
| used for.
|
| Also when the original author "models as computers" what
| does "computer" represent? A mainframe computer the size
| of small apartment, a smartphone, a laptop, turing
| machine or some collection of server racks. Even the term
| "computer" is broad enough to include many forms of
| interaction. I interact with my smartphone visually while
| with my server rack textually, yet both are computers.
|
| At least initially, AI seems to be something completely
| different, almost god-like in its ability to provide us
| with insightful answers and creative suggestions. God-
| like meaning that judged from the outside, AI has the
| ability to provide comforting support in times of need,
| which is one characteristic of a god-like entity.
|
| Powerpoint wasn't built to be a god-like provider of
| answers to the most important questions. It would indeed
| be a surprising if a PP presentation made the same impact
| as religious scriptures - to thousands/millions of
| people, not referring to individual experiences.
| TeMPOraL wrote:
| > _That was my thought, is model as a computer the best we
| can do?_
|
| Nah, there's a better option. Instead of a computer, we
| could... go for treating it as a _person_.
|
| Yes, that's inverting the whole point of the
| article/discussion here, but think about it: the main
| limitation of a computer is that we have to tell it step-by-
| step what to do, because it can't figure out what we _mean_.
| Well, LLMs can.
|
| Textual chat interface is annoying, particularly the way it
| works now, but I'd say the models are fundamentally right
| where they need to be - it's just that a human person doesn't
| use a single thin pipe of a text chat to communicate with the
| world; they may converse with others explicitly, but that's
| augmented by orders of magnitude more of contextual inputs -
| sights, sounds, smells, feelings, memory, all combining into
| higher-level memories and observations.
|
| This is what could be the better alternative to "LLM as
| computer": double down on tools and automatic context
| management, so the user inputs are merely the small fraction
| of data that's provided explicitly; everything else, the
| model should watch on its own. Then it might just be able to
| reliably Do What I Mean.
| holoduke wrote:
| Amateur question: is there a point possible where a llm is using
| less compute power to calculate a certain formula compared to
| regular computation?
| logicchains wrote:
| When the LLM knows how to simplify/solve the formula and the
| person using it doesn't, it could be much more efficient than
| directly running the brute-force/inefficient version provided
| by the user. A simple example would be summing all numbers from
| 0 to a billion; if you ask o1 to do this, it uses the O(1)
| analytical solution, rather than the naive brute-force O(n)
| approach.
| stevesimmons wrote:
| Though even in this case, it is enormously more efficient to
| simply sum the first billion integers rather than find an
| analytic solution via a 405b parameter LLM...
| piotr93 wrote:
| Yes an llm could do it since it can predict the next token for
| pretty much anything. But what's the error margin you are ready
| to tolerate?
| JTyQZSnP3cQGa8B wrote:
| I wish we had some kind of _Central Processing Unit_ to do this
| instead of relying on hallucinating remote servers that need a
| subscription.
| piotr93 wrote:
| The only computations that an LLM does are backprops and forward
| passes. It can not run any arbitrary program description. Yes, it
| will hallucinate your program's output if you feed it some good
| enough starting prompt. But that's it.
| FezzikTheGiant wrote:
| genuinely what's the point of this comment? are you allergic to
| cool stuff? honestly curious as to what you were trying to
| achieve by this comment.
|
| nowhere in this post does the author say that it's ready with
| the current state of models, or he'd use a foundation model for
| this. why the hate?
| logicchains wrote:
| An LLM with chain of thought and unbounded compute/context can
| run any program in PTIME: https://arxiv.org/abs/2310.07923 ,
| which is a huge class of programs.
| piotr93 wrote:
| Woah super interesting, I didn't know about this. Will def
| read it! Seems like I was wrong?
| csmpltn wrote:
| > "An LLM with unbounded compute/context"
|
| This isn't a thing we have, or will have.
|
| It's like saying that a computer with infinite memory, CPU
| and power can certainly break SHA-256 and bring the world's
| economy down with it.
| stavros wrote:
| No, it's like saying "a computer can't crack SHA hashes, it
| can only add and subtract numbers together" "a computer can
| crack any SHA hash" "yes, given infinite time".
|
| The fact that you need infinite time for some of the stuff
| doesn't mean you can't do any of the stuff.
| hansonkd wrote:
| I mean it doesn't need to compute _all_ programs in a human
| length reasonable amount of time.
|
| It just needs to be able to compute enough programs to be
| useful.
|
| Even our current infrastructure of precisely defined
| programs and compilers isn't able to compute all programs.
|
| It seems reasonable in the future be able to give an LLM
| the python language specification, a python program, and it
| iteratively returns the answer.
| Vetch wrote:
| Note that this is an expressibility (upper) bound on
| transformers granted intermediate decoding steps. It says
| nothing about their learnability, and modern LLMs are not
| near that level of expressive capacity.
|
| The authors also introduce projected pre-norm and layer-norm
| hash to facilitate their proofs, another sense in which it is
| an upper-bound on the current approach to AI, since these
| concepts are not standard. Nonetheless, the paper shows how
| allowing a number of intermediate decoding steps polynomial
| in input size is already enough to run most programs of
| interest (which are in P).
|
| There are additional issues. This work relies on the concept
| of saturated attention, however as context length grows in
| real world transformers, self-attention deviates from this
| model as it becomes noisier, with unimportant indices getting
| undue focus (IIUC, due to precision issues and how softmax
| assigns non-zero probability to every token). Finally, it's
| worth noting that the more under-specified your problem is,
| and the more complex the problem representation is, then the
| quickly more intractable the induced probabilistic inference
| problem. Unless you're explicitly (and wastefully)
| programming a simulated turing machine through the LLM, this
| will be far from real-time interactive. Users should expect a
| prolog like experience of spending most of their time working
| out how to help search.
|
| Trivia: Softmax also introduces another problem: the way
| softmax is applied forces attention to always assign
| importance to some tokens, often leading to dumping of focus
| on typically semantically unimportant tokens like whitespace.
| This can lead to an overemphasis on unimportant tokens,
| possibly inducing spurious correlations on whitespace, this
| propagating through the network with possibly unexpected
| negative downstream effects around whitespace.
| K0balt wrote:
| Very interesting paradigm shift.
|
| Tangentially, I have considered the possible impact of
| thermodynamic computing in its application to machine learning
| models.
|
| If (big if) we can get thermodynamic compute wells to work at
| room temperature or cheap microcryogenics, it's foreseeable that
| we could have flash-scale AI accelerators (thermodynamic wells
| could be very simple in principle, like a flash cell)
|
| That could give us the capability to run Tera-parameter models on
| drive-size devices using 5-50 watts of power. In such a case, it
| is foreseeable that it might become more efficient and economical
| to simulate deterministic computing devices when they are
| required for standard computing tasks.
|
| My knee jerk reaction is "probably not" but still , it's a
| foreseeable possibility.
|
| Hard to say what the ramifications of that might be.
| thrwthsnw wrote:
| This is the wrong direction, it is retrograde trying to shoehorn
| NATURAL LANGUAGE UNDERSTANDING into existing GUI metaphors.
|
| Instead of showing a "discoverable" palette of buttons and
| widgets which is limited by screen space just ASK the model what
| it can do and make sure it can answer. People obviously don't
| know to do that yet so a simple on screen prompt to the user will
| be necessary.
|
| Yes we should have access to "sliders" and other controls for
| fine tuning the output or maintaining a desired setting
| throughout generations but those are secondary to the ability of
| the models to make sweeping and cohesive changes and provide many
| alternatives for the user to CHOOSE from before they get to the
| stage of making fine grained adjustments.
| decasia wrote:
| I think the proof that this is a good article is that people's
| reactions to it are taking them in so many different directions.
| It might or might not be very actionable this year (I for one...
| would like to see a lower level of hallucination and mansplaining
| in LLM output before it starts to hide itself behind a
| dynamically generated UI) but it seems, for sure, good to think
| with.
| someothherguyy wrote:
| This seems like an inevitability. That is, eventually, "AI" will
| be used to create adaptive interfaces for whatever the HCI user
| wants: graphical, immersive, textual, voice, and so on.
| irthomasthomas wrote:
| On a related line of enquiry, both gemini-2-flash and
| sonnet-3.5-original can act like computers, interpreting and
| responding to instructions written in code. These two models are
| the only ones to do it reliably.
|
| Here's a thread
| https://x.com/xundecidability/status/1867044846839431614
|
| And example function for Gemini written in shell, where the
| system prompt is the function definition that interacts with the
| model.
| https://github.com/irthomasthomas/shelllm.sh/blob/main/shelp...
| mithametacs wrote:
| We're going to have to move past considering an LLM to just be a
| model.
|
| It's a database. The WYSIWYG example would require different
| object types to have different UI components. So if you change
| what a container represents in the UI, all its children should be
| recomputed.
|
| Need direct association between labels in the model space and
| labels in the UI space.
| Zr01 wrote:
| Databases don't hallucinate.
| mithametacs wrote:
| Correct, they just don't return anything. Which is the right
| behavior sometimes and the wrong behavior others.
| padolsey wrote:
| > communicating complex ideas in conversation is hard and lossy
|
| True but..
|
| > instead of building the website, the model would generate an
| interface for you to build it, where every user input to that
| interface queries the large model under the hood
|
| This to me seems wildly _more_ lossy though, because it is by its
| nature immediately constraining. Whereas conversation at least
| has the possibility of expansiveness and lateral step-taking. I
| feel like mediating via an interface might become too narrow too
| quickly maybe?
|
| For me, conversation, although linear and lossy, melds well with
| how our brain works. I just wish the conversational UXs we had
| access to were less rubbish, less linear. E.g. I'd love Claude or
| any of the major AI chat interfaces to have a 'forking'
| capability so I can go back to a certain point in time in the
| chat and fork off a new rabbit hole of context.
|
| > nobody would want an email app that occasionally sends emails
| to your ex and lies about your inbox. But gradually the models
| will get better.
|
| I think this is a huge impasse tho. And we can never make models
| 'better' in this regard. What needs to get 'better' - somehow -
| is how to mediate between models and their levers into the
| computer (what they have permission to do). It's a bad idea to
| even have a highly 'aligned' LLM send emails on our behalf
| without having us in the loop. The surface area for problems is
| just too great.
| gavindean90 wrote:
| Yea forked conversations UX is definitely one of my most
| desired features.
| sheeshkebab wrote:
| Can it run DOOM yet?
| handfuloflight wrote:
| Author seems to not engage with a core problem: humans rely on
| muscle memory and familiar patterns. Dynamic interfaces that
| change every session would force constant relearning. That's
| death by a thousand micro-learning curves, no matter how
| "optimal" each generated UI might be.
| thorum wrote:
| The solution is user interfaces that are stable, but infinitely
| customizable by the user for their personal needs and
| preferences, rather than being fixed until a developer updates
| it.
___________________________________________________________________
(page generated 2024-12-15 23:01 UTC)