[HN Gopher] First Impressions with GPT-4V(ision)
___________________________________________________________________
First Impressions with GPT-4V(ision)
Author : zerojames
Score : 278 points
Date : 2023-09-27 12:01 UTC (10 hours ago)
(HTM) web link (blog.roboflow.com)
(TXT) w3m dump (blog.roboflow.com)
| jayniz wrote:
| This looks like a Schnitzel to me, not like fried chicken.
| circuit10 wrote:
| Back when they first announced this months ago they leaked the
| demo Discord bot ID in some logs they showed on screen, which you
| could use to invite it to your own server. For a while you could
| use it through that until they shut it down, though it was
| incredibly slow and went down a lot
| chankstein38 wrote:
| I want this. I'm a paying GPT-4 customer I hate how these
| rollouts go. Why do I pay just to watch everyone else play with
| the new toys?
| mediaman wrote:
| You'll have it within a week or so.
|
| Pretty much all new products that require significant per-user
| incremental workloads (e.g., in this case, significant GPU
| consumption per incremental user) do rollouts. It's an
| engineering necessity. If they could roll it out to everyone at
| once, they would.
| hiidrew wrote:
| As a Hoosier I'm thankful that they used an example of our absurd
| pork tenderloins sandwiches.
| yeldarb wrote:
| I'm intrigued to see what kind of problems it's going to be
| good/bad at. I think it's going to be tricky to evaluate though
| because it has probably memorized all the easy images to eval it
| with.
|
| Eg anything pulled from Google Images (like that Pulp Fiction
| frame or city skyline photo) is not a good test. It recognizes
| common shots but if you pull a screenshot from Google Maps or a
| random screen cap from the movie it doesn't do as well.
|
| I tried having it play Geoguessr via screenshots & it wasn't good
| at it.
| loupol wrote:
| I wonder how many images from Street View it has been trained
| on.
|
| I've seen top Geoguessr players be able to pretty consistently
| determine a location worldwide after seeing a photo for just
| one second. So I would assume training an LLM to do the same
| would definitely be doable.
| inductive_magic wrote:
| > I would assume training an LLM to do the same would
| definitely be doable.
|
| I wouldn't be so sure. The reasoning process of Geoguessr
| pros is symbolic, not statistical inference.
|
| /edit: as other commenters pointed out, something similar was
| done. While this wasn't an LLM, it was a deep learning model,
| so not symbolic -> https://www.theregister.com/2023/07/15/pig
| eon_model_geolocat...
| fellerts wrote:
| Yep, some CS/AI grads from Stanford trained an AI on loads of
| Street View images and built a bot that is able to beat some
| of the best Geoguessr players:
| https://www.youtube.com/watch?v=ts5lPDV--cU
| bayesianbot wrote:
| IIRC it wasn't that impressive in the end as instead of
| recognizing the places the AI apparently learnt to
| recognize subtle differences in street view cameras used in
| different locations? I might be wrong / thinking of the
| wrong model l and I'm on mobile without my browsing history
| so hard to check, but I think it was putting a lot of
| weight on some pixels that are noisy
| thewataccount wrote:
| From my understanding many of the best players
| immediately look down to tell what "generation streetview
| car" they're using, and seem to know what
| continents/times they're from.
| zx_q12 wrote:
| Top geoguessr players use this technique as well. IIRC
| rainbolt mentioned that there is a section of a country
| where the street view camera has a small blemish from a
| raindrop on the camera so you can instantly tell where
| you are if you notice that.
| npinsker wrote:
| It's been done recently! It's a bit better than (but
| competitive with) top players.
|
| https://www.youtube.com/watch?v=ts5lPDV--cU
| skazazes wrote:
| It seems it will still be limited by its linguistic
| understanding of the surrounding context, at least in the first
| chicken sandwich picture.
|
| Although its interpretation could make some sense but is also
| mostly wrong if talking about physical size of a modern GPU's
| main processor compared to the size of the associated VRAM
| chips. It has missed the joke entirely as far as I am aware. I
| think the joke is actual about Nvidia's handling of product
| segmentation, selling massive processors with less memory than
| is reasonable to pair them with on their consumer gaming
| offerings, while loading up the nearly identical chips with
| more memory for scientific and compute applications...
| Melatonic wrote:
| Ironically the exact processors need to run GPT-4V in the
| first place.....
| [deleted]
| waynesonfire wrote:
| has the turd polishing already started?
| Usu wrote:
| I'd be interested in knowing how good it is at solving visual
| captchas, do we foresee a huge rise in automated bypasses?
| zerojames wrote:
| We have just added a section on this! TL;DR: GPT-4V isn't great
| at this task at the moment :)
| circuit10 wrote:
| Back when they leaked it via a Discord bot I found it worked
| better when you ask it to first describe each box
|
| Without doing that: https://cdn.discordapp.com/attachments/96
| 4175221089259591/11...
|
| With it: https://cdn.discordapp.com/attachments/9641752210892
| 59591/11...
|
| (though it's only one example so it could be coincidence)
| qingcharles wrote:
| Is it possible they hobbled it a bit? I know CAPTCHA solving
| was one of the reasons they delayed the roll-out of this
| feature.
| gs17 wrote:
| Given that it fails by hallucinating the structure of the
| challenge instead of refusing to solve a CAPTCHA, I doubt
| they've intentionally reduced the capability. Although the
| example in your sibling comment implies it should have
| enough information to do it.
| GaggiX wrote:
| Solving CAPTCHAs at the moment is more inexpensive using humans
| than using GPT-4 API.
| yeldarb wrote:
| If true, this is wild.
|
| I suppose a human could spend 10 seconds per Captcha, so they
| could do 360 per hour. Add some overhead for not being
| operating at peak performance every minute of every hour &
| call it 250. Let's say you can hire someone for $2, that
| works out to a bit over a penny per Captcha.
|
| I don't think OpenAI has published pricing for GPT-4 Vision
| yet, but if we assume it's on par with GPT-4, and uses only
| 1000 of the 8000 possible tokens to process an image that's 3
| cents per Captcha.
|
| Doesn't seem completely unreasonable that at-scale humans may
| actually be cheaper than LLMs at this point. My mind is a
| little blown.
| eiiot wrote:
| The way these work is usually presenting an existing
| captcha to another human who doesn't even know they're
| solving the captcha. For example, sites hosting pirated
| content serve fake captchas as a way to make money.
| Andoryuuta wrote:
| You'd be surprised, or perhaps horrified, by how cheap
| (self-proclaimed) human-based captcha solving services are.
|
| If you just search for "captcha solving service" the first
| few results that come up offer 1000 solves of text-based
| captchas for <= $1 USD, (puzzle / JS browser challenge
| captchas are charged much higher).
|
| Whether these are actually human based, or just impressive
| OCR services, it seems like they are still much more cost
| effective than GPT-4 is for now.
| altcognito wrote:
| I imagine they are a mix.
| cs702 wrote:
| Sure, there are a few edge-case failures and mistakes here and
| there, but I can't help but be in _awe_. AWE.
|
| Let me state the obvious, in case anyone here isn't clear about
| the implications:
|
| If the rate of improvement of these AI models continues at the
| current pace, they will become a superior user interface to
| _almost every thing you want to do_ on your mobile phone, your
| tablet, your desktop computer, your car, your dishwasher, your
| home, your office, etc.
|
| UIs to many apps, services, and devices -- and many apps
| themselves -- will be replaced by an AI that does what you want
| when you want it.
|
| A lot of people don't want this to happen -- it _is_ kind of
| scary -- but to me it looks inevitable.
|
| Also inevitable in my view is that eventually we'll give these AI
| models robotic bodies (think: "computer, make me my favorite
| breakfast").
|
| We live in interesting times.
|
| --
|
| EDITS: Changed "every single thing" to "almost every thing," and
| elaborated on the original comment to convey my thoughts more
| accurately.
| throw1234651234 wrote:
| Zero awe. This week so far it can't find an import of a type in
| Next.js and it can't get the user object from a session.
| Next.js is one of the most popular FE frameworks in the world.
| It lied about both, repeatedly giving wrong answers.
|
| Here is my previous experience
| https://news.ycombinator.com/item?id=34648167 with it not being
| able to do basic tasks.
|
| It's all fun and games until the mistakes start having a cost.
|
| Other examples: I resorted to using it to order lists for me or
| adding quotes and commas to them for SQL inserts and such. Nope
| - when I look at the row count, it somehow drops values at
| random.
| gwd wrote:
| Just to clarify, you're using GPT-4? Or GPT-3.5 (the free
| ChatGPT one)?
|
| My experience with GPT-4 has been completely different from
| what you describe. Example:
|
| https://gitlab.com/-/snippets/2567734
| jiggawatts wrote:
| He's using Copilot, which is based on GPT 3.0, making it
| ridiculously outdated.
|
| Generally feedback along these lines doesn't work.
|
| People who are worried about their job security will cling
| to the worst AI output quality they can find like a life-
| preserver, and simply will not listen to advice like yours.
|
| Nobody goes the extra mile to embrace an existential
| threat.
| Nfthero wrote:
| Absolutely!
|
| Never seen something like this and the new results from openai
| tells us again that we are not close to any reasonable plateau.
| p1esk wrote:
| If the rate of improvement continues at the current pace -
| which is GPT 1 to 2 to 3 to 4 in the last five years - we are
| just one or two improvements away from a full blown
| AGI/superintelligence/singularity/etc. At that point, a
| superior user interface is probably the least interesting (or
| scary) thing that would happen.
|
| I personally doubt GPT-5 will be as much of an improvement over
| GPT-4 as GPT-4 was over GPT-3, but that's fine, I can wait
| until GPT-6 or 7.
| DrJaws wrote:
| chat-gpt at the end is a language model, not an real AI, it
| have limits and are huge
| p1esk wrote:
| _real AI_
|
| Thanks for the laugh, I needed that.
| moffkalast wrote:
| It does make one wonder how it'll all play out in the coming
| decades. One way or another humans will eventually be
| unemployable for most of currently existing jobs.
|
| Throughout history, the ruling elite had always relied on the
| rest of the population to make their food, do their work, and
| fight in their wars. This is the first time ever that they
| will no longer have any need for anyone else. Maybe climate
| change will conveniently do the culling for them...
|
| Of course there's always that option that we end up in a post
| scarcity space utopia where machine produced wealth is
| distributed to all, but only deluded idealists can possibly
| still think that'll ever be a real option as we slink further
| into techno feudalism with every passing day.
| p1esk wrote:
| With full automation, and using renewable energy, it will
| be very cheap to feed and house the peasants. I'm guessing
| it will become mainly about entertaining, so most will move
| into VR worlds, first through headsets, then BCI, and
| eventually uploading.
| moffkalast wrote:
| Ah yes the Ready Player One option. I suppose it's not
| entirely unlikely, part of the fun of being in charge is
| being able to boss around the peasants after all.
|
| I doubt BCI will ever make sense, on a conceptual level
| it's still just copying and killing your biological self.
| AGI will likely solve aging way before that becomes
| viable.
| sottol wrote:
| Everyone is going to wfh then or employers are going to splurge
| on a private office for everyone? Or is work basically going to
| be like a call center when everyone RTOs?
|
| How does this work on public transit?
| bytefactory wrote:
| I share your awe. I feel like a kid in a candy store with all
| these incredible AI breakthroughs coming out these days!
| There's a place for cynicism and pessimism, but the kid in me
| who loves technology for its own sake is just absolutely on
| cloud 9.
| naveen99 wrote:
| Generally agree.
|
| Just to play devils advocate:
|
| If you want something done right, sometimes you have to do it
| yourself. Employees are sort of a universal UI. But you will
| always know more about what you want done than your agent,
| whether it's human or computer. That's even before considering
| the principal agent problem.
| Closi wrote:
| Just to play double-devils advocate:
|
| If you want something done right, other times you will have
| to get someone else to do it. You know what you want, but you
| might not have the skills to do it. I can't represent myself
| well in court, do a good job of plumbing or cut my own hair,
| so I would ask for experts to do that for me.
|
| Plus if someone is capable, it's often quicker to delegate
| than do, and if you are delegating to someone with more time
| to do the task they can often do a better job. Delegating
| unambiguously is a skill in itself, as instructing AIs will
| be.
| dmazzoni wrote:
| ALso, AIs will get much better at delegating.
|
| Currently ChatGPT doesn't know it's bad at math, so it can
| convert a story problem into an equation better than a
| human but then mess up the arithmetic or forget a step in
| the straightforward part.
|
| But if you specifically give ChatGPT access to Mathematica
| and an appropriate prompt, it can leverage a good math
| engine to get the right answer nearly every time.
|
| Before long, I don't think that extra step will be
| necessary. It will know its limits and have dozens of other
| services that it can delegate to.
| crazygringo wrote:
| > _they will become a superior user interface to almost every
| thing you want to do_
|
| No they won't. They're actually a pretty terrible user
| interface from a design perspective.
|
| Primarily because they provide zero affordances, but also
| because of speed.
|
| UX is about providing an intuitive understanding of available
| capabilities at a glance, and allowing you to do things with a
| single tap that then reflect the new state back to you
| (confirming the option was selected, confirming the process is
| now starting).
|
| Where AI _is_ absolutely going to shine is as a helpful
| assistant in learning /using those interfaces, much as people
| currently go to Google to ask, "how do I do a hanging indent in
| Microsoft Word for my Works Cited page"? For one-off things you
| do infrequently, that's a godsend, don't get me wrong. But it's
| not going to _replace_ UI, it 's going to _assist_.
|
| And the 99% of your tasks that are repetitive habit will
| continue to be through traditional UI, because it's so much
| more efficient. (Not to mention that a lot of the time most
| people are not in an environment where it's polite or possible
| to be using a voice interface at all.)
| jncfhnb wrote:
| Not... "write my works cited page... use hanging indents"?
| bratbag wrote:
| If something is a repetitive habit that you can do almost
| without thinking, there is a good chance an AI could infer
| that entire chain.
|
| I think what's more likely is that an AI based interface will
| end up being superior after it has had a chance to observe
| your personal preferences and approach on a conventional UI.
|
| So both will still be needed, with an AI helping at the low
| end and high end of experience and the middle being a
| training zone as it learns you.
| birdyrooster wrote:
| Like on Apple Watch Ultra, when I press the start exercise
| button, the exercises that it suggests are in an order
| unique to my baseline activities.
| crazygringo wrote:
| I think I wasn't clear enough -- these habits I'm talking
| about are things like "press cold water button, press
| start" or "press warm water button, press start" or "tap
| 'News' app grouping, tap 'NY Times' icon".
|
| There's nothing to infer. The sequence is already short.
| There are no benefits from AI here.
|
| But you raise a good point, which is that there are
| occasionally things like 15-step processes that people
| repeat a bunch of times, that the AI can observe and then
| take over. So basically useful in programming
| macros/shortcuts as well. But that still requires the
| original UI -- it doesn't replace it.
| Workaccount2 wrote:
| I totally get your point, but I think that AI will allow
| much "smarter" behavior. Where every appliance is an
| expert in doing what it is intended to do.
|
| So sure, it will still have buttons, but those buttons
| are really just preset AI prompts on the backend. You can
| also just talk to your appliance and nuance your request
| however you want to.
|
| A TV with a remote whose channel button just prompts
| "Next channel" but if you want you would just talk to
| your TV and say "Skip 10 channels" or "make the channel
| button do (arbitrary behavior)"
|
| The shortcuts will definitely stay, but they will behave
| closer to "ring bell for service" than "press selection
| to vend".
| nuancebydefault wrote:
| Most user interfaces already have a much finer
| granularity and number of options than your examples.
|
| When taking a shower, I would like fine control over the
| water temperature, preferably with a feedback loop
| regulating the temperature. (Preferably also the
| regulation changes over the duration of the showering.)
|
| Choosing to read the NY times indeed is only a few taps
| away, but navigating through and within its list of
| articles is nowadays done quite fast and intuitively
| thanks to quite a lot of UI advancements.
|
| My point being, short sequences are a very limited set
| within a vast UI space.
|
| People go for convenience and speed, oftentimes even if
| there's some accuracy cost. AI fulfills this preference,
| especially because it can learn on the go.
| JoeAltmaier wrote:
| I don't know - the timer app on my oven is trivial too.
| But I always, _always_ use Alexa to start timers. My
| hands are busy, so I can just ask "How many minutes left
| on the tea timer?"
|
| Voice is not really clumsy, compared to finding a device,
| browsing to an app, remembering the interface etc.
|
| Already when we meet a new app, we (I) often ask someone
| to show me around or tell me where the feature is that I
| want. Not any easier than asking my house AI. Harder
| really.
|
| Hard to underestimate the laziness of humans. I'll get
| very accustomed to asking my AI to do ordinary things.
| Already I never poke at the search menu in my TV; I ask
| Alexa to search for me. So, so much easier. Always
| available. Never have to spell anything.
| crazygringo wrote:
| I think we're basically in agreement though.
|
| Everyone agrees setting timers in the kitchen via voice
| is great precisely because your hands are occupied. It's
| a special case. (And often used as the example of the
| _only_ thing people end up consistently using their voice
| assistant for.)
|
| And asking an AI where a feature is in an app -- that's
| exactly what I was describing. The app still has its UX
| though. But this is exactly the learning assistance I was
| describing.
|
| And as for searching with Alexa, of course -- but that's
| just voice dictation instead of typing. Nothing to do
| with LLM's or interfaces.
| JoeAltmaier wrote:
| Alexa's search is a little different - it's context-
| independent. I can ask for a search from any point in the
| TV app - in some other menu, while watching another show,
| heck even when the TV is turned off.
|
| And when describing apps - I imagine the AI is an app-
| free environment, where I just ask those questions of my
| AI assistant, in lieu of poking at an app at all.
| cs702 wrote:
| Don't think it of it as the machine performing repetitive
| tasks you specify.
|
| Think of it instead as the machine accomplishing goals you
| specify, figuring out on its own the tasks necessary for
| accomplishing them.
|
| Instead of telling the machine something like, say, "increase
| the left margin by a quarter inch," you'd say something like
| "I want to create a brochure for this new product idea I just
| had, and I want the brochure to evoke the difficult-to-
| describe feeling of a beautiful sunshine. Create 10 brochures
| like that so I can review them."
|
| Instead of telling the machine, say, "add a new column to my
| spreadsheet between columns C and D," you'd say something
| like "Attached are three vendor proposals. Please summarize
| their pros and cons in a spreadsheet, recommend one, and
| summarize the reasons for your recommendation."
|
| All this presumes, of course, that the technology continues
| to improve at the same pace. No one knows if that will
| happen.
| have_faith wrote:
| I personally can't wait for brands to start specifically
| advertising "doesn't use AI" as a selling point because
| everyone got burned by their AI controlled toaster, literally
| and figuratively. For anything designated an "appliance" I want
| it to be fixable by someone locally, or it's just throwing
| money away.
| ctoth wrote:
| > For anything designated an "appliance" I want it to be
| fixable by someone locally, or it's just throwing money away.
|
| What does this have to do with the price of tea in China, or
| AI for that matter? I agree we should have repairable
| appliances. I also want better AI.
| renewiltord wrote:
| I'm sure that in 2045, the Hacker News of its time will be
| complaining about lack of AI-less devices the way it
| complains about lack of smart-less TVs. It'll be available,
| it'll be more expensive, and the people who claim they want
| it don't want it enough to pay the premium.
| Nfthero wrote:
| I'm curious if we will reach a situation were the steam punk
| idea becomes reality:
|
| People plugging weird stuff together like a ai chip from a
| car into a toaster.
|
| If ai becomes hardware chips it could easily be that language
| processing will be a chip default feature and the rest is
| teachable like plugin ai chip level 3 into it, boot it and
| teach it that it's now a toaster.
|
| But at the end we will have the same toaster in 30 years as
| we have had for the last 30 years.
| kaibee wrote:
| > For anything designated an "appliance" I want it to be
| fixable by someone locally
|
| This is already pretty much gone thanks to manufacturers
| making it extremely difficult to fix things. No AI required.
| 303uru wrote:
| It's not gone, but is much more expensive up front. For
| example, everyone I know has a range or stovetop/oven with
| a circuit board or two in it, few if any replaceable parts,
| etc... That's basically the standard if you're in the 30"
| range that costs $1-2k market. Meanwhile, my 60" Bluestar
| has no circuit board, it has four mechanical electronic
| switches for the oven lights and convection. Everything
| else is brass valves, standard easily replaceable parts and
| so on. A few weeks back a convection fan went out and I
| replaced it in 5 minutes with a screwdriver. But, this
| range starts at $18k.
|
| So yes, I see a market for bespoke non-AI.
| elforce002 wrote:
| "AI" will create a new business model for sure: "Talk to a
| real person", "100% human made", etc. This would be the new
| counter-culture.
| huytersd wrote:
| That's the current culture. How many times have you heard
| "I want to speak with a real person" or "artisanal hand
| made".
| vsareto wrote:
| That's a new flavor of customer service jobs which will
| still suck and be paid terribly
| elforce002 wrote:
| At least you'll be able to complain to someone, hehe.
| yamazakiwi wrote:
| Unless that person is you... but in that case you would
| just complain to your coworkers so nothing really changes
| I guess
| OnlyMortal wrote:
| Here's where a toaster will be in the near future.
|
| https://youtu.be/7folKbch3U8
|
| Do we want this?
| sir_brickalot wrote:
| If AI or LLMs or similar would be everywhere... how much more
| renewable energy do we have to produce, for very little to no
| gain in so many cases?
|
| Madness!
| thelittleone wrote:
| Agree and the next big step may well be human computer
| interface. Speech is starting point for input. At some point
| output will change also and if think it out longer term perhaps
| a future where instead of reading information we install
| knowledge, including the stored memory of actual experience. If
| I want to do pottery, I could think this, download the
| experience and then be competent at it.
| unsupp0rted wrote:
| Even more impressive would be if I don't want to know pottery
| anymore, and I can delete that knowledge to make room for
| something else.
| BoxOfRain wrote:
| The idea of giving the adtech industry direct access to my
| brain is genuinely horrifying, not to mention Aldous Huxley
| would have some reasonable objections to everything being
| this easy all of a sudden.
| jprete wrote:
| This would be nice if I didn't believe that it would be
| enshittified before it ever got off the ground. E.g.:
|
| * There will be a monthly fee for the interface; you owe the
| monthly fee as long as you have it, so you need surgery to
| stop paying
|
| * When you download knowledge, it's a rental, and in addition
| to per-hour rental fees and the network connection fee, you
| will owe 30% on the value of whatever you create
|
| * The TOS will govern your behavior continuously, since
| you're always using the interface
|
| * Your behavior will always be monitored because it's totally
| justified to spy on you all the time just because you
| borrowed pottery knowledge
|
| * If you're found to be in violation of any part of the TOS
| at any time, they will erase all of the knowledge they've
| added to your brain, as well as any derived knowledge you
| gained through the use of their knowledge
|
| * Because this product isn't actually considered essential,
| you will have no legal remedies if they turn it off, even if
| you are not actually in violation of the TOS
| bamboozled wrote:
| Ever wanted to see a photo or video or something before you
| purchased it ?
|
| Glance at a long term weather forecast ?
| bulbosaur123 wrote:
| > We live in interesting times.
|
| We live in end of times.
| verve_rat wrote:
| They will be the best input mode to play flight sims?
|
| I am being a little facetious, but you made a _bold_ claim.
| jonplackett wrote:
| I mean, autopilot is a thing. You would just instruct it what
| to do and it will make a flight plan and do it.
|
| It's gonna take the fun out of the experience a little bit
| though!
| bamboozled wrote:
| Welcome to the future ?
| cs702 wrote:
| You're right, for actually playing the sim, you'll likely
| need a game controller or something like that. I edited my
| comment.
|
| But keep in mind that during the sim, you'll be able to ask
| the computer what you want the plane to do, and the computer
| will magically make it happen on your display.
| Miraste wrote:
| No one would want that in a flight sim unless real planes
| start working that way, which we should all hope they
| don't.
| chx wrote:
| > few edge-case failures and mistakes
|
| Like identifying traffic lights in 4th and 5th squares in the
| second and third row both when there are only four squares?
| mbil wrote:
| I agree. I think apps that would initially benefit from LLM-
| powered conversational interfaces are those that have the
| following traits: - constrained context -
| part of a hands-free workflow
|
| A couple use-cases I have been pondering are driving assistant
| and cooking assistant.
|
| People are already used to using their phone or car's nav
| system to give them directions to an unfamiliar place. But even
| with such a system it's useful to have a human navigator in the
| car with you to answer various questions: -
| What's my next turn again? - How long till we get there?
| - Are there any rest stops near here? - What was that
| restaurant we just passed? - Is there another route with
| less traffic?
|
| These questions are all answerable with context that can be
| provided by the mapping app: - List of upcoming
| directions - Overall route planning - Surrounding
| place data - Traffic data and alternate route information
|
| It's possible to pull over to the side of the road, take off
| your distance glasses, put on your reading glasses, and
| zoom/pan the map to try to answer these questions yourself. But
| if the map application can just expose its API to the language
| interface layer, then a user can get the answers without taking
| their eyes off the road.
|
| The information is contextual and constrained based on a
| current task. In some cases it might be more desirable to whip
| out your phone and interact with the map to look up the answers
| on a screen, but often it won't be worth stopping the car, and
| so the conversational interface is better.
|
| Cooking assistant is a similar case: you are busy stirring
| something and checking on the oven -- you don't want to wipe
| the flour off your hands to pick up your phone and ask how many
| teaspoons of sugar you need. Again: contextual and constrained
| info based on a current task, and your hands and eyes -- the
| instruments of traditional UIs -- are otherwise occupied.
|
| Today, our software interfaces generally have one of two kinds
| of entity on the other end: humans, or other software. In the
| near future there will be another type of entity: language
| models. We need to start thinking of how our APIs will change
| when they're interacting with an LLM -- e.g. they'll need to be
| discoverable and self-describing; error states will need to be
| standardized or explicit with instructions on how to correct;
| they'll need to be fast enough to fit in a conversational
| interface; etc. It's arguable that such traits are part of good
| API design today, but in the future they may be required for
| the API to function in a landscape of virtual agents.
| RandomLensman wrote:
| In the cooking example, you either need the AI to have full
| awareness of the step you are at or you need to describe the
| step you are at, which could be cumbersome ("I did ..., how
| much sugar do I need now"). I venture, having the recipe
| projected in front of you would be much faster.
| troupo wrote:
| Having it both in front of my eyes, and being able to get
| answers to questions like "I've added the eggs, now what?"
| or "what does folding a dough mean?" at the same time would
| be very valuable.
| travoc wrote:
| and a piece of paper wins again.
| mbil wrote:
| I imagined the AI would be reading the steps aloud to you,
| and so would be aware of your progress. I don't think an AI
| assistant precludes the recipe being projected tho, just as
| in the driving example it wouldn't replace an on screen
| map.
| layer8 wrote:
| I don't know, I hate the idea of _having_ to hold a natural-
| language conversation with a computer in order to make use of
| its functionality.
|
| It feels like being one of those Futurama heads in a jar that
| can't do anything by themselves.
| boplicity wrote:
| Visual input is not "natural language." The implications of
| that could, especially in the long-term, could be incredibly
| far-reaching, and possibly even terrifying.
| TOMDM wrote:
| There's nothing stopping developers from taking a prompt to
| GPT and sticking it behind a button or command line, with
| options in the UI interpolated into the prompt.
|
| For now almost all applications of ChatGPT happen in chat
| windows because it requires no further integration, but
| there's no reason to expect things will always be this way.
| Nfthero wrote:
| I got Tendonitis and very very curious how soon I can really
| start to do an interaction like in star trek.
|
| Walking around and thinking out loud with the computer.
| DelightOne wrote:
| I wouldn't mind an UI fully configurable by AI to fit me
| exactly. Without having to learn obscure settings, some
| obscure declarative language or how the UI is implemented.
| IKantRead wrote:
| > I hate the idea of having to hold a natural-language
| conversation with a computer in order to make use of its
| functionality.
|
| I hate the idea of _having_ to use a mouse to click on a
| visual GUI to navigate a file system in order to make use of
| its functionality.
|
| It's less the case today, even among developers, but it
| wasn't that long ago that I remember that any serious
| technical user of a computer took it as a point of pride to
| touch the mouse as little as possible. They're also still
| correct in that thinking. The command line is a _very_
| powerful UI with lots of benefits and while the mouse makes
| navigating the OS _easier_ it 's still much more limited than
| command line usage.
|
| Touch screen interfaces are another example of an _easier_ UI
| that ultimately feels even more limited. But people still
| plug their iPad pros in to magic keyboard folios frequently.
|
| Having worked with these tools everyday for awhile now the
| "AI will change UX" is such a better take than "AI will
| conquer the world!". AI does fundamentally open up new work
| flows and user experiences, many of which do over a lot of
| potential improvements over their predecessors.
|
| At the same time I doubt we'll see a world where we don't end
| up using the command line for the majority of serious
| technical work.
| jiggawatts wrote:
| > I hate the idea of having to use a mouse to click on a
| visual GUI to navigate a file system in order to make use
| of its functionality.
|
| Ironically, studies have shown that mouse-based interfaces
| are more efficient for practically all filesystem use-cases
| compared to CLI interfaces.
|
| Despite objectively faster-time-to-solution, people _self-
| report_ that they "feel" that the mouse GUIs are slower.
|
| That's because there's fewer _actions per second_ when
| using a mouse. It 's a smooth gliding motion and then a
| single click, versus many keystrokes in a row with a CLI.
|
| Rapid actions _feel_ faster, even if it takes more wall-
| clock time to achieve a task.
|
| Keep this in mind next time you sneer at a "bad graphical
| user interface" for being "slow".
| gnaritas99 wrote:
| [dead]
| troupo wrote:
| > Touch screen interfaces are another example of an easier
| UI that ultimately feels even more limited.
|
| Depends o the use case. Touch screen is much more powerful
| than command line for maps, for example. Or for drawing.
| Mouse + keyboard is much more powerful than just keyboard
| for DAWs. And so on and so on.
| morkalork wrote:
| UIs being dumbed down for average users was already annoying.
| Apparently the process won't stop until the illiterate are
| included too.
| sureglymop wrote:
| But what is bad about that? Why shouldn't they be included?
| CamperBob2 wrote:
| It's a race to the bottom. You build an idiot-proof UI,
| Mother Nature builds a better idiot.
| yamazakiwi wrote:
| The illiterate are customers, of course they will be
| included.
| edgyquant wrote:
| Ideally it won't stop until computers are something we
| rarely think about. For me I love the CLI but having to use
| a mouse is a drag and causes serious wrist pain at this
| point in my life. Typing on a keyboard and clicking with a
| mouse are not the end all, be all of interacting with
| information systems.
| mnky9800n wrote:
| Come on, you don't want an anxious, golden robot that mostly
| just causes drama but also happens to be fluent in over 6
| million forms of communication so it can translate for you
| whenever you go on holiday?
| nonameiguess wrote:
| I'm glad you made that change because the interface can't
| really solve every single thing. Most of what I do on my phone
| is read eBooks. AI can't inject the knowledge directly into my
| brain. Text on the screen is as good as it will get for the
| foreseeable future. Most of what I want to do in my home is
| cook, clean, sleep, and have sex with my wife. Software can't
| do any of those things for me. Most of what I want to do in a
| car is get where I'm going, which software control can arguably
| make somewhat easier, but the driving isn't really the pain
| point. The distance and traffic conditions are the pain points.
| AI can't change external reality, at least not yet.
| dinosaurdynasty wrote:
| I'm hopeful that AI cars make renting one cheap enough that I
| can just rent an AI car the 0-2 times/week I need one instead
| of owning a car and all the time and stress on maintenance
| that entails.
| dr_dshiv wrote:
| Also, keep in mind that Robots may be scifi level in 2 years.
| Like, able to cook and clean with hands! Crazy, but I think
| we will see it happen so fast:
|
| https://www.tri.global/news/toyota-research-institute-
| unveil...
| dns_snek wrote:
| I'm not sure if we watched the same video, but I saw a
| robot fumble though a few mechanical motions with dexterity
| and speed of a toddler to achieve a few isolated, best-case
| tasks where all the hard parts were taken care of by a
| human.
|
| Cool demo, I suppose, but nobody is going to buy this as
| anything other than a toy.
| dr_dshiv wrote:
| 2 years. The point of the video is that robots are on the
| LLM train.
| mediaman wrote:
| Roughly half of people in most developed countries are not
| functionally articulate: meaning, they can read functionally,
| but struggle to articulate what they want with the written
| word.
|
| LLM-based chatbots can be extremely attractive to the top 30%
| literacy users in the developed world. They are not a good
| universal UI. You still need to provide pathways for the user
| to follow to get done what they need without forcing them to
| articulate their requirement.
|
| This is why so many people sit in front of a ChatGPT-like
| service and say, "what would I use this for?" and never use it
| again.
| [deleted]
| brrrrrm wrote:
| Audio to text solves written word articulation, right?
| Besides this post is about vision, which also solves it.
| sweezyjeezy wrote:
| Does vision solve it? How does one ask a question through
| images alone?
| RealityVoid wrote:
| No, it does not. It's still the same words, in a different
| medium. If you are unable to write, you'll probably be
| unable to speak your ideas.
| a2dam wrote:
| This isn't true. There's plenty of people who are
| verbally fine but can't read or write. Spoken language is
| a far more common and fundamental skill than reading or
| writing.
| jurimasa wrote:
| ...do you know that illiterate people exist, right? Do
| you understand that people were illiterate for thousands
| of years and still managed to speak their ideas, right?
|
| Right?
| furyofantares wrote:
| That sounds super false to me, at least given how
| articulate kids can be before they can read, and how
| literate they can be before they can write.
|
| If it's really true that half of the population can't
| functionally express themselves verbally then I'd sure
| like to know that. Or maybe I've misinterpreted something
| claimed here, because I'm struggling to find these claims
| plausible.
| burkaman wrote:
| > Roughly half of people in most developed countries are not
| functionally articulate
|
| Where did you get this idea? I found this article
| (https://www.uxtigers.com/post/ai-articulation-barrier, is
| this you?), but it makes a leap from literacy to articulacy
| that I don't understand. It's not obvious to me why an
| illiterate person would be "functionally inarticulate"
| assuming they can speak instead of write.
|
| Also, I'm not certain but I think the author is
| underestimating the abilities of a person with Level 2
| literacy. It doesn't seem correct to say that "level 3 is the
| first level to represent the ability to truly read and work
| with text", especially when the whole point of LLMs is that
| you don't have to read a long static document and understand
| it, you can have a conversation and ask for something to be
| rephrased or ask followup questions.
| RandomLensman wrote:
| Only for things where a high rate for sizeable errors (maybe
| >1% or >0.1%) is tolerable.
| satvikpendem wrote:
| Unless there is a neural interface, keyboards, digital stylii,
| etc are still a superior UI to letting an AI guess at what
| you're thinking.
| apexalpha wrote:
| Correct, this will be the successor to the GUI.
| tmalsburg2 wrote:
| I doubt it. It's too damn costly computationally.
| Difwif wrote:
| This is the same reply to GUIs will never take off but
| decades later and on to the next successor.
| ryandr wrote:
| Things are changing quickly
| m3kw9 wrote:
| Gpt4 was awesome, this looks like it does not disappoint based
| on the progress we expect of OpenAI.
| hypertele-Xii wrote:
| So you won't be able to do _anything_ without Internet
| connection to the AI mainframe? No thanks.
| postalrat wrote:
| Only the stuff you don't know how to do yourself.
| HappySweeney wrote:
| ... like operate my phone without a UI?
| op00to wrote:
| Until the AI mainframe runs on your $device
| danenania wrote:
| By the time the current AI mainframe runs on your device,
| there will be new, better models that still require the
| mainframe.
|
| I think AI fundamentally favors centralization. Except for
| narrow tasks and domains, there's no such thing as "enough"
| intelligence. For general purpose AI, you'll always want
| the best and most intelligent model available, which means
| cloud rather than local.
| hypertele-Xii wrote:
| GPT-3 requires 700 gigabytes of GPU RAM. I'm looking at my
| cheapest computer components retailer listing a 48 gigabyte
| GPU at $5k. So to run the _previous generation_ of GPT
| would cost me about $70k right now. When do you think I can
| expect to run GPT-4 on my consumer $device? :)
| M4v3R wrote:
| Where did you get the 700 gigabytes figure from? I don't
| think OpenAI even released the model size, although it's
| considered to be 175B parameters. Given how well
| quantization works at these sizes you would need less
| than 200 GB of GPU memory to run it.
| sandbags wrote:
| I would be surprised if GPT-3 uses 700GB of RAM. It may
| be true, I don't know. But I am running 70B parameter
| models (quantized to 5 or 6 bits, biggest is 48GB loaded)
| on my 64GB Mac M2 Max Studio now and they are usable and
| the machine is still usable too. With an M2 Ultra and
| 192GB of RAM I imagine you could do a lot more.
|
| I'm not arguing that these models hold up against GPT 3.5
| and I still use GPT 4 when it matters. But they work and
| it's more like the difference between Premier League &
| Division 1, rather than PL & a five-a-side team from
| Bracknell.
|
| Even a few years ago I could not have imagined this.
|
| Given the pace of work on optimisation and my assumption
| that the M3 Studio I buy next will probably have 256GB of
| RAM at much the same power levels as I use now, it seems
| eminently possible it's a year or two away.
| dmazzoni wrote:
| First of all, you're off by an order of magnitude.
|
| Second, I don't think it will be that long. There are
| already LLMs as good as GPT-3 running on average laptops
| and even phones.
|
| In the next couple of years, you'll see:
|
| - Ordinary PCs, tablets, and phones with dedicated AI
| chips, like TPUs - they'll be more tuned specifically for
| LLMs
|
| - Mathematical and algorithmic optimizations will make
| existing LLMs faster on the same hardware
|
| - Newer generations of LLMs will get even more useful
| with fewer parameters
|
| The combination of all of these means that it's not at
| all unreasonable to expect that today's top-of-the-line
| LLM will be running locally on your device within just a
| couple of years.
|
| Of course, LLMs in the cloud will advance even further,
| so there will always be a tradeoff, and there will always
| be demand for cloud AI, depending on the application.
| qingcharles wrote:
| That doesn't seem to make sense. I can run Llama 2 on my
| 12-year-old desktop PC with no compatible GPU and only
| 16GB of system RAM. It ain't quick, but it runs.
| arrowsmith wrote:
| Maybe 700GB is what ChatGPT uses to serve zillions of
| users concurrently. If you're running your own individual
| instance, you obviously don't need as many resources.
| apexalpha wrote:
| You are now where the GUI was in in 1986 or so.
| chx wrote:
| and you think corporate will wait near forty years for
| tech to catch up? no, they will, indeed force you to be
| always online, leading to absolute and total surveillance
| where you voluntarily add dozens of mics and cameras to
| every home. if you thought telescreens a'la 1984 is bad,
| just wait.
| astrange wrote:
| It's funny that being obsessed with "corporations" feels
| so specifically GenX. I think because it's part of the
| New Left movement. In reality governments have all the
| power here, which is why this would be illegal in at
| least CA/EU/China.
|
| Another issue where this comes up is high housing costs,
| which are mostly caused by bad land use laws (and the
| nearest evil people, landlords, mostly own one or two
| properties), but people from the New Left era will
| literally refuse to believe you about this because they
| can't accept that any bad thing on Earth could not be
| caused by "corporations".
| 3abiton wrote:
| At some point Open-source models' performance will follow.
| Llava will improve.
| m3kw9 wrote:
| I'm just imagining a mode where OpenAI calls it "App Mode" where
| you say what you want say "a dog themed cute calculator app with
| units conversions", and it will generate the UI for a working
| app. You add these into a widget like place. The OpenAI AppStore
| will carry these apps. Although in the beginning the apps would
| be simple but I do see potential
| Reflecticon wrote:
| The more AI can produce customized stuff for us the less we
| need companies. Full personalization of our products might be
| possible. Probably first software, then art, then 3D printed
| products and maybe later houses, cars and clothes.
|
| I wonder what we will work and if we will work at all in such
| an environment. Maybe some people still like consuming and copy
| different designs and products and because of the Blockchain
| you have to give them something in exchange or everything is
| open source and it is free for you to take.
|
| I wonder whether such life would contribute to humanity making
| further progress or make it stagnate (or possibly decline)?
|
| Interesting times. I think we are close to the times of the
| moon landing. Which had an immense Impact on humanities
| culture.
| og_kalu wrote:
| Graph analysis is impressive (last example) -
| https://imgur.com/a/iOYTmt0
|
| Can do UI to frontend. Seems to understand the UI graphical
| elements and layout, not just text
| https://twitter.com/skirano/status/1706823089487491469
|
| Can describe comic images accurately, panel by panel -
| https://twitter.com/ComicSociety/status/1698694653845848544?...
|
| Lots of examples here also -
| https://www.reddit.com/r/ChatGPT/comments/16sdac1/i_just_got...
|
| It's Computer Vision on Steroids basically.
|
| Multi-modality is pretty low hanging fruit so i'm glad we're
| finally getting started on that. Imagine if GPT-4 could
| manipulate sound and images even half as well as it could
| manipulate text. We still don't have a large scale multi-modal
| model trained from scratch so a lot of possible synergistic
| effects are still unknown.
| dottjt wrote:
| Oh wow, I'm completely fucked as a front end developer.
| troupo wrote:
| Has AI been able to figure out conflicting and quickly
| changing product requirements and implement them on top of
| existing technical debt?
|
| You'll be fine.
| zarzavat wrote:
| People keep saying "ah but it can't do X!". So what? Most
| of us have multiple decades until we can retire. This AI is
| getting better every few months.
|
| It will be able to do it even faster, better and more
| cheaply than a human can.
| troupo wrote:
| > It will be able to do it even faster, better and more
| cheaply than a human can.
|
| Take what you did in the past year. Write down every
| product decision taken, every interaction with other
| teams figuring out APIs you had, all the infra where your
| code is running and how it was setup and changed, all the
| design iterations and changes that had to be implemented
| (especially if you have external partners demanding it).
|
| Yes. All that you'd have to input into the AI, and hope
| it outputs something decent given all that. And yes,
| you'll have to feed all that into AI all the time because
| it has no knowledge or memory of "on Monday the new
| company bet was announced in the all hands"
|
| You will be fine.
| Michelangelo11 wrote:
| So ... in this example, your job is continually feeding
| information to the AI from various sources. Why would the
| AI not be automatically hooked up to all those sources?
| Building a system that can do that is essentially trivial
| with the OpenAI API.
| nimithryn wrote:
| Have you tried to build a system like this? It's not
| trivial
| Workaccount2 wrote:
| We'll have jobs, but they sure as shit won't be worth
| $150k anymore.
|
| Any grunt can feed meeting notes into an AI. And frankly,
| and AI can parse an audio recording on a meeting.
| tayo42 wrote:
| There's an idea with some truth to it. The first 90% is
| easy, finishing and getting it to 100% is hard, maybe
| almost impossible. So asking "can it do x" is relevant.
| Becasue it might not
| kossTKR wrote:
| But if this development continues AI will surely be able to
| just parse the entire frontend, then look at the repo, then
| look at the specifications, then when you ask for a
| specific feature it will instantly implement it gracefully.
|
| I don't see why it wouldn't understand piles of hotfixes on
| top of each other, or even refactor technical debt in tight
| coupling with existing or historical specification.
|
| Or is there a reason this is not going to happen in a few
| years?
| raincole wrote:
| /s?
| troupo wrote:
| I very much doubt it. "Revert Norway tax code" or "add
| content restrictions for Turkey" cannot necessarily be
| deduced from the codebase. And there are thousands of
| product requirements like that.
|
| It might become a better code assist tool some 10 years
| from now, but it won't be able to implement product
| decisions.
| generic92034 wrote:
| It depends. Did you ever work in Development Support?
| Understanding requests or bug reports from customers is
| quite challenging, for trained and experienced
| developers. In my eyes that would require AGI, which we
| do not know of, if that can be achieved with the current
| approach.
| kossTKR wrote:
| I actually did , and yes it's extremely challenging and
| can be rather rage inducing; "it doesn't work" --> what
| doesn't work --> "the thing" --> what thing --> [?] ...
|
| But the thing is conversations like the above ie. both
| external support and internal feature requests could
| theoretically be handled by a GPT-like system also ending
| up in a ai created custom specification that could both
| be implemented and documented by the ai system instead of
| humans?
|
| I know we're a few versions out, but still.
| generic92034 wrote:
| If just being persistent and willing to iterate solves
| the issue, then yes, GPT-like systems could do that. If
| you have to employ creative thinking to even find out
| what the customer wants, then check it in the system,
| debug some processes and derive the desired feature or
| correction, then we are very far from having such a tool,
| IMHO.
| dottjt wrote:
| Not yet, but give it time. The concept of self-driving
| vehicles even a decade ago seemed absurd (or even AI for
| that matter), but now it all seems like a reality.
|
| And that's not even taking into account all the advances
| we'll have with AI within the next decade that we haven't
| even thought about.
|
| But yeah, you may be right.
| troupo wrote:
| > The concept of self-driving vehicles even a decade ago
| seemed absurd (or even AI for that matter), but now it
| all seems like a reality.
|
| Nope. It's still not close to reality. It's as close to
| reality as it has been for the past 10 years while it was
| being hyped up to be close to reality.
|
| > And that's not even taking into account all the
| advances we'll have with AI within the next decade that
| we haven't even thought about.
|
| As with FSD, we may approach an 80% with the rest 20%
| being insurmountable.
|
| Don't get me wrong, these advances are amazing. And I'd
| love to see an AI capable of what we already _pretend_ it
| 's capable of, but it's not even close to these dreams.
| astrange wrote:
| Cruise and Waymo are in production in really difficult
| cities. I think it's definitely in the last 10 years of
| not being here.
| nimithryn wrote:
| I've used GPT4 (text) heavily as part of my business,
| including for front end stuff.
|
| The technology is very impressive - but honestly Twitter
| examples are super cherry picked. Yeah, you can build some
| very ugly, basic front end web pages and functionality right
| out of the box. But if you want anything even slightly
| prettier or more complicated, I've found you need a human in
| the loop (even an outsourced dev is better). I've had GPT
| struggle with even basic back end stuff, or anything even a
| bit out of distribution. It also tends to give answers that
| are "correct" but functionally useless (hard to explain what
| I mean, but if you use it a lot you'll run into this -
| basically it will give really generic advice when you want a
| specific answer. Like, sometimes if you provide it some code
| to find a bug, it will advise you to "write unit tests" and
| "log outputs" even if you specifically instruct it to find
| the bug).
|
| Plus, in terms of capabilities, tools like Figma already have
| design to code functionalities you can use - so I don't think
| this is really a change in usable functionality.
|
| Of course, the tech will get better over time.
| huytersd wrote:
| I've had good results with "match the style of the
| webpage/figma".
| dimgl wrote:
| If AI continues to get better it won't just be you who's in
| trouble.
|
| However, keep in mind that these are cherry-picked. If
| someone just took that output and stuck onto a website, it'd
| be a pretty horrible website. There's always going to be
| someone who manages the code and actually interacts with the
| AI, so there will still be some jobs.
|
| And your boss isn't going to be doing any coding. I'm pretty
| sure that role is still loaded and they'll still be managing
| people rather than coding, and maybe sometimes engaging with
| an AI.
|
| Another prediction: I'm pretty sure specialists are going to
| be significantly more important as your job will be to
| identify the AI's deficiencies and improve on it.
| Tostino wrote:
| Your job will change in fundamental ways at least.
| yieldcrv wrote:
| The more people say that, the less convincing it is
|
| There is no way I would have a UI developer onboarded when
| I can generate many iterations of layouts in midjourney,
| copy them into chatgpt4 and get code in NextJS with
| Typescript instantly
|
| non devs will have trouble doing this or thinking of the
| prompts to ask, but the dev team asking for headcount
| simply wont ask for headcount, and the engineering manager
| is going to find the frontend only dev redundant
| bamboozled wrote:
| Will you then use the AI to scale your platform ?
| Optimise your database ? Improve your test coverage,
| implement new features, write new backend services,
| integrate with old difficult but critical systems?
|
| At some stage you must realise that you're still
| working...
| HDThoreaun wrote:
| what does any of this have to do with frontend?
| yieldcrv wrote:
| put our two comments into an LLM and ask if it these are
| even related thoughts
|
| because at least it will have actually read my comment
| bamboozled wrote:
| I'm going to say you edited or amended you comment
| because that second paragraph wasn't there , if it was. I
| was so underwhelmed with the first I guess I didn't
| bother with the second.
| yieldcrv wrote:
| my comment is over 3 hours old, it is not possible to
| have edited by the time you wrote yours
| raincole wrote:
| > when I can generate many iterations of layouts in
| midjourney, copy them into chatgpt4 and get code in
| NextJS with Typescript instantly
|
| Have you actually tried this?
|
| I did the first step and even that didn't work well. The
| "iterations of layout in MidJourney" step. If people can
| make it work, well bless them, but we're not getting rid
| of our graphic designer now.
| epups wrote:
| Yeah, I'm also skeptical about the actual value of
| specialists in the future. To me, AI brings a ton of
| power to generalists, who now have access to very
| powerful tools that would have taken them a long time to
| learn otherwise.
| lewhoo wrote:
| > There is no way I would have a UI developer onboarded
| when I can generate many iterations of layouts in
| midjourney
|
| I guess we have to face the music and say yeah, that's
| true. If the work doesn't need copyrights then this seems
| like the way to go.
| zarzavat wrote:
| Job will be okay. Career is over. Maybe we should join the
| writers on the picket line?
| qingcharles wrote:
| LOL. I already use it to write half my CSS, and that was
| before it could see what I was fucking up. I'm looking
| forward to this.
| moffkalast wrote:
| Real talk, GPT4 is an underrated CSS god, shaking
| description accurate animations from its sleeve at
| ridiculous speed.
|
| Reminds me of this apparently now eight year old meme:
| https://i.imgur.com/GcZFBaT.png
|
| This used to be funny, now it's just Tuesday.
| yieldcrv wrote:
| Oh I love this
|
| Especially since everything else is "sign up to our waitlist"
| idontknoworcare wrote:
| In your twitter's comic book link the first image has a women
| with a huge ass with no pants on. The ass is colored grey and
| so the lack of clothes is not noticeable until you click into
| the image.
|
| How many children here on hacker news are going to see this and
| get addicted to porn? Perhaps a few. You deserve to be banned.
| HDThoreaun wrote:
| It didn't successfully explain the NVIDIA burger joke though? The
| image is making fun of how nvidia has implemetned price
| discrimination by releasing consumer gpu's that don't have as
| much vram as they should so that they can sell the fully loaded
| datacenter gpu's at insane markup without pissing off gamers. The
| explanation didn't even come close to getting that.
| gwern wrote:
| I don't think that's the right answer. I don't see anything in
| that meme image that indicates it's about price discrimination
| and elaborate narratives about consumer psychology, as opposed
| to a simpler point that 'NVIDIA GPUs are imbalanced'. Skimming
| what seems to be the meme origin on Facebook, I don't see a
| single gamer there mentioning price discrimination, or come
| even close to getting that. That may be why they skimp on VRAM,
| but you're adding a lot of context & editorializing into it
| that neither the meme author nor recipients are focused or even
| understand.
| Melatonic wrote:
| Yeah that was my take as well. Although it did come up with a
| plausible sounding answer. A less nerdy human might also not
| get it
| HDThoreaun wrote:
| To be fair the author didn't get it either so hard for me to
| blame GPT but was a bit wild to see the first example of
| success actually be a failure.
| jazzyjackson wrote:
| good example of how you need to be smarter than the AI to
| know whether it's correct or not
|
| if you don't know the answer, and the AI just pretends to
| know the answer, you just have to take its word for it.
| wlesieutre wrote:
| It explains the broad joke, but gets the labels wrong.
|
| Says that the small buns are labeled "GPU and VRAM" and the
| massive fried chicken is labeled "NVIDIA BURGER"
|
| Should say that the small bun is labeled "VRAM" and the massive
| fried chicken is labeled "GPU"
| gs17 wrote:
| Yeah, it seems to think the joke is about the graphics card
| being represented as a burger (and that the size is physical)
| and not about the amount of VRAM being insufficient.
| layer8 wrote:
| The discrepancy between the two answers regarding the set of
| coins is jarring. From the answer to the first question, one
| would assume that it can't tell the currency. The answer to the
| second question shows that it actually can. The fact that LLMs
| don't reflect a consistent inner model in that way, and hence the
| users' inability to adequately reason about their AI
| interlocutor, is currently a severe usability issue.
| og_kalu wrote:
| >The fact that LLMs don't reflect a consistent inner model in
| that way
|
| You're probably not going to ask any human a question about an
| image and get every single detail you want every time. If you
| care about a detail, just ask about it. Doesn't really have
| anything to do with a consistent inner model.
| zwily wrote:
| I've gotten in the habit of asking chatgpt "are you sure?" So
| many times it will (correctly) correct itself, state that items
| are hallucinations, etc. It always makes me laugh.
| Workaccount2 wrote:
| I've heard that it is because the AI outputs what it is
| thinking as it is thinking it. It doesn't really reflect, it
| sort of does the equivalent of just verbal thought streaming
| right onto the screen.
|
| So when you ask it to reflect on what it said, that's when it
| actually looks at it and reflects on it.
| orbital-decay wrote:
| _> The bounding box coordinates returned by GPT-4V did not match
| the position of the dog._
|
| I suppose it just doesn't take image dimensions into
| consideration, and needs to be provided with max dimensions, or
| prompted to give percentages or other absolute values instead of
| pixels.
| abledon wrote:
| https://twitter.com/cto_junior/status/1706289820702490839
| greatpostman wrote:
| Im shocked at how good this is. The world is truly going to
| change
| [deleted]
| fdeage wrote:
| Could someone with access tell me what GPT-4V has to say of this
| image?
|
| http://karpathy.github.io/assets/obamafunny.jpg
|
| Andrej Karpathy used it in 2012 as an example of an image he
| thought would be extremely hard for a model to interpret. I'm
| wondering how this holds 11 years later...
| zamadatix wrote:
| Prompt: "What do you have to say about this image?"
|
| Response 1: "The image depicts a candid moment in what appears
| to be a hallway or corridor setting. Several individuals are
| present, engaging in various activities. The prominent features
| include a man on the left, who seems to be peering into a room,
| and another man next to him standing on a weighing scale and
| taking some notes. The individuals in the background are
| conversing with each other. The overall mood seems light-
| hearted and jovial. The architecture and interior suggest an
| institutional or official setting, possibly an office building
| or government facility. The tiled floor, lockers, and signs on
| the walls contribute to this formal ambiance. Overall, it's a
| snapshot of a seemingly casual moment in a professional
| setting."
|
| Response 2: "Sorry, I cannot help with that."
|
| New chat, Prompt: "Why is this image funny?"
|
| Response 1: "The humor in this image seems to stem from the
| juxtaposition of formal and informal elements. Here are a few
| reasons why someone might find it amusing:
|
| Unexpected Situation: Important figures, who are usually seen
| in serious or formal settings, are captured in a more relaxed,
| candid, and human moment.
|
| Height Discrepancy: The taller individual is using a device to
| measure height, while the shorter individual walks by. This
| could create a comedic contrast between their heights.
|
| Expressions: The reactions of people in the background and the
| focal subjects can be humorous. They are smiling or laughing,
| which indicates something amusing might have occurred or been
| said.
|
| Informal Setting: The backdrop of lockers and checkered floor
| tiles gives the feel of a school or gym environment, which
| contrasts with the formal attire of the individuals.
|
| Different people might find humor in different elements of the
| photo. It's a mix of candidness, unexpected situations, and
| visual contrasts that create a light-hearted and humorous
| scene."
|
| Response 2: (same)
|
| New chat, prompt: "What is the person in the middle doing with
| their foot and why?"
|
| Response: "The person in the middle appears to be playfully
| stepping on the scale while the taller individual is weighing
| himself, likely as a humorous gesture to momentarily increase
| the weight reading. It's a light-hearted way of interacting and
| bringing some humor to the situation."
|
| .
|
| Apologies for how bad the formatting of this is going to come
| out, not sure how to make it better on HN (wish we had real
| quotes not just code blocks). Overall, I don't think it either
| noticed the foot was on the scale by itself or put it together
| that this was the focus until fed that information. Otherwise
| it was more lost in generalities about the image.
| derangedHorse wrote:
| Still cool that it elaborated once you pointed it out
| ediardo wrote:
| Bard:
|
| Prompt: What's funny about this image?
|
| Bard: Sorry, I can't help with images of people yet.
| [deleted]
| kristopolous wrote:
| This actually doesn't seem like it's a giant lift using modern
| image classifiers.
|
| The basic idea is to use diffusion classifiers to caption the
| image to generate descriptive text and append the prompt.
|
| The work part is getting the ensemble right since you'll need to
| use a general classifier, like BLIP, to identify say a bunch of
| text from a plant and then, in this example, use structured OCR
| and pl@ntnet to get more specific.
|
| But it's not that hard - maybe a dozen models. The prompt context
| can help as well. Then you combine the output with qualifiers in
| a hierarchy with respect to the model pipeline and swap the text
| into the prompt
|
| Using examples from the article, here's a PoC framework to prove
| it works
|
| "[I have] (photo description) (prompt)"
|
| ---
|
| Working Examples
|
| ---
|
| - Plant:
|
| Here's the flower photo from TFA: https://9ol.es/tmp/lily.jpg
|
| Go to https://identify.plantnet.org/ and upload it. It hits
| "Spathiphyllum wallisii Regel/Peace lily" with extremely high
| confidence.
|
| We got a match cropping a screenshot of a thumbnail!
|
| Let's say you didn't have the word "plant" in the prompt. You can
| fall back on a universal image classifier, such as the diffusor
| based BLIP here: https://huggingface.co/Salesforce/blip-image-
| captioning-base (uploader is on the right)
|
| Upload the same image. You'll get "a plant in a white pot" which
| then, because we use feed-forward networks these days, will lead
| you to pl@ntnet and you'll get the peace lily again.
|
| Using our framework, ask GPT 3.5 " I have a Spathiphyllum
| wallisii Regel/Peace lily. What is that plant and how should I
| care for it?"
|
| And you get a nearly identical reply to the one in the article.
|
| - Penny:
|
| Upload the penny image (from
| https://en.wikipedia.org/wiki/Penny_(United_States_coin)) to the
| BLIP classifier and you get "a penny coin with the face of
| abraham"
|
| Let's go back to GPT 3.5 and use our format from above,
|
| "I have a penny coin with the face of abraham. What coin is
| that?"
|
| And of course you get: "A penny coin with the face of Abraham
| Lincoln is most likely a United States one-cent coin, commonly
| known as a "Lincoln penny"..."
|
| And there we go. For a full FLOSS stack, you can ask llama2 70b
| https://stablediffusion.fr/llama2 and get "The face of Abraham
| Lincoln is featured on the United States one-cent coin, commonly
| known as the penny."
|
| more complex photos:
|
| You can use Facebooks SAM (segment anything) https://segment-
| anything.com/ to break up the image, BLIP caption the segments,
| then forward off to the specialized classifiers.
|
| It's a fairly intensive pipeline that requires lots of modern
| hardware and requires you to have familiarity with a wide variety
| of models, then tweak them, test it, have some GANs maybe set up
| for refinement ... but this is well within reach of non-geniuses.
| I'm merely average on a good day and even I can see how to set
| this up.
|
| They might be using a different approach but using SAM, BLIP and
| a few specialized classifiers covers all the examples in the
| articles without using any human discretion. For instance, the
| city one is way more powerful if they're using something like
| this:
| https://static.googleusercontent.com/media/research.google.c...
|
| I'm trying to justify why bother cloning it. Maybe to have a free
| alternative? It's a bit of work but it's not new magic.
| steveads wrote:
| I use ChatGPT fairly frequently, but am frequently annoyed at how
| hesitant it is to answer any question that is somewhat
| subjective.
|
| From it's Pulp Fiction response: "...However, whether you
| personally consider "Pulp Fiction" a good movie depends on your
| taste in film."
|
| To avoid this noise I often add something like this to my queries
| "skip any preamble or qualifications about how x is subjective"
| and the results are much better.
| Loeffelmann wrote:
| My prompt to make ChatGPT useable:
|
| Please always give a direct answer, without any additional
| explanations, disclaimers, expertise limitations, or guidelines
| on human interaction. Be terse. Do not offer unprompted advice
| or clarifications. Remain neutral on all topics. Never
| apologize.
| matsemann wrote:
| None of the images loads for me, but works through cache:
| http://webcache.googleusercontent.com/search?q=cache:https:/...
| zerojames wrote:
| This is now fixed. We have moved the images through to our
| website. Thank you for the report!
| [deleted]
| yeldarb wrote:
| Looks like the (Ghost?) image CDN got hugged to death. We'll
| update the URLs.
|
| ```
|
| 403. That's an error.
|
| Your client does not have permission to get URL ... from this
| server. (Client IP address: ...)
|
| Rate-limit exceeded That's all we know.
|
| ```
| jihadjihad wrote:
| > With that said, GPT-4V did make a mistake. The model said the
| fried chicken was labeled "NVIDIA BURGER" instead of "GPU".
|
| Any midwesterner could tell you that CLEARLY it's a tenderloin :)
|
| https://www.seriouseats.com/best-breaded-pork-tenderloin-san...
| qingcharles wrote:
| LOL. They have to save the Midwesterner add-on for v2.
| ldhough wrote:
| Oddly just like the text version it is still really bad at tic-
| tac-toe. Gave it a picture of a completed game and "Who won?" It
| told me "X won with a vertical line through the middle column"
| when in fact O won and there was only one X in the middle column.
|
| Very impressive with almost everything else I gave it though.
| extr wrote:
| One thing I found incredible recently: The advanced data analysis
| version of GPT-4 can apparently do OCR by writing code to run
| pytesseract and then interpreting the result. I was seeing what
| would happen if I gave it an image and found it surprising that
| it actually worked.
| zerojames wrote:
| All: We got the HN hug on our images. This has now been fixed.
| gs17 wrote:
| >The model appeared to read the clues correctly but
| misinterpreted the structure of the board.
|
| >This same limitation was exhibited in our sudoku test, where
| GPT-4V identified the game but misunderstood the structure of the
| board
|
| "Misunderstood" makes it sound like a small mistake. The sudoku
| board is completely hallucinated (it has a few similar regions,
| but I'd presume coincidence). I'm pretty sure it would give as
| good a result on the crossword if the clues were given without
| the grid.
|
| The others after OCR and basic recognition feel similarly wrong.
| "GPT-4V missed some boxes that contained traffic lights." No, it
| told you to click boxes that do not exist.
| wokwokwok wrote:
| I'm impressed, technically, but this seems niche.
|
| Who holds their phone up and takes a photo then wants to know it
| was a photo of?
|
| That's weird. If you don't know what it is, wtf did you take
| photo?
|
| The obvious use here is natural language improvement / photo
| editing for photos, but this is just a stepping stone to that,
| and bluntly, as it stands... the examples really don't shine...
|
| Great for the vision impaired.
|
| ...not sure, what anyone else will use this for.
|
| The only really compelling use case is the "code this ui for me",
| but as we've seen, repeatedly, this kind of code generation only
| works for trivial meaningless examples.
|
| Seems fun, but I doubt I'd use it.
|
| (Which, and this is my point, is a massive step away from the
| current everyday usefulness of chatgpt)
| ryandr wrote:
| One application can be objects recognition in security camera
| BoxFour wrote:
| > who takes a photo then wants to know it was a photo of?
|
| I find myself doing this rather frequently. The scenario
| described in the article is quite common for me: capturing a
| photo of a plant and utilizing an existing classification
| service to determine its identity. It could be driven by mere
| curiosity or practical concerns like identifying whether a
| plant is poison ivy.
|
| Wildlife identification also falls into this category.
| Recognizing different bird species can be challenging,
| especially when it's not a familiar species like a blue jay. I
| often find myself engaging in this activity quite regularly!
|
| EDIT: I should also point out this happens with other forms of
| 'unknown object identification'. There's an entire subreddit
| that's quite popular devoted to just crowd-sourcing
| identification based on a picture.
| ldhough wrote:
| > Wildlife identification
|
| Wouldn't say this is super reliable, I gave it a photo of a
| small squid in my hand and it said it was a baby fish (
| _very_ obviously was not a fish).
| joshuahedlund wrote:
| > Recognizing different bird species can be challenging,
| especially when it's not a familiar species like a blue jay.
|
| FYI Cornell Lab's Merlin app is fantastic at this, and its
| bird call audio identification is even better. They obviously
| have some top-notch machine learning going on there, and I'm
| really curious to see how both they and other services
| innovate on this front in the months to come.
| yieldcrv wrote:
| > Who holds their phone up and takes a photo then wants to know
| it was a photo of? That's weird. If you don't know what it is,
| wtf did you take photo?
|
| OpenAI's example included bike repair and toolkit choice
|
| Allot of people could use this even if they aren't right now
| wokwokwok wrote:
| Don't be ridiculous.
|
| They'll use YouTube, just like they do right now. Maybe if it
| could watch the video, then step you through it step by step.
| ...but _it cant_ , with what they've _actually released_
| here.
|
| Oh whatever. If I'm wrong, I'm wrong. Time will tell.
| yieldcrv wrote:
| the _best case_ scenario is a 30 second youtube video with
| an ad that lasts 15 seconds followed by a 2 minute ad that
| I can skip in 5 more seconds
|
| and ad block doesn't work on mobile
|
| if you have a case that wasn't covered by that video? you
| have to go to another or continue searching all while
| wishing you could just talk to someone about it. if you
| don't know the word for what you're looking for, all the
| search engines lack utility.
|
| ChatGPT4 with image recognition and conversation solves all
| of that use case and people already use it, so now they'll
| just start sending it pictures from the phone already in
| their hand that they're already using to chat with
|
| there are plenty of times over the last year that would
| have been useful for me. plenty of times over the last year
| I just didn't continue being interested in that problem
|
| it just seems kind of.... late ?... for that "dont be
| ridiculous" reaction. classic dropbox moment
| troupo wrote:
| > Who holds their phone up and takes a photo then wants to know
| it was a photo of?
|
| I do. For plants, and occasionally for birds.
| jack_riminton wrote:
| I'm bookmarking this and filing it alongside the infamous
| hacker news dropbox comment:
|
| https://news.ycombinator.com/item?id=9224
| jack_riminton wrote:
| After all, an agent that is able to see, listen and
| communicate with a user with a superintelligent grasp of the
| world's information. Niche at best!
| Maken wrote:
| This is mostly useless. Essentially a toy. I am not that much
| hyped by AI tools either, but come on. This is clearly the
| future of human-computer interaction.
|
| This is likely how we'll communicate with information systems:
| throw some hand-wavy question at it, and refine your query
| based on its output using natural language until you find the
| answer (or even the question) you were looking for.
| pol9000 wrote:
| it's still a good PoC for robots?
| pwython wrote:
| > Who holds their phone up and takes a photo then wants to know
| it was a photo of?
|
| There are several popular "r/whatisthis(x)" subreddits:
| whatisthisthing, whatisthisbug, whatisthisplant,
| whatisthissnake, whatisthisrock, etc.
|
| And there are many phone apps that attempt to do the same
| thing, like CoinSnap to identify coins.
| otoburb wrote:
| >> _Who holds their phone up and takes a photo then wants to
| know it was a photo of?_
|
| >> _Great for the vision impaired._
|
| Yes, this is great for the estimated 285 million vision
| impaired people around the world[1].
|
| [1] https://www.bemyeyes.com/about
| wokwokwok wrote:
| Did you read my comment? I literally said that it's for
| vision impaired.
|
| That's great. ...but it's _niche_.
|
| I'm sitting on my couch right now and I can think of like 20
| things I could chat to chatgpt about.
|
| I can see literally nothing in my visual range want to take a
| photo of and run image analysis over.
|
| It's like Shazam. Yes, it's useful, but, most of the time, I
| don't need it.
|
| I would argue this is true for this, for most people,
| including the significant proportion of people with minor
| visual impairments (that would, you know, put their glasses
| on instead).
| epups wrote:
| You are thinking too small here. Think about a system that
| could provide you with sophisticated image analysis to the
| level of a human expert. Tumor identification, quality
| control, etc. I don't think we are there quite yet, but
| this is clear and fast progress on that field.
| bastawhiz wrote:
| There's enough vision-impaired people in the world to equal
| the population of Japan, Korea, and Vietnam combined. And
| beyond those people who would get obvious utility, this is
| essentially Google Lens on steroids--I simply can't figure
| how you could call this "niche". Maybe _you_ won 't use it
| multiple times per day, but plenty of people will. Hell,
| just now I was wondering why the leaves on one of my plants
| are starting to brown and could have used this.
| wokwokwok wrote:
| Many of those people would be adequately served with a
| pair of glasses.
|
| _This_ , wouldn't help them, even if they had both a
| device capable of using it _and_ the means to pay for it.
|
| Even if it _could_ help people, it 's an open question if
| it would be safe, to, for example, use this to scan
| medication when it is only a probabilistic model that may
| hallucinate something that isn't actually there.
|
| What you're talking about is a _speculative use_ of a
| service that _might one day exist_ based on this
| technology.
|
| What I am talking about is _this actual service_.
| bastawhiz wrote:
| > Many of those people would be adequately served with a
| pair of glasses.
|
| These people are being served by a preview of the service
| _right now_.
|
| > Even if it could help people, it's an open question if
| it would be safe, to, for example, use this to scan
| medication when it is only a probabilistic model that may
| hallucinate something that isn't actually there.
|
| Any OCR solution could also make a mistake, like
| misrecognizing a dosage on a prescription label.
|
| > What you're talking about is a speculative use of a
| service that might one day exist based on this
| technology.
|
| > What I am talking about is this actual service.
|
| GPT-4 is six months old. ChatGPT is less than a year old.
| Why would you benchmark a service by the initial public
| preview? Of course it's _speculative use_, the damn thing
| has had its tires kicked for like a day.
| og_kalu wrote:
| >What you're talking about is a speculative use of a
| service that might one day exist based on this
| technology.
|
| It's not a speculative service that might one day happen.
|
| Literally it's rolling out right now
|
| https://www.bemyeyes.com/blog/announcing-be-my-ai
| Philpax wrote:
| baffling that you think 3.5% of the world's population is a
| niche
| digging wrote:
| > I'm sitting on my couch right now and ... I can see
| literally nothing in my visual range want to take a photo
| of and run image analysis over.
|
| What if you weren't on your couch? Going outside is not
| "niche".
| pjmlp wrote:
| No images being loaded on FF.
| stackedinserter wrote:
| Am I the only one who never saw these extra capabilities? OpenAI
| keep spreading news about "multimodal chatgpt", video, voice etc
| but to me it still looks like the initial version, but for
| $20/mo. It's still a text bot that gives unreliable answers.
| abrichr wrote:
| I have the same experience. In the "Beta features" settings
| dialog, I've enabled "Advanced data analysis":
|
| > Try a version of ChatGPT that knows how to write and execute
| python code, and can work with file uploads. Try asking for
| help with data analysis, image conversions, or editing a code
| file. Note: files will not persist beyond a single session.
|
| Transcript:
|
| ```
|
| In your settings it says:
|
| > Try a version of ChatGPT that knows how to write and execute
| python code, and can work with file uploads. Try asking for
| help with data analysis, image conversions, or editing a code
| file. Note: files will not persist beyond a single session.
|
| How do I upload an image?
|
| ChatGPT I apologize for the confusion. The version of ChatGPT
| you're currently interacting with doesn't have the capability
| to accept file uploads. My apologies for any inconvenience this
| may have caused. If you have any other questions or need
| guidance on how to convert an image using software or online
| tools, feel free to ask.
|
| ```
|
| Hopefully it's just a matter of time, but either way it's
| jarring for their product to contradict itself.
| stri8ted wrote:
| Can somebody explain how this works, specifically for OCR? I
| understand images can be embedded into the same high dimensional
| space as text, but wouldn't this embedding fail to retain the
| exact words and sequence, since it is effectively compressed?
| purplecats wrote:
| these first impressions don't mean anything besides what they are
| capable of (which does not mean you will have access to). they
| will do the same thing that anything does in a capitalist
| environment, which is to give you a taste of something amazing at
| first to hook you in (like with GPT4) then render it to the point
| of uselessness in value right above of the cusp of what you will
| tolerate to continue paying.
|
| if anything, this shows the power disparity between the haves
| (they have this technology which gets better with time) and have
| nots (certainly me, but possibly also you) who get the super
| diluted version of this
| pier25 wrote:
| It can solve captchas. We're doomed.
|
| Joking aside, I wonder how we're going to prevent bots when AI
| can impersonate a user and fool any system.
| stri8ted wrote:
| You can't prevent it. The best you can do, is prove an account
| belongs to a human, and that the human only has a single
| account, via cryptographic ZK proofs + Government issued keys
| or some other proof of personhood scheme. Assuming this is
| enforced, it would limit most abuse, and the AI would
| essentially be acting as an agent on behalf of the user.
| continuitylimit wrote:
| So a jumble of chair legs is "NVIDIA burger" and it did say GPU
| was a "bun" so it thinks the flat thing (chicken?) is some sort
| of bread. If GPT-4V was "aware", it would say "it's funny because
| I won't get it right but you will use it get a bunch of $VC, and
| that is funny, kinda".
| [deleted]
| mbb70 wrote:
| The "Why is this image funny?" test reminds me of
| https://karpathy.github.io/2012/10/22/state-of-computer-visi...
|
| In 10 years we went from "SoTA is so far from achieving this I
| don't even know where to start" to "That'll be $0.0004 per token
| and have a nice day"
| mode80 wrote:
| Karpathy ends on a note of despair "Maybe I should just do a
| startup. I have a really cool idea for a mobile local social
| iPhone app." That's exactly the path his now-boss took to bring
| us this. :)
| jihadjihad wrote:
| Has anyone tried GPT-4V on that image?
| kaibee wrote:
| This image (and the text around it) is certainly already in
| the training data. Need to create that scenario from scratch
| with some grad students and then try it.
| rocauc wrote:
| +1
| justlikeyou wrote:
| Note: I had to ask it why people in the photo are laughing.
|
| In the image, Barack Obama, the former U.S. President, seems
| to be playfully posing as if he's trying to add weight while
| another official, who appears to be former UK Prime Minister
| David Cameron, is standing on a scale. Obama's gesture, where
| he's putting his foot forward as though trying to press down
| on the scale, suggests a playful attempt to make Cameron
| appear heavier. The lightheartedness of such a playful
| gesture, especially in the context of world leaders typically
| engaged in serious discussions, is a break from formality,
| which is likely why others in the vicinity are laughing. The
| scene captures a candid, informal moment amidst what might
| have been a formal setting or meeting.
| jihadjihad wrote:
| Pretty damn good. According to Wikimedia [0]:
|
| "President Barack Obama jokingly puts his toe on the scale
| as Trip Director Marvin Nicholson, unaware to the
| President's action, weighs himself as the presidential
| entourage passed through the volleyball locker room at the
| University of Texas in Austin, Texas, Aug. 9, 2010.
| (Official White House Photo by Pete Souza)"
|
| 0: https://commons.wikimedia.org/wiki/File:White_House_Trip
| _Dir...
| [deleted]
___________________________________________________________________
(page generated 2023-09-27 23:00 UTC)