[HN Gopher] MM1: Methods, Analysis and Insights from Multimodal ...
___________________________________________________________________
MM1: Methods, Analysis and Insights from Multimodal LLM Pre-
training
Author : lord_sudo
Score : 96 points
Date : 2024-03-16 01:27 UTC (21 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| a_vanderbilt wrote:
| I wonder if this has anything to do with their acquisition of
| DarwinAI. After a decade of mediocrity, I'd love to see Siri get
| smarter. Any improvement would be welcome at this point.
| sroussey wrote:
| I agree. The whole push to have Siri work on device was a noble
| one, but I'd rather have the option for a dumber on device Siri
| or a smarter in the cloud Siri.
| epaga wrote:
| Mediocrity is far too positive a word for the dumpster fire
| that is Siri.
| kstrauser wrote:
| I hear that a lot, and I have no desire to tell you your
| opinion's wrong, but it doesn't match my experience.
| Siri's... _fine_ , I guess, for what I ask of it like setting
| timers and reminders and such.
|
| It's not perfect, for sure:
|
| Me: Hey Siri, turn off the kitchen lights.
|
| Siri: I can't process multiple requests.
|
| Me: Hey Siri, turn off the kitchen lights.
|
| Siri: OK.
|
| But it works reliably enough that I use it all the time for
| the reminder and timer actions. Is it vastly worse for other
| people, and in what ways?
| CharlesW wrote:
| The characterization of "mediocre" is fair, but we're
| transiting a household to Siri from Alexa (because Alexa
| doesn't work locally, and because of Amazon's track record on
| privacy), and it's not noticeably worse.
| MBCook wrote:
| The feeling I've heard from people is Alexa was way better
| than Siri at first.
|
| Over time Siri got better. Not great but better. Alexa had
| mostly stayed the same or perhaps gotten a touch worse
| except for adding ads and other annoyances.
|
| I've never used anything but Siri. It works decently,
| definitely has its moods/dumb-as-a-post moments. But I've
| learn what works well and for that it's proven very useful.
| Aqua_Geek wrote:
| Honest question: what do you (in the general sense, not
| specifically asking the parent) use Siri for? I think my main
| (only?) use case is setting a timer.
|
| Maybe I find conversational UIs awkward, or maybe I just got
| jaded REALLY quickly from Siri's lacking capabilities early on,
| but I have hardly used it in the decade or whatever that it's
| been around.
| azinman2 wrote:
| I use it almost daily for something that is simple but under
| appreciated I don't know why it's not in every marketing
| video: "Siri, remind me tomorrow at 10am to do X"
|
| I outsource so much of my memory to the phone via Siri ALL
| THE TIME. It's so useful. Even for things in 20m. I'll easily
| forget if I don't do this, and it's reliable so it gives me
| confidence. It also keeps the notification present until I
| actually do the thing, so I have a kind of string around my
| finger until the task is accomplished. I can also snooze that
| notification as needed to rebring it up at the right time.
|
| Every time I do this around non-tech people they go "wow I
| didn't know you could do that." I swear it's literally life
| changing, particularly for anyone over 30.
| ribosometronome wrote:
| Especially with Shortcuts, Siri can have some pretty useful
| functionality. My personal big improvement I'd like to see
| is being able to better able to tap into those actions
| without having to set things up in advance.
| MBCook wrote:
| I'm really hoping for something like that.
|
| A year or so ago I remember someone pointing out in a
| podcast how LLMs are great at taking something like
| general language and turning it into a series of
| predefined commands (the stuff available to shortcuts).
| It would instantly make Siri much more useful.
|
| I think Federico Viticci rigged up something similar or
| at least a powerful demo using Siri + Shortcuts + ChatGPT
| to be able to answer all sort of questions better than
| native Siri.
| MBCook wrote:
| Yep. Reminders is #1 by far, followed by sending texts,
| turning lights on/off with HomeKit and timers which are
| similar.
|
| I can't imagine reminders w/o Siri because that's how I add
| 90%+ of them. Grocery items, things to do at time X, or
| when I get to (or leave) work/home are the big ones.
| bombcar wrote:
| If Siri could do the following _reliably_ (meaning not having
| to ask again, not having to repeat, having it work 99% of the
| time) it would be golden:
|
| 1. Find my phone via Siri on homepod
|
| 2. Set a simple timer
|
| 3. Add to a list
|
| 4. Send a text message to one of a few contacts
|
| It _can_ and sometimes _does_ do all of those things, but
| horribly unreliably.
| csnweb wrote:
| For me it really is extremely close to 100% for timers, I
| barely remember it being wrong and I use it several times
| per day. Finding my phone via the HomePod also works pretty
| much every time, may be 90% for me but it doesn't recognize
| my wife so for her it basically never works. The others I
| don't use enough. But timers and reminders work really well
| for me and it's also what I need to most from an assistant.
| bombcar wrote:
| I've seen similar. They really don't have two person
| houses down pat - timers work great for me (as long as I
| never have to ask how much remaining; I'd die for a
| "count down from 30 seconds") - but for the wife;
| nothing.
| seanmcdirmid wrote:
| Raising blinds, turning on/off lights, and unlocking the
| front door. It is convenient since I can do all those things
| with one command (raise all the blinds and turn off all the
| lights, or raise all the north blinds and lower the south
| ones), it would be a hard problem to create physical buttons
| to do what we needed without running around the room to hit
| various switches.
|
| Google can also do this. Alexa has lots of problems, but it
| can raise a blind in a pinch. We also spent a ton on Lutron
| shades because we discovered that we were just managing them
| too much manually (Siri then is great for controlling that).
|
| You can also ask Siri the weather in the morning, useful in
| figuring out how to dress the kid.
| bionhoward wrote:
| Since they removed "hey" and I got the latest phone, I've
| noticed many little situations where it's faster to speak to
| the device than tap your way around. E.G. when it's locked
| you can say, "Siri, open Spotify" and look at it for face
| unlock, boom. Random stuff. Also Alexa has surprised me
| lately, like a rational response to, "how many sandwiches is
| too many?"
| samatman wrote:
| Personally, I don't want Siri to be 'smarter', if smarter means
| it becomes an open-ended and unpredictable way to have an LLM
| guess what I meant. I'd like Siri to be more powerful, yes.
|
| I like that I can model Siri as a decision tree with voice-
| activated input. Being able to configure it to do more things
| (for example, to put reminders in Things rather than
| Reminders), that would be useful. More discoverability would
| also be great (but this is Apple we're talking about, so good
| luck there). But for me personally, the most important feature
| is that Siri is predictable: once I figure out how to do
| something with it, asking again in mostly the same way will get
| the same result. If I want to talk to an LLM, I have ChatGPT on
| my phone.
| reaperman wrote:
| This looks competitive against CLIP, and surprisingly great at
| VQA style prompts, but it doesn't seem like the paper supports
| comparing it to GPT-4. We don't see any tests for coding
| performance, math homework, legal document review, or any of the
| myriad other things that people use GPT-4 for on a daily basis.
| zshrc wrote:
| Besides homework, all of these things seem to be professional
| uses of GPT-4. If they're trying to bake this into a consumer
| platform like Siri, I don't see why they'd need to focus on
| those use cases. Besides MDM/Enterprise, which will be curious
| if they try and attack this market or just their army of
| consumer devices.
| reaperman wrote:
| Good insight. My comment was based on the headline that says
| "...Competing with ChatGPT".
| fauigerzigerk wrote:
| They are going to have to focus on the use cases that most of
| their customers use LLMs for, regardless of whether it falls
| in the consumer or professional category or somewhere in
| between.
|
| If all it does is improve Siri a bit without massively
| expanding the range of applications and APIs it will be a big
| disappointment.
|
| I think what Apple presents in June will decide whether on-
| device AI will be seen as a viable alternative to cloud APIs.
| smokel wrote:
| The paper lists "first authors", "core authors", and "senior
| authors".
|
| My dream is to one day be listed on a seminal paper as "secondary
| forum reply author".
| verticalscaler wrote:
| Holy inferiority complex batman!
|
| You can aspire higher and just use one of these LLMs to be a
| "first author" in a published peer reviewed paper.
| peddling-brink wrote:
| Similarly, I'd like the movie credit Second Assistant to the
| Second Second Assistant Director.
| Turing_Machine wrote:
| "Junior Assistant Vice-Dean" (or variants thereof) in
| academia. Those mostly exist to give a pay boost to
| administrators who've otherwise maxed out on pay.
|
| I recall that my undergrad institution once invented a new
| deanship out of whole cloth for a coach who'd maxed out on
| the "professor" pay scale.
|
| Even worse, the bastard didn't even win games!
| smokel wrote:
| In that case, I highly recommend watching the movie
| Synecdoche New York (2008).
|
| PS Can I be your hairdresser?
| jebarker wrote:
| Speaking as someone working in the field, I find it amusing how
| much researchers working on automating human work care about
| human credit assignment.
| brookst wrote:
| The paper explores different design choices for various parts of
| the model and draws conclusions about the relative importance of
| optimizing each area (image encoder very important, vision-
| language connector less so).
|
| The actual set of models produced (up to 30B parameters) seems
| secondary to the intent of the paper, and is more validation of
| the best design choices in each area.
| lolinder wrote:
| MM1 is a research paper, not a release of a competing product.
| I'm sure the paper is interesting and am looking forward to
| reading an analysis of it by someone who understands these things
| better than I do, but this is not that analysis, it's an
| extremely low-effort puff piece that is more interested in
| getting attention than in accurately describing a research paper.
|
| I don't usually say this, but TFA frankly feels like it was
| written by AI:
|
| > The release of MM1 by Apple contributes significantly to the
| artificial intelligence domain, offering a detailed roadmap for
| the development of future MLLMs. By sharing the insights and
| design principles gleaned from MM1, Apple not only challenges the
| current capabilities of models like ChatGPT but also invites the
| broader AI community to build upon their findings, potentially
| leading to more sophisticated and capable AI systems.
| basicallybones wrote:
| I believe most run-of-the-mill marketing language will sound
| like it is written in AI. The easiest thing to do for
| technology writing is to write the complete, factual article,
| then ask an LLM to dumb it down to whatever level you need for
| communication.
| JimDabell wrote:
| No, I agree this really does seem autogenerated, or at the
| very least written by somebody who doesn't understand the
| topic at all and is going through the motions of padding
| things out to hit a hype / word count. It's got that weird
| summary focusing on the wrong things and wild speculations
| dressed up as serious predictions vibe, like there are words
| saying things in places because there are supposed to be
| words there and not because it's actually imparting useful
| information.
| CharlesW wrote:
| Out of curiosity, where are you seeing this? It's not in the
| abstract or the paper.
| JimDabell wrote:
| Some of these comments were originally made in response to
| this spammy submission:
|
| https://news.ycombinator.com/item?id=39726156
| CharlesW wrote:
| Ah! Makes sense now, thank you.
| lolinder wrote:
| Oh, thank you! I didn't know we'd been moved.
| refibrillator wrote:
| Biggest model is 30b MoE trained on 100b tokens, max sequence
| length 4096. A bit underwhelming compared to recent announcements
| like the open source Large World Model [1].
|
| Absolutely no benchmarks against GPT4 present in the paper.
|
| Notably they used instruction response pairs generated from GPT4
| for supervised fine tuning. Which has always felt like an
| experimental hack to me, but that's how many folks are
| bootstrapping smaller models these days, and the effectiveness is
| hard to argue with.
|
| Apple's axlearn framework was used which leverages JAX and XLA
| [2].
|
| [1] https://news.ycombinator.com/item?id=39367141
|
| [2] https://github.com/apple/axlearn
| AJRF wrote:
| > Absolutely no benchmarks against GPT4 present in the paper.
|
| Table 4 on page 14 shows comparisons to GPT4V
| BryanLegend wrote:
| Trainig
| dang wrote:
| Yes, the submitted title was "Apple announces MM1: Multimodal
| LLM Pre-trainig Report". We've reverted it now. But the greater
| problem wasn't the typo, it was the editorializing (from
| https://news.ycombinator.com/newsguidelines.html: " _Please use
| the original title, unless it is misleading or linkbait; don 't
| editorialize._")
| erulabs wrote:
| If it's going to take general artificial intelligent to get a
| voice assistant that can remember not one, but two entirely
| separate cooking timers, then so be it. Imagine the GPUs
| required!
|
| I'm still baffled at Siri and Google assistant. Virtually zero
| innovation in a decade. I just want to be able to turn on BBC
| radio while my hands are wet, is that really so hard?!
| samatman wrote:
| > _that can remember not one, but two entirely separate cooking
| timers_
|
| You're in luck! Siri will do that right now. Just tried it.
| Works.
| Rinzler89 wrote:
| OMG, 2 cooking timers?! Pinnacle tech right there.
|
| Knowing Apple, I was expecting one base timer, with every
| other timer being a $200 upgrade.
| CharlesW wrote:
| https://www.tomsguide.com/how-to/how-to-set-up-and-manage-
| mu...
|
| _"How many timers can you have going at one time? [...]
| ...I had 26 timers going at once, and the only reason I
| didn 't have more running was because I got bored."_
| Rinzler89 wrote:
| I wonder if the maximum number of timers is an 8 bit, 16
| bit or 32 bit int.
| MBCook wrote:
| Only one horrifically boring way to find out.
| mortenjorck wrote:
| That's not really Apple's style. More along the lines of
| "HomePod mini 2 features double the RAM, allowing for
| exciting new features like multiple kitchen timers. Pre-
| orders start Friday."
| ksubedi wrote:
| Google Assistant is pretty decent. But as someone who is pretty
| much locked into the Apple ecosystem, Siri needs a reboot from
| scratch.
| astrange wrote:
| It's been reportedly rewritten from scratch like five times,
| during which time people have not stopped posting claims that
| it's exactly the same as it was in 2010.
| bestnameever wrote:
| You should be able to do this with Siri. You can use a shortcut
| if it doesn't work out of the box.
| MBCook wrote:
| It works out of the box as of iOS 16 or 17.
|
| "Hey Siri set an egg timer for 4 minutes"
|
| The interface for switching between multiple timers sucks on
| the watch, the whole app does now. I don't know how it's
| handled on HomePods, though you can see them somewhere in the
| home app (yeah that's discoverable).
|
| But it works fine. And the interface is good on the phone.
| astrange wrote:
| You mostly think there's no innovation because you speak
| English.
___________________________________________________________________
(page generated 2024-03-16 23:00 UTC)