[HN Gopher] MM1: Methods, Analysis and Insights from Multimodal ...
       ___________________________________________________________________
        
       MM1: Methods, Analysis and Insights from Multimodal LLM Pre-
       training
        
       Author : lord_sudo
       Score  : 96 points
       Date   : 2024-03-16 01:27 UTC (21 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | a_vanderbilt wrote:
       | I wonder if this has anything to do with their acquisition of
       | DarwinAI. After a decade of mediocrity, I'd love to see Siri get
       | smarter. Any improvement would be welcome at this point.
        
         | sroussey wrote:
         | I agree. The whole push to have Siri work on device was a noble
         | one, but I'd rather have the option for a dumber on device Siri
         | or a smarter in the cloud Siri.
        
         | epaga wrote:
         | Mediocrity is far too positive a word for the dumpster fire
         | that is Siri.
        
           | kstrauser wrote:
           | I hear that a lot, and I have no desire to tell you your
           | opinion's wrong, but it doesn't match my experience.
           | Siri's... _fine_ , I guess, for what I ask of it like setting
           | timers and reminders and such.
           | 
           | It's not perfect, for sure:
           | 
           | Me: Hey Siri, turn off the kitchen lights.
           | 
           | Siri: I can't process multiple requests.
           | 
           | Me: Hey Siri, turn off the kitchen lights.
           | 
           | Siri: OK.
           | 
           | But it works reliably enough that I use it all the time for
           | the reminder and timer actions. Is it vastly worse for other
           | people, and in what ways?
        
           | CharlesW wrote:
           | The characterization of "mediocre" is fair, but we're
           | transiting a household to Siri from Alexa (because Alexa
           | doesn't work locally, and because of Amazon's track record on
           | privacy), and it's not noticeably worse.
        
             | MBCook wrote:
             | The feeling I've heard from people is Alexa was way better
             | than Siri at first.
             | 
             | Over time Siri got better. Not great but better. Alexa had
             | mostly stayed the same or perhaps gotten a touch worse
             | except for adding ads and other annoyances.
             | 
             | I've never used anything but Siri. It works decently,
             | definitely has its moods/dumb-as-a-post moments. But I've
             | learn what works well and for that it's proven very useful.
        
         | Aqua_Geek wrote:
         | Honest question: what do you (in the general sense, not
         | specifically asking the parent) use Siri for? I think my main
         | (only?) use case is setting a timer.
         | 
         | Maybe I find conversational UIs awkward, or maybe I just got
         | jaded REALLY quickly from Siri's lacking capabilities early on,
         | but I have hardly used it in the decade or whatever that it's
         | been around.
        
           | azinman2 wrote:
           | I use it almost daily for something that is simple but under
           | appreciated I don't know why it's not in every marketing
           | video: "Siri, remind me tomorrow at 10am to do X"
           | 
           | I outsource so much of my memory to the phone via Siri ALL
           | THE TIME. It's so useful. Even for things in 20m. I'll easily
           | forget if I don't do this, and it's reliable so it gives me
           | confidence. It also keeps the notification present until I
           | actually do the thing, so I have a kind of string around my
           | finger until the task is accomplished. I can also snooze that
           | notification as needed to rebring it up at the right time.
           | 
           | Every time I do this around non-tech people they go "wow I
           | didn't know you could do that." I swear it's literally life
           | changing, particularly for anyone over 30.
        
             | ribosometronome wrote:
             | Especially with Shortcuts, Siri can have some pretty useful
             | functionality. My personal big improvement I'd like to see
             | is being able to better able to tap into those actions
             | without having to set things up in advance.
        
               | MBCook wrote:
               | I'm really hoping for something like that.
               | 
               | A year or so ago I remember someone pointing out in a
               | podcast how LLMs are great at taking something like
               | general language and turning it into a series of
               | predefined commands (the stuff available to shortcuts).
               | It would instantly make Siri much more useful.
               | 
               | I think Federico Viticci rigged up something similar or
               | at least a powerful demo using Siri + Shortcuts + ChatGPT
               | to be able to answer all sort of questions better than
               | native Siri.
        
             | MBCook wrote:
             | Yep. Reminders is #1 by far, followed by sending texts,
             | turning lights on/off with HomeKit and timers which are
             | similar.
             | 
             | I can't imagine reminders w/o Siri because that's how I add
             | 90%+ of them. Grocery items, things to do at time X, or
             | when I get to (or leave) work/home are the big ones.
        
           | bombcar wrote:
           | If Siri could do the following _reliably_ (meaning not having
           | to ask again, not having to repeat, having it work 99% of the
           | time) it would be golden:
           | 
           | 1. Find my phone via Siri on homepod
           | 
           | 2. Set a simple timer
           | 
           | 3. Add to a list
           | 
           | 4. Send a text message to one of a few contacts
           | 
           | It _can_ and sometimes _does_ do all of those things, but
           | horribly unreliably.
        
             | csnweb wrote:
             | For me it really is extremely close to 100% for timers, I
             | barely remember it being wrong and I use it several times
             | per day. Finding my phone via the HomePod also works pretty
             | much every time, may be 90% for me but it doesn't recognize
             | my wife so for her it basically never works. The others I
             | don't use enough. But timers and reminders work really well
             | for me and it's also what I need to most from an assistant.
        
               | bombcar wrote:
               | I've seen similar. They really don't have two person
               | houses down pat - timers work great for me (as long as I
               | never have to ask how much remaining; I'd die for a
               | "count down from 30 seconds") - but for the wife;
               | nothing.
        
           | seanmcdirmid wrote:
           | Raising blinds, turning on/off lights, and unlocking the
           | front door. It is convenient since I can do all those things
           | with one command (raise all the blinds and turn off all the
           | lights, or raise all the north blinds and lower the south
           | ones), it would be a hard problem to create physical buttons
           | to do what we needed without running around the room to hit
           | various switches.
           | 
           | Google can also do this. Alexa has lots of problems, but it
           | can raise a blind in a pinch. We also spent a ton on Lutron
           | shades because we discovered that we were just managing them
           | too much manually (Siri then is great for controlling that).
           | 
           | You can also ask Siri the weather in the morning, useful in
           | figuring out how to dress the kid.
        
           | bionhoward wrote:
           | Since they removed "hey" and I got the latest phone, I've
           | noticed many little situations where it's faster to speak to
           | the device than tap your way around. E.G. when it's locked
           | you can say, "Siri, open Spotify" and look at it for face
           | unlock, boom. Random stuff. Also Alexa has surprised me
           | lately, like a rational response to, "how many sandwiches is
           | too many?"
        
         | samatman wrote:
         | Personally, I don't want Siri to be 'smarter', if smarter means
         | it becomes an open-ended and unpredictable way to have an LLM
         | guess what I meant. I'd like Siri to be more powerful, yes.
         | 
         | I like that I can model Siri as a decision tree with voice-
         | activated input. Being able to configure it to do more things
         | (for example, to put reminders in Things rather than
         | Reminders), that would be useful. More discoverability would
         | also be great (but this is Apple we're talking about, so good
         | luck there). But for me personally, the most important feature
         | is that Siri is predictable: once I figure out how to do
         | something with it, asking again in mostly the same way will get
         | the same result. If I want to talk to an LLM, I have ChatGPT on
         | my phone.
        
       | reaperman wrote:
       | This looks competitive against CLIP, and surprisingly great at
       | VQA style prompts, but it doesn't seem like the paper supports
       | comparing it to GPT-4. We don't see any tests for coding
       | performance, math homework, legal document review, or any of the
       | myriad other things that people use GPT-4 for on a daily basis.
        
         | zshrc wrote:
         | Besides homework, all of these things seem to be professional
         | uses of GPT-4. If they're trying to bake this into a consumer
         | platform like Siri, I don't see why they'd need to focus on
         | those use cases. Besides MDM/Enterprise, which will be curious
         | if they try and attack this market or just their army of
         | consumer devices.
        
           | reaperman wrote:
           | Good insight. My comment was based on the headline that says
           | "...Competing with ChatGPT".
        
           | fauigerzigerk wrote:
           | They are going to have to focus on the use cases that most of
           | their customers use LLMs for, regardless of whether it falls
           | in the consumer or professional category or somewhere in
           | between.
           | 
           | If all it does is improve Siri a bit without massively
           | expanding the range of applications and APIs it will be a big
           | disappointment.
           | 
           | I think what Apple presents in June will decide whether on-
           | device AI will be seen as a viable alternative to cloud APIs.
        
       | smokel wrote:
       | The paper lists "first authors", "core authors", and "senior
       | authors".
       | 
       | My dream is to one day be listed on a seminal paper as "secondary
       | forum reply author".
        
         | verticalscaler wrote:
         | Holy inferiority complex batman!
         | 
         | You can aspire higher and just use one of these LLMs to be a
         | "first author" in a published peer reviewed paper.
        
         | peddling-brink wrote:
         | Similarly, I'd like the movie credit Second Assistant to the
         | Second Second Assistant Director.
        
           | Turing_Machine wrote:
           | "Junior Assistant Vice-Dean" (or variants thereof) in
           | academia. Those mostly exist to give a pay boost to
           | administrators who've otherwise maxed out on pay.
           | 
           | I recall that my undergrad institution once invented a new
           | deanship out of whole cloth for a coach who'd maxed out on
           | the "professor" pay scale.
           | 
           | Even worse, the bastard didn't even win games!
        
           | smokel wrote:
           | In that case, I highly recommend watching the movie
           | Synecdoche New York (2008).
           | 
           | PS Can I be your hairdresser?
        
         | jebarker wrote:
         | Speaking as someone working in the field, I find it amusing how
         | much researchers working on automating human work care about
         | human credit assignment.
        
       | brookst wrote:
       | The paper explores different design choices for various parts of
       | the model and draws conclusions about the relative importance of
       | optimizing each area (image encoder very important, vision-
       | language connector less so).
       | 
       | The actual set of models produced (up to 30B parameters) seems
       | secondary to the intent of the paper, and is more validation of
       | the best design choices in each area.
        
       | lolinder wrote:
       | MM1 is a research paper, not a release of a competing product.
       | I'm sure the paper is interesting and am looking forward to
       | reading an analysis of it by someone who understands these things
       | better than I do, but this is not that analysis, it's an
       | extremely low-effort puff piece that is more interested in
       | getting attention than in accurately describing a research paper.
       | 
       | I don't usually say this, but TFA frankly feels like it was
       | written by AI:
       | 
       | > The release of MM1 by Apple contributes significantly to the
       | artificial intelligence domain, offering a detailed roadmap for
       | the development of future MLLMs. By sharing the insights and
       | design principles gleaned from MM1, Apple not only challenges the
       | current capabilities of models like ChatGPT but also invites the
       | broader AI community to build upon their findings, potentially
       | leading to more sophisticated and capable AI systems.
        
         | basicallybones wrote:
         | I believe most run-of-the-mill marketing language will sound
         | like it is written in AI. The easiest thing to do for
         | technology writing is to write the complete, factual article,
         | then ask an LLM to dumb it down to whatever level you need for
         | communication.
        
           | JimDabell wrote:
           | No, I agree this really does seem autogenerated, or at the
           | very least written by somebody who doesn't understand the
           | topic at all and is going through the motions of padding
           | things out to hit a hype / word count. It's got that weird
           | summary focusing on the wrong things and wild speculations
           | dressed up as serious predictions vibe, like there are words
           | saying things in places because there are supposed to be
           | words there and not because it's actually imparting useful
           | information.
        
         | CharlesW wrote:
         | Out of curiosity, where are you seeing this? It's not in the
         | abstract or the paper.
        
           | JimDabell wrote:
           | Some of these comments were originally made in response to
           | this spammy submission:
           | 
           | https://news.ycombinator.com/item?id=39726156
        
             | CharlesW wrote:
             | Ah! Makes sense now, thank you.
        
             | lolinder wrote:
             | Oh, thank you! I didn't know we'd been moved.
        
       | refibrillator wrote:
       | Biggest model is 30b MoE trained on 100b tokens, max sequence
       | length 4096. A bit underwhelming compared to recent announcements
       | like the open source Large World Model [1].
       | 
       | Absolutely no benchmarks against GPT4 present in the paper.
       | 
       | Notably they used instruction response pairs generated from GPT4
       | for supervised fine tuning. Which has always felt like an
       | experimental hack to me, but that's how many folks are
       | bootstrapping smaller models these days, and the effectiveness is
       | hard to argue with.
       | 
       | Apple's axlearn framework was used which leverages JAX and XLA
       | [2].
       | 
       | [1] https://news.ycombinator.com/item?id=39367141
       | 
       | [2] https://github.com/apple/axlearn
        
         | AJRF wrote:
         | > Absolutely no benchmarks against GPT4 present in the paper.
         | 
         | Table 4 on page 14 shows comparisons to GPT4V
        
       | BryanLegend wrote:
       | Trainig
        
         | dang wrote:
         | Yes, the submitted title was "Apple announces MM1: Multimodal
         | LLM Pre-trainig Report". We've reverted it now. But the greater
         | problem wasn't the typo, it was the editorializing (from
         | https://news.ycombinator.com/newsguidelines.html: " _Please use
         | the original title, unless it is misleading or linkbait; don 't
         | editorialize._")
        
       | erulabs wrote:
       | If it's going to take general artificial intelligent to get a
       | voice assistant that can remember not one, but two entirely
       | separate cooking timers, then so be it. Imagine the GPUs
       | required!
       | 
       | I'm still baffled at Siri and Google assistant. Virtually zero
       | innovation in a decade. I just want to be able to turn on BBC
       | radio while my hands are wet, is that really so hard?!
        
         | samatman wrote:
         | > _that can remember not one, but two entirely separate cooking
         | timers_
         | 
         | You're in luck! Siri will do that right now. Just tried it.
         | Works.
        
           | Rinzler89 wrote:
           | OMG, 2 cooking timers?! Pinnacle tech right there.
           | 
           | Knowing Apple, I was expecting one base timer, with every
           | other timer being a $200 upgrade.
        
             | CharlesW wrote:
             | https://www.tomsguide.com/how-to/how-to-set-up-and-manage-
             | mu...
             | 
             |  _"How many timers can you have going at one time? [...]
             | ...I had 26 timers going at once, and the only reason I
             | didn 't have more running was because I got bored."_
        
               | Rinzler89 wrote:
               | I wonder if the maximum number of timers is an 8 bit, 16
               | bit or 32 bit int.
        
               | MBCook wrote:
               | Only one horrifically boring way to find out.
        
             | mortenjorck wrote:
             | That's not really Apple's style. More along the lines of
             | "HomePod mini 2 features double the RAM, allowing for
             | exciting new features like multiple kitchen timers. Pre-
             | orders start Friday."
        
         | ksubedi wrote:
         | Google Assistant is pretty decent. But as someone who is pretty
         | much locked into the Apple ecosystem, Siri needs a reboot from
         | scratch.
        
           | astrange wrote:
           | It's been reportedly rewritten from scratch like five times,
           | during which time people have not stopped posting claims that
           | it's exactly the same as it was in 2010.
        
         | bestnameever wrote:
         | You should be able to do this with Siri. You can use a shortcut
         | if it doesn't work out of the box.
        
           | MBCook wrote:
           | It works out of the box as of iOS 16 or 17.
           | 
           | "Hey Siri set an egg timer for 4 minutes"
           | 
           | The interface for switching between multiple timers sucks on
           | the watch, the whole app does now. I don't know how it's
           | handled on HomePods, though you can see them somewhere in the
           | home app (yeah that's discoverable).
           | 
           | But it works fine. And the interface is good on the phone.
        
         | astrange wrote:
         | You mostly think there's no innovation because you speak
         | English.
        
       ___________________________________________________________________
       (page generated 2024-03-16 23:00 UTC)