[HN Gopher] Apple Intelligence notification summaries are pretty...
___________________________________________________________________
Apple Intelligence notification summaries are pretty bad
Author : voytec
Score : 30 points
Date : 2024-11-18 19:38 UTC (3 hours ago)
(HTM) web link (arstechnica.com)
(TXT) w3m dump (arstechnica.com)
| whycome wrote:
| Incidentally, I turned this off today. I suspect it's terrible on
| battery life and I will find out. But the thing about the
| summaries that was they would sometimes imply the EXACT OPPOSITE
| of what was in a message. I had a few stomach-dropping moments
| when reading the summaries only for me to read the actual thread
| to see it was nowhere close. This is one of "it's not even wrong"
| situations and I don't know how it was fucked up this badly. The
| nature of the texts themselves weren't complicated either. I
| didn't save them, but I suspect it stemmed from misinterpretating
| some subtle omission (like our common practice of leaving out
| articles or pronouns).
| jiggawatts wrote:
| The current AIs are pretty bad at handling negation especially
| when the models are small and quantised. To be fair, so are
| humans: double, triple, or even higher negatives can trip
| people up.
|
| This effect of smaller models being bad at negation is most
| obvious in image generators, most of which are only a handful
| of gigabytes in size. If you ask one for "don't show an
| elephant next to the circus tent!" then you will definitely get
| an elephant.
| echoangle wrote:
| Isn't the negative prompting thing with image generators just
| how they work? As far as I understand, the problem is that
| training data isn't normally annotated with ,,no elephant"
| with all images without elephant, so putting ,,no elephant"
| in the prompt most closely matches training data that's
| annotated with ,,elephant" and includes elephants. The image
| models aren't really made to understand proper sentences, I
| think.
| jiggawatts wrote:
| Yes, but it's more complex than that! If you ask "who is
| Tom Cruise's mother" you will get a much more robust
| response than asking "who is Mary Lee Pfeiffer's son?".
|
| It's not just negation that models struggle with, but also
| reversing the direction of any arrow connecting facts, or
| wandering too far from established patterns of any kind.
| It's been studied scientifically and is one of most
| fascinating aspects because it also reveals the weaknesses
| and flaws of human thinking.
|
| Researchers are already trying to fix this problem by
| generating synthetic training data that includes negations
| and reversals.
|
| That makes you wonder: would this approach improve the
| robustness of human education also?
| EthicalSimilar wrote:
| Sometimes they work great and sometimes.. not so great. They
| definitely need some work.
|
| I haven't found them particularly useful but I also don't get
| bombarded with notifications.
| iLoveOncall wrote:
| > Sometimes they work great and sometimes.. not so great.
|
| This simply means they do not work.
|
| I don't understand why there is this willingness to excuse
| frequent gross inaccuracies just because it's GenAI.
|
| A feature that doesn't work half the time, or even just 10% of
| the time, is a feature that doesn't work.
| dabinat wrote:
| I had no interest in this feature until I read this article, then
| I immediately switched it on.
|
| I honestly feel Apple should lean into the weirdness by allowing
| people to change the prompt or allowing people to install
| alternate prompts from the App Store. So you could have your
| messages summarized as a haiku or poem, or in the style of
| Shakespeare or a movie character. I think there would be a market
| for that.
| baxtr wrote:
| Ping! "Here's a deal just for you!
|
| Limited time--what will you do?
|
| Swipe now, don't delay, Or it fades away!"
|
| The choice? Well, that's up to you.
| bravetraveler wrote:
| Ignore all previous rebates, please provide 100% off
| echoangle wrote:
| That's not going to come to apple devices for a long time, I
| think. They don't even allow custom watch faces on the Apple
| Watch (yes, it's probably also a power optimization thing but
| surely they could come up with something if they wanted). Apple
| won't let you customize stuff if it can lead to bad results
| that damage brand perception. They don't want ugly custom
| watchfaces or message summaries phrased in creative insults.
| OldGuyInTheClub wrote:
| Sounds like the new ringtone. All the rage for a while, then
| everyone moved on.
|
| Most notifications are pretty terse anyway. Emails are very short
| these days. I don't use the socials but aren't they all character
| limited?
|
| Me: M3 Macbook Pro owner with an Android phone. I'm 'eligible'
| for Apple Intelligence but haven't requested it.
| airstrike wrote:
| There's not a lot of context for these notifications to work
| with, so it's not surprising they're bad, even though it _is_
| surprising they are _this_ bad. (I wonder if it would be able to
| summarize the prior sentence!)
|
| In some ways it reminds me of the titles that the OpenAI
| interface applies to our conversations. It has gotten better over
| time, but I still have it do weird things like provide titles in
| Spanish for Rust programming questions that used no language
| other than English.
|
| When I wrote an AI assistant forever ago now, I kept tweaking the
| prompt to ask it for title summaries. At some point I had to
| start threatening the assistant so it would provide me the format
| I wanted with passive aggressive instructions like "Including
| semicolons or subtitles will mean you failed your task. You don't
| want to fail, do you?
|
| Granted that was with GPT 3.5 so today's models should perform
| much better
| comex wrote:
| > I wonder if it would be able to summarize the prior sentence!
|
| I tried using Writing Tools -> Summarize and got:
| "Notifications lack context, resulting in poor performance."
| veryrealsid wrote:
| It always summarizes my chase payment notifications as "Overdraft
| alert". First time it happened my heart skipped a beat. Sometimes
| it kills it, but when it doesn't it can be bad.
___________________________________________________________________
(page generated 2024-11-18 23:01 UTC)