[HN Gopher] Why is Rosetta 2 fast?
___________________________________________________________________
Why is Rosetta 2 fast?
Author : pantalaimon
Score : 443 points
Date : 2022-11-09 15:40 UTC (7 hours ago)
(HTM) web link (dougallj.wordpress.com)
(TXT) w3m dump (dougallj.wordpress.com)
| lunixbochs wrote:
| > To see ahead-of-time translated Rosetta code, I believe I had
| to disable SIP, compile a new x86 binary, give it a unique name,
| run it, and then run otool -tv /var/db/oah/ _/_ /unique-name.aot
| (or use your tool of choice - it's just a Mach-O binary). This
| was done on old version of macOS, so things may have changed and
| improved since then.
|
| My aotool project uses a trick to extract the AOT binary without
| root or disabling SIP:
| https://github.com/lunixbochs/meta/tree/master/utils/aotool
| karmakaze wrote:
| Vertical integration. My understanding was it's because the Apple
| silicon ARM has special support to make it fast. Apple has had
| enough experience to know that some hardware support can go a
| long way to making the binary emulation situation better.
| saagarjha wrote:
| That's not correct, the article goes into details why.
| nwallin wrote:
| That _is_ correct, the article goes into details why. See the
| "Apple's Secret Extension" section as well as the "Total
| Store Ordering" section.
|
| The "Apple's Secret Extension" section talks about how the M1
| has 4 flag bits and the x86 has 6 flag bits, and how
| emulating those 2 extra flags would make every add/sub/cmp
| instruction significantly slower. Apple has an undocumented
| extension that adds 2 more flag bits to make the M1's flag
| bits behave the same as x86.
|
| The "Total Store Ordering" section talks about how Apple has
| added a non-standard store ordering to the M1 than makes the
| M1 order its stores in the same way x86 guarantees instead of
| the way ARM guarantees. Without this, there's no good way to
| translate instructions in code in and around an x86 memory
| fence; if you see a memory fence in x86 code it's safe to
| assume that it depends on x86 memory store semantics and if
| you don't have that you'll need to emulate it with many
| mostly unnecessary memory fences, which will be devastating
| for performance.
| saagarjha wrote:
| I'm aware of both of these extensions; they're not actually
| necessary for most applications. Yes, you trade fidelity
| with performance, but it's not _that_ big of a deal. The
| majority of Rosetta's performance is good software
| decisions and not hardware.
| MikusR wrote:
| The main reason, M1/2 being incredibly fast. Is listed last.
| dagmx wrote:
| Perhaps if you're comparing against Intel processors, but even
| on an Apple Silicon Mac, Rosetta 2 vs native versions of apps
| are no slouch.
|
| 20% overhead for a non-native executable is very commendable.
| Someone wrote:
| I don't think that's the main reason. The article lists a few
| things that, I think the main reason is that they made several
| parts of the CPU behave identical to x86. The M1 and M2 chips:
|
| - can be told to do total store ordering, just as x86 does
|
| - have of a few status flags that x86 has, but regular arm
| doesn't
|
| - can be told to make the FPU behave exactly as the x86 FPU
|
| It also helps that ARM has many more registers than x86.
| Because of that the emulator can map the x86 registers to ARM
| registers, and have registers to spare for use by the emulator.
| postalrat wrote:
| That isn't the main reason.
|
| If Rosetta ran x86 code at 10% the speed of native nobody would
| be calling it fast.
| superkuh wrote:
| bogeholm wrote:
| Thanks for your thoroughly objective insights. I especially
| appreciate the concrete examples.
| howinteresting wrote:
| Here you go for a concrete example:
| https://news.ycombinator.com/item?id=33493276
| saagarjha wrote:
| This has nothing to do with Rosetta being incomplete (it
| has pretty good fidelity).
| howinteresting wrote:
| It was direct corroboration of:
|
| > Apple users not being able to use the same hardware
| peripherals or same software as other people is not a
| problem, it's a feature. There's no doubt the M1/M2 chips
| are fast. It's just a problem that they're only available
| in crappy computers that can't run a large amount of
| software or hardware.
| spullara wrote:
| The first time I ran into this technology was in the early 90s on
| the DEC Alpha. They had a tool called "MX" that would translate
| MIPS Ultrix binaries to Alpha on DEC Unix:
|
| https://www.linuxjournal.com/article/1044
|
| Crazy stuff. Rosetta 2 is insanely good. Runs FPS video games
| even.
| tomcam wrote:
| > Every one-byte x86 push becomes a four byte ARM instruction
|
| Can someone explain this to me? I don't know ARM but it just
| seems to me a push should not be that expensive.
| jasonwatkinspdx wrote:
| The general principle is that RISC style instruction sets are
| typically fixed length and with only a couple different
| subformats. Like the prototypical RISC design has one format
| with an opcode and 3 register fields, and then a second with an
| opcode and an immediate field. This simplicity and regularity
| makes the fastest possible decoding hardware much more simple
| and efficient compared to something like x86 that has a simply
| dumbfounding number of possible variable length formats.
|
| The basic bet of RISC was that larger instruction encodings
| would be worth it due to the micro architectural advantages
| they enabled. This more or less was proven out, though the
| distinction is less distinct today with x86 decoding into uOps
| and recent ARM standards being quite complex beasts.
| TazeTSchnitzel wrote:
| x86 has variable-length instructions, so they can be anything
| from 1 to 15 bytes long. AArch64 instructions are always 4
| bytes long.
| iainmerrick wrote:
| This is a great writeup. What a clever design!
|
| I remember Apple had a totally different but equally clever
| solution back in the days of the 68K-to-PowerPC migration. The
| 68K had 16-bit instruction words, usually with some 16-bit
| arguments. The emulator's core loop would read the next
| instruction and branch directly into a big block of 64K x 8 bytes
| of PPC code. So each 68K instruction got 2 dedicated PPC
| instructions, typically one to set up a register and one to
| branch to common code.
|
| What that solution and Rosetta 2 have in common is that they're
| super pragmatic - fast to start up, with fairly regular and
| predictable performance across most workloads, even if the
| theoretical peak speed is much lower than a cutting-edge JIT.
|
| Anyone know how they implemented PPC-to-x86 translation?
| kijiki wrote:
| > Anyone know how they implemented PPC-to-x86 translation?
|
| They licensed Transitive's retargettable binary translator, and
| renamed it Rosetta; very Apple.
|
| It was originally a startup, but had been bought by IBM by the
| time Apple was interested.
| GeekyBear wrote:
| > It was originally a startup, but had been bought by IBM by
| the time Apple was interested.
|
| Rosetta shipped in 2005.
|
| IBM bought Transitive in 2008.
|
| The last version of OS X that supported Rosetta shipped in
| 2009.
|
| I always wondered if the issue was that IBM tried to alter
| the terms of deal too much for Steve's taste.
| savoytruffle wrote:
| I agree it was a bit worryingly short-lived. However the
| first version of Mac OS X that shipped without Rosetta 1
| support was 10.7 Lion in summer 2011 (and many people
| avoided it since it was problematic). So nearly-modern Mac
| OS X with Rosetta support was realistic for a while longer.
| GeekyBear wrote:
| > However the first version of Mac OS X that shipped
| without Rosetta 1 support was 10.7 Lion
|
| Yes, but I was pointing out when the last version of OS X
| that did support Rosetta shipped.
|
| I have no concrete evidence that Apple dropped Rosetta
| because IBM wanted to alter the terms of the deal after
| they bought Transitive, but I've always found that timing
| interesting.
|
| In comparison, the emulator used during the 68k to PPC
| transition was never removed from Classic MacOS, so the
| change stood out.
| r00fus wrote:
| Apple is also not tied to reverse compatibility.
|
| Their customers are not enterprise, and consequently they
| are probably the best company in the world at dictating
| well-managed, reasonable shifts in customer behavior at
| scale.
|
| So they likely had no need for Rosetta as of 2009.
| runjake wrote:
| Link: https://en.wikipedia.org/wiki/QuickTransit
| lostgame wrote:
| From what I understand; they purchased a piece of software that
| already existed to translate PPC to x86 in some form or another
| and iterated on it. I believe the software may have already
| even been called 'Rosetta'.
|
| My memory is very hazy; though. While I experienced this
| transition firsthand and was an early Intel adopter, that's
| about all I can remember about Rosetta or where it came from.
|
| I remember before Adobe had released the Universal Binary CS3
| that running Photoshop on my Intel Mac was a total nightmare.
| :( I learned to not be an early adopter from that whole
| debacle.
| saagarjha wrote:
| Transitive.
| runjake wrote:
| Link: https://en.wikipedia.org/wiki/QuickTransit
| Asmod4n wrote:
| I don't know how they did it, but they did it very very slowly.
| Anything "interactive" was unuseable.
| lilyball wrote:
| Assuming you're talking about PPC-to-x86, it was certainly
| usable, though noticeably slower. Heck, I used to play Tron
| 2.0 that way, the frame rate suffered but it was still quite
| playable.
| scarface74 wrote:
| Interactive 68K programs were usually fast. The 68K programs
| would still call native PPC QuickDraw code. It was processor
| intensive code that was slow. Especially with the first
| generation 68K emulator.
|
| Connectix SpeedDoubler was definitely faster.
| duskwuff wrote:
| Most of the Toolbox was still running emulated 68k code in
| early Power Mac systems. A few bits of performance-critical
| code (like QuickDraw, iirc) were translated, but most
| things weren't.
| klelatti wrote:
| That's really interesting. You might enjoy reading about the VM
| embedded into the Busicom calculator that used the Intel 4004
| [1]
|
| They squeezed a virtual machine with 88 instructions into less
| than 1k of memory!
|
| [1] https://thechipletter.substack.com/p/bytecode-and-the-
| busico...
| wang_li wrote:
| In the mists of history S. Wozniak wrote the SWEET-16
| interpreter for the 6502. A VM with 29 instructions
| implemented in 300 bytes.
|
| https://en.wikipedia.org/wiki/SWEET16
| iainmerrick wrote:
| That is nifty! Sounds very similar to a Forth interpreter.
| vaxman wrote:
| Burn.
|
| (unintentional, which makes it even funnier)
| retskrad wrote:
| Apple Silicon will be Tim Cook's legacy.
| vaxman wrote:
| Rosetta 3 will probably be semantic evaluation of the origin and
| complete source-level reprogramming of the target. If it comes
| from Apple, it will translate everything to ARM and then
| digitally sign it to run in a native-mode sandbox under a version
| of Safari with a supporting runtime.
| hinkley wrote:
| Apple is doing some really interesting but really quiet work in
| the area of VMs. I feel like we don't give them enough credit but
| maybe they've put themselves in that position by not bragging
| enough about what they do.
|
| As a somewhat related aside, I have been watching Bun (low
| startup time Node-like on top of Safari's JavaScript engine) with
| enough interest that I started trying to fix a bug, which is
| somewhat unusual for me. I mostly contribute small fixes to tools
| I use at work. I can't quite grok Zig code yet so I got stuck
| fairly quickly. The "bug" turned out to be default behavior in a
| Zig stdlib, rather than in JavaScript code. The rest is fairly
| tangential but suffice it to say I prefer self hosted languages
| but this probably falls into the startup speed compromise.
|
| Being low startup overhead makes their VM interesting, but the
| fact that it benchmarks better than Firefox a lot of the time and
| occasionally faster than v8 is quite a bit of quiet competence.
| jraph wrote:
| > feel like we don't give them enough credit but maybe they've
| put themselves in that position by not bragging enough about
| what they do.
|
| And maybe also by keeping the technology closed and Apple-
| specific. Many people who could be interested in using it don't
| have access to it.
| freedomben wrote:
| Exactly. As someone who would be very interested in this, but
| don't use Apple products, it's just not exciting because it's
| not accessible to me (I can't even test it as a user). If
| they wanted to write a whitepaper about it to share
| knowledge, that might be interesting, but given that it's
| Apple I'm not gonna hold my breath.
| saagarjha wrote:
| Apple (mostly WebKit) writes a significant amount about how
| they designed their VMs.
| jolux wrote:
| WebKit B3 is open source: https://webkit.org/docs/b3/
| [deleted]
| Vt71fcAqt7 wrote:
| I hope Rosetta is here to stay and continues developement. And I
| hope what is learned from it can be used to make a RISC-V version
| of it. translating native ARM to RISC-V should be much easier
| than x86 to ARM as I understand it, so one could conceivably do
| x86 -> ARM -> RISC-V.
| rowanG077 wrote:
| I hope not. Rosetta 2, as cool as it is, is a crutch to allow
| Apple to transition away from x86. If it keeps beeing needing
| it's a massive failure for Apple and the ecosystem.
| klelatti wrote:
| More likely to be useful RISC-V to Arm then Apple can support
| running virtual machines for another architecture on its
| machines.
| masklinn wrote:
| > I hope Rosetta is here to stay and continues developement.
|
| It almost certainly is not. Odds are Apple will eventually
| remove Rosetta II, as they did Rosetta back in the days, once
| they consider the need for that bridge to be over (Rosetta was
| added in 2006 in 10.4, and removed in 2011 from 10.7).
|
| > And I hope what is learned from it can be used to make a
| RISC-V version of it. translating native ARM to RISC-V should
| be much easier than x86 to ARM as I understand it, so one could
| conceivably do x86 -> ARM -> RISC-V.
|
| That's not going to happen unless Apple decides to switch from
| ARM to RISC-V, and... why would they? They've got 15 years
| experience and essentially full control on ARM.
| Vt71fcAqt7 wrote:
| >That's not going to happen unless Apple decides to switch
| from ARM to RISC-V, and... why would they? They've got 15
| years experience and essentially full control on ARM.
|
| Two points here.
|
| * First off, Apple developers are not binded to Apple. The
| knkwledge gained can be used elsewhere. See Rivos and Nuvia
| for example.
|
| * Second, Apple reportedly has already ported many of it's
| secondary cores to RISC-V. It's not unreasonable that they
| will switch in 10 years or so.
| jrmg wrote:
| _Apple reportedly has already ported many of it 's
| secondary cores to RISC-V_
|
| Really? In current hardware or is this speculation?
| Symmetry wrote:
| If you've got some management core somewhere in your
| silicon you can, with RISC-V, give it a MMU but no FPU
| and save area. You're going to be writing custom embedded
| code anyways so you get to save silicon by only
| incorporating the features that you need instead of
| having to meet the full ARM spec. And you can add your
| own custom instructions for the job at hand pretty
| easily.
|
| That would all be a terrible idea if you were doing it
| for a core intended to run user applications, but that's
| not what Apple, Western Digital, NVidia are embracing
| RISC-V for embedded cores. If I were ARM I'd honestly be
| much more worried about RISC-V's threat to my R and M
| series cores than my A series cores.
| my123 wrote:
| Arm64 allows FPU-less designs. There are some around...
| Symmetry wrote:
| Sure. The FPU is optional on a Cortex M2, for instance.
| But those don't have MMUs. You'd certainly need an
| expensive architectural license to make something with an
| MMU but no FPU if you wanted to and given all the
| requirements ARM normally imposes for software
| compatibility[1] between cores I'd tend to doubt that
| they'd let you make something like that.
|
| [1] Explicitly testing that you don't implement total
| store ordering by default is one requirement I've heard
| people talk about to get a custom core licensed.
| masklinn wrote:
| Apple has an architecture license (otherwise they could
| not design their own cores, which they've been doing for
| close to a decade), and already had the ability to take
| liberties beyond what the average architecture licensee
| can, owing to _being one of ARM's founders_.
| saagarjha wrote:
| Don't think any are shipping, but they're hiring RISC-V
| engineers.
| Vt71fcAqt7 wrote:
| >Many dismiss RISC-V for its lack of software ecosystem
| as a significant roadblock for datacenter and client
| adoption, but RISC-V is quickly becoming the standard
| everywhere that isn't exposed to the OS. For example,
| Apple's A15 has more than a dozen Arm-based CPU cores
| distributed across the die for various non-user-facing
| functions. SemiAnalysis can confirm that these cores are
| actively being converted to RISC-V in future generations
| of hardware.[0]
|
| So to answer your question, it is not in currently in
| hardware, but it is more than just speculation.
|
| [0]https://www.semianalysis.com/p/sifive-powers-google-
| tpu-nasa...
| klelatti wrote:
| > it's not unreasonable that they will switch in 10 years
| or so.
|
| You've not provided any rationale at all for why they
| should switch their application cores let alone on this
| specific timetable.
|
| Switching is an expensive business and there has to be a
| major business benefit for Apple in return.
| chris_j wrote:
| For me, those two points make it clear that it would be
| _possible_ for Apple to port to RISC-V. But it 's still not
| clear what advantages they would gain from doing so, given
| that their ARM license appears to let them do whatever they
| want with CPUs that they design themselves.
| Vt71fcAqt7 wrote:
| The first point precludes Apple's gain from the
| discussion.
| quux wrote:
| It would be funny/not funny if in a few years Apple removes
| Rosetta 2 for Mac apps but keeps the Linux version forever so
| docker can run at reasonable speeds.
| kccqzy wrote:
| > They've got 15 years experience
|
| Did you only start counting from 2007 when the iPhone was
| released? All the iPods prior to that were using ARM
| processors. The Apple Newton was using ARM processors.
| EricE wrote:
| iPods and Newton were entirely different chips and OS's.
| The first iPods weren't even on an OS that Apple created -
| they licensed it.
| masklinn wrote:
| > All the iPods prior to that were using ARM processors.
|
| Most of the original device was outsourced and contracted
| out (for reasons of time constraint and lack of internal
| expertise). PortalPlayer built the SoC and OS, not Apple.
| Later SoC were sourced from SigmaTel and Samsung, until the
| 3rd gen Touch.
|
| > The Apple Newton was using ARM processors.
|
| The Apple Newton was a completely different Apple, and
| there were several years' gap between Jobs killing the
| Newton and the birth of iPod, not to mention the completely
| different purpose and capabilities. There would be no
| newton-type project until the iPhone.
|
| Which is also when Apple started working with silicon
| themselves: they acquired PA in 2008, Intrinsity in 2010,
| and Passif in 2013, released their first partially in-house
| SoC in 2010 (A4), and their first in-house core in 2013
| (Cyclone, in the A7).
| stu2b50 wrote:
| Rosetta 1 had a ticking time bomb. Apple was licensing it
| from a 3rd party. Rosetta 2 is all in house as far as we
| know.
|
| Different CEO as well. Jobs was more opinionated on
| "principles" - Cook is more than happy to sell what people
| will buy. I think Rosetta 2 will last.
| masklinn wrote:
| > Rosetta 1 had a ticking time bomb. Apple was licensing it
| from a 3rd party.
|
| Yes, I'm sure Apple had no way of extending the license.
|
| > Cook is more than happy to sell what people will buy. I
| think Rosetta 2 will last.
|
| There's no "buy" here.
|
| Rosetta is complexity to maintain, and an easy cut. It's
| not even part of the base system.
|
| And "what people will buy" certainly didn't prevent
| essentially removing support for non-hidpi displays from
| MacOS. Which is a lot more impactful than Rosetta as far as
| I'm concerned.
| NavinF wrote:
| > removing support for non-hidpi displays from MacOS
|
| Did that really reduce sales? Consider that the wide
| availability of crappy low end hardware gave Windows
| laptops a terrible reputation. Eg https://www.reddit.com/
| r/LinusTechTips/comments/yof7va/frien...
| masklinn wrote:
| > Consider that the wide availability of crappy low end
| hardware gave Windows laptops a terrible reputation.
|
| Standard DPI displays are not "crappy low-end hardware"?
|
| I don't think there's a single widescreen display which
| qualifies as hiDPI out there, that more or less doesn't
| exist: a 5K 34" is around 160 DPI (to say nothing of the
| downright pedestrian 5K 49" like the G9 or the AOC Agon).
| fredoralive wrote:
| What do you mean non HiDPI display support being removed
| from Mac OS? I've been using a pair of 1920x1080 monitors
| with my Mac Mini M1 just fine? Have they somehow broken
| something in Mac OS 13 / Ventura? (I haven't clicked the
| upgrade button yet, I prefer to let others leap boldly
| first).
| bpye wrote:
| They've also allowed Rosetta 2 in Linux VMs - if they are
| serious about supporting those use cases then I think it'll
| stay.
| kitsunesoba wrote:
| We'll see, but even post-Cook Apple historically hasn't
| liked the idea of third parties leaning on bridge
| technologies for too long. Things like Rosetta are offered
| as temporary affordances to allow time for devs to migrate,
| not as a permanent platform fixture.
| vaxman wrote:
| But that 3rd party was only legally at arm's length.
| TillE wrote:
| What important Intel-only macOS software is going to exist
| in five years?
|
| It's basically only games and weird tiny niches, and Apple
| is pretty happy to abandon both those categories. The
| saving grace is that there's very few interesting Mac-
| exclusive games in the Intel era.
| flomo wrote:
| Yeah, Apple killed all "legacy" 32-bit support, so one
| would think there's not much software which is both
| x86-64 and not being actively developed.
| vxNsr wrote:
| 2006 Apple was very different from 2011 Apple, renewing
| that license in 2011 was probably considered cost
| prohibitive for the negligible benefit.
| rerx wrote:
| Starting with Ventura, Linux VMs can use Rosetta 2 to run
| x64 executables. I expect x64 Docker containers to remain
| relevant for quite a few years to come. Running those at
| reasonable speeds on Apple Silicon would be huge for
| developers.
| dmitriid wrote:
| > Jobs was more opinionated on "principles" - Cook is more
| than happy to sell what people will buy.
|
| Well, the current "principle" is "iOS is enough, we're
| going to run iOS apps on MacOS, and that's it".
|
| Rosetta isn't needed for that.
| dmitriid wrote:
| It's strange to see people downvoting this when three
| days ago App Store on MacOS literally defaulted to
| searching iOS and iPad apps for me
| https://twitter.com/dmitriid/status/1589179351572312066
| CharlesW wrote:
| > _Odds are Apple will eventually remove Rosetta II, as they
| did Rosetta back in the days, once they consider the need for
| that bridge to be over (Rosetta was added in 2006 in 10.4,
| and removed in 2011 from 10.7)._
|
| The difference is that Rosetta 1 was PPC - x86, so its
| purpose ended once PPC was a fond memory.
|
| Today's Rosetta is a generalized x86 - ARM translation
| environment that isn't just for macOS apps. For example, it
| works with Apple's new virtualization framework to support
| running x86_64 Linux apps in ARM Linux VMs.
|
| https://developer.apple.com/documentation/virtualization/run.
| ..
| gumby wrote:
| > That's not going to happen unless Apple decides to switch
| from ARM to RISC-V, and... why would they? They've got 15
| years experience and essentially full control on ARM.
|
| 15? More than a quarter century. They were one of the
| original investors in ARM and have produced plenty of arm
| devices since then beyond the newton and the ipod.
|
| I'd bet they use a bunch of risc v internally too if they
| just need a little cpu to manage something locally on some
| device and just want to avoid paying a tiny fee to ARM or
| just want some experience with it.
|
| But RISC V as the main CPU? Yes, that's a long way away, if
| ever. But apple is good at the long game. I wouldn't be
| surprised to hear that Apple has iOS running on RISC V, but
| even something like the lightning-to-HDMI adapter runs IOS on
| ARM.
| masklinn wrote:
| > 15? More than a quarter century. They were one of the
| original investors in ARM and have produced plenty of arm
| devices since then beyond the newton and the ipod.
|
| They didn't design their own chips for most of that time.
| gumby wrote:
| At the same time as the ARM investment they had a Cray
| for...chip design.
| masklinn wrote:
| Yes and?
|
| Apple invested in ARM and worked with ARM/Acorn on what
| would become ARM6, in the early 90s. The newton uses it
| (specifically the ARM610), it is a commercial failure,
| later models use updated ARM CPUs to which AFAIK Apple
| didn't contribute (DEC's StrongARM, and ARM's ARM710).
|
| <15 years pass>
|
| Apple starts working on bespoke designs again around the
| time they start working on the iPhone, or possibly after
| they realise it's succeeding.
|
| That doesn't mean they stopped _using_ ARM in the
| meantime (they certainly didn 't).
|
| The iPod's SoC was not even designed internally (it was
| contracted out to PortalPlayer, later generations were
| provided by Samsung). 15 times and the revolution of
| Jobs' return (and his immediate killing of the Newton) is
| a long time for an internal team of silicon designers.
| preisschild wrote:
| > They've got 15 years experience and essentially full
| control on ARM.
|
| Do they? ARM made it very clear that they consider all ARM
| cores their own[1]
|
| [1]: https://www.theregister.com/2022/11/07/opinion_qualcomm_
| vs_a...
| nicoburns wrote:
| Apple is in a somewhat different position to Qualcomm in
| that they were a founding member of ARM. I've also heard
| rumours that aarch64 was designed by apple and donated to
| ARM (hence why apple was so early to release an aarch64
| processor). So I somewhat doubt ARM will be a position to
| sue them any time soon.
| danaris wrote:
| The Qualcomm situation is based on breaches of a specific
| agreement that ARM had with Nuvia, which Qualcomm has now
| bought. It's not a generalizable "ARM thinks everything
| they license belongs to them fully in perpetuity" deal.
| masklinn wrote:
| > Do they?
|
| They do, yes. They were one of the founding 3 members of
| ARM itself, and the primary monetary contributor.
|
| Through this they acquired privileges which remain extant:
| they can literally add custom instructions to the ISA
| (https://news.ycombinator.com/item?id=29798744), something
| there is no available license for.
|
| > ARM made it very clear that they consider all ARM cores
| their own[1]
|
| The Qualcomm situation is a breach of contract issue wrt
| Nuvia, it's a very different issue, and by an actor with
| very different privileges.
| Vt71fcAqt7 wrote:
| Is there a real source for this claim? It gets parroted a
| lot on HN and elsewhere, but I've also heard it's greatly
| exagerated. I don't think Apple engineers get to read the
| licences, and even if they did, how do we know they
| understood it corretly and that it got repeated
| correctlty? I've never seen a valid source for this
| claim.
| masklinn wrote:
| For what claim? They they co-founded ARM? That's
| historical record. That they extended the ISA? That's
| literally observed from decompilations. That they can do
| so? They've been doing it for at least 2 years and ARM
| has yet to sue.
|
| > I've never seen a valid source for this claim.
|
| What is "a valid source"? The linked comment is from
| Hector Martin, the founder and lead of Asahi, who worked
| on and assisted with reversing various facets of Apple
| silicon, including the capabilities and extensions of the
| ISA.
| Vt71fcAqt7 wrote:
| >For what claim?
|
| that they have "essentially full control on ARM"
|
| Having an ALA + some extras doesn't mean "full control."
|
| he also says:
|
| >And apparently in Apple's case, they get to be a little
| bit incompatible
|
| So he doesn't seem to actually know the full extent to
| which Apple has more rights, even using the phrase "a
| little bit" -- far from your claim. And he (and certainly
| you) has not read the license. Perhaps they have to pay
| for each core they release on the market that breaks
| compatabilty? Do you know? Of course not. A valid source
| would be a statement from someone who read the license or
| one of the companies. There is more to a core than just
| the ISA. If not, why is Apple porring cores to RISC-V? If
| they have so much control ?
| ksherlock wrote:
| Why does it need a "real source"? ARM sells architecture
| licenses, Apple has a custom ARM architecture. 1 + 1 = 2.
|
| https://www.cnet.com/tech/tech-industry/apple-seen-as-
| likely...
|
| "ARM Chief Executive Warren East revealed on an earnings
| conference call on Wednesday that "a leading handset
| OEM," or original equipment manufacturer, has signed an
| architectural license with the company, forming ARM's
| most far-reaching license for its processor cores. East
| declined to elaborate on ARM's new partner, but EETimes'
| Peter Clarke could think of only one smartphone maker who
| would be that interested in shaping and controlling the
| direction of the silicon inside its phones: Apple."
|
| https://en.wikipedia.org/wiki/Mac_transition_to_Apple_sil
| ico...
|
| "In 2008, Apple bought processor company P.A. Semi for
| US$278 million.[28][29] At the time, it was reported that
| Apple bought P.A. Semi for its intellectual property and
| engineering talent.[30] CEO Steve Jobs later claimed that
| P.A. Semi would develop system-on-chips for Apple's iPods
| and iPhones.[6] _Following the acquisition, Apple signed
| a rare "Architecture license" with ARM, allowing the
| company to design its own core, using the ARM instruction
| set_.[31] The first Apple-designed chip was the A4,
| released in 2010, which debuted in the first-generation
| iPad, then in the iPhone 4. Apple subsequently released a
| number of products with its own processors."
|
| https://www.anandtech.com/show/7112/the-arm-diaries-
| part-1-h...
|
| "Finally at the top of the pyramid is an ARM architecture
| license. Marvell, Apple and Qualcomm are some examples of
| the 15 companies that have this license."
| Vt71fcAqt7 wrote:
| I should have been more explicit. I am questioning the
| claim that Apple has "full control on ARM" with no
| restriction on the cores they make, grandfathered in from
| the 1980s. Nobody has ever substantiated that claim.
| titzer wrote:
| Rosetta 2 is great, except it apparently can't run statically-
| linked (non-PIC) binaries. I am unsure why this limitation
| exists, but it's pretty annoying because Virgil x86-64-binaries
| cannot run under Rosetta 2, which means I resort to running on
| the JVM on my M1...
| randyrand wrote:
| Why are static binaries with PIC so rare? I'm surprised
| position dependent code is _ever_ used anymore in the age of
| ASLR.
|
| But static binaries are still great for portability. So you'd
| think static binaries with PIC would be the default.
| masklinn wrote:
| > But static binaries are still great for portability.
|
| macOS has not officially supported static binaries in...
| ever? You can't statically link libSystem, and it absolutely
| does not care for kernel ABI stability.
| titzer wrote:
| > it absolutely does not care for kernel ABI stability
|
| That may be true on the mach system call side, but the UNIX
| system calls don't appear to change. (Virgil actually does
| call the kernel directly).
| masklinn wrote:
| > That may be true on the mach system call side, but the
| UNIX system calls don't appear to change.
|
| They very much do, without warning, as the Go project
| discovered (after having been warned multiple times)
| during the Sierra betas:
| https://github.com/golang/go/issues/16272
| https://github.com/golang/go/issues/16606
|
| That doesn't mean Apple goes outs of its way to break
| syscalls (unlike microsoft), but there is no support for
| direct syscalls. That is why, again, you can't statically
| link libSystem.
|
| > (Virgil actually does call the kernel directly).
|
| That's completely unsupported -\\_(tsu)_/-
| titzer wrote:
| Virgil doesn't use ASLR. I'm not sure what value it adds to a
| memory-safe language.
| saagarjha wrote:
| Rosetta can run statically linked binaries, but I don't think
| anything supports binaries that aren't relocatable.
| $ file a.out a.out: Mach-O 64-but executable x86_64
| $ tool -L a.out a.out: $ ./a.out Hello,
| world!
| CharlesW wrote:
| > _Rosetta 2 is great, except it apparently can 't run
| statically-linked (non-PIC) binaries._
|
| Interestingly, it supports statically-linked x86 binaries when
| used with Linux.
|
| "Rosetta can run statically linked x86_64 binaries without
| additional configuration. Binaries that are dynamically linked
| and that depend on shared libraries require the installation of
| the shared libraries, or library hierarchies, in the Linux
| guest in paths that are accessible to both the user and to
| Rosetta."
|
| https://developer.apple.com/documentation/virtualization/run...
| mirashii wrote:
| Statically linked binaries are officially unsupported on MacOS
| in general, so there's no reason to support it on Rosetta
| either.
|
| It's unsupported in MacOS because it assumes binary
| compatibility on the kernel system call interface, which is not
| guaranteed.
| saagarjha wrote:
| Rosetta was introduced with the promise that it supports
| binaries that make raw system calls. (And it does indeed
| support these by hooking the syscall instruction.)
| darzu wrote:
| Does anyone know the names of the key people behind Rosetta 2?
|
| In my experience, exceptionally well executed tech like this
| tends to have 1-2 very talented people leading. I'd like to
| follow their blog or Twitter.
| trollied wrote:
| The original Rosetta was written by Transitive, which was
| formed by spinning a Manchester University research group out.
| See https://www.software.ac.uk/blog/2016-09-30-heroes-
| software-e...
|
| I know a few of their devs went to ARM, some to Apple & a few
| to IBM (who bought Transitive). I do know a few of their ex
| staff (and their twitter handles), but I don't feel comfortable
| linking them here.
| scrlk wrote:
| IIRC the current VP of Core OS at Apple is ex-
| Manchester/Transitive.
| cwzwarich wrote:
| I am the creator / main author of Rosetta 2. I don't have a
| blog or a Twitter (beyond lurking).
| darzu wrote:
| Should you feel inspired to share your learnings, insights,
| or future ideas about the computing spaces you know, me and
| I'm sure many other people would be interested to listen!
|
| My preferred way to learn about a new (to me) area of tech is
| to hear the insights of the people who have provably advanced
| that field. There's a lot of noise to signal in tech blogs.
| darzu wrote:
| If you're feeling inclined, here's a slew of questions:
|
| What was the most surprising thing you learned while working
| on Rosetta 2?
|
| Is there anything (that you can share) that you would do
| differently?
|
| Can your recommend any great starting places for someone
| interested in instruction translation?
|
| Looking forward, did your work on Rosetta give you ideas for
| unfilled needs in the virtualization/emulation/translation
| space?
|
| What's the biggest inefficiency you see today in the tech
| stacks you interact most with?
|
| A lot of hard decisions must have been made while building
| Rosetta 2; can you shed light on some of those and how you
| navigated them?
| pcf wrote:
| Thanks for your amazing work!
|
| May I ask - would it be possible to implement support for
| 32-bit VST and AU plugins?
|
| This would be a major bonus, because it could e.g. enable
| producers like me to open up our music projects from earlier
| times, and still have the old plugins work.
| [deleted]
| Klonoar wrote:
| Huh, this is timely. Incredibly random but: do you know if
| there was anything that changed as of Ventura to where trying
| to mmap below the 2/4GB boundary would no longer work in
| Rosetta 2? I've an app where it's worked right up to Monterey
| yet inexplicably just bombs in Ventura.
| keepquestioning wrote:
| Isn't Rosetta 2 "done"? What are you working on now?
| bdash wrote:
| Impressive work, Cameron! Hope you're doing well.
| skrrtww wrote:
| Are you able to speak at all to the known performance
| struggles with x87 translation? Curious to know if we're
| likely to see any updates or improvements there into the
| future.
| peatmoss wrote:
| Not having any particular domain experience here, I've idly
| wondered whether or not there's any role for neural net models in
| translating code for other architectures.
|
| We have giant corpuses of source code, compiled x86_64 binaries,
| and compiled arm64 binaries. I assume the compiled binaries
| represent approximately our best compiler technology. It seems
| predicting an arm binary from an x86_64 binary would not be
| insane?
|
| If someone who actually knows anything here wants to disabuse me
| of my showerthoughts, I'd appreciate being able to put the idea
| out of my head :-)
| Symmetry wrote:
| Many branch predictors have traditionally used perceptrons,
| which are sort of NN like. And I think there's a lot of
| research into involving incorporating deep learning models into
| doing chip routings.
| Someone wrote:
| > It seems predicting an arm binary from an x86_64 binary would
| not be insane?
|
| If you start with a couple of megabytes of x64 code, and
| predict a couple of megabytes of arm code from it, there will
| be errors even if your model is 99.999% accurate.
|
| How do you find the error(s)?
| hinkley wrote:
| I think we are on the cusp of machine aided rules generation
| via example and counter example. It could be a very cool era of
| "Moore's Law for software" (which I'm told software doubles in
| speed roughly every 18 years).
|
| Property based testing is a bit of a baby step here, possibly
| in the same way that escape analysis in object allocation was
| the precursor to borrow checkers which are the precursor to...?
|
| These are my inputs, these are my expectations, ask me some
| more questions to clarify boundary conditions, and then offer
| me human readable code that the engine thinks satisfies the
| criteria. If I say no, ask more questions and iterate.
|
| If anything will ever allow machines to "replace" coders, it
| will be that, but the scare quotes are because that shifts us
| more toward information architecture from data munging, which I
| see as an improvement on the status quo. Many of my work
| problems can be blamed on structural issues of this sort. A
| filter that removes people who can't think about the big
| picture doesn't seem like a problem to me.
| saagarjha wrote:
| People have tried doing this, but not typically at the
| instruction level. Two ways to go about this that I'm aware of
| are trying to use machine learning to derive high-level
| semantics about code, then lowering it to the new architecture.
| brookst wrote:
| I'm a ML dilletante and hope someone more knowledgeable chimes
| in, but one thing to consider is the statistics of how many
| instructions you're translating and the accuracy rate. Binary
| execution is very unforgiving to minor mistakes in translation.
| If 0.001% of instructions are translated incorrectly, that
| program just isn't going to work.
| qsort wrote:
| You would need a hybrid architecture with a NN generating
| guesses and a "watchdog" shutting down errors.
|
| Neural models are basically universal approximators. Machine
| code needs to be obscenely precise to work.
|
| Unless you're doing something else in the backend, it's just a
| turbo SIGILL generator.
| throw10920 wrote:
| This is all true - machine code needs to be "basically
| perfect" to work.
|
| However, there are lots of problems in CS that are easier to
| check the answer to a solution than to solve in the first
| place. It _may_ turn out to be the case that a well-tuned
| model can quickly produce solutions to some code-generation
| problems, that those solutions have a high enough likelihood
| of being correct, that it 's fast enough to check (and maybe
| try again), and that this entire process is faster than
| state-of-the-art classical algorithms.
|
| However, if that were the case, I might also expect us to be
| able to extract better algorithms from the model -
| intuitively, machine code generation "feels" like something
| that's just better implemented through classical algorithms.
| Have you met a human that can do register allocation faster
| than LLVM?
| classichasclass wrote:
| > turbo SIGILL generator
|
| This gave me the delightful mental image of a CPU smashing
| headlong into a brick wall, reversing itself, and doing it
| again. Which is pretty much what this would do.
| ericbarrett wrote:
| Anybody know if Docker has plans to move from qemu to Rosetta on
| M1/2 Macs? I've found qemu to be at least 100x slower than the
| native arch.
| jeffbee wrote:
| I wonder how much hand-tuning there is in Rosetta 2 for known,
| critical routines. One of the tricks Transmeta used to get
| reasonable performance on their very slow Crusoe CPU was to
| recognize critical Windows functions and replace them with a
| library of hand-optimized native routines. Of course that's a
| little different because Rosetta 2 is targeting an architecture
| that is generally speaking at least as fast as the x86
| architecture it is trying to emulate, and that's been true for
| most cross-architecture translators historically like DEC's VEST
| that ran VAX code on Alpha, but Transmeta CMS was trying to
| target a CPU that was slower.
| saagarjha wrote:
| Haven't spotted any in particular.
| sedatk wrote:
| TL;DR: One-to-one instruction translation ahead of time instead
| of complex JIT translations to bet on M1's performance and
| instruction cache handling.
| johnthuss wrote:
| "I believe there's significant room for performance improvement
| in Rosetta 2... However, this would come at the cost of
| significantly increased complexity... Engineering is about making
| the right tradeoffs, and I'd say Rosetta 2 has done exactly
| that."
| Gigachad wrote:
| Would be a waste of effort when the tool is designed to be
| obsolete in a few years as everything gets natively compiled.
| saagarjha wrote:
| One thing that's interesting to note is that the amount of effort
| expended here is not actually all that large. Yes, there are
| smart people working on this, but the performance of Rosetta 2
| for the most part is probably the work of a handful of clever
| people. I wouldn't be surprised if some of them have an interest
| in compilers but the actual implementation is fairly
| straightforward and there isn't much of the stuff you'd typically
| see in an optimizing JIT: no complicated type theory or analysis
| passes. Aside from a handful of hardware bits and some convenient
| (perhaps intentionally selected) choices in where to make
| tradeoffs there's nothing really specifically amazing here. What
| really makes it special is that anyone (well, any company with a
| bit of resources) could've done it but nobody really did. (But,
| again, Apple owning the stack and having past experience probably
| did help them get over the hurdle of actually putting effort into
| this.)
| pjmlp wrote:
| Back in the early days of Windows NT everywhere, the Alpha
| version had a similar JIT emulation.
| agentcooper wrote:
| I am interested in this domain, but lacking knowledge to fully
| understand the post. Any recommendations on good
| books/courses/tutorials related to low level programming?
| saagarjha wrote:
| I'd recommend going through a compilers curriculum, then
| reading up on past binary translation efforts.
| pjmlp wrote:
| Back in the early days of Windows NT everywhere, the Alpha
| version had a similar JIT emulation.
|
| https://en.m.wikipedia.org/wiki/FX!32
|
| Or for a more technical deep dive,
|
| https://www.usenix.org/publications/library/proceedings/usen...
| mosburger wrote:
| OMG I forgot about FX!32. My first co-op was as a QA tester for
| the DEC Multia, which they moved from the Alpha processor to
| Intel midway through. I did a skunkworks project for the dev
| team attempting to run the newer versions of Multia's software
| (then Intel-based) on older Alpha Multias using FX!32. IIRC it
| was still internal use only/beta, but it worked quite well!
| hot_gril wrote:
| Rosetta 2 has become the poster child for "innovation without
| deprecation" where I work (not Apple).
| Tijdreiziger wrote:
| Apple is the king of deprecation, just look at what happened to
| Rosetta 1 and 32-bit iOS apps.
| hot_gril wrote:
| Yes they are, and that makes Rosetta 2 even more special.
| Though Rosetta 1 got support for 5 years, which is pretty
| good.
| kccqzy wrote:
| > The instructions from FEAT_FlagM2 are AXFLAG and XAFLAG, which
| convert floating-point condition flags to/from a mysterious
| "external format". By some strange coincidence, this format is
| x86, so these instruction are used when dealing with floating
| point flags.
|
| This really made me chuckle. They probably don't want to mention
| Intel by name, but this just sounds funny.
|
| https://developer.arm.com/documentation/100076/0100/A64-Inst...
| manv1 wrote:
| Apple's historically been pretty good at making this stuff. Their
| first 68k -> PPC emulator (Davidian's) was so good that for some
| things the PPC Mac was the fastest 68k mac you could buy. The
| next-gen DR emulator (and SpeedDoubler etc) made things even
| faster.
|
| I suspect the ppc->x86 stuff was slower because x86 just doesn't
| have the registers. There's only so much you can do.
| scarface74 wrote:
| > Their first 68k -> PPC emulator (Davidian's) was so good that
| for some things the PPC Mac was the fastest 68k mac you could
| buy.
|
| This is not true. A 6100/60 running 68K code was about the
| speed of my unaccelerated Mac LCII 68030/16. Even when using
| SpeedDoubler, you only got speeds up to my LCII with a
| 68030/40Mhz accelerator.
|
| Even the highest end 8100/80 was slower than a high end 68k
| Quadra.
|
| The only time 68K code ran faster is when it made heavy use of
| the Mac APIS that were native.
| dev_tty01 wrote:
| >The only time 68K code ran faster is when it made heavy use
| of the Mac APIS that were native.
|
| Yes, and that just confirms the original point. Mac apps
| often spend a lot of time in the OS apis and therefore the
| 68K code (the app) often ran faster on PPC than it did on 68K
| because apps often spend much of their time in OS apis. The
| earlier post said "so good that for some things the PPC Mac
| was the fastest 68k mac." That is true.
|
| In my own experience, I found most 68K apps felt as fast or
| faster. Your app mix might have been different, but many
| folks found the PPC faster.
| classichasclass wrote:
| Part of that was the greater clock speeds on the 601 and
| 603, though. Those _started_ at 60MHz. Clock for clock 68K
| apps were generally poorer on PowerPC until PPC clock
| speeds made them competitive, and then the dynamic
| recompiling emulator knocked it out of the park.
|
| Similarly, Rosetta was clock-for-clock worse than Power
| Macs at running Power Mac applications. The last generation
| G5s would routinely surpass Mac Pros of similar or even
| slightly greater clocks. On native apps, though, it was no
| contest, and by the next generation the sheer processor
| oomph put the problem completely away.
|
| Rosetta 2 is notable in that it is so far Apple's only
| processor transition where the new architecture was
| unambiguously faster than the old one _on the old one 's
| own turf_.
| Wowfunhappy wrote:
| > Apple's historically been pretty good at making this stuff.
| Their first 68k -> PPC emulator (Davidian's) was so good that
| for some things the PPC Mac was the fastest 68k mac you could
| buy.
|
| Not arguing the facts here, but I'm curious--are these
| successes related? And if so, how has Apple done that?
|
| I would imagine that very few of the engineers who programmed
| Apple's 68k emulator are still working at Apple today. So, why
| is Apple still so good at this? Strong internal documentation?
| Conducive management practices? Or were they just lucky both
| times?
| joshstrange wrote:
| I mean they are one of very few companies who have done arch
| changes like this and they had already done it twice before
| Rosetta 2. The same engineers might not have been used for
| all 3 but I'm sure there was at least a tiny bit of overlap
| between 68k->PPC and PPC->Intel (and likewise overlap between
| PPC->Intel and Intel->ARM) that coupled with passed down
| knowledge within the company gives them a leg up. They know
| the pitfalls, they've see issues/advantages of using certain
| approaches.
|
| I think of it in same way that I've migrated from old->new
| versions of frameworks/languages in the past with breaking
| changes and each time I've done it I've gotten better at
| knowing what to expect, what to look for, places where it
| makes sense to "just get it working" or "upgrade the code to
| the new paradigm". The first time or two I did it was as a
| junior working under senior developers so I wasn't as
| involved but what did trickle down to me and/or my part in
| the refactor/upgrade taught me things. Later times when I was
| in charge (or on my own) I was able to draw on those past
| experiences.
|
| Obviously my work is nowhere near as complicated as arch
| changes but if you squint and turn your head to the side I
| think you can see the similarities.
|
| > Or were they just lucky to have success both times?
|
| I think 2 times might be explained with "luck" but being
| successful 3 times points to a strong trend IMHO, especially
| since Rosetta 2 seems to have done even better than Rosetta 1
| for the last transition.
| spacedcowboy wrote:
| FWIW, I know several current engineers at Apple who wrote
| ground-breaking stuff before the Mac even existed. Apple
| certainly doesn't have any problem with older engineers, and
| it turns out that transferring that expertise to new chips on
| demand isn't particularly hard for them.
| nordsieck wrote:
| > I suspect the ppc->x86 stuff was slower because x86 just
| doesn't have the registers.
|
| My understanding is that part of the reason the G4/5 was sort
| of able to keep up with x86 at the time was due to the heavy
| use of SIMD in some apps. And I doubt that Rosetta would have
| been able to translate that stuff into SSE (or whatever the x86
| version of SIMD was at the time) on the fly.
| bonzini wrote:
| Apple had a library of SIMD subroutines (IIRC
| Accelerate.framework) and Rosetta was able to use the x86
| implementation when translating PPC applications that called
| it.
| masklinn wrote:
| Rosetta actually did support Altivec. It didn't support G5
| input at all though (but likely because that was considered
| pretty niche, as Apple only released a G5 iMac, a PowerMac,
| and an XServe, due to the out-of-control power and thermals
| of the PowerPC 970).
| menaerus wrote:
| > Rosetta 2 translates the entire text segment of the binary from
| x86 to ARM up-front.
|
| Do I understand correctly that the Rosetta is basically a
| transpiler from x86-64 machine code to ARM machine code which is
| run prior to the binary execution? If so, does it affect the
| application startup times?
| nilsb wrote:
| Yes, it does. The delay of the first start of an app is quite
| noticeable. But the transpiled binary is apparently cached
| somewhere.
| saagarjha wrote:
| /var/db/oah.
| nicoburns wrote:
| > If so, does it affect the application startup times?
|
| It does, but only the very first time you run the application.
| The result of the transpilation is cached so it doesn't have to
| be computed again until the app is updated.
| arianvanp wrote:
| And deleting the cache is undocumented (it is not in the file
| system) so if you run Mac machines as CI runners they will
| trash and brick themselves running out of disk space over
| time.
| rowanG077 wrote:
| What in the actual fuck. That is such an insane decision.
| Where is it stored then? Some dark corner of the file
| system inaccessible via normal means?
| jonny_eh wrote:
| You mean the cache is ever expanding?
| koala_man wrote:
| Really? This SO question says it's stored in /var/db/oah/
|
| https://apple.stackexchange.com/questions/427695/how-can-
| i-l...
| dylan604 wrote:
| Does that essentially mean each non-native app is doubled in
| disk use? Maybe not doubled but requires more space to be
| sure.
| saagarjha wrote:
| Yes.
| varenc wrote:
| Yes... you can see the cache in /var/db/oah/
|
| Though only the actual binary size that gets doubled. For
| large apps it's usually not the binary that's taking up
| most of the space.
| kijiki wrote:
| Similar to DEC's FX!32 in that regard. FX!32 allowed running
| x86 Windows NT apps on Alpha Windows NT.
| saltcured wrote:
| There was also an FX!32 for Linux. But I think it may have
| only included the interpreter part and left out the
| transpiler part. My memory is vague on the details.
|
| I do remember that I tried to use it to run the x86
| Netscape binary for Linux on a surplus Alpha with RedHat
| Linux. It worked, but so slowly that a contemporary Python-
| based web browser had similar performance. In practice, I
| settled on running Netscape from a headless 486 based PC
| and displaying remotely on the Alpha's desktop over
| ethernet. That was much more usable.
| esskay wrote:
| The first load is fairly slow, but once it's done it every load
| after that is pretty much identical to what it'd be running on
| an x86 mac due to the caching it does.
| EricE wrote:
| For me my M1 was fast enough that the first load didn't seem
| that different - and more importantly subsequent loads were
| lighting fast! It's astonishing how good Rosetta 2 is -
| utterly transparent and faster than my Intel Mac thanks to
| the M1.
| savoytruffle wrote:
| If installed using a packaged installer, or the App Store,
| the translation is done during installation instead of at
| first run. So, slow 1st launch may be uncommon for a lot of
| apps or users.
| hinkley wrote:
| I remember years ago when Java adjacent research was all the
| rage, HP had a problem that was "Rosetta lite" if you will. They
| had a need to run old binaries on new hardware that wasn't
| exactly backward compatible. They made a transpiler that worked
| on binaries. It might have even been a JIT but that part of the
| memory is fuzzy.
|
| What made it interesting here was that as a sanity check they
| made an A->A mode where they took in one architecture and spit
| out machine code for the same architecture. The output was faster
| than the input. Meaning that even native code has some room for
| improvement with JIT technology.
|
| I have been wishing for years that we were in a better place with
| regard to compilers and NP complete problems where the compilers
| had a fast mode for code-build-test cycles and a very slow
| incremental mode for official builds. I recall someone telling me
| the only thing they liked about the Rational IDE (C and C++?) was
| that it cached precompiled headers, one of the Amdahl's Law areas
| for compilers. If you changed a header, you paid the
| recompilation cost and everyone else got a copy. I love whenever
| the person that cares about something gets to pay the consequence
| instead of externalizing it on others.
|
| And having some CI machines or CPUs that just sit around chewing
| on Hard Problems all day for that last 10% seems to be to be a
| really good use case in a world that's seeing 16 core consumer
| hardware. Also caching hints from previous runs is a good thing.
| fuckstick wrote:
| > The output was faster than the input.
|
| So if you ran the input back through the output multiple times
| then that means you could eventually get the runtime down to 0.
| twic wrote:
| But unfortunately, the memory use goes to infinity.
| avidiax wrote:
| Probably the output of the decade-old compiler that produced
| the original binary had no optimizations.
| hinkley wrote:
| That too but the eternal riddle of optimizer passes is
| which ones reveal structure and which obscure it. Do I loop
| unroll or strength reduce first? If there are heuristics
| about max complexity for unrolling or inlining then it
| might be "both".
|
| And then there's processor family versus this exact model.
| zaphirplane wrote:
| Is this for itanium
| tomcam wrote:
| I'm likely misunderstanding what you said, but I thought pre-
| compiled headers were pretty much standard these days.
| wmf wrote:
| https://www.hpl.hp.com/techreports/1999/HPL-1999-78.html
| travisgriggs wrote:
| It was particularly poignant at the time because JITed
| languages were looked down on by the "static compilation
| makes us faster" crowd. So it was a sort of "wait a minute
| Watson!" moment in that particular tech debate.
|
| No one cares as much now days, we've moved our overrated
| opinion battlegrounds to other portions of what we do.
| pjmlp wrote:
| I eventually changed my opinion into JIT being the only way
| to make dynamic languages faster, while strong typed ones
| can benefit from having both AOT/JIT for different kinds of
| deployment scenarios, and development workflows.
| titzer wrote:
| Dynamic languages need inline caches, type feedback, and
| fairly heavy inlining to be competitive. Some of that can
| be gotten offline, e.g. by doing PGO. But you can't, in
| general, adapt to a program that suddenly changes phases,
| or rebinds a global that was assumed a constant, etc.
| Speculative optimizations with deopt are what make
| dynamic languages fast.
| hinkley wrote:
| Before I talked myself out of writing my own programming
| language, I used to have lunch conversations with my
| mentor who was also speed obsessed about how JIT could
| meet Knuth in the middle by creating a collections API
| with feedback guided optimization, using it for algorithm
| selection and tuning parameters by call site.
|
| For object graphs in Java you can waste exorbitant
| amounts of memory by having a lot of "children" members
| that are sized for a default of 10 entries but the normal
| case is 0-2. I once had to deoptimize code where someone
| tried to do this by hand and the number they picked was 6
| (just over half of the default). So when the average
| jumped to 7, then the data structure ended up being 20%
| larger than the default behavior instead of 30% smaller
| as intended.
|
| For a server workflow, having data structured tuned to
| larger pools of objects with more complex comparison
| operations can also be valuable, but I don't want that
| kitchen sink stuff on mobile or in an embedded app.
|
| I still think this is viable, but only if you are clever
| about gathering data. For instance the incremental
| increase in runtime for telemetry data is quite high on
| the happy path. But corner cases are already expensive,
| so telemetry adds only a few percent there instead of
| double digits.
|
| The nonstarter for this ended up being that most
| collections APIs violate Liskov, so you almost need to
| write your own language to pick a decomposition that
| doesn't. Variance semantics help a ton but they don't
| quite fix LSP.
| mikepurvis wrote:
| I think I landed in a place where it's basically "the
| compiler has insufficient information to achieve ideal
| optimization because some things can only be known at
| runtime."
|
| Which is not exclusively an argument for runtime JIT-- it
| can also be an argument for instrumenting your runtime
| environment, and feeding that profiling data back to the
| compiler to help it make smarter decisions the next time.
| But that's definitely a more involved process than just
| baking it into the same JavaScript interpreter used by
| everyone-- likely well worth it in the case of things
| like game engines, though.
| masklinn wrote:
| It's also an argument for having much more expressive and
| precise type systems, so the compiler has better
| information.
|
| Once you've managed to debug the codegen anyway (see: The
| Long and Arduous Story of Noalias).
| mikepurvis wrote:
| Is it? I'd love to see a breakdown of what classes of
| information can be gleaned from profile data, and how
| much of an impact each one has in isolation in terms of
| optimization.
|
| Naively, I would have assumed that branch information
| would be most valuable, in terms of being able to guide
| execution toward the hot path and maximize locality for
| the memory accesses occurring on the common branches. And
| that info is not something that would be assisted by more
| expressive types, I don't think.
| titzer wrote:
| Darn it, replied too early. See sibling comment I just
| posted. The problem with dynamic languages is that you
| need to speculate and be ready to undo that speculation.
| notriddle wrote:
| https://tomaszs2.medium.com/how-
| rust-1-64-became-10-20-faste...
|
| https://news.ycombinator.com/item?id=33306945
| bluGill wrote:
| The problem with JIT is not all information known at
| runtime is the correct information to optimize one.
|
| In finance the performance critical code path is often
| the one run least often. That is you have a
| if(unlikely_condition) {run_time_sensitive_trade();}. In
| this case you need to tell the compiler to ensure the CPU
| will have a pipeline stall because of a branch
| misprediction most of the time to ensure the time that
| counts the pipeline doesn't stall.
|
| The above is a rare corner case for sure, but it is one
| of those weird exceptions you always need to keep in mind
| when trying to make any blanket rule.
| dahfizz wrote:
| The other issue with JIT is that it is unreliable. It
| optimizes code by making assumptions. If one of the
| assumptions is wrong, you pay a large latency penalty. In
| my field of finance, having reliably low latency is
| important. Being 15% faster on average, but every once in
| a while you will be really slow, is not something
| customers will go for.
| saagarjha wrote:
| I take it you are not very familiar with the website known
| as Hacker News.
| AussieWog93 wrote:
| Outside of gaming, or hyper-CPU-critical workflows like video
| editing, I'm not really sure if people actually even care about
| that last 10% of performance.
|
| I know most of the time I get frustrated by everyday software,
| its doing something unnecessary in a long loop, and possibly
| forgetting to check for Windows messages too.
| koala_man wrote:
| Performance also translates into better battery life and
| cheaper datacenters.
| hamstergene wrote:
| Could it be simply because many binaries were produced by much
| older, outdated optimizers. Or optimized for size.
|
| Also, optimizers usually target "most common denominator" so
| native binaries rarely use full power of current instruction
| set.
|
| Jumping from that peculiar finding to praising runtime JIT
| feels like a longshot. To me it's more of an argument towards
| distributing software in intermediate form (like Apple Bitcode)
| and compiling on install, tailoring for the current processor.
| jasonwatkinspdx wrote:
| All reasonable points, but examples where JIT has an
| advantage are well supported in research literature. The
| typical workload that shows this is something with a very
| large space of conditionals, but where at runtime there's a
| lot of locality, eg matching and classification engines.
| AceJohnny2 wrote:
| > _Or optimized for size._
|
| Note that on gcc (I think) and clang (I'm sure), -Oz is a
| strict superset of -O2 (the "fast+safe" optimizations,
| compared to -O3 that can be a bit too aggressive, given C's
| minefield of Undefined Behavior that compilers can exploit).
|
| I'd guess that, with cache fit considerations, -Oz can even
| be faster than -O2.
| astrange wrote:
| > To me it's more of an argument towards distributing
| software in intermediate form (like Apple Bitcode) and
| compiling on install, tailoring for the current processor.
|
| This turns out to be quite difficult, especially if you're
| using bitcode as a compiler IL. You have to know what the
| right "intermediate" level is; if assumptions change too much
| under you then it's still too specific. And it means you
| can't use things like inline assembly.
|
| That's why bitcode is dead now.
|
| By the way, I don't know why this thread is about how JITs
| can optimize programs when this article is about how Rosetta
| is not a JIT and intentionally chose a design that can't
| optimize programs.
| lmm wrote:
| > This turns out to be quite difficult, especially if
| you're using bitcode as a compiler IL. You have to know
| what the right "intermediate" level is; if assumptions
| change too much under you then it's still too specific. And
| it means you can't use things like inline assembly.
|
| > That's why bitcode is dead now.
|
| Isn't this what Android does today? Applications are
| distributed in bytecode form and then optimized for the
| specific processor at install time.
| chrisseaton wrote:
| I've run Ruby C extensions on a JIT faster than on native, due
| to things like inlining and profiling working more effectively
| at runtime.
| jeffbee wrote:
| Post-build optimization of binaries without changing the target
| CPU is common. See BOLT
| https://github.com/facebookincubator/BOLT
| mark_undoio wrote:
| Something that fascinates me about this kind of A -> A
| translation (which I associate with the original HP Dynamo
| project on HPPA CPUs) is that it was able to effectively yield
| the performance effect of one or two increased levels of -O
| optimization flag.
|
| Right now it's fairly common in software development to have a
| debug build and a release build with potentially different
| optimisation levels. So that's two builds to manage - if we
| could build with lower optimisation and still effectively run
| at higher levels then that's a whole load of build/test
| simplification.
|
| Moreover, debugging optimised binaries is fiddly due to
| information that's discarded. Having the original, unoptimised,
| version available at all times would give back the fidelity
| when required (e.g. debugging problems in the field).
|
| Java effectively lives in this world already as it can use high
| optimisation and then fall back to interpreted mode when
| debugging is needed. I wish we could have this for C/C++ and
| other native languages.
| foobiekr wrote:
| One of the engineers I was working with on a project was from
| Transitive (the company that made QuickTransit which became
| Rosetta) found that their JIT based translator could not
| deliver significant performance increases for A->A outside of
| pathological cases, and it was very mature technology at the
| time.
|
| I think it's a hypothetical. The Mill Computing lectures talk
| about a variant of this, which is sort of equivalent to an
| install-time specializer for intermediate code which might
| work, but that has many problems (for one thing, it breaks
| upgrades and is very, very problematic for VMs being run on
| different underlying hosts).
| saagarjha wrote:
| It depends greatly on which optimization levels you're going
| through. --O0 to -O1 can easily be a 2-3x performance
| improvement, which is going to be hard to get otherwise. -O2
| to -O3 might be 15% if you're lucky, in which case -O+LTO+PGO
| can absolutely get you wins that beat that.
| bluGill wrote:
| -O2 to -O3 has in some benchmarks made things worse. In
| others it is a massive win, but in generally going above
| -O2 should not be done without bench marking code. There
| are some optimizations that can make things worse or better
| for reasons that compiler cannot know.
| astrange wrote:
| Over-optimizing your "cold" code can also make things
| worse for the "hot" code, eg by growing code size so much
| that briefly entering the cold space kicks everything out
| of caches.
| hinkley wrote:
| I have often lamented not being able to hint to the JIT
| when I've transitioned from startup code to normal
| operation. I don't need my Config file parsing optimized.
| But the code for interrogating the Config at runtime
| better be.
|
| Everything before listen() is probably run once. Except
| not ever program calls listen().
| hinkley wrote:
| And then there's always the outlier where optimizing for
| size makes the working memory fit into cache and thus the
| whole thing substantially faster.
| freedomben wrote:
| If JIT-ing a statically compiled input makes it faster, does
| that mean that JIT-ing itself is superior or does it mean that
| the static compiler isn't outputting optimal code? (real
| question. asked another way, does JIT have optimizations it can
| make that a static compiler can't?)
| vips7L wrote:
| Yes, the JIT has more profile guided data as to what your
| program actually does at runtime, therefore it can optimize
| better.
| gpderetta wrote:
| On the other hand some optimization are so expensive that a
| JIT just doesn't have the execution budget to perform them.
|
| Probably the optimal system is an hybrid iterative JIT/AOT
| compiler (which incidentally was the original objective of
| LLVM).
| mockery wrote:
| In addition to the sibling comments, one simple opportunity
| available to a JIT and not AOT is 100% confidence about the
| target hardware and its capabilities.
|
| For example AOT compilation often has to account for the
| possibility that the target machine might not have certain
| instructions - like SSE/AVX vector ops, and emit both SSE and
| non-SSE versions of a codepath with, say, a branch to pick
| the appropriate one dynamically.
|
| Whereas a JIT knows what hardware it's running on - it
| doesn't have to worry about any other CPUs.
| duped wrote:
| AOT compilers support this through a technique called
| function multi-versioning. It's not free and only goes so
| far, but it isn't reserved to JITs.
|
| The classical reason to use FMV is for SIMD optimizations,
| fwiw
| acdha wrote:
| One great example of this was back in the P4 era where
| Intel hit higher clock speeds at the expense of much higher
| latency. If you made a binary for just that processor a
| smart compiler could use the usual tricks to hit very good
| performance, but that came at the expense of other
| processors and/or compatibility (one appeal to the AMD
| Athlon & especially Opteron was that you could just run the
| same binary faster without caring about any of that[1]). A
| smart JIT could smooth that considerably but at the time
| the memory & time constraints were a challenge.
|
| 1. The usual caveats about benchmarking what you care about
| apply, of course. The mix of webish things I worked on and
| scientists I supported followed this pattern, YMMV.
| andrewaylett wrote:
| It depends on what the JIT does exactly, but in general _yes_
| a JIT _may_ be able to make optimisations that a static
| compiler won 't be aware of because a JIT can optimise for
| the specific data being processed.
|
| That said, a sufficiently advanced CPU could also make those
| optimisations on "static" code. That was one of the things
| Transmeta had been aiming towards, I think.
| kmeisthax wrote:
| It's more the case that the ahead-of-time compilation is
| suboptimal.
|
| Modern compilers have a thing called PGO (Profile Guided
| Optimization) that lets you take a compiled application, run
| it and generate an execution profile for it, and then compile
| the application again using information from the profiling
| step. The reason why this works is that lots of optimization
| involves time-space tradeoffs that only make sense to do if
| the code is frequently called. JIT _only_ runs on frequently-
| called code, so it has the advantage of runtime profiling
| information, while ahead-of-time (AOT) compilers have to make
| educated guesses about what loops are the most hot. PGO
| closes that gap.
|
| Theoretically, a JIT _could_ produce binary code hyper-
| tailored to a particular user 's habits and their computer's
| specific hardware. However, I'm not sure if that has that
| much of a benefit versus PGO AOT.
| com2kid wrote:
| > Theoretically, a JIT could produce binary code hyper-
| tailored to a particular user's habits and their computer's
| specific hardware. However, I'm not sure if that has that
| much of a benefit versus PGO AOT.
|
| In theory JIT can be a _lot_ more efficient, optimizing for
| not only the exact instruction set, and do per CPU
| architecture optimizations, such as instruction length,
| pipeline depth, cache sizes, etc.
|
| In reality I doubt most compiler or JIT development teams
| have the resources to write and test all those potential
| optimizations, especially as new CPUs are coming out all
| the time, and each set of optimizations is another set of
| tests that has to be maintained.
| bluGill wrote:
| gcc and clang at least have options so you can optimize
| for specific CPUs. I'm not sure how good they are (most
| people want a generic optimization that runs well on all
| CPUs of the family, so there likely is lots of room for
| improvement with CPU specific optimization), but they can
| do that. This does (or at least can, again it probably
| isn't fully implemented), account for instruction length,
| pipeline depth, cache size.
|
| The Javascript V8 engine, and the JVM both are popular
| and supported enough that I expect the teams working on
| them take advantage of every trick they can for specific
| CPUs, they have a lot of resources for this. (at least
| the major x86 and ARM chips - maybe they don't for MIPS
| or some uncommon variant of ARM...). Of courses there are
| other JIT engines, some uncommon ones don't have many
| resources and won't do this.
| titzer wrote:
| > take advantage of every trick they can for specific
| CPUs
|
| Not to the extent clang and gcc do, no. V8 does, e.g. use
| AVX instructions and some others if they are indicated to
| be available by CPUID. TurboFan does global scheduling in
| moving out of the sea of nodes, but that is not machine-
| specific. There was an experimental local instruction
| scheduler for TurboFan but it never really helped big
| cores, while measurements showed it would have helped
| smaller cores. It didn't actually calculate latencies; it
| just used a greedy heuristic. I am not sure if it was
| ever turned on. TurboFan doesn't do software pipelining
| or unroll/jam, though it does loop peeling, which isn't
| CPU-specific.
| astrange wrote:
| > gcc and clang at least have options so you can optimize
| for specific CPUs. I'm not sure how good they are
|
| They are not very good at it, and can't be. You can look
| inside them and see the models are pretty simple; the
| best you can do is optimize for the first step (decoder)
| of the CPU and avoid instructions called out in the
| optimization manual as being especially slow. But on an
| OoO CPU there's not much else you can do ahead of time,
| since branches and memory accesses are unpredictable and
| much slower than in-CPU resource stalls.
| duped wrote:
| Like another commented, JIT compilers do this today.
|
| The thing that makes this mostly theoretical is that the
| underlying assumption is only true when you neglect that
| an AOT has zero run-time cost while a JIT compiler has to
| execute the code it's optimizing _and_ the code to decide
| if it 's worth optimizing and generate new code.
|
| So JIT compiler optimizations are a bit different than
| AOT optimizations since they have to both generate
| faster/smaller code _and_ the execute code that performs
| the optimization. The problem is that most optimizations
| beyond peephole are quite expensive.
|
| There's another thing that AOT compilers don't need to
| deal with, which is being wrong.Production JITs have to
| implement dynamic de-optimization in the case that an
| optimization was built on a bad assumption.
|
| That's why JITs are only faster in theory (today), since
| there are performance pitfalls in the JIT itself.
| titzer wrote:
| Nearly all JS engines are doing concurrent JIT
| compilation now, so some of the compilation cost is moved
| off the main thread. Java JITs have had multiple compiler
| threads for more than a decade.
| saagarjha wrote:
| The well funded production JIT compilers (HotSpot, V8,
| etc.) absolutely do take advantage of these. The vector
| ISA can sometimes be unwieldy to work with but things
| like replacing atomics, using unaligned loads, or taking
| advantage of differing pointer representations is common.
| com2kid wrote:
| They do some auto-vectorization, but AFAIK they don't do
| micro-optimizations for different CPUs.
| rowanG077 wrote:
| A JIT can definitely make optimizations that a static
| compiler can't. Simply by virtue of it having concrete
| dynamic real-time information.
| ketralnis wrote:
| It means that in this case, the static compiler emitted code
| that could be further optimised, that's all. It doesn't mean
| that that's always the case, or that static compilers _can
| 't_ produce optimal code, or that either technique is
| "better" than the other.
|
| An easy example is code compiled for 386 running on a 586.
| The A->A compiler can use CPU features that weren't available
| to the 386. As with PGO you have branch prediction
| information that's not available to the static compiler. You
| can statically compile the dynamically linked dependencies,
| allowing inlining that wasn't previously available.
|
| On the other hand you have to do all of that. That takes
| warmup time just like a JIT.
|
| I think the road to enlightenment is letting go of phrasing
| like "is superior". There are lots of upsides and downsides
| to pretty much every technique.
| sergimas15 wrote:
| nice
| hawflakes wrote:
| People have mentioned the Dynamo project from HP. But I think
| you're actually thinking of the Aries project (I worked in a
| directly adjacent project) that allowed you to run PA-RISC
| binaries on IA-64.
|
| https://nixdoc.net/man-pages/HP-UX/man5/Aries.5.html
| dynjo wrote:
| It is quite astonishing how seamless Apple has managed to make
| the Intel to ARM transition, there are some seriously smart minds
| behind Rosetta. I honestly don't think I had a single software
| issue during the transition!
| wombat-man wrote:
| There's an annoying dwarf fortress bug but other than that,
| same
| xxpor wrote:
| They've almost made it too good. I have to run software that
| ships an x86 version of CPython, and it just deeply offends me
| on a personal level, even though I can't actually detect any
| slowdown (probably because lol python in the first place)
| ChuckNorris89 wrote:
| If that blows your mind, you should see how Microsoft did the
| emulation of the PowerPC based Xeon chip to X86 so you can play
| Xbox 360 games on Xbox One.
|
| There's an old pdf from Microsoft researchers with the details
| but I can't seem to find it right now.
| RedShift1 wrote:
| Any good videos on that?
| poulpy123 wrote:
| having total control on the hardware and the software didn't
| hurt for sure
| manv1 wrote:
| Qualcomm (and Broadcomm) has total control on the hardware
| and software side of a lot of stuff and their stuff is shit.
|
| It's not about control, it's about good engineering.
| stevefan1999 wrote:
| It's about both control and engineering in Apple's case.
| porcc wrote:
| So many parts across the stack need to work well for this
| to go well. Early support for popular software is a good
| example. This goes from partnerships all the way down to
| hardware designers.
|
| I'd argue it's not about engineering more than it is about
| good organizational structure.
| iamstupidsimple wrote:
| And having execs who design the organizational structure
| around those goals is part of what makes good engineering
| :)
| zeusk wrote:
| That's really not the case, if you're in Microsoft or
| Linux's position you can't really change the OS
| architecture or driver models for any particular vendor.
|
| That generality and general knowledge separation between
| different stacks leaves quite a lot of efficiency on the
| table.
| esskay wrote:
| It has been extremely smooth sailing. I moved my own mac over
| to it about a year ago, swapping a beefed up MPB for a budget
| friendly M1 Air (which has massively smashed it out the park
| performance wise, far better than I was expecting). Didn't have
| a single issue.
|
| My work mac was upgraded to a MBP M1 Pro and again, very
| smooth. I had one minor issue with a docker container not being
| happy (it was an x86 instance) but one minor tweak to the
| docker compose file and I was done.
|
| It does still amaze me how good these new machines are. Its
| almost enough to redeem apple for the total pile of
| overheating, underperforming crap that came directly before the
| transition (aka any mac with a touchbar).
| js2 wrote:
| I have a single counter-example. Mailplane, a Gmail SSB. It's
| Intel including its JS engine, making the Gmail UI too sluggish
| to use.
|
| I've fallen back to using Fluid, an ancient and also Intel-
| specific SSB, but its web content runs in a separate WebKit ARM
| process so it's plenty fast.
|
| I've emailed the Mailplane author but they wont release an
| Universal version of the app since they've EOL'd Mailplane.
|
| I have yet to find a Gmail SSB that I'm happy with under ARM.
| Fluid is a barely workable solution.
| cmg wrote:
| For what it's worth, I use Mailplane on an M1 MacBook Air
| (8GB) with 2 Gmail tabs and a calendar tab without noticeable
| issues.
|
| Unfortunately the developers weren't able to get Google to
| work with them on a policy change that impacted the app [0]
| [1] and so gave up and have moved on to a new and completely
| different customer support service.
|
| [0] https://developers.googleblog.com/2020/08/guidance-for-
| our-e... [1] https://mailplaneapp.com/blog/entry/mailplane_st
| opped_sellin...
|
| So unfortunately
| perardi wrote:
| I think the end of support for 32-bit applications in 2019
| helped, slightly, with the run-up.
|
| Assuming you weren't already shipping 64-bit
| applications...which would be weird...updating the application
| probably required getting everything into a contemporary
| version of Xcode, cleaning out the cruft, and getting it
| compiling nice and cleanly. After that, the ARM transition was
| kind of a "it just works" scenario.
|
| Now, I'm sure Adobe and other high-performance application
| developers had to do some architecture-specific tweaks, but,
| gotta think Apple clued them in ahead of time as to what was
| coming.
| chrchang523 wrote:
| I finally started seriously using a M1 work laptop yesterday,
| and I'm impressed. More than twice as fast on a compute-
| intensive job as my personal 2015 MBP, with a binary compiled
| for x86 and with hand-coded SIMD instructions.
| robohoe wrote:
| Are you me lol? I'm on my third day on M1 Pro. Battery life
| is nuts. I can be on video calls and still do dev work
| without worrying about charging. And the thing runs cool!
| dexterdog wrote:
| It helps that there were almost 2 years between the release
| and your adoption. I had a very early M1 and it was not too
| bad, but there were issues. I knew that going in.
| EricE wrote:
| I had an M1 Air early on and I didn't run into any issues.
| Even the issues with apps like Homebrew were resolved
| within 3-4 months of the M1 debut. It's amazing just how
| seamless such a major architectural transition it was and
| continues to be!
| radicaldreamer wrote:
| Since this is the company's third big arch transition, cross-
| compilation and compatibility is probably considered a core
| competency for Apple to maintain internally.
| mixmastamyk wrote:
| And Next was multi-platform as well.
| AnIdiotOnTheNet wrote:
| It isn't their first rodeo: 68k->PPC->x86_64->ARM.
| darzu wrote:
| You gotta think there's been a lot of churn and lost
| knowledge at the company between PPC->x86_64 (2006) and now
| though.
| esskay wrote:
| Rosetta 1 and the PPC -> x86 move wasn't anywhere near as
| smooth, I recall countless problems with that switch. Rosetta
| 2 is a totally different experience, and so much better in
| every way.
| kevincox wrote:
| But they've been on x84_64 for a _long_ time. How much of
| that knowledge is still around? Probably some traces of it
| have been institutionalized but it isn 't the same as if they
| just grabbed the same team and made them do it again a year
| after the least transition.
| toast0 wrote:
| nitpick, they did PPC -> x86 (32), the x86_64 bit transition
| was later (no translation layer though). They actually had
| 64-bit PPC systems on the G5 when they switched to Intel
| 32-bit, but Rosetta only does 32-bit PPC -> 32-bit x86; it
| would have been rare to have released 64-bit PPC only
| software.
| EricE wrote:
| They had 64 bit Carbon translation layer, but spiked it to
| force Adobe and some other large publishers to go native
| Intel. There was a furious uproar at the time, but it
| turned out to be the right decision.
| rgiacobazzi wrote:
| Great article!
___________________________________________________________________
(page generated 2022-11-09 23:00 UTC)