[HN Gopher] Tracking developer build times to decide if the M3 M...
___________________________________________________________________
Tracking developer build times to decide if the M3 MacBook is worth
upgrading
Author : paprikati
Score : 165 points
Date : 2023-12-28 19:41 UTC (1 days ago)
(HTM) web link (incident.io)
(TXT) w3m dump (incident.io)
| lawrjone wrote:
| Author here, thanks for posting!
|
| Lots of stuff in this from profiling Go compilations, building a
| hot-reloader, using AI to analyse the build dataset, etc.
|
| We concluded that it was worth upgrading the M1s to an M3 Pro
| (the max didn't make much of a difference in our tests) but the
| M2s are pretty close to the M3s, so not (for us) worth upgrading.
|
| Happy to answer any questions if people have them.
| BlueToth wrote:
| Hi, Thanks for the interesting comparison. What I would like to
| see added would be a build on a 8GB memory machine (if you have
| one available).
| aranke wrote:
| Hi,
|
| Thanks for the detailed analysis. I'm wondering if you factored
| in the cost of engineering time invested in this analysis, and
| how that affects the payback time (if at all).
|
| Thanks!
| LanzVonL wrote:
| We've found that distributed building has pretty much eliminated
| the need to upgrading developer workstations. Super easy to set
| up, too.
| packetlost wrote:
| Distributed building of _what_? Because for every language the
| answer of whether it 's easy or not is probably different.
| LanzVonL wrote:
| We don't use new-fangled meme languages so everything is very
| well supported.
| lawrjone wrote:
| I'm not sure this would work well for our use case.
|
| The distributed build systems only really benefit from
| aggressively caching the modules that are built, right? But the
| majority of the builds we do are almost fully cached, having
| changed just one module that needs recompiling then the linker
| sticks everything back together, which the machines would then
| need to download from the distributed builder and at 300MB a
| binary that's gonna take a while.
|
| I may have this totally wrong though. Would distributed builds
| actually get us a new binary faster to the local machine?
|
| I suspect we wouldn't want this anyway (lots of our company
| work on the go, train WiFi wouldn't cut it for this!) but
| interested nonetheless.
| dist-epoch wrote:
| > The distributed build systems only really benefit from
| aggressively caching the modules that are built, right
|
| Not really, you have more cores to build on. Significant
| difference for slow to compile languages like C++.
|
| > I may have this totally wrong though. Would distributed
| builds actually get us a new binary faster to the local
| machine?
|
| Yes, again, for C++.
| closeparen wrote:
| A Macbook-equivalent AWS instance prices to at least the cost
| of a Macbook per year.
| lawrjone wrote:
| Yes I actually did the maths on this.
|
| If you want a GCP instance that is comparable to an M3 Pro
| 36GB, you're looking at an n2-standard-8 with a 1TB SSD,
| which comes out at $400/month.
|
| Assuming you have it running just 8 hours a day (if your
| developers clock in at exact times) and you can 1/3 that to
| make it $133/month, or $1600/year.
|
| We expect these MacBooks to have at least a 2 year life,
| which means you're comparing the cost of the MacBook to 2
| years of running the VM for 8 hours a day, which means $2800
| vs $3200, so the MacBook still comes in $400 cheaper over
| it's lifetime.
|
| And the kicker is you still need to buy people laptops so
| they can connect to the build machine, and you can no longer
| work if you have bad internet connection. So for us the
| trade-off doesn't work whichever way you cut it.
| throwaway892238 wrote:
| 1. With a savings plan or on-demand? 2. Keeping one
| instance on per developer indefinitely, or only when
| needed? 3. Shared nodes? Node pools? 4.
| Compared to what other instance types/sizes? 5. Spot
| pricing?
|
| Shared nodes brought up on-demand with a savings plan and
| spot pricing is the same cost if not cheaper than dedicated
| high-end laptops. And on top of that, they can actually
| scale their resources much higher than a laptop can, and do
| distributed compute/test/etc, and match production. And
| with a remote dev environment, you can easily fix issues
| with onboarding where different people end up with
| different setups, miss steps, need their tooling re-
| installed or to match versions, etc.
| lawrjone wrote:
| 1. That was assuming 8 hours of regular usage a day that
| has GCP's sustained use discounts applied, though not the
| committed usage discounts you can negotiate (but this is
| hard if you don't want 24/7 usage).
|
| 2. The issue with only-when-needed is the cold-start time
| starts hurting you in ways we're trying to pay to avoid
| (we want <30s feedback loops if possible) as would
| putting several developers on the same machine.
|
| 3. Shared as in cloud multi-tenant? Sure, we wouldn't be
| buying the exclusive rack for this.
|
| 4. n2-standard-8 felt comparable.
|
| 5. Not considered.
|
| If it's interesting, we run a build machine for when
| developers push their code into a PR and we build a
| binary/container as a deployable artifact. We have one
| machine running a c3-highcpu-22 which is 22 CPUs and 44GB
| memory.
|
| Even at the lower frequency of pushes to master the build
| latency spikes a lot on this machine when developers push
| separate builds simultaneously, so I'd expect we'd need a
| fair bit more capacity in a distributed build system to
| make the local builds (probably 5-10x as frequent) behave
| nicely.
| mgaunard wrote:
| Anything cloud is 3 to 10 times the price of just buying
| equivalent hardware.
| boricj wrote:
| At one of my former jobs, some members of our dev team (myself
| included) had manager-spec laptops. They were just good enough
| to develop and run the product on, but fairly anemic overall.
|
| While I had no power over changing the laptops, I was co-
| administrator of the dev datacenter located 20 meters away and
| we had our own budget for it. Long story short, that dev
| datacenter soon had a new, very beefy server dedicated for CI
| jobs "and extras".
|
| One of said extras was providing Docker containers to the team
| for running the product during development, which also happened
| to be perfectly suitable for remote development.
| vessenes wrote:
| The upshot: M3 Pro is slightly better than M2 and significantly
| better than M1 Pro is what I've experienced with running local
| LLMs on my Macs; currently M3 memory bandwidth options are lower
| than for M2, and that may be hampering the total performance.
|
| Performance per watt and rendering performance are both better in
| the M3, but I ultimately decided to wait for an M3 Ultra with
| more memory bandwidth before upgrading my daily driver M1 Max.
| lawrjone wrote:
| This is pretty much aligned with our findings (am the author of
| this post).
|
| I came away feeling that:
|
| - M1 is a solid baseline
|
| - M2 improves performance by about 60% - M3 Pro is marginal on
| the M2, more like 10%
|
| - M3 Max (for our use case) didn't seem that much different on
| the M3 Pro, though we had less data on this than other models
|
| I suspect Apple saw the M3 Pro as "maintain performance and
| improve efficiency" which is consistent with the reduction in
| P-cores from the M2.
|
| The bit I'm interested about is that you say the M3 Pro is only
| a bit better than the M2 at LLM work, as I'd assumed there were
| improvements in the AI processing hardware between the M2 and
| M3. Not that we tested that, but I would've guessed it.
| vessenes wrote:
| Yeah, agreed. I'll say I do use the M3 Max for Baldur's gate
| :).
|
| On LLMs, the issue is largely that memory bandwidth: M2 Ultra
| is 800GB/s, M3 Max is 400GB/s. Inference on larger models are
| simple math on what's in memory, so the performance is
| roughly double. Probably perf / watt suffers a little, but
| when you're trying to chew through 128GB of RAM and do math
| on all of it, you're generally maxing your thermal budget.
|
| Also, note that it's absolutely incredible how cheap it is to
| run a model on an M2 Ultra vs an H100 -- Apple's integrated
| system memory makes a lot possible at much lower price
| points.
| lawrjone wrote:
| Ahh right, I'd seen a few comments about the memory
| bandwidth when it was posted on LinkedIn, specifically that
| the M2 was much more powerful.
|
| This makes a load of sense, thanks for explaining.
| Aurornis wrote:
| > - M2 improves performance by about 60%
|
| This is the most shocking part of the article for me since
| the difference between M1 and M2 build times has been more
| marginal in my experience.
|
| Are you sure the people with M1 and M2 machines were really
| doing similar work (and builds)? Is there a possibility that
| the non-random assignment of laptops (employees received M1,
| M2, or M3 based on when they were hired) is showing up in the
| results as different cohorts aren't working on identical
| problems?
| lawrjone wrote:
| The build events track the files that were changed that
| triggered the build, along with a load of other stats such
| as free memory, whether docker was running, etc.
|
| I took a selection of builds that were triggered by the
| same code module (one that frequently changes to provide
| enough data) and compared models on just that, finding the
| same results.
|
| This feels as close as you could get for an apples-to-
| apples comparison, so I'm quite confident these figures are
| (within statistical bounds of the dataset) correct!
| sokoloff wrote:
| > apples-to-apples comparison
|
| No pun intended. :)
| Erratic6576 wrote:
| Importing a couple thousand RAW pictures into a Capture One
| library would take 2 h on my 2017 iMac.
|
| 5 min on my m3 mbp pro.
|
| Geekbench score differences were quite remarkable.
|
| I am still wondering if I should return it, though
| lawrjone wrote:
| Go on, I'll bite: why?
| ac2u wrote:
| They miss the 2 hours procrastination time. It's a version of
| "code's compiling" :)
| Erratic6576 wrote:
| Ha ha ha. You can leave it overnight and importing files is
| a 1 time process so not much to win
| teaearlgraycold wrote:
| The foam swords are collecting dust.
| Erratic6576 wrote:
| 2,356 EUR is way over my budget. The machine is amazing but
| the specs are stingy. Returning it and getting a cheaper one
| would give me a lot of disposable money to spend in
| restaurants
| tomaskafka wrote:
| Get a 10-core M1 Pro then - I got mine for about 1200 eur
| used (basically undistinguishable from new), and the
| difference (except GPU) is very small.
| https://news.ycombinator.com/item?id=38810228
| kingTug wrote:
| Does anyone have any anecdoctal evidence around the snappiness of
| VsCode with Apple Silicon? I very begrudgingly switched over from
| SublimeText this year (after using it as my daily driver for
| ~10yrs). I have a beefy 2018 MBP but VScode just drags. This is
| the only thing pushing me to upgrade my machine right now but I'd
| be bummed if there's still not a significant improvement with an
| m3 pro.
| lawrjone wrote:
| If you're using an Intel Mac at this point, you should 100%
| upgrade. The performance of the MX chips blows away the Intel
| chips and there's almost no friction with the arm architecture
| at this point.
|
| I don't use VSCode but most of my team do and I frequently pair
| with them. Never noticed it to be anything other than very
| snappy. They all have M1s or up (I am the author of this post,
| so the detail about their hardware is in the link).
| hsbauauvhabzb wrote:
| There can be plenty of friction depending on your use case.
| whalesalad wrote:
| I have 2x intel macbook pro's that are honestly paperweights.
| Apple Silicon is infinitely faster.
|
| It's a bummer because one of them is also a 2018 fully loaded
| and I would have a hard time even selling it to someone because
| of how much better the M2/M3 is. It's wild when I see people
| building hackintoshes on like a Thinkpad T480 ... its like
| riding a pennyfarthing bicycle versus a ducati.
|
| My M2 Air is my favorite laptop of all time. Keyboard is
| finally back to being epic (esp compared to 2018 era, which I
| had to replace myself and that was NOT fun). It has no fan so
| it never makes noise. I rarely plug it in for AC power. I can
| hack almost all day on it (using remote SSH vscode to my beefy
| workstation) without plugging in. The other night I worked for
| 4 hours straight refactoring a ton of vue components and it
| went from 100% battery to 91% battery.
| ghaff wrote:
| That assumes you only use one laptop. I have a couple 2015
| Macs that are very useful for browser tasks. They're not
| paperweights and I use them daily.
| whalesalad wrote:
| I have a rack in my basement with a combined 96 cores and
| 192gb of ram (proxmox cluster), and a 13900k/64gb desktop
| workstation for most dev work. I usually will offload
| workloads to those before leveraging one of these old
| laptops that is usually dead battery. If I need something
| for "browser tasks" (I am interpreting this as cross-
| browser testing?) I have dedicated VMs for that. For just
| browsing the web, my M2 is still king as it has zero fan,
| makes no noise, and will last for days without charging if
| you are just browsing the web or writing documentation.
|
| I would rather have a ton of beefy compute that is remotely
| accessible and one single lightweight super portable
| laptop, personally.
|
| I should probably donate these mac laptops to someone who
| is less fortunate. I would love to do that, actually.
| xp84 wrote:
| > should donate
|
| Indeed. I keep around a 2015 MBP with 16GB (asked my old
| job's IT if I could just keep it when I left since it had
| already been replaced and wouldn't ever be redeployed) to
| supplement my Mac Mini which is my personal main
| computer. I sometimes use screen sharing, but mostly when
| I use the 2015 it's just a web browsing task. With
| adblocking enabled, it's 100% up to the task even with a
| bunch of tabs.
|
| Given probably 80% of people probably use webapps for
| nearly everything, there's a huge amount of life left in
| a late-stage Intel Mac for people who will never engage
| in the types of tasks I used to find sluggish on my 2015
| (very large Excel sheet calculations and various kinds of
| frontend code transpilation). Heck, even that stuff ran
| amazingly better on my 16" 2019 Intel MBP, so I'd assume
| for web browsing your old Macs will be amazing for
| someone in need, assuming they don't have bad keyboards.
| fragmede wrote:
| Your 5 year old computer is, well, 5 years old. It was once
| beefy but that's technology for you.
| orenlindsey wrote:
| VSCode works perfectly.
| baq wrote:
| I've got a 12700k desktop with windows and an M1 macbook (not
| pro!) and my pandas notebooks run _noticeably_ faster on the
| mac unless I 'm able to max out all cores on the Intel chip
| (this is after, ahem, _fixing_ the idiotic scheduler which
| would put the background python on E-cores.)
|
| I couldn't believe it.
|
| Absolutely get an apple silicon machine, no contest the best
| hardware on the market right now.
| kimixa wrote:
| The 2018 macbook pros weren't even using the best silicon of
| the time - they were in the middle of Intel's "14nm skylake
| again" period, and an AMD GPU from 2016.
|
| I suspect one of the reasons why Apple silicon looks _so_ good
| is the previous generations were at a dip of performance. Maybe
| they took the foot off the gas WRT updates as they _knew_ the M
| series of chips was coming soon?
| doublepg23 wrote:
| My theory is Apple bought Intel's timeline as much as anyone
| and Intel just didn't deliver.
| eyelidlessness wrote:
| On my 2019 MBP, I found VSCode performance poor enough to be
| annoying on a regular basis, enough so that I would frequently
| defer restarting it or my machine to avoid the lengthy
| interruption. Doing basically anything significant would have
| the fans running full blast pretty much constantly.
|
| On my M2 Max, all of that is ~fully resolved. There is still
| some slight lag, and I have to figure it's just the Electron
| tax, but never enough to really bother me, certainly not enough
| to defer restarting anything. And I can count the times I've
| even heard the fans on one hand... and even so, never for more
| than a few seconds (though each time has been a little
| alarming, just because it's now so rare).
| aragonite wrote:
| It depends on what specifically you find slow about VSCode. In
| my experience, some aspects of VSCode feel less responsive than
| Sublime simply due to intentional design choices. For example,
| VSCode's goto files and project symbol search is definitely not
| as snappy as Sublime's. But this difference is due to VSCode's
| choice to use debouncing (search is triggered after typing has
| stopped) as opposed to throttling (restricts function execution
| to a set time interval).
| tmpfile wrote:
| If you find your compiles are slow, I found a bug in vscode
| where builds would compile significantly faster when the status
| bar and panel are hidden. Compiles that took 20s would take 4s
| with those panels hidden.
|
| https://github.com/microsoft/vscode/issues/160118
| mattgreenrocks wrote:
| VSCode is noticeably laggy on my 2019 MBP 16in to the point
| that I dislike using it. Discrete GPU helps, but it still feels
| dog slow.
| throwaway892238 wrote:
| MacBooks are a waste of money. You can be just as productive with
| a machine just as fast for 1/2 the price that doesn't include the
| Apple Tax.
|
| Moreover, if your whole stack (plus your test suite) doesn't fit
| in memory, what's the point of buying an extremely expensive
| laptop? Not to mention constantly replacing them just because a
| newer, shinier model is released? If you're just going to test
| one small service, that shouldn't require the fastest MacBook.
|
| To test an entire product suite - especially one that has high
| demands on CPU and RAM, and a large test suite - it's much more
| efficient and cost effective to have a small set of remote
| servers to run everything on. It's also great for keeping dev and
| prod in parity.
|
| Businesses buy MacBooks not because they're necessary, but
| because developers just want shiny toys. They're status symbols.
| cedws wrote:
| It's OK to just not like Apple. You don't have to justify your
| own feelings with pejoratives towards other peoples' choice of
| laptop.
| boringuser2 wrote:
| You really need to learn what a"pejorative" is before using
| the term publicly.
| swader999 wrote:
| My main metrics are 1) does the fan turn on, 2) does it respond
| faster than I think and move? Can't be any happier with the M2 at
| top end specs. It's an amazing silent beast.
| LispSporks22 wrote:
| I wish I needed a fast computer. It's the CI/CD that's killing
| me. All this cloud stuff we use - can't test anything locally
| anymore. Can't use the debugger. I'm back to glorified fmt.Printf
| statements that hopefully have enough context that the 40 min
| build/deploy time was worth it. At least it's differential
| -\\_(tsu)_/- All I can say is "I compiles... I think?" The unit
| tests are mostly worthless and the setup for sending something to
| a lambda feels like JCL boiler plate masturbation from that z/OS
| course I took out of curiosity last year. I only typing this out
| because I just restarted CI/CD to redeploy what I already pushed
| because even that's janky. Huh, it's an M3 they gave me.
| lawrjone wrote:
| Yeah everything you just said is exactly why we care so much
| about a great local environment. I've not seen remote tools
| approach the speed/ease/flexibility you can get from a fast
| local machine yet, and it makes a huge difference when
| developing.
| LispSporks22 wrote:
| In the back of my mind I'm worried that our competitors have
| a faster software development cycle.
| orenlindsey wrote:
| This is pretty cool, also I love how you can use AI to read the
| data. Would take minutes if not hours to do it even just a year
| ago.
| lawrjone wrote:
| Yeah, I thought it was really cool! (am author)
|
| It's pretty cool how it works, too: the OpenAI Assistant uses
| the LLM to take your human instructions like "how many builds
| is in the dataset?" and translate that into Python code which
| is run in a sandbox on OpenAI compute with access to the
| dataset you've uploaded.
|
| Under the hood everything is just numpy, pandas and gnuplot,
| you're just using a human interface to a Python interpreter.
|
| We've been building an AI feature into our product recently
| that behaves like this and it's crazy how good it can get. I've
| done a lot of data analysis in my past and using these tools
| blew me away, it's so much easier to jump into complex analysis
| without tedious setup.
|
| And a tip I figured out halfway through: if you want to, you
| can ask the chat for an iPython notebook of it's calculations.
| So you can 'disable autopilot' and jump into manual if you ever
| want finer control over the analysis it runs. Pretty wild.
| guax wrote:
| I also got surprised about using it for this kind of work. I
| don't have access to copilot and gpt-4 at work but my first
| instinct is to ask, did you double check its numbers?
|
| Knowing how it works now makes more sense that it would make
| less mistakes but I'm still skeptical :P
| tomaskafka wrote:
| My personal research for iOS development, taking the cost into
| consideration, concluded:
|
| - M2 Pro is nice, but the improvement over 10 core (8 perf cores)
| M1 Pro is not that large (136 vs 120 s in Xcode benchmark:
| https://github.com/devMEremenko/XcodeBenchmark)
|
| - M3 Pro is nerfed (only 6 perf cores) to better distinguish and
| sell M3 Max, basically on par with M2 Pro
|
| So, in the end, I got a slightly used 10 core M1 Pro and am very
| happy, having spent less than half of what the base M3 Pro would
| cost, and got 85% of its power (and also, considering that you
| generally need to have at least 33 to 50 % faster CPU to even
| notice the difference :)).
| geniium wrote:
| Basically the Pareto effect in choosing the right cpu vs cost
| mgrandl wrote:
| The M3 Pro being nerfed has been parroted on the Internet since
| the announcement. Practically it's a great choice. It's much
| more efficient than the M2 Pro at slightly better performance.
| That's what I am looking for in a laptop. I don't really have a
| usecase for the memory bandwidth...
| tomaskafka wrote:
| Everyone has a different needs - for me, even M1 Pro has more
| battery life than I use or need, so further efficiency
| differences bring little value.
| dgdosen wrote:
| I picked up an M3Pro/11/14/36GB/1TB to 'test' over the long
| holiday return period to see if I need an M3 Max. For my
| workflow (similar to blog post) - I don't! I'm very happy
| with this machine.
|
| Die shots show the CPU cores take up so little space compared
| to GPUs on both the Pro and Max... I wonder why.
| wlesieutre wrote:
| I don't really have a usecase for even more battery life, so
| I'd rather have it run faster
| lawrjone wrote:
| That's interesting you saw less of an improvement in the M2
| than we saw in this article.
|
| I guess not that surprising given the different compilation
| toolchains though, especially as even with the Go toolchain you
| can see how specific specs lend themselves to different parts
| of the build process (such as the additional memory helping
| linker performance).
|
| You're not the only one to comment that the M3 is weirdly
| capped for performance. Hopefully not something they'll
| continue into the M4+ models.
| tomaskafka wrote:
| That's what Xcode benchmarks seem to say.
|
| Yep, there appears to be no reason for getting M3 Pro instead
| of M2 Pro, but my guess is that after this (unfortunate)
| adjustment, they got the separation they wanted (a clear
| hierarchy of Max > Pro > base chip for both CPU and GPU
| power), and can then improve all three chips by a similar
| amount in the future generations.
| Reason077 wrote:
| > _"Yep, there appears to be no reason for getting M3 Pro
| instead of M2 Pro"_
|
| There is if you care about efficiency / battery life.
| Aurornis wrote:
| My experience was similar: In real world compile times, the M1
| Pro still hangs quite closely to the current laptop M2 and M3
| models. Nothing as significant as the differences in this
| article.
|
| I could depend on the language or project, but in head-to-head
| benchmarks of identical compile commands I didn't see any
| differences this big.
| jim180 wrote:
| I love my M1 MacBook Air for iOS development. One thing, I'd
| like to have from Pro line is the screen, and just the PPI
| part. While 120Hz is a nice thing to have, it won't happen Air
| laptops.
| ramijames wrote:
| I also made this calculation recently and ended up getting an
| M1 Pro with maxed out memory and disk. It was a solid deal and
| it is an amazing computer.
| aschla wrote:
| Side note, I like the casual technical writing style used here,
| with the main points summarized along the way. Easily digestible
| and I can go back and get the details in the main text at any
| point if I want.
| lawrjone wrote:
| Thank you, really appreciate this!
| isthisreallife9 wrote:
| Is this what software development is like in late 2023?
|
| Communicating in emojis as much as words? Speaking to an LLM to
| do basic data aggregation because you don't know how to do it
| yourself?
|
| If you don't know how to do munge data and produce bar charts
| yourself then it's just a small step to getting rid of you and
| let the LLM do everything!
| lawrjone wrote:
| Fwiw I've spent my whole career doing data analysis but the
| ease at which I was able to use OpenAI to help me for this post
| (am author) blew me away.
|
| The fact that I can do this type of analysis is why I
| appreciate it so much. It's one of the reasons I'm convinced AI
| engineering find its way into the average software engineer's
| remit (https://blog.lawrencejones.dev/2023/#ai) because it
| makes this analysis far more accessible than it was before.
|
| I still don't think it'll make devs redundant, though. Things
| the model can't help you with (yet, I guess):
|
| - Providing it with clean data => I had to figure out what data
| to collect, write software to collect it, ship it to a data
| warehouse, clean it, then upload it into the model.
|
| - Knowing what you want to achieve => it can help suggest
| questions to ask, but people who don't know what they want will
| still struggle to get results even from a very helpful
| assistant.
|
| These tools are great though, and one of the main reasons I
| wrote this article was to convince other developers to start
| experimenting with them like this.
| gray_-_wolf wrote:
| > it makes this analysis far more accessible than it was
| before
|
| How does the average engineer verify if the result is
| correct? You claim (and I believe you) to be able to do this
| "by hand", if required. Great, but that likely means you are
| able to catch when LLM makes an mistake. Any ideas on how
| average engineer, without much experience in this area,
| should validate the results?
| lawrjone wrote:
| I mentioned this in a separate comment but it may be worth
| bearing in mind how the AI pipeline works, in that you're
| not pushing all this data into an LLM and asking it to
| produce graphs, which would be prone to some terrible
| errors.
|
| Instead, you're using the LLM to generate Python code that
| runs using normal libraries like Pandas and gnuplot. When
| it makes errors it's usually generating totally the wrong
| graphs rather than inaccurate data, and you can quickly ask
| it "how many X Y Z" and use that to spot check the graphs
| before you proceed.
|
| My initial version of this began in a spreadsheet so it's
| not like you need sophisticated analysis to check this
| stuff. Hope that explains it!
| PaulHoule wrote:
| The medium is the message here, the macbook is just bait.
|
| The pure LLM is not effective on tabular data (so many
| transcripts of ChatGPT apologizing it got a calculation
| wrong.). To be working as well as it seems to work they must be
| loading results into something like a pandas data frame and
| having the agent write and run programs on that data frame, tap
| into stats and charting libraries, etc.
|
| I'd trust it more if they showed more of the steps.
| lawrjone wrote:
| Author here!
|
| We're using the new OpenAI assistants with the code
| interpreter feature, which allows you to ask questions of the
| model and have OpenAI turn those into python code that they
| run on their infra and pipe the output back into the model
| chat.
|
| It's really impressive and removes need for you to ask it for
| code and then run that locally. This is what powers many of
| the data analysis product features that are appearing
| recently (we're building one ourselves for our incident data
| and it works pretty great!)
| gumballindie wrote:
| You need to be a little bit more gentle and understanding. A
| lot of folks have no idea there are alternatives to apple's
| products that are faster, of higher quality, and upgradeable.
| Many seem to be blown away by stuff that has been available
| with other brands for a while - fast RAM speeds being one of
| them. Few years back when i broke free from apple i was shocked
| how fast and reliable other products were. Not to mention the
| size of my ram is larger than an entry level storage option
| with apple's laptops.
| Aurornis wrote:
| This is a great write-up and I love all the different ways they
| collected and analyzed data.
|
| That said, it would have been much easier and more accurate to
| simply put each laptop side by side and run some timed
| compilations on the exact same scenarios: A full build,
| incremental build of a recent change set, incremental build
| impacting a module that must be rebuilt, and a couple more
| scenarios.
|
| Or write a script that steps through the last 100 git commits,
| applies them incrementally, and does a timed incremental build to
| get a representation of incremental build times for actual code.
| It could be done in a day.
|
| Collecting company-wide stats leaves the door open to significant
| biases. The first that comes to mind is that newer employees will
| have M3 laptops while the oldest employees will be on M1 laptops.
| While not a strict ordering, newer employees (with their new M3
| laptops) are more likely to be working on smaller changes while
| the more tenured employees might be deeper in the code or working
| in more complicated areas, doing things that require longer build
| times.
|
| This is just one example of how the sampling isn't truly as
| random and representative as it may seem.
|
| So cool analysis and fun to see the way they've used various
| tools to analyze the data, but due to inherent biases in the
| sample set (older employees have older laptops, notably) I think
| anyone looking to answer these questions should start with the
| simpler method of benchmarking recent commits on each laptop
| before they spend a lot of time architecting company-wide data
| collection
| lawrjone wrote:
| I totally agree with your suggestion, and we (I am the author
| of this post) did spot-check the performance for a few common
| tasks first.
|
| We ended up collecting all this data partly to compare machine-
| to-machine, but also because we want historical data on
| developer build times and a continual measure of how the builds
| are performing so we can catch regressions. We quite frequently
| tweak the architecture of our codebase to make builds more
| performant when we see the build times go up.
|
| Glad you enjoyed the post, though!
| pjot wrote:
| newer employees will have M3 laptops while the oldest employees
| will be on M1 laptops
|
| While I read this from my work intel...
| dash2 wrote:
| As a scientist, I'm interested how computer programmers work with
| data.
|
| * They drew beautiful graphs!
|
| * They used chatgpt to automate their analysis super-fast!
|
| * ChatGPT punched out a reasonably sensible t test!
|
| But:
|
| * They had variation across memory and chip type, but they never
| thought of using a linear regression.
|
| * They drew histograms, which are hard to compare. They could
| have supplemented them with simple means and error bars. (Or used
| cumulative distribution functions, where you can see if they
| overlap or one is shifted.)
| mnming wrote:
| I think it's partly because the audiences are often not
| familiar with those statistics details either.
|
| Most people hates nuances when reading data report.
| jxcl wrote:
| Yeah, I was looking at the histograms too, having trouble
| comparing them and thinking they were a strange choice for
| showing differences.
| Herring wrote:
| >They drew histograms, which are hard to compare.
|
| Note that in some places they used boxplots, which offer
| clearer comparisons. It would have been more effective to
| present all the data using boxplots.
| vaxman wrote:
| 1. If, and only if, you are doing ML or multimedia, get a 128GB
| system and because of the cost of that RAM, it would be foolish
| not to go M3 Max SoC (notwithstanding the 192GB M2 Ultra SoC).
| Full Stop. (Note: This is also a good option for people with more
| money than brains.)
|
| 2. If you are doing traditional heavyweight software development,
| or are concerned with perception in an interview, promotional
| context or just impressing others at a coffee shop, get a 32GB
| 16" MBP system with as large a built-in SSD as you can afford (it
| gets cheaper per GB as you buy more) and go for an M2 Pro SoC,
| which is faster in many respects than an M3 Pro due to core count
| and memory bandwidth. Full Stop. (You could instead go 64GB on an
| M1 Max if you keep several VMs open, which isn't really a thing
| anymore (use VPS), or if you are keeping a 7-15B parameter LLM
| open (locally) for some reason, but again, if you are doing much
| with local LLMs, as opposed to being always connectable to the
| 1.3T+ parameter hosted ChatGPT, then you should have stopped at
| #1.)
|
| 3. If you are nursing mature apps along, maybe even adding ML,
| adjusting UX, creating forks to test new features, etc.. then
| your concern is with INCREMENTAL COMPILATION and the much bigger
| systems like M3 Max will be slower (bc they need time to ramp up
| multiple cores and that's not happening with bursty incremental
| builds), so might as well go for a 16GB M1 MBA (add stickers or
| whatever if you're ashamed of looking like a school kid) and
| maybe invest the savings in a nice monitor like the 28" LG DualUp
| (bearing in mind you can only use a single native-speed external
| monitor on non-Pro/Max SoCs at a time). You can even get by with
| the 8GB M1 MBA because the MacOS memory compressor is really good
| and the SSD is really fast. Do you want an M2 MBA? No, it has
| inferior thermals, is heavier, larger, fingerprints easy, lack's
| respect and the price performance doesn't make sense given the
| other options. Same goes for 13" M1/M2 Pro and all M3 Pro.
|
| Also, make sure you keep hourly (or better) backups on all Apple
| laptops. There is a common failure scenario where the buck
| converter that drops voltage for the SSD fails, sending 13VDC
| into the SSD for long enough to permanently destroy the data on
| it. https://youtu.be/F6d58HIe01A
| whatshisface wrote:
| Good to know I have commercial options for overcoming my laptop
| shame at interviews. /s
| fsckboy wrote:
| I feel like there is a correlation between fast-twitch
| programming muscles and technical debt. Some coding styles that
| are rewarded by fast compile times can be more akin to "throw it
| at the wall, see if it sticks" style development. Have you ever
| been summoned to help a junior colleague who is having a problem,
| and you immediately see some grievous errors, errors that give
| you pause. You point the first couple out, and the young buck is
| ready to send you away and confidently forge ahead, with no sense
| of "those errors hint that this thing is really broken".
|
| but we were all young once, I remember thinking the only thing
| holding me back was 4.77MHz
| wtetzner wrote:
| There's a lot of value in a short iteration loop when debugging
| unexpected behavior. Often you end up needing to keep trying
| different variations until you understand what's going on.
| lawrjone wrote:
| Yeah there's a large body of research that shows faster
| feedback cycles help developers be more productive.
|
| There's nothing that says you can't have fast feedback loops
| _and_ think carefully about your code and next debugging
| loop, but you frequently need to run and observe code to
| understand the next step.
|
| In those cases even the best programmer can't overcome a much
| slower build.
| LASR wrote:
| Solid analysis.
|
| A word of warning from personal experience:
|
| I am part of a medium-sized software company (2k employees). A
| few years ago, we wanted to improve dev productivity. Instead of
| going with new laptops, we decided to explore offloading the dev
| stack over to AWS boxes.
|
| This turned out to be a multi-year project with a whole team of
| devs (~4) working on it full-time.
|
| In hindsight, the tradeoff wasn't worth it. It's still way too
| difficult to remap a fully-local dev experience with one that's
| running in the cloud.
|
| So yeah, upgrade your laptops instead.
| jiggawatts wrote:
| https://xkcd.com/1205/
| mdbauman wrote:
| This xkcd seems relevant also: https://xkcd.com/303/
|
| One thing that jumps out at me is the assumption that compile
| time implies wasted time. The linked Martin Fowler article
| provides justification for this, saying that longer feedback
| loops provide an opportunity to get distracted or leave a
| flow state while ex. checking email or getting coffee. The
| thing is, you don't have to go work on a completely unrelated
| task. The code is still in front of you and you can still be
| thinking about it, realizing there's yet another corner case
| you need to write a test for. Maybe you're not getting
| instant gratification, but surely a 2-minute compile time
| doesn't imply 2 whole minutes of wasted time.
| chiefalchemist wrote:
| Spot on. The mind often needs time and space to breathe,
| especially after it's been focused and bearing down on
| something. We're humans, not machines. Creativity (i.e.,
| problem solving) needs to be nurtured. It can't be force
| fed.
|
| More time working doesn't translate to being more effective
| and more productive. If that were the case then why are a
| disproportionate percentage of my "Oh shit! I know what to
| do to solve that..." in the shower, on my morning run,
| etc.?
| WaxProlix wrote:
| I suspect things like GitHub's Codespaces offering will be more
| and more popular as time goes on for this kind of thing. Did
| you guys try out some of the AWS Cloud9 or other 'canned' dev
| env offerings?
| hmottestad wrote:
| My experience with GitHub Codespaces is mostly limited to
| when I forgot my laptop and had to work from my iPad. It was
| a horrible experience, mostly because Codespaces didn't
| support touch or Safari very well and I also couldn't use
| IntelliJ which I'm more familiar with.
|
| Can't really say anything for performance, but I don't think
| it'll beat my laptop unless maven can magically take decent
| advantage of 32 cores (which I unfortunately know it can't).
| boringuser2 wrote:
| I get a bit of a toxic vibe from a couple comments in that
| article.
|
| Chiefly, I think the problem is that the CTO solved the wrong
| problem: the right problem to solve includes a combination of
| assessing why company public opinion is generating mass movements
| of people wanting a new MacBook literally every year, if this is
| even worth responding to at all (it isn't), and keeping employees
| happy.
|
| Most employees are reasonsble enough to not be bothered if they
| don't get a new MacBook every year.
|
| Employers should already be addressing outdated equipment
| concerns.
|
| Wasting developer time on a problem that is easily solvable in
| one minute isn't worthwhile. You upgrade the people 2-3 real
| generations behind. That should already have been in the
| pipeline, resources notwhistanding.
|
| I just dislike this whole exercise because it feels like a
| perfect storm of technocratic performativity, short sighted
| "metric" based management, rash consumerism, etc.
| BlueToth wrote:
| It's really worth the money if it keeps employees happy!
| Besides that the conclusion was updating M1 to M3, but not
| every year.
| lawrjone wrote:
| Sorry you read it like this!
|
| If it's useful: Pete wasn't really being combative with me on
| this. I suggested we should check if the M3 really was faster
| so we could upgrade if it was, we agreed and then I did the
| analysis. The game aspect of this was more for a bit of fun in
| the article than how things actually work.
|
| And in terms of why we didn't have a process for this: the
| company itself is about two years old, so this was the first
| hardware refresh we'd ever needed to schedule. So we haven't a
| formal process in place yet and probably won't until the next
| one either!
| joshspankit wrote:
| Since RAM was a major metric, there should have been more focus
| on IO Wait to catch cases where OSX was being hindered by
| swapping to disk. (Yes, the drives are fast but you don't know
| until you measure)
| cced wrote:
| This. I've routinely got a 10-15GB page file on an M2 pro and
| need to justify bumping the memory up a notch or two. I'm
| consistently in the yellow memory and in the red while
| building.
|
| How can I tell how much I would benefit from a memory bump?
| mixmastamyk wrote:
| A lot of the graphs near the end comparing side-to-side had
| different scales on the Y axis. Take results with a grain of
| salt.
|
| https://incident.io/_next/image?url=https%3A%2F%2Fcdn.sanity...
| lawrjone wrote:
| They're normalised histograms so the y axis is deliberately
| adjusted so you can compare the shape of the distribution, as
| the absolute number of builds in each bucket means little when
| there are a different count of builds for each platform.
| hk1337 wrote:
| I wonder why they didn't include Linux since the project they're
| building is Go? Most CI tools, I believe, are going to be Linux.
| Sure, you can explicitly select macOS in Github CI but Linux
| seems like it would be the better generic option?
|
| *EDIT* I guess if you needed a macOS specific build with Go you
| would us macOS but I would have thought you'd use Linux too. Can
| you build a Go project in Linux and have it run on macOS? I
| suppose architecture would be an issue building on Linux x86
| would not run on macOS Apple Silicon but the reverse is true too
| a build on Apple Silicon would not work on Linux x86 maybe not
| even Linux Arm.
| xp84 wrote:
| I know nothing about Go, but if it's like other platforms,
| builds intended for production or staging environments are
| indeed nearly always for x86_64, but those are done somewhere
| besides laptops, as part of the CI process. The builds done on
| the laptops are to run each developer's local instance of their
| server-side application and its front-end components, That
| instance is always being updated to whatever is in-progress at
| the time. Then they check that code in, and eventually it gets
| built for prod on an Intel, Linux system elsewhere.
| SSLy wrote:
| > Application error: a client-side exception has occurred (see
| the browser console for more information).
|
| When I open the page.
| rendaw wrote:
| > People with the M1 laptops are frequently waiting almost 2m for
| their builds to complete.
|
| I don't see this at all... the peak for all 3 is at right under
| 20s. The long tail (i.e. infrequently) goes up to 2m, but for all
| 3. M2 looks slightly better than M1, but it's not clear to me
| there's an improvement from M2 to M3 at all from this data.
___________________________________________________________________
(page generated 2023-12-29 23:00 UTC)