[HN Gopher] Ask HN: Machine learning engineers, what do you do a...
___________________________________________________________________
Ask HN: Machine learning engineers, what do you do at work?
I'm curious about the day-to-day of a Machine Learning engineer. If
you work in this field, could you share what your typical tasks and
projects look like? What are you working on?
Author : Gooblebrai
Score : 283 points
Date : 2024-06-07 17:26 UTC (1 days ago)
| tambourineman88 wrote:
| The opposite of what you'd think when studying machine
| learning...
|
| 95% of the job is data cleaning, joining datasets together and
| feature engineering. 5% is fitting and testing models.
| toephu2 wrote:
| Sounds like a Data Scientist job?
| moandcompany wrote:
| This is a large problem in industry: defining away some of
| the most important parts of a job or role as (should be)
| someone else's.
|
| There is a lot of toil and unnecessary toil in the whole data
| field, but if you define away all of the "yucky" parts, you
| might find that all of those "someone elses" will end up
| eating your lunch.
| hiatus wrote:
| > There is a lot of toil and unnecessary toil in the whole
| data field, but if you define away all of the "yucky"
| parts, you might find that all of those "someone elses"
| will end up eating your lunch.
|
| See: the use of "devops" to encapsulate "everything besides
| feature development"
| tedivm wrote:
| It's not about "yucky" so much as specialization and only
| having a limited time in life to learn everything.
|
| Should your reseacher have to manage nvidia drivers and
| infiniband networking? Should your operations engineer need
| to understand the math behind transformers? Does your
| researcher really gain any value from understanding the
| intricacies of docker layer caching?
|
| I've seen what it looks like when a company hires mostly
| researchers and ignores other expertise, versus what
| happens when a company hires diverse talent sets to build a
| cross domain team. The second option works way better.
| AndrewKemendo wrote:
| My answer is yes to both of those
|
| If other peoples work is reliant on yours then you should
| know how their part of the system transforms your inputs
|
| Similarly you should fully understand how all the inputs
| to your part of the system are generated
|
| No matter your coupling pattern, if you have more than 1
| person product, knowing at least one level above and
| below your stack is a baseline expectation
|
| This is true with personnel leadership too, I should be
| able to troubleshoot one level above and below me to some
| level of capacity.
| otteromkram wrote:
| The parent comment had three examples...
| mrbombastic wrote:
| 2/3 is close enough in ML world
| moandcompany wrote:
| > I've seen what it looks like when a company hires
| mostly researchers and ignores other expertise, versus
| what happens when a company hires diverse talent sets to
| build a cross domain team. The second option works way
| better.
|
| I've seen these too, and you aren't wrong. Division into
| specializations can work "way better" (i.e. the overall
| potential is higher), but in practice the differentiating
| factors that matter will come down to organizational and
| ultimately human-factors. The anecdotal cases I draw my
| observations from organizations operating at the scale of
| 1-10 people, as well as 1,000s working in this field.
|
| > Should your reseacher have to manage nvidia drivers and
| infiniband networking? Should your operations engineer
| need to understand the math behind transformers? Does
| your researcher really gain any value from understanding
| the intricacies of docker layer caching?
|
| To realize the higher potential mentioned above, what
| they need to be doing is appreciating the value of what
| those things are and those who do those things beyond:
| these are the people that do the things I don't want to
| do or don't want to understand. That appreciation usually
| comes from having done and understanding that work.
|
| When specializations are used, they tend to also manifest
| into organizational structures and dynamics which are
| ultimately comprised of humans. Conway's Law is worth
| mentioning here because the interfaces between these
| specializations become the bottleneck of your system in
| realizing that "higher potential."
|
| As another commenter mentions, the effectiveness of these
| interfaces, corresponding bottlenecking effects, and
| ultimately the entire people-driven system is very much
| driven by how the parties on each side understand each
| other's work/methods/priorities/needs/constraints/etc,
| and having an appreciation for how they affect (i.e.
| complement) each other and the larger system.
| auntienomen wrote:
| A good DS can double as an MLE.
| disgruntledphd2 wrote:
| And sometimes, a good MLE can double as a DS.
|
| Personally I think we calcified the roles around data a
| little too soon but that's probably because there was such
| demand and the space is wide.
| RSZC wrote:
| Used to do this job once upon a time - can't overstate the
| importance of just being knee-deep in the data all day long.
|
| If you outsource that to somebody else, you'll miss out on
| all the pattern-matching eureka moments, and will never know
| the answers to questions you never think to ask.
| huygens6363 wrote:
| "Scientist"? Is this like Software Engineer?
| staunton wrote:
| I guess it means "someone who has or is about to have a
| PhD".
| maxlamb wrote:
| Sounds like a data engineer job to me
| jamil7 wrote:
| My partner is a data engineer, from what I've gathered the
| departments are often very small or one person so the roles
| end up blending together a lot.
| dblohm7 wrote:
| As somebody whose machine learning expertise consists of the
| first cohort of Andrew Ng's MOOC back in 2011, I'm not too
| surprised. One of the big takeaways I took from that experience
| was the importance of getting the features right.
| geoduck14 wrote:
| >was the importance of getting the features right.
|
| Yeah, but also _knowing_ which features to get right. Right?
| Animats wrote:
| I remember that class. Someone from Blackrock taught it at
| Hacker Dojo. The good old days of support vector machines and
| Matlab.
| ismailmaj wrote:
| This was very important with classical machine learning, now
| with deep learning, feature engineering became useless as the
| model can learn the relevant features by itself.
|
| However, having a quality and diverse dataset is more
| important now than ever.
| Salgat wrote:
| That depends on the type of data, and regardless, your goal
| is to minimizing the input data since it has a direct
| impact on performance overhead and duration of inference.
| beckhamc wrote:
| no we just replaced feature engineering with architectural
| engineering
| AndrewKemendo wrote:
| As it was in the beginning and now and ever shall be amen
|
| At the staff/principal level it's all about maintaining "data
| impedance" between the product features that rely on inference
| models and the data capture
|
| This is to ensure that as the product or features change it
| doesn't break the instrumentation and data granularity that
| feed your data stores and training corpus
|
| For RL problems however it's about making sure you have the
| right variables captured for state and action space tuple and
| then finding how to adjust the interfaces or environment models
| for reward feedback
| llama_person wrote:
| Same here, it's tons of work to collect, clean, validate data,
| followed by a tiny fun portion where you train models, then you
| do the whole loop over again.
| gopher_space wrote:
| > it's tons of work to collect, clean, validate data
|
| That's my fun part. The discovery process is a joy especially
| if it means ingesting a whole new domain and meeting people.
| whiplash451 wrote:
| In a sense, the data _is_ the model (inductive bias) so
| splitting << data work >> and << model work >> like you do is
| arbitrary.
| hirako2000 wrote:
| The amount of response may be self explaining.
|
| Not my main work, but spending a lot of time gluing things
| together. Tweaking existing open source. Figuring out how to
| optimize resources, retraining models on different data sets.
| Trying to run poorly put together python code. Adding missing
| requirements files. Cleaning up data. Wondering what could in
| fact really be useful to solve with ML that hasn't been done
| years ago already. Browsing the prices of the newest GPUs and
| calculating whether that would be worth it to get one rather than
| renting overpriced hours off hosting providers. Reading papers
| until my head hurt, that is just 1 by 1, by the time I finish the
| abstract and glanced over a few diagrams in the middle.
| ZenMikey wrote:
| Where do you locate/how do you select papers?
| davedx wrote:
| pip install pytorch
|
| Environment broken
|
| Spend 4 hours fixing python environment
|
| pip install Pillow
|
| Something something incorrect cpu architecture for your Macbook
|
| Spend another 4 hours reinstalling everything from scratch after
| nuking every single mention of python
|
| pip install ... oh time to go home!
| makapuf wrote:
| Maybe pip should not work by default (but python -m venv _then_
| pip install should)
| avmich wrote:
| Legends say there were times when you'd have a program.c file
| and just run cc program.c, and then could just execute the
| compiled result. Funny that programmer's job is highly
| automatable, yet we invent ourselves tons of intermediate
| layers which we absolutely have to deal with manually.
| EnergyAmy wrote:
| And then you'd have to deal with wrong glibc versions or
| mysterious segfaults or undefined behavior or the the code
| assuming the wrong arch or ...
| KeplerBoy wrote:
| python solves none of those issues. It just adds a myriad
| of ways those problems can get to you.
|
| All of a sudden you have people with C problems, who have
| no idea they're even using compiled dependencies.
| EnergyAmy wrote:
| In theory you're right, CPython is written in C and it
| could segfault or display undefined behavior. In
| practice, you're quite wrong.
|
| It's not really much of a counterargument to say that
| Python is good enough that you don't have to care what's
| under the hood, except when it breaks because C sucks so
| badly.
| KeplerBoy wrote:
| I was specifically talking about python packages using C.
| You type "pip install" and god knows what's going to
| happen. It might pull a precompiled wheel, it might just
| compile and link some C or Fortran code, it might need
| external dependecies. It might install flawlessly and
| crash as soon as you try to run it. All bets are off.
|
| I never experienced CPython itself segfault, it's always
| due to some package.
| davedx wrote:
| I actually did a small C project a couple of years ago, the
| spartan simplicity there can have its own pain too, like
| having to maintain a Makefile. LOL. It's swings and
| roundabouts!
| makapuf wrote:
| I agree simplicity is king. But you're comparing making a
| script using dependencies and tooling for those
| dependencies and a C program with no dependencies. You can
| download a simple python script and run it directly if it
| has no dependencies besides stdlib (which is way larger in
| python). That's why I love using bottle.py by example.
| avmich wrote:
| Agree. But even with dependencies running "make" seems to
| be way simpler than having to install particular version
| of tools for a project, making venv and then picking
| versions of dependencies.
|
| The point is the same - we had it simpler and now, with
| all capabilities for automation, we have it more complex.
|
| Frankly, I suspect most of the efforts now are spent
| fighting non-essential complexities, like
| compatibilities, instead of solving the problem at hand.
| That means we create problems for ourselves faster than
| removing them.
| jononor wrote:
| Some Linux distros are moving that way, particularly for the
| included Python/pip version. My Arch Linux already does so
| some years, and I do not set it up myself - so I think it is
| default.
| sigmoid10 wrote:
| If you're still doing ML locally in 2024 and also use an ARM
| macbook, you're asking for trouble.
| spmurrayzzz wrote:
| Can you expand on this a bit? My recent experiences with MLX
| have been really positive, so I'm curious what footguns
| you're alluding to here.
|
| (I don't do most of my work locally, but for smaller models
| its pretty convenient to work on my mbp).
| sigmoid10 wrote:
| MPS implementations generally lag behind CUDA kernels,
| especially for new and cutting edge stuff. Sure, if you're
| only running CPU inference or only want to use the GPU for
| simple or well established models, then things have gotten
| to the point where you can almost get the plug and play
| experience on Apple silicon. But if you're doing research
| level stuff and training your own models, the hassle is
| just not worth it once you see how convenient ML has become
| in the cloud. Especially since you don't really want to
| store large training datasets locally anyways.
| genevra wrote:
| For real
| nicce wrote:
| > ARM macbook
|
| Funnily, the only real competitor for Nvidias' GPUs are
| Macbooks with 128GB of RAM.
| hu3 wrote:
| And they don't compete in performance.
| hkt wrote:
| I see your contemporary hardware choices and raise you my
| P900 ThinkStation with 256GB of RAM and 48 Xeon cores.
| Eventually it might even acquire modern graphics hardware.
| anArbitraryOne wrote:
| I wish my company would understand this and let us use
| something else. Luckily, they don't really seem to care that
| I use my Linux based gaming machine most of the time
| davedx wrote:
| What can I say, I enjoy pain!?
| blitzar wrote:
| Nahh im l33t - intel macbook and no troubles.
| rwalle wrote:
| can't read? parent clearly says "ARM macbook".
| next_xibalba wrote:
| Do people doing ML/DS not use conda anymore?
| buildbot wrote:
| A lot do, personally, every single time I try to go back to
| conda/mamba whatever, I get some extremely weird C/C++
| related linking bug - just recently, I ran into an issue
| where the environment was _almost_ completely isolated from
| the OS distro's C/C++ build infra, except for LD, which was
| apparently so old it was missing the vpdpbusd instruction
| (https://github.com/google/XNNPACK/issues/6389). Except the
| thing was, that wouldn't happen when building outside of of
| the Conda environment. Very confusing. Standard virtualenvs
| are boring but nearly always work as expected in comparison.
|
| I'm an Applied Scientist vs. ML Engineer, if that matters.
| astromaniak wrote:
| It's probably easier to reinstall everything anew from time
| to time. Instead of fixing broken 18.04 just move to 22.04.
| Most tools should work, if you don't have huge codebase
| which requires old compiler...
|
| Conda.. it interfere with OS setup and has not always the
| best utils. Like ffmpeg is compiled with limited options,
| probably due to licensing.
| buildbot wrote:
| I do all the time, and always have (in fact my first job
| was bare metal OS install automation), this was Rocky
| 9.4. New codebase, new compiler weird errors. I did
| actually reinstall and switch over to Ubuntu 24.04 after
| that issue lol.
| copperroof wrote:
| If they are they should stop.
|
| It causes so many entirely unnecessary issues. The conda
| developers are directly responsible for maybe a month of my
| wasted debugging time. At my last job one of our questions
| for helping debug client library issues was "are you using
| conda". And if so we just would say we can't help you.
| Luckily it was rare, but if conda was involved it was 100%
| conda fault somehow, and it was always a stupid decision they
| made that flew in the face of the rest of the python
| packaging community.
|
| Data scientist python issues are often caused by them not
| taking the 1-3 days it takes to fully understand their tool
| chain. It's genuinely quite difficult to fuck up if you take
| the time once to learn how it all works, where your putbon
| binaries are on your system etc. Maybe not the case 5 years
| ago. But today it's pretty simple.
| buildbot wrote:
| Fully agree with this. Understand the basic tools that
| currently exist and you'll be fine. Conda constantly fucks
| things up in weird hard to debug ways...
| blt wrote:
| I can't point at a single reason, but I got sick of it.
|
| The interminable solves were awful. Mamba made it better, but
| can still be slow.
|
| Plenty of more esoteric packages are on PyPI but not Conda.
| (Yes, you can install pip packages in a conda env file.)
|
| Many packages have a default version and a conda-forge
| version; it's not always clear which you should use.
|
| In Github CI, it takes extra time to install.
|
| Upon installation it (by default) wants to mess around with
| your .bashrc and start every shell in a "base" environment.
|
| It's operated by another company instead of the Python
| Software Foundation.
|
| idk, none of these are deal-breakers, but I switched to venv
| and have not considered going back.
| SushiHippie wrote:
| Can recommend using conda, more specifically
| mambaforge/micromamba (no licensing issues when used at work).
|
| This works way better than pip, as it does more
| checks/dependency checking, so it does not break as easily as
| pip, though this makes it definitely way slower when installing
| something. It also supports updating your environment to the
| newest versions of all packages.
|
| It's no silver bullet and mixing it with pip leads to even more
| breakages, but there is pixi [0] which aims to support interop
| between pypi and conda packages
|
| [0] https://prefix.dev/
| tasuki wrote:
| I had a bad experience with Conda:
|
| - If they're so good at dependency management, why is Conda
| installed through a magical shell script?
|
| - It's slow as molasses.
|
| - Choosing between Anaconda/Miniconda...
|
| When forced to use Python, I prefer Poetry, or just pip with
| freezing the dependencies.
|
| The Python people probably can't even imagine how great
| dependency management is in all the other languages...
| tedivm wrote:
| I absolutely hate conda. I had to support a bunch of
| researchers who all used it and it was a nightmare.
| akkad33 wrote:
| Mamba/micromamba solves the slowness problem of conda
| epoxia wrote:
| To add. Conda has parallelized downloads and is faster.
| Not as fast as mamba, but faster than previously. pr
| merged sep 2022 ->
| https://github.com/conda/conda/pull/11841
| SushiHippie wrote:
| Yeah, I agree, maybe I should have also mentioned the bad
| things about it, but after trying many different tools
| that's the one that I stuck with, as creating/destroying
| environments is a breeze once you got it working and the
| only time my environment broke was when I used pip in that
| environment.
|
| > The Python people probably can't even imagine how great
| dependency management is in all the other languages...
|
| Yep, I wish I could use another language at work.
|
| > Choosing between Anaconda/Miniconda...
|
| I went straight with mamba/micromamba as anaconda isn't
| open source.
| semi-extrinsic wrote:
| rye and uv, while "experimental", are orders of magnitude
| better than poetry and pip IMHO.
| davedx wrote:
| Yes I started with conda I think and ended up switching to
| venv and I can't even remember why, it's a painful blur now.
| It was almost certainly user error too somewhere along the
| way (probably one of the earlier steps), but recovering from
| it had me seriously considering buying a Linux laptop.
|
| This happened about a week ago
| mysteria wrote:
| I thought most ML engineers use their laptops as dumb terminals
| and just remote into a Linux GPU server.
| daemonologist wrote:
| Yeah, the workday there looks pretty similar though, except
| that installing pytorch and pillow is usually no problem.
| Today it was flash-attn I spent the afternoon on.
| ungamedplayer wrote:
| Isn't this what containers are for. Someone somewhere gets
| it configured right and then you download and run pre setup
| container and add your job data ? Or am I looking at the
| problem wrong?
| rolisz wrote:
| But then how do you test out the latest model that came
| out from who knows where and has the weirdest
| dependencies and a super obscure command to install?
| fshbbdssbbgdd wrote:
| Just email all your data to the author and ask them to
| run it for you.
| davedx wrote:
| Spoiler: my main role isn't ML engineer :) and that doesn't
| sound like a bad idea at all
| jshbmllr wrote:
| I do this... but air-gapped :(
| daemonologist wrote:
| Oof. At our company only CI/CD agents (and laptops) are
| allowed to access the internet, and that's bad enough.
| fragmede wrote:
| that sounds very painful.
| rqtwteye wrote:
| Oh my! This hits home. We have some test scripts written in
| python. Every time I try to run them after a few months I spend
| a day fixing the environment, package dependencies and other
| random stuff. Python is pretty nice once it works, but managing
| the environment can be a pain.
| phaedrus wrote:
| As an amateur game engine developer, I morosely reflect my
| hobby seems to actually consist of endlessly chasing things
| that were broken by environment updates (OS, libraries,
| compiler, etc.) That is, most of the time I sit down to code I
| actually spend nuking and reinstalling things that (I thought)
| were previously working.
|
| Your comment makes me feel a little better that this is not
| merely some personal failing of focus, but happens in a
| professional setting too.
| ClimaxGravely wrote:
| Happens in AAA too but we tend to have teams that shield
| everyone from that before they get to work. I ran a team like
| that for a couple years.
|
| For hobby stuff at home though I don't tend to hit those
| types of issues because my projects are pretty frozen
| dependency-wise. Do you really have OS updates break stuff
| for you often? I'm not sure I recall that happening on a home
| project in quite a while.
| davedx wrote:
| Oh god yes I remember trying to support old Android games
| several OS releases later... Impossible, I gave up!
|
| It's why I still use react, their backcompat is amazing
| SJC_Hacker wrote:
| Docker is your friend
|
| or pyenv at least
| navbaker wrote:
| Yes, I've switched from conda to a combination of dev
| containers and pyenv/pyenv-virtualenv on both my Linux and
| MacBook machines and couldn't be happier
| __rito__ wrote:
| Just use conda.
| SoftTalker wrote:
| Then you get one of my favorites: NVIDIA-<something> has
| failed because it couldn't communicate with the NVIDIA
| driver. Make sure that the latest NVIDIA driver is installed
| and running.
| davedx wrote:
| Iirc I originally used conda because I couldn't get faiss
| to work in venv, lol. That was a while ago though
| wszrfcbhujnikm wrote:
| Docker fixes this!
| el_benhameen wrote:
| > Environment broken
|
| >Something something incorrect cpu architecture for your
| Macbook
|
| I'm glad I have something in common with the smart people
| around here.
| davedx wrote:
| Yeah getting a good python environment setup is a very
| humbling experience
| mc10 wrote:
| uv is a great drop-in replacement for pip:
| https://astral.sh/blog/uv
| vergessenmir wrote:
| Why do ML, especially nowadays on a Mac, when you can do it on
| an Ubuntu based machine.
|
| Surely work can provide that?
| zxexz wrote:
| Why Ubuntu specifically? Not even being snarky. Calling out a
| specific distro, vs. the operating system itself. I've had
| more pain setting up ML environments with Ubuntu than a
| MacBook, personally - though pure Debian has been the easiest
| to get stable from scratch. Ubuntu usually screws me over one
| way or another after a month or so. I think I've spend a
| cumulative month of my life tracking down things related to
| changes inn netplan, cloud-init, etc. Not to mention Ubuntu
| Pro spam being incessant, as official policy of Canonical
| [0]. I first used the distro all the way back at Warty
| Warthog, and it was my daily driver from Feisty until
| ~Xenial. I think it was the Silicon Valley ad in the MotD
| that was the last straw for me.
|
| [0] https://bugs.launchpad.net/ubuntu/+source/ubuntu-
| meta/+bug/1...
| 0x008 wrote:
| I can recommend to try poetry. It is a lot more succesful in
| resolving dependencies than pip.
|
| Although I think the UX of poetry is stupid and I do not agree
| with some design decisions, I have not had any dependency
| conflicts since I used it.
| globular-toast wrote:
| You could learn how to use Python. Just spend one of those 4
| hours actually learning. Imagine just getting into a car and
| pressing controls until something happened. This wouldn't be
| allowed to happen in any other industry.
| sirlunchalot wrote:
| Could you be a bit more specific about what you mean by "You
| could learn how to use python"? What resources would you
| recommend to learn how to work around problems the OP has?
| What basic procedures/resources can you recommend to "learn
| python"? I work as a software developer alongside my studies
| and often face the same problems as OP that I would like to
| avoid. Very grateful for any tips!
| globular-toast wrote:
| Basically just use virtual environments via the venv
| module. The only thing you really need to know is that
| Python doesn't support having multiple versions of a
| package installed in the same environment. That means you
| need to get very familiar with creating (and destroying)
| environments. You don't need to know any of this if you
| just use tools that happen to be written in Python. But if
| you plan to write Python code then you do. It should be in
| Python books really, but they tend to skip over the boring
| stuff.
| mountainriver wrote:
| Oh wow, I've been a Python engineer for over a decade and
| getting dependencies right for machine learning has very
| little to do with Python and everything to do with c++/cuda
| globular-toast wrote:
| I've done it. Isn't it just following instructions? What
| part of that means destroying every mention of Python on
| the system?
| kobalsky wrote:
| I've been programming with python for decades and the problem
| they are describing says more about the disastrous state of
| python's package management and the insane backwards
| compatibility stance python devs have.
|
| Half of the problems I've helped some people solve stem from
| python devs insisting on shuffling std libraries around
| between minor versions.
|
| Some libraries have a compatibility grid with different
| python minor versions, because how often they break things.
| loftyal wrote:
| ...and people criticise node's node_modules. At least you don't
| spend hours doing this
| RamblingCTO wrote:
| But you do because your local node_modules and upstream are
| out of sync and CI is broken. Happens at least once a month
| just before a release of course. I'd rather have my code
| failing locally than trying to debug what's out of sync on
| upstream.
| mountainriver wrote:
| Not nearly as hard of a problem. Python does work just fine
| when it's pure Python. The trouble comes with all the C/Cuda
| dependencies in machine learning
| posix_monad wrote:
| Python's dominance is holding us back. We need a stack with a
| more principled approach to environments and native
| dependencies.
| llm_trw wrote:
| Here's what getting PyTorch built reproducibly looks like:
| https://hpc.guix.info/blog/2021/09/whats-in-a-package/
|
| Since then the whole python ecosystem has gotten worse.
|
| We are building towers on quicksand.
|
| It's not about python, it's about people who don't care about
| dependencies.
| hkt wrote:
| Dependency management is just.. hard. It is one of the
| things where everything relies upon it but nobody thinks
| "hey, this is my responsibility to improve" so it is left
| to people who have the most motivation, academic posts, or
| grant funding. This is roughly the same problem that led to
| heartbleed for OpenSSL.
| manusachi wrote:
| Do you know what other ecosystem comes closest to the
| existing in Python? I've heard good things about Julia.
|
| 13 years ago when I was trying to explore the field R seemed
| to be the most popular, but looks like not anymore. (I didn't
| get into the field, and do just a regular SWE, so I'm not
| aware of the trends).
|
| There is also a lot of development in Elixir ecosystem around
| the subject [1].
|
| [1](https://dashbit.co/blog/elixir-ml-s1-2024-mlir-arrow-
| instruc...)
| ninkendo wrote:
| I count at least a half dozen "just use X" replies to this
| comment, for at least a half dozen values of X, where X is some
| wrapper on top of pip or a replacement for pip or some virtual
| environment or some alternative to a virtual environment etc
| etc etc.
|
| Why is python dependency management so cancerously bad? Why are
| there so many "solutions" to this problem that seem to be out
| of date as soon as they exist?
|
| Are python engineers just bad, or?
|
| (Background: I never used python except for one time when I
| took a coursera ML course and was immediately assaulted with
| conda/miniconda/venv/pip/etc etc and immediately came away with
| a terrible impression of the ecosystem.)
| sjducb wrote:
| Two problems intersect:
|
| - You can't have two versions of the same package in the
| namespace at the same time. - The Python ecosystem is very
| bad at backwards compatibility
|
| This means that you might require one package that requires
| foo below version 1.2 and another package that requires foo
| version 2 and above.
|
| There is no good solution to the above problem.
|
| This problem is amplified when lots of the packages were
| written by academics 10 years ago and are no longer
| maintained.
|
| The bad solutions are: 1) Have 2 venvs - not always possible
| and if you keep making venvs you'll have loads of them. 2)
| Rewrite your code to only use one library 3) Update one of
| the libraries 4) Don't care about the mismatch and cross your
| fingers that the old one will work with the newer library.
|
| Most of the tooling follows approach 1 or 4
| fragmede wrote:
| Disk space is cheap, so where it's possible to have 2 (or
| more) venvs, that seems easiest. The problem with venv is
| that they don't automatically activate. I've been using a
| _very_ simple wrapper around python to automatically
| activate venvs so I can just cd into the directory and do
| _python foo.py_ and have it use the local venv.
|
| I threw it online at https://github.com/fragmede/python-
| wool/
| fbdab103 wrote:
| I think it is worth separating the Python ML ecosystem from
| the rest. While traditional Python environment management has
| many sore points, it is usually not terrible (though there
| are many gotchas still-to-this-day-problems that should have
| been corrected long ago).
|
| The ML system is a whole another stack of problems. The
| elephant in the room is Nvidia who is not known for playing
| well with others. Aside from that, the state of the art in ML
| is churning rapidly as new improvements are identified.
| globular-toast wrote:
| It's not bad. It works really well. There's always room for
| improvement. That's technology for you. Python probably does
| attract more than it's fair share of bad engineers, though.
| llm_trw wrote:
| Just use Linux. Then you only have to fight the nvidia drivers.
| htrp wrote:
| Use standard cloud images
|
| > Something something incorrect cpu architecture for your
| Macbook
| angarg12 wrote:
| My job title is ML Engineer, but my day to day job is almost pure
| software engineering.
|
| I build the systems to support ML systems in production. As
| others have mentioned, this includes mostly data transformation,
| model training, and model serving.
|
| Our job is also to support scientists to do their job, either by
| building tools or modifying existing systems.
|
| However, looking outside, I think my company is an outlier. It
| seems in the industry the expectations for a ML Engineer are more
| aligned to what a data/applied scientist does (e.g. building and
| testing models). That introduces a lot of ambiguity into the
| expectations for each role in each company.
| hnthrowaway0328 wrote:
| That's really the kind of job I'd love. Whatever the data is, I
| don't care. I make sure that the users get the correct data
| quickly.
| tedivm wrote:
| In my experience your company is doing it right, and doing it
| the way that other successful companies do.
|
| I gave a talk at the Open Source Summit on MLOps in April, and
| one of the big points I try to drive home is that it's 80%
| software development and 20% ML.
|
| https://www.youtube.com/watch?v=pyJhQJgO8So
| exegete wrote:
| My company is largely the same. I'm an MLE and partner with
| data scientists. I don't train or validate the models. I
| productionize and instrument the feature engineering pipelines
| and model deployments. More data engineering and MLOps than
| anything. I'm in a highly regulated industry so the data
| scientists have many compliance tasks related to the models and
| we engineers have our own compliance tasks related to the
| deployments. I was an MLE at another company in the very same
| industry before and did everything in the model lifecycle and
| it was just too much.
| trybackprop wrote:
| In a given week, I usually do the following:
|
| * 15% of my time in technical discussion meetings or 1:1's.
| Usually discussing ideas around a model, planning, or ML product
| support
|
| * 40% ML development. In the early phase of the project, I'm
| understanding product requirements. I discuss an ML model or
| algorithm that might be helpful to achieve product/business goals
| with my team. Then I gather existing datasets from analysts and
| data scientists. I use those datasets to create a pipeline that
| results in a training and validation dataset. While I wait for
| the train/validation datasets to populate (could take several
| days or up to two weeks), I'm concurrently working on another
| project that's earlier or further along in its development. I'm
| also working on the new model (written in PyTorch), testing it
| out with small amounts of data to gauge its offline performance,
| to assess whether or not it does what I expect it to do. I sanity
| check it by running some manual tests using the model to populate
| product information. This part is more art than science because
| without a large scale experiment, I can only really go by the gut
| feel of myself and my teammates. Once the train/valid datasets
| have been populated, I train a model on large amounts of data,
| check the offline results, and tune the model or change the
| architecture if something doesn't look right. After offline
| results look decent or good, I then deploy the model to
| production for an experiment. Concurrently, I may be making
| changes to the product/infra code to prepare for the test of the
| new model I've built. I run the experiment and ramp up traffic
| slowly, and once it's at 1-5% allocation, I let it run for weeks
| or a month. Meanwhile, I'm observing the results and have put in
| alerts to monitor all relevant pipelines to ensure that the model
| is being trained appropriately so that my experiment results
| aren't altered by unexpected infra/bug/product factors that
| should be within my control. If the results look as expected and
| match my initial hypothesis, I then discuss with my team whether
| or not we should roll it out and if so, we launch! (Note: model
| development includes feature authoring, dataset preparation,
| analysis, creating the ML model itself, implementing
| product/infra code changes)
|
| * 20% maintenance - Just because I'm developing new models
| doesn't mean I'm ignoring existing ones. I'm checking in on those
| daily to make sure they haven't degraded and resulted in
| unexpected performance in any way. I'm also fixing pipelines and
| making them more efficient.
|
| * 15% research papers and skills - With the world of AI/ML moving
| so fast, I'm continually reading new research papers and testing
| out new technologies at home to keep up to date. It's fun for me
| so I don't mind it. I don't view it as a chore to keep me up-to-
| date.
|
| * 10% internal research - I use this time to learn more about
| other products within the team or the company to see how my team
| can help or what technology/techniques we can borrow from them. I
| also use this time to write down the insights I've gained as I
| look back on my past 6 months/1 year of work.
| ZenMikey wrote:
| How do you select what papers to read? How often does that
| research become relevant to your job?
| itake wrote:
| not sure if this counts as ML engineering, but I support all the
| infra around the ML models: caching, scaling, queues, decision
| trees, rules engines, etc.
| selimthegrim wrote:
| What do you do with decision trees specifically?
| barrenko wrote:
| MLOps, sure.
| frankPoole wrote:
| Pretty much the same as the others, building tool, data cleaning,
| etc. But something I don't see mentioned: experiment design/ data
| collection protocols
| tenache wrote:
| Although I studied machine learning and was originally hired for
| that role, the company pivoted and is now working with LLMs, so I
| spend most of my day working on figuring out how different LLMs
| work, what parameters work best for them, how to do RAG, how to
| integrate them with other bots.
| KeplerBoy wrote:
| Would you not consider LLMs as a part of machine learning?
| Cyclone_ wrote:
| I'd say deep learning is a subset of machine learning, and
| LLMs are a subset of deep learning.
| chudi wrote:
| Probably it's because we are not training them anymore and
| just using with prompts. Seems like more of a swe regular
| type of job
| aulin wrote:
| except regular swe is way more fun than writing prompts
| layer8 wrote:
| They are the result of machine learning.
| uoaei wrote:
| There is a vanishingly small percentage of people actually
| working on the design and training of LLMs vs all those who
| call themselves "AI engineers" who are just hitting APIs.
| mardifoufs wrote:
| I work on optimizing our inference code, "productizing" our
| trained models and currently I'm working on local training and
| inference since I work in an industry where cloud services just
| aren't very commonly used yet. It's super interesting too since
| it's not LLMs, meaning that there aren't as many pre made tools
| and we have to make tons of stuff by ourselves. That means
| touching anything from assessing data quality (again, the local
| part is the challenge) to using CUDA directly as we already have
| signal processing libs that are built around it and that we can
| leverage.
|
| Sometimes it also involves building internal tooling for our team
| (we are a mixed team of researchers/MLEs), to visualize the data
| and the inferences as again, it's a pretty niche sector and that
| means having to build that ourselves. That allowed me to have a
| lot of impact in my org as we basically have complete freedom
| w.r.t tooling and internal software design, and one of the tools
| that I built basically on a whim is now on the way to be shipped
| in our main products too.
| burnedout_dc4e3 wrote:
| I've been doing machine learning since the mid 2000s. About half
| of my time is spent keeping data pipelines running to get data
| into shape for training and using in models.
|
| The other half is spent doing tech support for the bunch of
| recently hired "AI scientists" who can barely code, and who spend
| their days copy/pasting stuff into various chatbot services.
| Stuff like telling them how to install python packages and use
| git. They have no plan for how their work is going to fit into
| any sort of project we're doing, but assert that transformer
| models will solve all our data handling problems.
|
| I'm considering quitting with nothing new lined up until this
| hype cycle blows over.
| naveen99 wrote:
| You're living the dream. Why quit ?
| bowsamic wrote:
| Is that really your idea of a dream?
| naveen99 wrote:
| My dreams are usually more disturbing, or fun...
|
| But yes. My work is kind of similar... I do some data
| curation / coding, and help 2 engineers who report to me. I
| enjoy it.
| burnedout_dc4e3 wrote:
| I like to feel useful, and like I'm actually contributing to
| things. I probably didn't express it well in my first post,
| but the attitude is very much that my current role is
| obsolete and a relic that's just sticking around until the AI
| can do everything.
|
| It means I'm marginalized in terms of planning. The company
| has long term goals that involve making good use of data.
| Right now, the plan is that "AI" will get us there, with no
| plan B is it doesn't work. When it inevitably fails to live
| up to the hype, we're going to have a bunch of clobbered
| together systems that are expensive to run, rather than
| something that we can keep iterating on.
|
| It means I'm marginalized in terms of getting resources for
| projects. There's a lot of good my team could be doing if we
| had the extra budget for more engineers and computing.
| Instead that budget is being sent off to AI services, and
| expensive engineer time is being spent on tech support for
| people that slapped "LLM" all over their resume.
| bentt wrote:
| I wonder if this is how the OG VR guys felt in 2016.
| Havoc wrote:
| Well Palmer Luckey sold oculus and now makes military gear so
| I guess he chose violence after his VR era
| m_ke wrote:
| I just quit a day ago with nothing lined up for the same
| reason.
| whiplash451 wrote:
| There are companies where applied scientists are required to
| code well. Just ask how they are hired before joining (that
| should be a positive feature).
| burnedout_dc4e3 wrote:
| Yeah, we used to be like that. Then, when this hype cycle
| started ramping up, the company brought in a new exec who got
| rid of that. I brought it up with the CEO, but nothing
| changed, so that's another reason for me to leave.
| giantg2 wrote:
| I've interviewed for a few of the ML positions and turned them
| down because they were just data jockey positions.
| redwood wrote:
| Do people feel like they are more or less in demand with the
| hyper around genai?
| uoaei wrote:
| Demand is higher for flashy things that look good on directors'
| desks, definitely. But there's less attention on less flashy
| applications of machine learning, unless your superiors are so
| clueless that they think what you're doing _is_ GenAI. Which
| sometimes the systems /models being trained are legitimately
| generative, but in the more technical, traditional sense.
| redwood wrote:
| What are the tools people up to use? Feature platforms like
| Tecton on the list?
| schmookeeg wrote:
| Getting my models dunked on by people who can't open MS Outlook
| more than 3 tries out of 5, however, have a _remarkable_ depth
| and insight into their chosen domain of expertise. It 's rather
| humbling.
|
| Collaborating with nontechnical people is oddly my favorite part
| of doing MLE work right now. It wasn't the case when I did basic
| web/db stuff. They see me as a magician. I see them as voodoo
| priests and priestesses. When we get something trained up and
| forecasting that we both like, it's super fulfilling. I think for
| both sides.
|
| Most of my modeling is healthcare related. I tease insights out
| of a monstrous data lake of claims, Rx, doctor notes, vital
| signs, diagnostic imagery, etc. What is also monstrous is how
| accessible this information is. HIPAA my left foot.
|
| Since you seemed to be asking about the temporal realities, it's
| about 3 hours of meetings a week, probably another 3 doing task
| grooming/preparatory stuff, fixing some ETL problem, or doing a
| one-off query for the business, the rest is swimming around in
| the data trying to find a slight edge to forecast something that
| surprised us for a $million or two using our historical
| snapshots. It's like playing wheres waldo with math. And the
| waldo scene ends up being about 50TB or so in size. :D
| soared wrote:
| 3 hours of meetings a week, that's incredible. Sounds like your
| employer understands and values your time!
| visarga wrote:
| 13 meetings/week, at least one full day of work wasted for me
| schmookeeg wrote:
| They really do. This has been my longest tenure at any
| position _by far_ and Engineer QoL is a massive part of it.
| Our CTO came up through the DBA /Data/Engineering Management
| ranks and the empathy is solidly there.
|
| As we grow, I'm ever watchful for our metamorphosis into a
| big-dumb-company, but no symptoms yet. :)
| saulrh wrote:
| > HIPAA my left foot.
|
| That was my experience as well - training documentation for
| fresh college grads (i.e. me) directed new engineers to just...
| send SQL queries to production to learn. There was a process
| for gaining permissions, there were audit logs, but the only
| sign-off you needed was your manager, permission lasted 12
| months, and the managers just rubber-stamped everyone.
|
| That was ten years ago. Every time I think about it I find
| myself hoping that things have gotten better and knowing they
| haven't.
| bick_nyers wrote:
| ... you didn't have a UAT environment?
| saulrh wrote:
| There were a couple "not prod" environments, but they were
| either replicated directly from prod or so poorly
| maintained that they were unusable (empty tables, wrong
| schemas, off by multiple DB major versions, etc), no middle
| ground. So institutional culture was to just run everything
| against prod (for bare selects that could be copied and
| pasted into the textbox in the prod-access web tool) or a
| prod replica (for anything that needed a db connection).
| The training docs actually did specify Real Production, and
| first-week tasks included gaining Real Production access.
| If I walked in and was handed that training documentation
| today I'd raise hell and/or quit on the spot, but that was
| my first job out of college - it was basically _everyone
| 's_ first job out of college, they strongly preferred
| hiring new graduates - and I'd just had to give up on my
| PhD so I didn't have the confidence, energy, or pull to do
| anything about it, even bail out.
|
| That was also the company where prod pushes happened once a
| month, over the weekend, and were all hands on deck in case
| of hiccups. It was an extraordinarily strong lesson in how
| not to organize software development.
|
| (edit: if what you're really asking is "did every engineer
| have write access to production", the answer was, I
| believe, that only managers did, and they were at least not
| totally careless with it. not, like, actually _responsible_
| , no "formal post-mortem for why we had to use break-glass
| access", but it generally only got brought out to unbreak
| prod pushes. Still miserable.)
| roughly wrote:
| There's an old joke that everyone's got a testing
| environment, but some people are lucky enough to have a
| separate production environment.
| jollofricepeas wrote:
| So...
|
| Is it surprising that Engineers in healthcare dont read the
| actual HIPAA documentation?
|
| Use of health data is permitted so long as it's for payment,
| treatment or operations. Disclosures and patient consent are
| not required.
|
| There are helpful summaries on the US Department of Health
| and Humans Services website of the various rules (Security,
| Privacy & Notification)
|
| Source: https://www.hhs.gov/hipaa/for-
| professionals/privacy/guidance...
|
| This allowance is permitted to covered entities and by
| extension their vendors (business associates) by HIPAA.
|
| If it wasn't then, theoretically the US healthcare industry
| would grind to a halt considering the number of
| intermediaries for a single transaction.
|
| Example:
|
| Doctor writes script -> EHR -> Pharmacy -> Switch ->
| Clearinghouse --> PA Processing -> PBM/Plan makes
| determination
|
| Along this flow there are other possible branches and
| vendors.
|
| It's beyond complex.
| aiforecastthway wrote:
| _> If it wasn't then, theoretically the US healthcare
| industry would grind to a halt considering the number of
| intermediaries for a single transaction._
|
| It just occurred to me that cleaning up our country's data
| privacy / data ownership mess might have extraordinarily
| positive second-order effects on our Kafkaesque and
| criminally expensive healthcare "system".
|
| Maybe making it functionally impossible for there to be
| hundreds of middlemen between me and my doctor would be
| a... good thing?
| outside1234 wrote:
| Define operations, because that sounds like a loophole that
| basically allows you to use it for anything
| connicpu wrote:
| Basically the only thing you can't do with the data is
| disclose it to someone who doesn't also fall under HIPAA
| constantinum wrote:
| > I tease insights out of a monstrous data lake of claims, Rx,
| doctor notes, vital signs
|
| I'm curious to know the tech stack behind converting
| unstructured to structured data(for reporting and analysis)
| dax77 wrote:
| Take a look at AWS Healthlake and AWS Comprehend Medical
| hamasho wrote:
| I worked on a project to analyze endoscope videos to find
| diseases. I examined a lot of images and videos annotated with
| symptoms of various diseases labeled by assistants and doctors.
| Most of them are really obvious, but others are almost
| impossible to detect. In rare cases, despite my best efforts, I
| couldn't see any difference between the spot labeled as a
| symptom of cancer and the surrounding area. There's no a-ha
| moment, like finding an insect mimicking its environment. No
| matter how many times I tried, I just couldn't see any
| difference.
| aswegs8 wrote:
| Mind sharing how to get a foot into the field? I've got a
| good amount of domain knowledge from my studies in life
| science and rather meager experience from learning to code on
| my own for a few years. It seems like I cant compete with CS
| majors and gotta find a way to leverage my domain knowledge.
| nomilk wrote:
| The Dead Internet Theory says _most_ activity on the internet
| is by bots [1]. The Dead Privacy Theory says _approximately
| all_ private data is not private; but rather is accessible on
| whim by any data scientist, SWE, analyst, or db admin with
| access to the database.
|
| [1] https://en.wikipedia.org/wiki/Dead_Internet_theory
| htrp wrote:
| > The Dead Privacy Theory says approximately all private data
| is not private; but rather is accessible on whim by any data
| scientist, SWE, analyst, or db admin with access to the
| database.
|
| I like this so much I'm definitely stealing it!
| bearjaws wrote:
| Damn, I've talked about this many times at my last job
| (startup that went from 100k patients to ~2.5M in 5 years). I
| love the name Dead Privacy Theory
| htrp wrote:
| >Getting my models dunked on by people who can't open MS
| Outlook more than 3 tries out of 5, however, have a remarkable
| depth and insight into their chosen domain of expertise. It's
| rather humbling.
|
| The people who have lasted in those roles have built up a large
| degree of intuition on how their domains work (or they would've
| done something else).
| ProjectArcturis wrote:
| What business are you in that predicting health data can make
| you millions?
| dr_kiszonka wrote:
| Insurance, health benefits.
| conkeisterdoor wrote:
| This sounds almost exactly like my day-to-day as a solo senior
| data engineer -- minus building and training ML models, and I
| don't work in healthcare. My peers are all very non-technical
| business directors who are very knowledgeable about their
| domains, and I'm like a wizard who can conjure up time
| savings/custom reporting/actionable insights for them.
|
| Collaborating with them is great, and has been a great exercise
| in learning how to explain complex ideas to non-technical
| business people. Which has the side effect of helping me get
| better at what I do (because you need a good understanding of a
| topic to be able to explain it both succinctly and accurately
| to others). It has also taught me to appreciate the business
| context and reasoning that can drive decisions about how a
| business uses or develops data/software.
| primaprashant wrote:
| Been working as an MLE for the last 5 years and as another
| comment said most of the work is close to SWE. Depending on the
| stage of the project I'm working on, day-to-day work varies but
| it's along the lines of one of these:
|
| - Collaboration with stakeholders & TPMs and analyzing data to
| develop hypotheses to solve business problems with high priority
|
| - Framing business problems as ML problems and creating suitable
| metrics for ML models and business problems
|
| - Building PoCs and prototypes to validate the technical
| feasibility of the new features and ideas
|
| - Creating design docs for architecture and technical decisions
|
| - Collaborating with the platform teams to set up and maintain
| the data pipelines based on the needs of new and exiting ML
| projects
|
| - Building, deploying, and maintaining ML microservices for
| inference
|
| - Writing design docs for running A/B tests and performing post-
| test analyses
|
| - Setting up pipelines for retraining of ML models
| singularity2001 wrote:
| Teaching others python.
| reeboo wrote:
| Underrated comment. At my place of work, I find this to be a
| huge part of the MLE job. Everyone knows R but none of the
| cloud tools have great R support.
| exe34 wrote:
| 90% of the time it's figuring out what data to feed into neural
| networks, 2% of the time figure out stuff about neural networks
| and the other 8% of the time figure out why on earth the recall
| rate is 100%.
| jackspawn wrote:
| 50%+ of my time is spent on backend engineering because the ML is
| used inside a bigger API.
|
| I take responsibility for the end to end experience of said API,
| so I will do whatever gives the best value per time spent. This
| often has nothing to do with the ML models.
| Xenoamorphous wrote:
| I'm a regular software dev but I've had to do ML stuff by
| necessity.
|
| I wonder how "real" ML people deal with the stochastic/gradient
| results and people's expectations.
|
| If I do ordinary software work the thing either works or it
| doesn't, and if it doesn't I can explain why and hopefully fix
| it.
|
| Now with ML I get asked "why did this text classifier not
| classify this text correctly?" and all I can say is "it was 0.004
| points away to meet the threshold", and "it didn't meet it
| because of the particular choice of words or even their order"
| which seems to leave everyone dissatisfied.
| hkt wrote:
| This seems to be the absolute worst of all worlds: the burden
| of software engineering with the tools of an English Language
| undergrad.
| gopher_space wrote:
| The English degree helps explain _why_ word choice and order
| matter, giving you context and guidelines for software
| design.
| xtagon wrote:
| Not all ML is built on neural nets. Genetic programming and
| symbolic regression is fun because the resulting model is just
| code, and software devs know how to read code.
| nchfgsj1 wrote:
| Genetic programming however isn't machine learning, but
| instead it's an AI algorithm. An extremely interesting one as
| well! It was fun to have my eyes opened after being taught
| genetic algorithms, to then be brought into genetic
| programming
| aiforecastthway wrote:
| Symbolic regression has the same failure mode; the reasons
| why the model failed can be explained in a more digestible
| way, but the actual truth of what happened is fundamentally
| similar -- some coefficient was off by some amount and/or
| some monomial beat out another in some optimization process.
|
| At least with symbolic regression you can treat the model as
| an analyzable entity from first principles theories. But
| that's not really particularly relevant to most failure modes
| in practice, which usually boil down to either missing some
| qualitative change such as a bifurcation or else just
| parameters being off by a bit. Or a little bit of A and a
| little bit of B.
| npalli wrote:
| Clean data and try to get people to understand why they need to
| have clean data.
| rurban wrote:
| Highly paid cleaning lady. With dirty data you get no proper
| results. BTW: perl is much better than python on this.
|
| Highly paid motherboard troubleshooter, because those all those
| H100's really get hot, even with watercooling, and we have no
| dedicated HW guy.
|
| Fighting misbehaving third-party deps, as everyone else.
| shoggouth wrote:
| Could you talk more about "BTW: perl is much better than python
| on this."?
| eb0la wrote:
| I haven't touched Perl in more than 20 years... ... but I
| (routinely) miss something like: $variable =
| something() if sanity_check()
|
| And do_something() unless $dont_do_that
| jononor wrote:
| There exists a ternary if statement?
|
| foo = something() if sanity_check else None
|
| Can replace None with foo (or any other expression), if
| desired.
___________________________________________________________________
(page generated 2024-06-08 23:01 UTC)