hngopher.com

       [HN Gopher] Ask HN: Machine learning engineers, what do you do a...
       ___________________________________________________________________
        
       Ask HN: Machine learning engineers, what do you do at work?
        
       I'm curious about the day-to-day of a Machine Learning engineer. If
       you work in this field, could you share what your typical tasks and
       projects look like? What are you working on?
        
       Author : Gooblebrai
       Score  : 283 points
       Date   : 2024-06-07 17:26 UTC (1 days ago)
        
       | tambourineman88 wrote:
       | The opposite of what you'd think when studying machine
       | learning...
       | 
       | 95% of the job is data cleaning, joining datasets together and
       | feature engineering. 5% is fitting and testing models.
        
         | toephu2 wrote:
         | Sounds like a Data Scientist job?
        
           | moandcompany wrote:
           | This is a large problem in industry: defining away some of
           | the most important parts of a job or role as (should be)
           | someone else's.
           | 
           | There is a lot of toil and unnecessary toil in the whole data
           | field, but if you define away all of the "yucky" parts, you
           | might find that all of those "someone elses" will end up
           | eating your lunch.
        
             | hiatus wrote:
             | > There is a lot of toil and unnecessary toil in the whole
             | data field, but if you define away all of the "yucky"
             | parts, you might find that all of those "someone elses"
             | will end up eating your lunch.
             | 
             | See: the use of "devops" to encapsulate "everything besides
             | feature development"
        
             | tedivm wrote:
             | It's not about "yucky" so much as specialization and only
             | having a limited time in life to learn everything.
             | 
             | Should your reseacher have to manage nvidia drivers and
             | infiniband networking? Should your operations engineer need
             | to understand the math behind transformers? Does your
             | researcher really gain any value from understanding the
             | intricacies of docker layer caching?
             | 
             | I've seen what it looks like when a company hires mostly
             | researchers and ignores other expertise, versus what
             | happens when a company hires diverse talent sets to build a
             | cross domain team. The second option works way better.
        
               | AndrewKemendo wrote:
               | My answer is yes to both of those
               | 
               | If other peoples work is reliant on yours then you should
               | know how their part of the system transforms your inputs
               | 
               | Similarly you should fully understand how all the inputs
               | to your part of the system are generated
               | 
               | No matter your coupling pattern, if you have more than 1
               | person product, knowing at least one level above and
               | below your stack is a baseline expectation
               | 
               | This is true with personnel leadership too, I should be
               | able to troubleshoot one level above and below me to some
               | level of capacity.
        
               | otteromkram wrote:
               | The parent comment had three examples...
        
               | mrbombastic wrote:
               | 2/3 is close enough in ML world
        
               | moandcompany wrote:
               | > I've seen what it looks like when a company hires
               | mostly researchers and ignores other expertise, versus
               | what happens when a company hires diverse talent sets to
               | build a cross domain team. The second option works way
               | better.
               | 
               | I've seen these too, and you aren't wrong. Division into
               | specializations can work "way better" (i.e. the overall
               | potential is higher), but in practice the differentiating
               | factors that matter will come down to organizational and
               | ultimately human-factors. The anecdotal cases I draw my
               | observations from organizations operating at the scale of
               | 1-10 people, as well as 1,000s working in this field.
               | 
               | > Should your reseacher have to manage nvidia drivers and
               | infiniband networking? Should your operations engineer
               | need to understand the math behind transformers? Does
               | your researcher really gain any value from understanding
               | the intricacies of docker layer caching?
               | 
               | To realize the higher potential mentioned above, what
               | they need to be doing is appreciating the value of what
               | those things are and those who do those things beyond:
               | these are the people that do the things I don't want to
               | do or don't want to understand. That appreciation usually
               | comes from having done and understanding that work.
               | 
               | When specializations are used, they tend to also manifest
               | into organizational structures and dynamics which are
               | ultimately comprised of humans. Conway's Law is worth
               | mentioning here because the interfaces between these
               | specializations become the bottleneck of your system in
               | realizing that "higher potential."
               | 
               | As another commenter mentions, the effectiveness of these
               | interfaces, corresponding bottlenecking effects, and
               | ultimately the entire people-driven system is very much
               | driven by how the parties on each side understand each
               | other's work/methods/priorities/needs/constraints/etc,
               | and having an appreciation for how they affect (i.e.
               | complement) each other and the larger system.
        
           | auntienomen wrote:
           | A good DS can double as an MLE.
        
             | disgruntledphd2 wrote:
             | And sometimes, a good MLE can double as a DS.
             | 
             | Personally I think we calcified the roles around data a
             | little too soon but that's probably because there was such
             | demand and the space is wide.
        
           | RSZC wrote:
           | Used to do this job once upon a time - can't overstate the
           | importance of just being knee-deep in the data all day long.
           | 
           | If you outsource that to somebody else, you'll miss out on
           | all the pattern-matching eureka moments, and will never know
           | the answers to questions you never think to ask.
        
           | huygens6363 wrote:
           | "Scientist"? Is this like Software Engineer?
        
             | staunton wrote:
             | I guess it means "someone who has or is about to have a
             | PhD".
        
           | maxlamb wrote:
           | Sounds like a data engineer job to me
        
           | jamil7 wrote:
           | My partner is a data engineer, from what I've gathered the
           | departments are often very small or one person so the roles
           | end up blending together a lot.
        
         | dblohm7 wrote:
         | As somebody whose machine learning expertise consists of the
         | first cohort of Andrew Ng's MOOC back in 2011, I'm not too
         | surprised. One of the big takeaways I took from that experience
         | was the importance of getting the features right.
        
           | geoduck14 wrote:
           | >was the importance of getting the features right.
           | 
           | Yeah, but also _knowing_ which features to get right. Right?
        
           | Animats wrote:
           | I remember that class. Someone from Blackrock taught it at
           | Hacker Dojo. The good old days of support vector machines and
           | Matlab.
        
           | ismailmaj wrote:
           | This was very important with classical machine learning, now
           | with deep learning, feature engineering became useless as the
           | model can learn the relevant features by itself.
           | 
           | However, having a quality and diverse dataset is more
           | important now than ever.
        
             | Salgat wrote:
             | That depends on the type of data, and regardless, your goal
             | is to minimizing the input data since it has a direct
             | impact on performance overhead and duration of inference.
        
             | beckhamc wrote:
             | no we just replaced feature engineering with architectural
             | engineering
        
         | AndrewKemendo wrote:
         | As it was in the beginning and now and ever shall be amen
         | 
         | At the staff/principal level it's all about maintaining "data
         | impedance" between the product features that rely on inference
         | models and the data capture
         | 
         | This is to ensure that as the product or features change it
         | doesn't break the instrumentation and data granularity that
         | feed your data stores and training corpus
         | 
         | For RL problems however it's about making sure you have the
         | right variables captured for state and action space tuple and
         | then finding how to adjust the interfaces or environment models
         | for reward feedback
        
         | llama_person wrote:
         | Same here, it's tons of work to collect, clean, validate data,
         | followed by a tiny fun portion where you train models, then you
         | do the whole loop over again.
        
           | gopher_space wrote:
           | > it's tons of work to collect, clean, validate data
           | 
           | That's my fun part. The discovery process is a joy especially
           | if it means ingesting a whole new domain and meeting people.
        
         | whiplash451 wrote:
         | In a sense, the data _is_ the model (inductive bias) so
         | splitting << data work >> and << model work >> like you do is
         | arbitrary.
        
       | hirako2000 wrote:
       | The amount of response may be self explaining.
       | 
       | Not my main work, but spending a lot of time gluing things
       | together. Tweaking existing open source. Figuring out how to
       | optimize resources, retraining models on different data sets.
       | Trying to run poorly put together python code. Adding missing
       | requirements files. Cleaning up data. Wondering what could in
       | fact really be useful to solve with ML that hasn't been done
       | years ago already. Browsing the prices of the newest GPUs and
       | calculating whether that would be worth it to get one rather than
       | renting overpriced hours off hosting providers. Reading papers
       | until my head hurt, that is just 1 by 1, by the time I finish the
       | abstract and glanced over a few diagrams in the middle.
        
         | ZenMikey wrote:
         | Where do you locate/how do you select papers?
        
       | davedx wrote:
       | pip install pytorch
       | 
       | Environment broken
       | 
       | Spend 4 hours fixing python environment
       | 
       | pip install Pillow
       | 
       | Something something incorrect cpu architecture for your Macbook
       | 
       | Spend another 4 hours reinstalling everything from scratch after
       | nuking every single mention of python
       | 
       | pip install ... oh time to go home!
        
         | makapuf wrote:
         | Maybe pip should not work by default (but python -m venv _then_
         | pip install should)
        
           | avmich wrote:
           | Legends say there were times when you'd have a program.c file
           | and just run cc program.c, and then could just execute the
           | compiled result. Funny that programmer's job is highly
           | automatable, yet we invent ourselves tons of intermediate
           | layers which we absolutely have to deal with manually.
        
             | EnergyAmy wrote:
             | And then you'd have to deal with wrong glibc versions or
             | mysterious segfaults or undefined behavior or the the code
             | assuming the wrong arch or ...
        
               | KeplerBoy wrote:
               | python solves none of those issues. It just adds a myriad
               | of ways those problems can get to you.
               | 
               | All of a sudden you have people with C problems, who have
               | no idea they're even using compiled dependencies.
        
               | EnergyAmy wrote:
               | In theory you're right, CPython is written in C and it
               | could segfault or display undefined behavior. In
               | practice, you're quite wrong.
               | 
               | It's not really much of a counterargument to say that
               | Python is good enough that you don't have to care what's
               | under the hood, except when it breaks because C sucks so
               | badly.
        
               | KeplerBoy wrote:
               | I was specifically talking about python packages using C.
               | You type "pip install" and god knows what's going to
               | happen. It might pull a precompiled wheel, it might just
               | compile and link some C or Fortran code, it might need
               | external dependecies. It might install flawlessly and
               | crash as soon as you try to run it. All bets are off.
               | 
               | I never experienced CPython itself segfault, it's always
               | due to some package.
        
             | davedx wrote:
             | I actually did a small C project a couple of years ago, the
             | spartan simplicity there can have its own pain too, like
             | having to maintain a Makefile. LOL. It's swings and
             | roundabouts!
        
             | makapuf wrote:
             | I agree simplicity is king. But you're comparing making a
             | script using dependencies and tooling for those
             | dependencies and a C program with no dependencies. You can
             | download a simple python script and run it directly if it
             | has no dependencies besides stdlib (which is way larger in
             | python). That's why I love using bottle.py by example.
        
               | avmich wrote:
               | Agree. But even with dependencies running "make" seems to
               | be way simpler than having to install particular version
               | of tools for a project, making venv and then picking
               | versions of dependencies.
               | 
               | The point is the same - we had it simpler and now, with
               | all capabilities for automation, we have it more complex.
               | 
               | Frankly, I suspect most of the efforts now are spent
               | fighting non-essential complexities, like
               | compatibilities, instead of solving the problem at hand.
               | That means we create problems for ourselves faster than
               | removing them.
        
           | jononor wrote:
           | Some Linux distros are moving that way, particularly for the
           | included Python/pip version. My Arch Linux already does so
           | some years, and I do not set it up myself - so I think it is
           | default.
        
         | sigmoid10 wrote:
         | If you're still doing ML locally in 2024 and also use an ARM
         | macbook, you're asking for trouble.
        
           | spmurrayzzz wrote:
           | Can you expand on this a bit? My recent experiences with MLX
           | have been really positive, so I'm curious what footguns
           | you're alluding to here.
           | 
           | (I don't do most of my work locally, but for smaller models
           | its pretty convenient to work on my mbp).
        
             | sigmoid10 wrote:
             | MPS implementations generally lag behind CUDA kernels,
             | especially for new and cutting edge stuff. Sure, if you're
             | only running CPU inference or only want to use the GPU for
             | simple or well established models, then things have gotten
             | to the point where you can almost get the plug and play
             | experience on Apple silicon. But if you're doing research
             | level stuff and training your own models, the hassle is
             | just not worth it once you see how convenient ML has become
             | in the cloud. Especially since you don't really want to
             | store large training datasets locally anyways.
        
           | genevra wrote:
           | For real
        
           | nicce wrote:
           | > ARM macbook
           | 
           | Funnily, the only real competitor for Nvidias' GPUs are
           | Macbooks with 128GB of RAM.
        
             | hu3 wrote:
             | And they don't compete in performance.
        
             | hkt wrote:
             | I see your contemporary hardware choices and raise you my
             | P900 ThinkStation with 256GB of RAM and 48 Xeon cores.
             | Eventually it might even acquire modern graphics hardware.
        
           | anArbitraryOne wrote:
           | I wish my company would understand this and let us use
           | something else. Luckily, they don't really seem to care that
           | I use my Linux based gaming machine most of the time
        
           | davedx wrote:
           | What can I say, I enjoy pain!?
        
           | blitzar wrote:
           | Nahh im l33t - intel macbook and no troubles.
        
             | rwalle wrote:
             | can't read? parent clearly says "ARM macbook".
        
         | next_xibalba wrote:
         | Do people doing ML/DS not use conda anymore?
        
           | buildbot wrote:
           | A lot do, personally, every single time I try to go back to
           | conda/mamba whatever, I get some extremely weird C/C++
           | related linking bug - just recently, I ran into an issue
           | where the environment was _almost_ completely isolated from
           | the OS distro's C/C++ build infra, except for LD, which was
           | apparently so old it was missing the vpdpbusd instruction
           | (https://github.com/google/XNNPACK/issues/6389). Except the
           | thing was, that wouldn't happen when building outside of of
           | the Conda environment. Very confusing. Standard virtualenvs
           | are boring but nearly always work as expected in comparison.
           | 
           | I'm an Applied Scientist vs. ML Engineer, if that matters.
        
             | astromaniak wrote:
             | It's probably easier to reinstall everything anew from time
             | to time. Instead of fixing broken 18.04 just move to 22.04.
             | Most tools should work, if you don't have huge codebase
             | which requires old compiler...
             | 
             | Conda.. it interfere with OS setup and has not always the
             | best utils. Like ffmpeg is compiled with limited options,
             | probably due to licensing.
        
               | buildbot wrote:
               | I do all the time, and always have (in fact my first job
               | was bare metal OS install automation), this was Rocky
               | 9.4. New codebase, new compiler weird errors. I did
               | actually reinstall and switch over to Ubuntu 24.04 after
               | that issue lol.
        
           | copperroof wrote:
           | If they are they should stop.
           | 
           | It causes so many entirely unnecessary issues. The conda
           | developers are directly responsible for maybe a month of my
           | wasted debugging time. At my last job one of our questions
           | for helping debug client library issues was "are you using
           | conda". And if so we just would say we can't help you.
           | Luckily it was rare, but if conda was involved it was 100%
           | conda fault somehow, and it was always a stupid decision they
           | made that flew in the face of the rest of the python
           | packaging community.
           | 
           | Data scientist python issues are often caused by them not
           | taking the 1-3 days it takes to fully understand their tool
           | chain. It's genuinely quite difficult to fuck up if you take
           | the time once to learn how it all works, where your putbon
           | binaries are on your system etc. Maybe not the case 5 years
           | ago. But today it's pretty simple.
        
             | buildbot wrote:
             | Fully agree with this. Understand the basic tools that
             | currently exist and you'll be fine. Conda constantly fucks
             | things up in weird hard to debug ways...
        
           | blt wrote:
           | I can't point at a single reason, but I got sick of it.
           | 
           | The interminable solves were awful. Mamba made it better, but
           | can still be slow.
           | 
           | Plenty of more esoteric packages are on PyPI but not Conda.
           | (Yes, you can install pip packages in a conda env file.)
           | 
           | Many packages have a default version and a conda-forge
           | version; it's not always clear which you should use.
           | 
           | In Github CI, it takes extra time to install.
           | 
           | Upon installation it (by default) wants to mess around with
           | your .bashrc and start every shell in a "base" environment.
           | 
           | It's operated by another company instead of the Python
           | Software Foundation.
           | 
           | idk, none of these are deal-breakers, but I switched to venv
           | and have not considered going back.
        
         | SushiHippie wrote:
         | Can recommend using conda, more specifically
         | mambaforge/micromamba (no licensing issues when used at work).
         | 
         | This works way better than pip, as it does more
         | checks/dependency checking, so it does not break as easily as
         | pip, though this makes it definitely way slower when installing
         | something. It also supports updating your environment to the
         | newest versions of all packages.
         | 
         | It's no silver bullet and mixing it with pip leads to even more
         | breakages, but there is pixi [0] which aims to support interop
         | between pypi and conda packages
         | 
         | [0] https://prefix.dev/
        
           | tasuki wrote:
           | I had a bad experience with Conda:
           | 
           | - If they're so good at dependency management, why is Conda
           | installed through a magical shell script?
           | 
           | - It's slow as molasses.
           | 
           | - Choosing between Anaconda/Miniconda...
           | 
           | When forced to use Python, I prefer Poetry, or just pip with
           | freezing the dependencies.
           | 
           | The Python people probably can't even imagine how great
           | dependency management is in all the other languages...
        
             | tedivm wrote:
             | I absolutely hate conda. I had to support a bunch of
             | researchers who all used it and it was a nightmare.
        
             | akkad33 wrote:
             | Mamba/micromamba solves the slowness problem of conda
        
               | epoxia wrote:
               | To add. Conda has parallelized downloads and is faster.
               | Not as fast as mamba, but faster than previously. pr
               | merged sep 2022 ->
               | https://github.com/conda/conda/pull/11841
        
             | SushiHippie wrote:
             | Yeah, I agree, maybe I should have also mentioned the bad
             | things about it, but after trying many different tools
             | that's the one that I stuck with, as creating/destroying
             | environments is a breeze once you got it working and the
             | only time my environment broke was when I used pip in that
             | environment.
             | 
             | > The Python people probably can't even imagine how great
             | dependency management is in all the other languages...
             | 
             | Yep, I wish I could use another language at work.
             | 
             | > Choosing between Anaconda/Miniconda...
             | 
             | I went straight with mamba/micromamba as anaconda isn't
             | open source.
        
             | semi-extrinsic wrote:
             | rye and uv, while "experimental", are orders of magnitude
             | better than poetry and pip IMHO.
        
           | davedx wrote:
           | Yes I started with conda I think and ended up switching to
           | venv and I can't even remember why, it's a painful blur now.
           | It was almost certainly user error too somewhere along the
           | way (probably one of the earlier steps), but recovering from
           | it had me seriously considering buying a Linux laptop.
           | 
           | This happened about a week ago
        
         | mysteria wrote:
         | I thought most ML engineers use their laptops as dumb terminals
         | and just remote into a Linux GPU server.
        
           | daemonologist wrote:
           | Yeah, the workday there looks pretty similar though, except
           | that installing pytorch and pillow is usually no problem.
           | Today it was flash-attn I spent the afternoon on.
        
             | ungamedplayer wrote:
             | Isn't this what containers are for. Someone somewhere gets
             | it configured right and then you download and run pre setup
             | container and add your job data ? Or am I looking at the
             | problem wrong?
        
               | rolisz wrote:
               | But then how do you test out the latest model that came
               | out from who knows where and has the weirdest
               | dependencies and a super obscure command to install?
        
               | fshbbdssbbgdd wrote:
               | Just email all your data to the author and ask them to
               | run it for you.
        
           | davedx wrote:
           | Spoiler: my main role isn't ML engineer :) and that doesn't
           | sound like a bad idea at all
        
         | jshbmllr wrote:
         | I do this... but air-gapped :(
        
           | daemonologist wrote:
           | Oof. At our company only CI/CD agents (and laptops) are
           | allowed to access the internet, and that's bad enough.
        
           | fragmede wrote:
           | that sounds very painful.
        
         | rqtwteye wrote:
         | Oh my! This hits home. We have some test scripts written in
         | python. Every time I try to run them after a few months I spend
         | a day fixing the environment, package dependencies and other
         | random stuff. Python is pretty nice once it works, but managing
         | the environment can be a pain.
        
         | phaedrus wrote:
         | As an amateur game engine developer, I morosely reflect my
         | hobby seems to actually consist of endlessly chasing things
         | that were broken by environment updates (OS, libraries,
         | compiler, etc.) That is, most of the time I sit down to code I
         | actually spend nuking and reinstalling things that (I thought)
         | were previously working.
         | 
         | Your comment makes me feel a little better that this is not
         | merely some personal failing of focus, but happens in a
         | professional setting too.
        
           | ClimaxGravely wrote:
           | Happens in AAA too but we tend to have teams that shield
           | everyone from that before they get to work. I ran a team like
           | that for a couple years.
           | 
           | For hobby stuff at home though I don't tend to hit those
           | types of issues because my projects are pretty frozen
           | dependency-wise. Do you really have OS updates break stuff
           | for you often? I'm not sure I recall that happening on a home
           | project in quite a while.
        
           | davedx wrote:
           | Oh god yes I remember trying to support old Android games
           | several OS releases later... Impossible, I gave up!
           | 
           | It's why I still use react, their backcompat is amazing
        
         | SJC_Hacker wrote:
         | Docker is your friend
         | 
         | or pyenv at least
        
           | navbaker wrote:
           | Yes, I've switched from conda to a combination of dev
           | containers and pyenv/pyenv-virtualenv on both my Linux and
           | MacBook machines and couldn't be happier
        
         | __rito__ wrote:
         | Just use conda.
        
           | SoftTalker wrote:
           | Then you get one of my favorites: NVIDIA-<something> has
           | failed because it couldn't communicate with the NVIDIA
           | driver. Make sure that the latest NVIDIA driver is installed
           | and running.
        
             | davedx wrote:
             | Iirc I originally used conda because I couldn't get faiss
             | to work in venv, lol. That was a while ago though
        
         | wszrfcbhujnikm wrote:
         | Docker fixes this!
        
         | el_benhameen wrote:
         | > Environment broken
         | 
         | >Something something incorrect cpu architecture for your
         | Macbook
         | 
         | I'm glad I have something in common with the smart people
         | around here.
        
           | davedx wrote:
           | Yeah getting a good python environment setup is a very
           | humbling experience
        
         | mc10 wrote:
         | uv is a great drop-in replacement for pip:
         | https://astral.sh/blog/uv
        
         | vergessenmir wrote:
         | Why do ML, especially nowadays on a Mac, when you can do it on
         | an Ubuntu based machine.
         | 
         | Surely work can provide that?
        
           | zxexz wrote:
           | Why Ubuntu specifically? Not even being snarky. Calling out a
           | specific distro, vs. the operating system itself. I've had
           | more pain setting up ML environments with Ubuntu than a
           | MacBook, personally - though pure Debian has been the easiest
           | to get stable from scratch. Ubuntu usually screws me over one
           | way or another after a month or so. I think I've spend a
           | cumulative month of my life tracking down things related to
           | changes inn netplan, cloud-init, etc. Not to mention Ubuntu
           | Pro spam being incessant, as official policy of Canonical
           | [0]. I first used the distro all the way back at Warty
           | Warthog, and it was my daily driver from Feisty until
           | ~Xenial. I think it was the Silicon Valley ad in the MotD
           | that was the last straw for me.
           | 
           | [0] https://bugs.launchpad.net/ubuntu/+source/ubuntu-
           | meta/+bug/1...
        
         | 0x008 wrote:
         | I can recommend to try poetry. It is a lot more succesful in
         | resolving dependencies than pip.
         | 
         | Although I think the UX of poetry is stupid and I do not agree
         | with some design decisions, I have not had any dependency
         | conflicts since I used it.
        
         | globular-toast wrote:
         | You could learn how to use Python. Just spend one of those 4
         | hours actually learning. Imagine just getting into a car and
         | pressing controls until something happened. This wouldn't be
         | allowed to happen in any other industry.
        
           | sirlunchalot wrote:
           | Could you be a bit more specific about what you mean by "You
           | could learn how to use python"? What resources would you
           | recommend to learn how to work around problems the OP has?
           | What basic procedures/resources can you recommend to "learn
           | python"? I work as a software developer alongside my studies
           | and often face the same problems as OP that I would like to
           | avoid. Very grateful for any tips!
        
             | globular-toast wrote:
             | Basically just use virtual environments via the venv
             | module. The only thing you really need to know is that
             | Python doesn't support having multiple versions of a
             | package installed in the same environment. That means you
             | need to get very familiar with creating (and destroying)
             | environments. You don't need to know any of this if you
             | just use tools that happen to be written in Python. But if
             | you plan to write Python code then you do. It should be in
             | Python books really, but they tend to skip over the boring
             | stuff.
        
           | mountainriver wrote:
           | Oh wow, I've been a Python engineer for over a decade and
           | getting dependencies right for machine learning has very
           | little to do with Python and everything to do with c++/cuda
        
             | globular-toast wrote:
             | I've done it. Isn't it just following instructions? What
             | part of that means destroying every mention of Python on
             | the system?
        
           | kobalsky wrote:
           | I've been programming with python for decades and the problem
           | they are describing says more about the disastrous state of
           | python's package management and the insane backwards
           | compatibility stance python devs have.
           | 
           | Half of the problems I've helped some people solve stem from
           | python devs insisting on shuffling std libraries around
           | between minor versions.
           | 
           | Some libraries have a compatibility grid with different
           | python minor versions, because how often they break things.
        
         | loftyal wrote:
         | ...and people criticise node's node_modules. At least you don't
         | spend hours doing this
        
           | RamblingCTO wrote:
           | But you do because your local node_modules and upstream are
           | out of sync and CI is broken. Happens at least once a month
           | just before a release of course. I'd rather have my code
           | failing locally than trying to debug what's out of sync on
           | upstream.
        
           | mountainriver wrote:
           | Not nearly as hard of a problem. Python does work just fine
           | when it's pure Python. The trouble comes with all the C/Cuda
           | dependencies in machine learning
        
         | posix_monad wrote:
         | Python's dominance is holding us back. We need a stack with a
         | more principled approach to environments and native
         | dependencies.
        
           | llm_trw wrote:
           | Here's what getting PyTorch built reproducibly looks like:
           | https://hpc.guix.info/blog/2021/09/whats-in-a-package/
           | 
           | Since then the whole python ecosystem has gotten worse.
           | 
           | We are building towers on quicksand.
           | 
           | It's not about python, it's about people who don't care about
           | dependencies.
        
             | hkt wrote:
             | Dependency management is just.. hard. It is one of the
             | things where everything relies upon it but nobody thinks
             | "hey, this is my responsibility to improve" so it is left
             | to people who have the most motivation, academic posts, or
             | grant funding. This is roughly the same problem that led to
             | heartbleed for OpenSSL.
        
           | manusachi wrote:
           | Do you know what other ecosystem comes closest to the
           | existing in Python? I've heard good things about Julia.
           | 
           | 13 years ago when I was trying to explore the field R seemed
           | to be the most popular, but looks like not anymore. (I didn't
           | get into the field, and do just a regular SWE, so I'm not
           | aware of the trends).
           | 
           | There is also a lot of development in Elixir ecosystem around
           | the subject [1].
           | 
           | [1](https://dashbit.co/blog/elixir-ml-s1-2024-mlir-arrow-
           | instruc...)
        
         | ninkendo wrote:
         | I count at least a half dozen "just use X" replies to this
         | comment, for at least a half dozen values of X, where X is some
         | wrapper on top of pip or a replacement for pip or some virtual
         | environment or some alternative to a virtual environment etc
         | etc etc.
         | 
         | Why is python dependency management so cancerously bad? Why are
         | there so many "solutions" to this problem that seem to be out
         | of date as soon as they exist?
         | 
         | Are python engineers just bad, or?
         | 
         | (Background: I never used python except for one time when I
         | took a coursera ML course and was immediately assaulted with
         | conda/miniconda/venv/pip/etc etc and immediately came away with
         | a terrible impression of the ecosystem.)
        
           | sjducb wrote:
           | Two problems intersect:
           | 
           | - You can't have two versions of the same package in the
           | namespace at the same time. - The Python ecosystem is very
           | bad at backwards compatibility
           | 
           | This means that you might require one package that requires
           | foo below version 1.2 and another package that requires foo
           | version 2 and above.
           | 
           | There is no good solution to the above problem.
           | 
           | This problem is amplified when lots of the packages were
           | written by academics 10 years ago and are no longer
           | maintained.
           | 
           | The bad solutions are: 1) Have 2 venvs - not always possible
           | and if you keep making venvs you'll have loads of them. 2)
           | Rewrite your code to only use one library 3) Update one of
           | the libraries 4) Don't care about the mismatch and cross your
           | fingers that the old one will work with the newer library.
           | 
           | Most of the tooling follows approach 1 or 4
        
             | fragmede wrote:
             | Disk space is cheap, so where it's possible to have 2 (or
             | more) venvs, that seems easiest. The problem with venv is
             | that they don't automatically activate. I've been using a
             | _very_ simple wrapper around python to automatically
             | activate venvs so I can just cd into the directory and do
             | _python foo.py_ and have it use the local venv.
             | 
             | I threw it online at https://github.com/fragmede/python-
             | wool/
        
           | fbdab103 wrote:
           | I think it is worth separating the Python ML ecosystem from
           | the rest. While traditional Python environment management has
           | many sore points, it is usually not terrible (though there
           | are many gotchas still-to-this-day-problems that should have
           | been corrected long ago).
           | 
           | The ML system is a whole another stack of problems. The
           | elephant in the room is Nvidia who is not known for playing
           | well with others. Aside from that, the state of the art in ML
           | is churning rapidly as new improvements are identified.
        
           | globular-toast wrote:
           | It's not bad. It works really well. There's always room for
           | improvement. That's technology for you. Python probably does
           | attract more than it's fair share of bad engineers, though.
        
         | llm_trw wrote:
         | Just use Linux. Then you only have to fight the nvidia drivers.
        
         | htrp wrote:
         | Use standard cloud images
         | 
         | > Something something incorrect cpu architecture for your
         | Macbook
        
       | angarg12 wrote:
       | My job title is ML Engineer, but my day to day job is almost pure
       | software engineering.
       | 
       | I build the systems to support ML systems in production. As
       | others have mentioned, this includes mostly data transformation,
       | model training, and model serving.
       | 
       | Our job is also to support scientists to do their job, either by
       | building tools or modifying existing systems.
       | 
       | However, looking outside, I think my company is an outlier. It
       | seems in the industry the expectations for a ML Engineer are more
       | aligned to what a data/applied scientist does (e.g. building and
       | testing models). That introduces a lot of ambiguity into the
       | expectations for each role in each company.
        
         | hnthrowaway0328 wrote:
         | That's really the kind of job I'd love. Whatever the data is, I
         | don't care. I make sure that the users get the correct data
         | quickly.
        
         | tedivm wrote:
         | In my experience your company is doing it right, and doing it
         | the way that other successful companies do.
         | 
         | I gave a talk at the Open Source Summit on MLOps in April, and
         | one of the big points I try to drive home is that it's 80%
         | software development and 20% ML.
         | 
         | https://www.youtube.com/watch?v=pyJhQJgO8So
        
         | exegete wrote:
         | My company is largely the same. I'm an MLE and partner with
         | data scientists. I don't train or validate the models. I
         | productionize and instrument the feature engineering pipelines
         | and model deployments. More data engineering and MLOps than
         | anything. I'm in a highly regulated industry so the data
         | scientists have many compliance tasks related to the models and
         | we engineers have our own compliance tasks related to the
         | deployments. I was an MLE at another company in the very same
         | industry before and did everything in the model lifecycle and
         | it was just too much.
        
       | trybackprop wrote:
       | In a given week, I usually do the following:
       | 
       | * 15% of my time in technical discussion meetings or 1:1's.
       | Usually discussing ideas around a model, planning, or ML product
       | support
       | 
       | * 40% ML development. In the early phase of the project, I'm
       | understanding product requirements. I discuss an ML model or
       | algorithm that might be helpful to achieve product/business goals
       | with my team. Then I gather existing datasets from analysts and
       | data scientists. I use those datasets to create a pipeline that
       | results in a training and validation dataset. While I wait for
       | the train/validation datasets to populate (could take several
       | days or up to two weeks), I'm concurrently working on another
       | project that's earlier or further along in its development. I'm
       | also working on the new model (written in PyTorch), testing it
       | out with small amounts of data to gauge its offline performance,
       | to assess whether or not it does what I expect it to do. I sanity
       | check it by running some manual tests using the model to populate
       | product information. This part is more art than science because
       | without a large scale experiment, I can only really go by the gut
       | feel of myself and my teammates. Once the train/valid datasets
       | have been populated, I train a model on large amounts of data,
       | check the offline results, and tune the model or change the
       | architecture if something doesn't look right. After offline
       | results look decent or good, I then deploy the model to
       | production for an experiment. Concurrently, I may be making
       | changes to the product/infra code to prepare for the test of the
       | new model I've built. I run the experiment and ramp up traffic
       | slowly, and once it's at 1-5% allocation, I let it run for weeks
       | or a month. Meanwhile, I'm observing the results and have put in
       | alerts to monitor all relevant pipelines to ensure that the model
       | is being trained appropriately so that my experiment results
       | aren't altered by unexpected infra/bug/product factors that
       | should be within my control. If the results look as expected and
       | match my initial hypothesis, I then discuss with my team whether
       | or not we should roll it out and if so, we launch! (Note: model
       | development includes feature authoring, dataset preparation,
       | analysis, creating the ML model itself, implementing
       | product/infra code changes)
       | 
       | * 20% maintenance - Just because I'm developing new models
       | doesn't mean I'm ignoring existing ones. I'm checking in on those
       | daily to make sure they haven't degraded and resulted in
       | unexpected performance in any way. I'm also fixing pipelines and
       | making them more efficient.
       | 
       | * 15% research papers and skills - With the world of AI/ML moving
       | so fast, I'm continually reading new research papers and testing
       | out new technologies at home to keep up to date. It's fun for me
       | so I don't mind it. I don't view it as a chore to keep me up-to-
       | date.
       | 
       | * 10% internal research - I use this time to learn more about
       | other products within the team or the company to see how my team
       | can help or what technology/techniques we can borrow from them. I
       | also use this time to write down the insights I've gained as I
       | look back on my past 6 months/1 year of work.
        
         | ZenMikey wrote:
         | How do you select what papers to read? How often does that
         | research become relevant to your job?
        
       | itake wrote:
       | not sure if this counts as ML engineering, but I support all the
       | infra around the ML models: caching, scaling, queues, decision
       | trees, rules engines, etc.
        
         | selimthegrim wrote:
         | What do you do with decision trees specifically?
        
         | barrenko wrote:
         | MLOps, sure.
        
       | frankPoole wrote:
       | Pretty much the same as the others, building tool, data cleaning,
       | etc. But something I don't see mentioned: experiment design/ data
       | collection protocols
        
       | tenache wrote:
       | Although I studied machine learning and was originally hired for
       | that role, the company pivoted and is now working with LLMs, so I
       | spend most of my day working on figuring out how different LLMs
       | work, what parameters work best for them, how to do RAG, how to
       | integrate them with other bots.
        
         | KeplerBoy wrote:
         | Would you not consider LLMs as a part of machine learning?
        
           | Cyclone_ wrote:
           | I'd say deep learning is a subset of machine learning, and
           | LLMs are a subset of deep learning.
        
           | chudi wrote:
           | Probably it's because we are not training them anymore and
           | just using with prompts. Seems like more of a swe regular
           | type of job
        
             | aulin wrote:
             | except regular swe is way more fun than writing prompts
        
           | layer8 wrote:
           | They are the result of machine learning.
        
           | uoaei wrote:
           | There is a vanishingly small percentage of people actually
           | working on the design and training of LLMs vs all those who
           | call themselves "AI engineers" who are just hitting APIs.
        
       | mardifoufs wrote:
       | I work on optimizing our inference code, "productizing" our
       | trained models and currently I'm working on local training and
       | inference since I work in an industry where cloud services just
       | aren't very commonly used yet. It's super interesting too since
       | it's not LLMs, meaning that there aren't as many pre made tools
       | and we have to make tons of stuff by ourselves. That means
       | touching anything from assessing data quality (again, the local
       | part is the challenge) to using CUDA directly as we already have
       | signal processing libs that are built around it and that we can
       | leverage.
       | 
       | Sometimes it also involves building internal tooling for our team
       | (we are a mixed team of researchers/MLEs), to visualize the data
       | and the inferences as again, it's a pretty niche sector and that
       | means having to build that ourselves. That allowed me to have a
       | lot of impact in my org as we basically have complete freedom
       | w.r.t tooling and internal software design, and one of the tools
       | that I built basically on a whim is now on the way to be shipped
       | in our main products too.
        
       | burnedout_dc4e3 wrote:
       | I've been doing machine learning since the mid 2000s. About half
       | of my time is spent keeping data pipelines running to get data
       | into shape for training and using in models.
       | 
       | The other half is spent doing tech support for the bunch of
       | recently hired "AI scientists" who can barely code, and who spend
       | their days copy/pasting stuff into various chatbot services.
       | Stuff like telling them how to install python packages and use
       | git. They have no plan for how their work is going to fit into
       | any sort of project we're doing, but assert that transformer
       | models will solve all our data handling problems.
       | 
       | I'm considering quitting with nothing new lined up until this
       | hype cycle blows over.
        
         | naveen99 wrote:
         | You're living the dream. Why quit ?
        
           | bowsamic wrote:
           | Is that really your idea of a dream?
        
             | naveen99 wrote:
             | My dreams are usually more disturbing, or fun...
             | 
             | But yes. My work is kind of similar... I do some data
             | curation / coding, and help 2 engineers who report to me. I
             | enjoy it.
        
           | burnedout_dc4e3 wrote:
           | I like to feel useful, and like I'm actually contributing to
           | things. I probably didn't express it well in my first post,
           | but the attitude is very much that my current role is
           | obsolete and a relic that's just sticking around until the AI
           | can do everything.
           | 
           | It means I'm marginalized in terms of planning. The company
           | has long term goals that involve making good use of data.
           | Right now, the plan is that "AI" will get us there, with no
           | plan B is it doesn't work. When it inevitably fails to live
           | up to the hype, we're going to have a bunch of clobbered
           | together systems that are expensive to run, rather than
           | something that we can keep iterating on.
           | 
           | It means I'm marginalized in terms of getting resources for
           | projects. There's a lot of good my team could be doing if we
           | had the extra budget for more engineers and computing.
           | Instead that budget is being sent off to AI services, and
           | expensive engineer time is being spent on tech support for
           | people that slapped "LLM" all over their resume.
        
         | bentt wrote:
         | I wonder if this is how the OG VR guys felt in 2016.
        
           | Havoc wrote:
           | Well Palmer Luckey sold oculus and now makes military gear so
           | I guess he chose violence after his VR era
        
         | m_ke wrote:
         | I just quit a day ago with nothing lined up for the same
         | reason.
        
         | whiplash451 wrote:
         | There are companies where applied scientists are required to
         | code well. Just ask how they are hired before joining (that
         | should be a positive feature).
        
           | burnedout_dc4e3 wrote:
           | Yeah, we used to be like that. Then, when this hype cycle
           | started ramping up, the company brought in a new exec who got
           | rid of that. I brought it up with the CEO, but nothing
           | changed, so that's another reason for me to leave.
        
       | giantg2 wrote:
       | I've interviewed for a few of the ML positions and turned them
       | down because they were just data jockey positions.
        
       | redwood wrote:
       | Do people feel like they are more or less in demand with the
       | hyper around genai?
        
         | uoaei wrote:
         | Demand is higher for flashy things that look good on directors'
         | desks, definitely. But there's less attention on less flashy
         | applications of machine learning, unless your superiors are so
         | clueless that they think what you're doing _is_ GenAI. Which
         | sometimes the systems /models being trained are legitimately
         | generative, but in the more technical, traditional sense.
        
       | redwood wrote:
       | What are the tools people up to use? Feature platforms like
       | Tecton on the list?
        
       | schmookeeg wrote:
       | Getting my models dunked on by people who can't open MS Outlook
       | more than 3 tries out of 5, however, have a _remarkable_ depth
       | and insight into their chosen domain of expertise. It 's rather
       | humbling.
       | 
       | Collaborating with nontechnical people is oddly my favorite part
       | of doing MLE work right now. It wasn't the case when I did basic
       | web/db stuff. They see me as a magician. I see them as voodoo
       | priests and priestesses. When we get something trained up and
       | forecasting that we both like, it's super fulfilling. I think for
       | both sides.
       | 
       | Most of my modeling is healthcare related. I tease insights out
       | of a monstrous data lake of claims, Rx, doctor notes, vital
       | signs, diagnostic imagery, etc. What is also monstrous is how
       | accessible this information is. HIPAA my left foot.
       | 
       | Since you seemed to be asking about the temporal realities, it's
       | about 3 hours of meetings a week, probably another 3 doing task
       | grooming/preparatory stuff, fixing some ETL problem, or doing a
       | one-off query for the business, the rest is swimming around in
       | the data trying to find a slight edge to forecast something that
       | surprised us for a $million or two using our historical
       | snapshots. It's like playing wheres waldo with math. And the
       | waldo scene ends up being about 50TB or so in size. :D
        
         | soared wrote:
         | 3 hours of meetings a week, that's incredible. Sounds like your
         | employer understands and values your time!
        
           | visarga wrote:
           | 13 meetings/week, at least one full day of work wasted for me
        
           | schmookeeg wrote:
           | They really do. This has been my longest tenure at any
           | position _by far_ and Engineer QoL is a massive part of it.
           | Our CTO came up through the DBA /Data/Engineering Management
           | ranks and the empathy is solidly there.
           | 
           | As we grow, I'm ever watchful for our metamorphosis into a
           | big-dumb-company, but no symptoms yet. :)
        
         | saulrh wrote:
         | > HIPAA my left foot.
         | 
         | That was my experience as well - training documentation for
         | fresh college grads (i.e. me) directed new engineers to just...
         | send SQL queries to production to learn. There was a process
         | for gaining permissions, there were audit logs, but the only
         | sign-off you needed was your manager, permission lasted 12
         | months, and the managers just rubber-stamped everyone.
         | 
         | That was ten years ago. Every time I think about it I find
         | myself hoping that things have gotten better and knowing they
         | haven't.
        
           | bick_nyers wrote:
           | ... you didn't have a UAT environment?
        
             | saulrh wrote:
             | There were a couple "not prod" environments, but they were
             | either replicated directly from prod or so poorly
             | maintained that they were unusable (empty tables, wrong
             | schemas, off by multiple DB major versions, etc), no middle
             | ground. So institutional culture was to just run everything
             | against prod (for bare selects that could be copied and
             | pasted into the textbox in the prod-access web tool) or a
             | prod replica (for anything that needed a db connection).
             | The training docs actually did specify Real Production, and
             | first-week tasks included gaining Real Production access.
             | If I walked in and was handed that training documentation
             | today I'd raise hell and/or quit on the spot, but that was
             | my first job out of college - it was basically _everyone
             | 's_ first job out of college, they strongly preferred
             | hiring new graduates - and I'd just had to give up on my
             | PhD so I didn't have the confidence, energy, or pull to do
             | anything about it, even bail out.
             | 
             | That was also the company where prod pushes happened once a
             | month, over the weekend, and were all hands on deck in case
             | of hiccups. It was an extraordinarily strong lesson in how
             | not to organize software development.
             | 
             | (edit: if what you're really asking is "did every engineer
             | have write access to production", the answer was, I
             | believe, that only managers did, and they were at least not
             | totally careless with it. not, like, actually _responsible_
             | , no "formal post-mortem for why we had to use break-glass
             | access", but it generally only got brought out to unbreak
             | prod pushes. Still miserable.)
        
             | roughly wrote:
             | There's an old joke that everyone's got a testing
             | environment, but some people are lucky enough to have a
             | separate production environment.
        
           | jollofricepeas wrote:
           | So...
           | 
           | Is it surprising that Engineers in healthcare dont read the
           | actual HIPAA documentation?
           | 
           | Use of health data is permitted so long as it's for payment,
           | treatment or operations. Disclosures and patient consent are
           | not required.
           | 
           | There are helpful summaries on the US Department of Health
           | and Humans Services website of the various rules (Security,
           | Privacy & Notification)
           | 
           | Source: https://www.hhs.gov/hipaa/for-
           | professionals/privacy/guidance...
           | 
           | This allowance is permitted to covered entities and by
           | extension their vendors (business associates) by HIPAA.
           | 
           | If it wasn't then, theoretically the US healthcare industry
           | would grind to a halt considering the number of
           | intermediaries for a single transaction.
           | 
           | Example:
           | 
           | Doctor writes script -> EHR -> Pharmacy -> Switch ->
           | Clearinghouse --> PA Processing -> PBM/Plan makes
           | determination
           | 
           | Along this flow there are other possible branches and
           | vendors.
           | 
           | It's beyond complex.
        
             | aiforecastthway wrote:
             | _> If it wasn't then, theoretically the US healthcare
             | industry would grind to a halt considering the number of
             | intermediaries for a single transaction._
             | 
             | It just occurred to me that cleaning up our country's data
             | privacy / data ownership mess might have extraordinarily
             | positive second-order effects on our Kafkaesque and
             | criminally expensive healthcare "system".
             | 
             | Maybe making it functionally impossible for there to be
             | hundreds of middlemen between me and my doctor would be
             | a... good thing?
        
             | outside1234 wrote:
             | Define operations, because that sounds like a loophole that
             | basically allows you to use it for anything
        
               | connicpu wrote:
               | Basically the only thing you can't do with the data is
               | disclose it to someone who doesn't also fall under HIPAA
        
         | constantinum wrote:
         | > I tease insights out of a monstrous data lake of claims, Rx,
         | doctor notes, vital signs
         | 
         | I'm curious to know the tech stack behind converting
         | unstructured to structured data(for reporting and analysis)
        
           | dax77 wrote:
           | Take a look at AWS Healthlake and AWS Comprehend Medical
        
         | hamasho wrote:
         | I worked on a project to analyze endoscope videos to find
         | diseases. I examined a lot of images and videos annotated with
         | symptoms of various diseases labeled by assistants and doctors.
         | Most of them are really obvious, but others are almost
         | impossible to detect. In rare cases, despite my best efforts, I
         | couldn't see any difference between the spot labeled as a
         | symptom of cancer and the surrounding area. There's no a-ha
         | moment, like finding an insect mimicking its environment. No
         | matter how many times I tried, I just couldn't see any
         | difference.
        
           | aswegs8 wrote:
           | Mind sharing how to get a foot into the field? I've got a
           | good amount of domain knowledge from my studies in life
           | science and rather meager experience from learning to code on
           | my own for a few years. It seems like I cant compete with CS
           | majors and gotta find a way to leverage my domain knowledge.
        
         | nomilk wrote:
         | The Dead Internet Theory says _most_ activity on the internet
         | is by bots [1]. The Dead Privacy Theory says _approximately
         | all_ private data is not private; but rather is accessible on
         | whim by any data scientist, SWE, analyst, or db admin with
         | access to the database.
         | 
         | [1] https://en.wikipedia.org/wiki/Dead_Internet_theory
        
           | htrp wrote:
           | > The Dead Privacy Theory says approximately all private data
           | is not private; but rather is accessible on whim by any data
           | scientist, SWE, analyst, or db admin with access to the
           | database.
           | 
           | I like this so much I'm definitely stealing it!
        
           | bearjaws wrote:
           | Damn, I've talked about this many times at my last job
           | (startup that went from 100k patients to ~2.5M in 5 years). I
           | love the name Dead Privacy Theory
        
         | htrp wrote:
         | >Getting my models dunked on by people who can't open MS
         | Outlook more than 3 tries out of 5, however, have a remarkable
         | depth and insight into their chosen domain of expertise. It's
         | rather humbling.
         | 
         | The people who have lasted in those roles have built up a large
         | degree of intuition on how their domains work (or they would've
         | done something else).
        
         | ProjectArcturis wrote:
         | What business are you in that predicting health data can make
         | you millions?
        
           | dr_kiszonka wrote:
           | Insurance, health benefits.
        
         | conkeisterdoor wrote:
         | This sounds almost exactly like my day-to-day as a solo senior
         | data engineer -- minus building and training ML models, and I
         | don't work in healthcare. My peers are all very non-technical
         | business directors who are very knowledgeable about their
         | domains, and I'm like a wizard who can conjure up time
         | savings/custom reporting/actionable insights for them.
         | 
         | Collaborating with them is great, and has been a great exercise
         | in learning how to explain complex ideas to non-technical
         | business people. Which has the side effect of helping me get
         | better at what I do (because you need a good understanding of a
         | topic to be able to explain it both succinctly and accurately
         | to others). It has also taught me to appreciate the business
         | context and reasoning that can drive decisions about how a
         | business uses or develops data/software.
        
       | primaprashant wrote:
       | Been working as an MLE for the last 5 years and as another
       | comment said most of the work is close to SWE. Depending on the
       | stage of the project I'm working on, day-to-day work varies but
       | it's along the lines of one of these:
       | 
       | - Collaboration with stakeholders & TPMs and analyzing data to
       | develop hypotheses to solve business problems with high priority
       | 
       | - Framing business problems as ML problems and creating suitable
       | metrics for ML models and business problems
       | 
       | - Building PoCs and prototypes to validate the technical
       | feasibility of the new features and ideas
       | 
       | - Creating design docs for architecture and technical decisions
       | 
       | - Collaborating with the platform teams to set up and maintain
       | the data pipelines based on the needs of new and exiting ML
       | projects
       | 
       | - Building, deploying, and maintaining ML microservices for
       | inference
       | 
       | - Writing design docs for running A/B tests and performing post-
       | test analyses
       | 
       | - Setting up pipelines for retraining of ML models
        
       | singularity2001 wrote:
       | Teaching others python.
        
         | reeboo wrote:
         | Underrated comment. At my place of work, I find this to be a
         | huge part of the MLE job. Everyone knows R but none of the
         | cloud tools have great R support.
        
       | exe34 wrote:
       | 90% of the time it's figuring out what data to feed into neural
       | networks, 2% of the time figure out stuff about neural networks
       | and the other 8% of the time figure out why on earth the recall
       | rate is 100%.
        
       | jackspawn wrote:
       | 50%+ of my time is spent on backend engineering because the ML is
       | used inside a bigger API.
       | 
       | I take responsibility for the end to end experience of said API,
       | so I will do whatever gives the best value per time spent. This
       | often has nothing to do with the ML models.
        
       | Xenoamorphous wrote:
       | I'm a regular software dev but I've had to do ML stuff by
       | necessity.
       | 
       | I wonder how "real" ML people deal with the stochastic/gradient
       | results and people's expectations.
       | 
       | If I do ordinary software work the thing either works or it
       | doesn't, and if it doesn't I can explain why and hopefully fix
       | it.
       | 
       | Now with ML I get asked "why did this text classifier not
       | classify this text correctly?" and all I can say is "it was 0.004
       | points away to meet the threshold", and "it didn't meet it
       | because of the particular choice of words or even their order"
       | which seems to leave everyone dissatisfied.
        
         | hkt wrote:
         | This seems to be the absolute worst of all worlds: the burden
         | of software engineering with the tools of an English Language
         | undergrad.
        
           | gopher_space wrote:
           | The English degree helps explain _why_ word choice and order
           | matter, giving you context and guidelines for software
           | design.
        
         | xtagon wrote:
         | Not all ML is built on neural nets. Genetic programming and
         | symbolic regression is fun because the resulting model is just
         | code, and software devs know how to read code.
        
           | nchfgsj1 wrote:
           | Genetic programming however isn't machine learning, but
           | instead it's an AI algorithm. An extremely interesting one as
           | well! It was fun to have my eyes opened after being taught
           | genetic algorithms, to then be brought into genetic
           | programming
        
           | aiforecastthway wrote:
           | Symbolic regression has the same failure mode; the reasons
           | why the model failed can be explained in a more digestible
           | way, but the actual truth of what happened is fundamentally
           | similar -- some coefficient was off by some amount and/or
           | some monomial beat out another in some optimization process.
           | 
           | At least with symbolic regression you can treat the model as
           | an analyzable entity from first principles theories. But
           | that's not really particularly relevant to most failure modes
           | in practice, which usually boil down to either missing some
           | qualitative change such as a bifurcation or else just
           | parameters being off by a bit. Or a little bit of A and a
           | little bit of B.
        
       | npalli wrote:
       | Clean data and try to get people to understand why they need to
       | have clean data.
        
       | rurban wrote:
       | Highly paid cleaning lady. With dirty data you get no proper
       | results. BTW: perl is much better than python on this.
       | 
       | Highly paid motherboard troubleshooter, because those all those
       | H100's really get hot, even with watercooling, and we have no
       | dedicated HW guy.
       | 
       | Fighting misbehaving third-party deps, as everyone else.
        
         | shoggouth wrote:
         | Could you talk more about "BTW: perl is much better than python
         | on this."?
        
           | eb0la wrote:
           | I haven't touched Perl in more than 20 years... ... but I
           | (routinely) miss something like:                  $variable =
           | something() if sanity_check()
           | 
           | And                  do_something() unless $dont_do_that
        
             | jononor wrote:
             | There exists a ternary if statement?
             | 
             | foo = something() if sanity_check else None
             | 
             | Can replace None with foo (or any other expression), if
             | desired.
        
       ___________________________________________________________________
       (page generated 2024-06-08 23:01 UTC)