[HN Gopher] Pen and Paper Exercises in Machine Learning (2022)
       ___________________________________________________________________
        
       Pen and Paper Exercises in Machine Learning (2022)
        
       Author : ibobev
       Score  : 137 points
       Date   : 2025-03-21 20:07 UTC (2 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | antipaul wrote:
       | So who among current ML practitioners building "useful" ML could
       | solve some of these?
       | 
       | _Should they_ be able to?
        
         | jerrygenser wrote:
         | Nope, i don't think they should or need to be able to.
         | 
         | These exercises are useful for mathematical maturity which
         | results in intuition needed to develop novel algorithms or low
         | level optimizations.
         | 
         | Not needed to use existing train and deploy ML algorithms in
         | general.
        
         | danielmarkbruce wrote:
         | Depends on your definition of "ML practitioner", "building" and
         | "ML". Look at the section on optimization - some people have an
         | extremely good grasp of this and it helps them mentally iterate
         | through possible loss functions and possible ways to update
         | parameters and what can go wrong.
        
         | psyklic wrote:
         | Good news -- if you're not interested in extending state-of-
         | the-art and simply want to call APIs, you don't have to learn
         | ML deeply.
        
         | biotechbio wrote:
         | I am curious about the same thing. I worked as a ML engineer
         | for several years and have a couple of degrees in the field.
         | Skimming over the document, I recognized almost everything but
         | I would not be able to recall many of these topics if asked
         | without context, although at one time I might have been able
         | to.
         | 
         | What are others' general level of recall for this stuff? Am I a
         | charlatan who never was very good at math or is it just
         | expected that you will forget these things in time if you're
         | not using them regularly?
        
       | kingkongjaffa wrote:
       | complete with solutions, beautiful, thank you for sharing!
       | 
       | I'd be interested in more of these pen and paper exercises, if
       | there is such a term, for other topics.
        
         | blackbear_ wrote:
         | Not sure which other topics you mean, but "1000 exercises in
         | probability" should keep you busy for a while (one can find the
         | PDF online). For other math oriented riddles, check out "The
         | colossal book of short puzzles and problems" and "The art and
         | craft of problem solving"
        
       | axpy906 wrote:
       | Love it.
        
       | simojo wrote:
       | Very neat! Reminds me of Tom Yeh's "AI By Hand" exercises [0].
       | 
       | [0] https://www.byhand.ai/
        
         | Sysreq2 wrote:
         | This is what I was expecting. Very much appreciated. OP's paper
         | is good - but I sort of feel like it's singing to the choir.
         | It's a great resource if you already know the material.
        
       | S4M wrote:
       | Looks neat! My only criticism would be that the solutions are
       | given right after the questions so I couldn't help to read the
       | answer of one question before thinking it through by myself.
        
       | plants wrote:
       | This is really neat! I work in machine learning but still feel
       | imposter syndrome with my foundations with math (specifically
       | linear algebra and matrix/tensor operations). Does anyone have
       | any more good resources for problem sets with an emphasis on deep
       | learning foundational skills? I find I learn best if I do a bit
       | of hands-on work every day (and if I can learn things from
       | multiple teachers' perspectives)
        
       | lucasoshiro wrote:
       | Seems to be cool, but, one of thing that most annoys me on
       | studying machine learning is that I may dive as deep as it is
       | possible in theory, but I can't see how it connects to the
       | practice, i. e. how it makes me choose the correct number of
       | neurons in a layer, how many layers, the activation functions, if
       | I should use a neural network or other techniques, and so on...
       | 
       | If someone have something explaining that I'll be grateful
        
         | joshdavham wrote:
         | I think you're just more interested in the practical side of ML
         | which is totally fine!
         | 
         | I'm a bit skeptical of how much math and theory the average MLE
         | actually needs. Obviously they do need some, but how much? I'm
         | not sure.
         | 
         | But on the other hand, the theoreticians often need much more
         | math. Something like the SVM could only have been invented by a
         | math genius like Vapnik.
        
         | danielmarkbruce wrote:
         | Most things can't be learned via pure theory or pure practice.
         | Almost nothing related to work in the modern day can.
         | 
         | In ML not everything can be derived from theory. If it could,
         | we'd not have been so surprised by the performance of really
         | really large language models. At the same time, if you can't
         | reason about the math involved, you are going to have a
         | difficult time figuring why something isn't working or what
         | options you have - could be around architecture or loss
         | functions or choice of activation function or optimizer or
         | hyperparameters or training time/resources or a dozen other
         | things.
        
         | hansvm wrote:
         | > all of the above
         | 
         | NFL says something about it being a wash for arbitrary data.
         | All results are going to be tuned to assumptions we have about
         | our data in particular (not too many discontinuities,
         | sufficiently well sampled, ...).
         | 
         | > neurons in a layer, how many layers, ...
         | 
         | Scaling laws are, currently, empirically derived. From those
         | you can pick your goals (e.g., at most $X and maximize
         | accuracy) and work backward to one or more optimal sets of
         | parameters. Except in very restricted domains or with other
         | strong assumptions I haven't seen anything giving you more than
         | that.
         | 
         | > activation functions
         | 
         | All of the above about how it can't matter for arbitrary data
         | and how parameters need to be empirically derived apply.
         | However: An important inductive bias a lot of practitioners use
         | is that every weight in the model should be roughly equally
         | important. There are other ways you choose activation
         | functions, especially in specialized domains, but when
         | designing a deep network one of the most important things you
         | can do is control the magnitude of information at each level of
         | backpropagation. If your activation function (and surrounding
         | infrastructure) approximately handles that problem then it's
         | probably good enough.
         | 
         | > neural network or other techniques
         | 
         | For almost every problem you're better off using something
         | other than a neural network (like catboost). I don't have any
         | good intuition for why that's the case. Test them both. That's
         | what the validation dataset is for.
         | 
         | > how it connects to the practice
         | 
         | For this article in particular, it doesn't connect to a ton of
         | what I personally do. I'm sure it resonates with someone. As
         | soon as pytorch or jax or whatever isn't good enough though and
         | you have to go implement stuff from scratch, you need a deep
         | dive in the theory you're implementing. To a lesser degree, if
         | you're interfacing with big frameworks nontrivially or working
         | around their limitations, you still need a deep understanding
         | of the things you're implementing.
         | 
         | Imagine, e.g., that you want all the modern ML tools in a world
         | where dynamic allocation, virtual functions, and all that
         | garbage aren't tractable. You can resoundedly beat every human
         | heuristic for phantom touchpad events in your mouse driver with
         | a tiny neural network, but you can't use pytorch to do it
         | without turning your laptop into a space heater.
         | 
         | Embedded devices aren't the only scenario where you might have
         | to venture off the beaten path. Much like the age-old argument
         | of importing a data structure vs writing your own, as soon as
         | you have requirements beyond what the library author provides
         | it's often worth it to do the whole thing on your own, and it
         | takes a firm theoretical foundation to do so swiftly and
         | correctly.
         | 
         | > how it connects to practice
         | 
         | That's a criticism I have of a lot of educational materials.
         | Connecting the dots is important in writing (competing with all
         | the advantages of brevity).
         | 
         | Pick on the Model-Based Learning section as an example. We're
         | asked, to start, to MLE a gaussian. (M)aximum (L)ikelihood
         | (E)stimation is an extremely important concept, and a lot of ML
         | practitioners throw it to the side.
         | 
         | Imagine, e.g., a 2-stage process where for each price bracket
         | you have a model reporting the likelihood of conversion and
         | then a second stage where you synthesize those predictions into
         | an optimal strategy. Common failure modes include (a)
         | mishandling variance, (b) assuming that MLE on each of the
         | models allows you to combine the mean/mode/... results into an
         | MLE composite action, (c) really an extension of [b], but if
         | you have the wrong loss function for your model(s) then they
         | aren't meaningfully combinable, ....
         | 
         | Something that should be obvious (predict conversion rates,
         | combine those rates to determine what you should do) has tons
         | of pitfalls if you don't holistically reason about the
         | composite process. That's perhaps a failure in the primitives
         | we use to construct those composite processes, but in today's
         | day and age it's still something you have to consider.
         | 
         | How does the book connect? I dunno. It looks more like a "kata"
         | (keep your fundamental skills sharp) than anything else. An
         | explicit connection to some real-world problem might make it
         | more tractable.
        
         | incognito124 wrote:
         | Maybe this will help:
         | 
         | https://github.com/google-research/tuning_playbook
        
         | moffkalast wrote:
         | > how it makes me choose the correct number of neurons in a
         | layer, how many layers, the activation function
         | 
         | Seeing massive ablation studies on each one of those in just
         | about every ML paper should be fairly indicative that nobody
         | knows shit about fuck when it comes to that. Just people trying
         | things out randomly and seeing what works, copying ideas from
         | each other. It's the worst field if you want things to be
         | logical and explainable. It's mostly labelling datasets, paying
         | for compute and hoping for the best.
        
       | dang wrote:
       | Discussed at the time:
       | 
       |  _Pen and paper exercises in machine learning (2021)_ -
       | https://news.ycombinator.com/item?id=31913057 - June 2022 (55
       | comments)
        
       | imranq wrote:
       | If someone could turn these into an adaptive Khan Academy style
       | app, that would be incredible
        
       ___________________________________________________________________
       (page generated 2025-03-21 23:00 UTC)