[HN Gopher] Pen and Paper Exercises in Machine Learning (2022)
___________________________________________________________________
Pen and Paper Exercises in Machine Learning (2022)
Author : ibobev
Score : 137 points
Date : 2025-03-21 20:07 UTC (2 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| antipaul wrote:
| So who among current ML practitioners building "useful" ML could
| solve some of these?
|
| _Should they_ be able to?
| jerrygenser wrote:
| Nope, i don't think they should or need to be able to.
|
| These exercises are useful for mathematical maturity which
| results in intuition needed to develop novel algorithms or low
| level optimizations.
|
| Not needed to use existing train and deploy ML algorithms in
| general.
| danielmarkbruce wrote:
| Depends on your definition of "ML practitioner", "building" and
| "ML". Look at the section on optimization - some people have an
| extremely good grasp of this and it helps them mentally iterate
| through possible loss functions and possible ways to update
| parameters and what can go wrong.
| psyklic wrote:
| Good news -- if you're not interested in extending state-of-
| the-art and simply want to call APIs, you don't have to learn
| ML deeply.
| biotechbio wrote:
| I am curious about the same thing. I worked as a ML engineer
| for several years and have a couple of degrees in the field.
| Skimming over the document, I recognized almost everything but
| I would not be able to recall many of these topics if asked
| without context, although at one time I might have been able
| to.
|
| What are others' general level of recall for this stuff? Am I a
| charlatan who never was very good at math or is it just
| expected that you will forget these things in time if you're
| not using them regularly?
| kingkongjaffa wrote:
| complete with solutions, beautiful, thank you for sharing!
|
| I'd be interested in more of these pen and paper exercises, if
| there is such a term, for other topics.
| blackbear_ wrote:
| Not sure which other topics you mean, but "1000 exercises in
| probability" should keep you busy for a while (one can find the
| PDF online). For other math oriented riddles, check out "The
| colossal book of short puzzles and problems" and "The art and
| craft of problem solving"
| axpy906 wrote:
| Love it.
| simojo wrote:
| Very neat! Reminds me of Tom Yeh's "AI By Hand" exercises [0].
|
| [0] https://www.byhand.ai/
| Sysreq2 wrote:
| This is what I was expecting. Very much appreciated. OP's paper
| is good - but I sort of feel like it's singing to the choir.
| It's a great resource if you already know the material.
| S4M wrote:
| Looks neat! My only criticism would be that the solutions are
| given right after the questions so I couldn't help to read the
| answer of one question before thinking it through by myself.
| plants wrote:
| This is really neat! I work in machine learning but still feel
| imposter syndrome with my foundations with math (specifically
| linear algebra and matrix/tensor operations). Does anyone have
| any more good resources for problem sets with an emphasis on deep
| learning foundational skills? I find I learn best if I do a bit
| of hands-on work every day (and if I can learn things from
| multiple teachers' perspectives)
| lucasoshiro wrote:
| Seems to be cool, but, one of thing that most annoys me on
| studying machine learning is that I may dive as deep as it is
| possible in theory, but I can't see how it connects to the
| practice, i. e. how it makes me choose the correct number of
| neurons in a layer, how many layers, the activation functions, if
| I should use a neural network or other techniques, and so on...
|
| If someone have something explaining that I'll be grateful
| joshdavham wrote:
| I think you're just more interested in the practical side of ML
| which is totally fine!
|
| I'm a bit skeptical of how much math and theory the average MLE
| actually needs. Obviously they do need some, but how much? I'm
| not sure.
|
| But on the other hand, the theoreticians often need much more
| math. Something like the SVM could only have been invented by a
| math genius like Vapnik.
| danielmarkbruce wrote:
| Most things can't be learned via pure theory or pure practice.
| Almost nothing related to work in the modern day can.
|
| In ML not everything can be derived from theory. If it could,
| we'd not have been so surprised by the performance of really
| really large language models. At the same time, if you can't
| reason about the math involved, you are going to have a
| difficult time figuring why something isn't working or what
| options you have - could be around architecture or loss
| functions or choice of activation function or optimizer or
| hyperparameters or training time/resources or a dozen other
| things.
| hansvm wrote:
| > all of the above
|
| NFL says something about it being a wash for arbitrary data.
| All results are going to be tuned to assumptions we have about
| our data in particular (not too many discontinuities,
| sufficiently well sampled, ...).
|
| > neurons in a layer, how many layers, ...
|
| Scaling laws are, currently, empirically derived. From those
| you can pick your goals (e.g., at most $X and maximize
| accuracy) and work backward to one or more optimal sets of
| parameters. Except in very restricted domains or with other
| strong assumptions I haven't seen anything giving you more than
| that.
|
| > activation functions
|
| All of the above about how it can't matter for arbitrary data
| and how parameters need to be empirically derived apply.
| However: An important inductive bias a lot of practitioners use
| is that every weight in the model should be roughly equally
| important. There are other ways you choose activation
| functions, especially in specialized domains, but when
| designing a deep network one of the most important things you
| can do is control the magnitude of information at each level of
| backpropagation. If your activation function (and surrounding
| infrastructure) approximately handles that problem then it's
| probably good enough.
|
| > neural network or other techniques
|
| For almost every problem you're better off using something
| other than a neural network (like catboost). I don't have any
| good intuition for why that's the case. Test them both. That's
| what the validation dataset is for.
|
| > how it connects to the practice
|
| For this article in particular, it doesn't connect to a ton of
| what I personally do. I'm sure it resonates with someone. As
| soon as pytorch or jax or whatever isn't good enough though and
| you have to go implement stuff from scratch, you need a deep
| dive in the theory you're implementing. To a lesser degree, if
| you're interfacing with big frameworks nontrivially or working
| around their limitations, you still need a deep understanding
| of the things you're implementing.
|
| Imagine, e.g., that you want all the modern ML tools in a world
| where dynamic allocation, virtual functions, and all that
| garbage aren't tractable. You can resoundedly beat every human
| heuristic for phantom touchpad events in your mouse driver with
| a tiny neural network, but you can't use pytorch to do it
| without turning your laptop into a space heater.
|
| Embedded devices aren't the only scenario where you might have
| to venture off the beaten path. Much like the age-old argument
| of importing a data structure vs writing your own, as soon as
| you have requirements beyond what the library author provides
| it's often worth it to do the whole thing on your own, and it
| takes a firm theoretical foundation to do so swiftly and
| correctly.
|
| > how it connects to practice
|
| That's a criticism I have of a lot of educational materials.
| Connecting the dots is important in writing (competing with all
| the advantages of brevity).
|
| Pick on the Model-Based Learning section as an example. We're
| asked, to start, to MLE a gaussian. (M)aximum (L)ikelihood
| (E)stimation is an extremely important concept, and a lot of ML
| practitioners throw it to the side.
|
| Imagine, e.g., a 2-stage process where for each price bracket
| you have a model reporting the likelihood of conversion and
| then a second stage where you synthesize those predictions into
| an optimal strategy. Common failure modes include (a)
| mishandling variance, (b) assuming that MLE on each of the
| models allows you to combine the mean/mode/... results into an
| MLE composite action, (c) really an extension of [b], but if
| you have the wrong loss function for your model(s) then they
| aren't meaningfully combinable, ....
|
| Something that should be obvious (predict conversion rates,
| combine those rates to determine what you should do) has tons
| of pitfalls if you don't holistically reason about the
| composite process. That's perhaps a failure in the primitives
| we use to construct those composite processes, but in today's
| day and age it's still something you have to consider.
|
| How does the book connect? I dunno. It looks more like a "kata"
| (keep your fundamental skills sharp) than anything else. An
| explicit connection to some real-world problem might make it
| more tractable.
| incognito124 wrote:
| Maybe this will help:
|
| https://github.com/google-research/tuning_playbook
| moffkalast wrote:
| > how it makes me choose the correct number of neurons in a
| layer, how many layers, the activation function
|
| Seeing massive ablation studies on each one of those in just
| about every ML paper should be fairly indicative that nobody
| knows shit about fuck when it comes to that. Just people trying
| things out randomly and seeing what works, copying ideas from
| each other. It's the worst field if you want things to be
| logical and explainable. It's mostly labelling datasets, paying
| for compute and hoping for the best.
| dang wrote:
| Discussed at the time:
|
| _Pen and paper exercises in machine learning (2021)_ -
| https://news.ycombinator.com/item?id=31913057 - June 2022 (55
| comments)
| imranq wrote:
| If someone could turn these into an adaptive Khan Academy style
| app, that would be incredible
___________________________________________________________________
(page generated 2025-03-21 23:00 UTC)