[HN Gopher] Neural Networks: Zero to Hero
___________________________________________________________________
Neural Networks: Zero to Hero
Author : whereistimbo
Score : 243 points
Date : 2023-04-05 19:12 UTC (3 hours ago)
(HTM) web link (karpathy.ai)
(TXT) w3m dump (karpathy.ai)
| whiplash451 wrote:
| Andrej's course is brilliant and so nice to follow.
|
| His explanation of attention is the most accessible I have ever
| seen.
| spaceman_2020 wrote:
| A little offtopic, but is this something that someone with only
| webdev experience can get started with?
| nomel wrote:
| Getting started with it takes exactly zero experience. Being
| productive in it does, but that's unrelated to the starting
| point, and shouldn't discourage you, if you really want to do
| it.
|
| There are several open courses, online.
| jfisher4024 wrote:
| It might make sense to get a handle on Python first
| thealchemistdev wrote:
| [dead]
| mhh__ wrote:
| The tools are basically irrelevant conceptually. Its all about
| the mathematics
| whiplash451 wrote:
| You need some fluency in python and basic knowledge of algebra
| (matrix multiplication etc.)
|
| If you want, you can also start with the first lessons of
| course.fast.ai
| yacine_ wrote:
| What I appreciate about karpathy's videos is that it doesn't make
| things any more complicated than they need to be. Simple,
| engineering language is used. No gatekeeping! It's reassuring,
| and lets everyone know that anyone can do it.
|
| Thanks karpathy!
| bilsbie wrote:
| I just don't know what he means by logits. Everything else
| seems like straightforward language.
| airstrike wrote:
| Having not watched the series, I can only assume he means
| logit as in a probability function from 0 to 1
|
| https://deepai.org/machine-learning-glossary-and-
| terms/logit....
| joshvm wrote:
| When people mention logits, they're usually referring to the
| raw output of the model before it gets transformed/normalised
| into a probability distribution (i.e. sums to 1, range
| [0,1]). Logits can take any value. The naming might not be
| mathematically strict, because it assumes(?) that you're
| going to apply softmax (which interprets the output of the
| model as logits), but that's how the term is used.
|
| For example in many classification problems you get a 1D
| vector of logits from the final layer, you apply softmax to
| normalise, then argmax to extract the predicted class. It
| extends to other tasks like semantic segmentation (predict
| pixel classes) where the "logit" output is the same size as
| the image with a channel for each class and you apply the
| same process to get a single channel image with class-per-
| pixel.
|
| Here's a nice explanation:
| https://stackoverflow.com/a/66804099/395457
| [deleted]
| sourcecodeplz wrote:
| This is really cool and I am so glad my math teacher was a hard
| ball and I still remember some Calculus.
|
| edit: Python really was/is made for this
| numbers/calculation/visualization thing. Kinda kicking myself now
| for not investing more in it and sticking with PHP, although PHP
| has its merits when building different things, Python is a beast
| with numbers.
| sourcecodeplz wrote:
| I am at graphwiz now and it is getting better and better.
|
| Also using ChatGTP to ask questions where I don't get
| something.
|
| Wow what a time we live in to learn things.
| 7373737373 wrote:
| This was the first time I actually _grokked_ backpropagation,
| just the first video alone is more lucid and valuable than any
| other resource about machine learning I had seen before, in fact
| it 's so well explained that i managed to implement the library
| almost completely from memory after watching it - I cannot
| recommend it highly enough, especially for programmers without a
| math background!
|
| The only aspect I could see being non-ideal for some is that it
| uses some Python-specific cleverness/advanced syntax and
| semantics (__call__(), list comprehensions with two for's,
| **kwargs, __add__, __repr__, subclasses, (nested) functions as
| variables etc.), but if you are familiar with these it might seem
| more compact and elegant as well.
| whiplash451 wrote:
| To be fair, the older Andrew Ng's online course was also
| fantastic to explain backprop.
|
| But this does not remove any credit to Andrej's class.
| agentofoblivion wrote:
| Wonderful! Just went through the GPT video the other day and it
| was great. Andrej has a talent for pedagogy via simplification.
| yuuuuyu wrote:
| And he presents all this stuff with humility. Many people that
| present are just showing off and are pretty much full of
| themselves. I suppose they need the ego boost, who knows. But
| Andrej could be the nice guy next door in the dorm who is
| studying the same course as you, just that hr is a lecture or
| two ahead. (Until you figure out he is the former VP of AI at
| Tesla or whatever his title ended up being before he left.)
|
| I can even recommend his interview with Lex Fridman.
| Yajirobe wrote:
| He also is quite good at teaching and solving the Rubik's
| cube
| meling wrote:
| Absolutely agree with this.
| 0cf8612b2e1e wrote:
| Only finished the first video, but he even made two minor
| blunders in his code, but kept the footage. Really helps your
| confidence when you see a pro make a mistake rather than a
| perfectly polished but unattainable ideal standard.
| frankcort wrote:
| Which GPT Video?
| nomel wrote:
| It can be found on the sites home page.
|
| Let's build GPT: from scratch, in code, spelled out:
| https://www.youtube.com/watch?v=kCc8FmEb1nY
| frankcort wrote:
| Thank you!
| abraxas wrote:
| He is a master educator. While at Stanford he developed their
| undergrad machine learning intro course named cs231n which
| immediately became legendary. It's somewhat out of date on some
| details but it's still well worth watching especially as
| delivered by Andrej. You can find all 11 lectures on YouTube.
| auggierose wrote:
| This course together with the new fastai ones [1] seem to be
| exactly what I was looking for. The micrograd video is excellent.
|
| [1] https://course.fast.ai
| zakki wrote:
| Anybody tried the lessons in an Apple MBA/MBP M1/M2? Is it easily
| applicable?
| sirodoht wrote:
| I'm doing an ML apprenticeship [1] these weeks and Karpathy's
| videos are part of it. We've been deep down into them. I found
| them excellent. All concepts he illustrates are crystal clear in
| his mind (even though they are complicated concepts themselves)
| and that shows in his explanations.
|
| Also, the way he builds up everything is magnificent. Starting
| from basic python classes, to derivatives and gradient descent,
| to micrograd [2] and then from a bigram counting model [3] to
| makemore [4] and nanoGPT [5]
|
| [1]: https://www.foundersandcoders.com/ml
|
| [2]: https://github.com/karpathy/micrograd
|
| [3]:
| https://github.com/karpathy/randomfun/blob/master/lectures/m...
|
| [4]: https://github.com/karpathy/makemore
|
| [5]: https://github.com/karpathy/nanoGPT
| bilsbie wrote:
| Do you run the code as you watch?
|
| I've been simply watching them on a palm from a hammock and I'm
| worried I'm not getting the full experience.
| sirodoht wrote:
| I've found that actually running the code has been very
| beneficial in understanding. This, along with reasoning for
| each line of code and spending a lot of time with the video
| paused and discussing and explaining to each other what we
| understood.
| whiplash451 wrote:
| Same. I also found the exercises to be useful.
| jimsparkman wrote:
| That program sounds quite impressive, I wonder if any
| equivalencies exist in the US?
| sirodoht wrote:
| The website doesn't say what--for me--is the best thing about
| it. The course is peer-led which works like this: once your
| join, you're part of a team which has one objective: get the
| best score with your ML recommendation system.
|
| There is simulated environment in which all teams of the
| cohort receive millions of requests per day (and hundreds of
| thousands of users and items) and you have to build out your
| infrastructure on an EC2 instance, build a basic model, and
| then iteratively improve on it. Imagine a simulated
| facebook/youtube/tiktok-style system where you aim for the
| best uptime and the best recommendations!
___________________________________________________________________
(page generated 2023-04-05 23:00 UTC)