[HN Gopher] Show HN: Collection of deep learning implementations...
___________________________________________________________________
Show HN: Collection of deep learning implementations with side-by-
side notes
Author : vpj
Score : 211 points
Date : 2021-01-30 09:27 UTC (13 hours ago)
(HTM) web link (nn.labml.ai)
(TXT) w3m dump (nn.labml.ai)
| sillysaurusx wrote:
| I thought of a change to gradient accumulation, which I call Adam
| accumulation:
|
| https://twitter.com/theshawwn/status/1355343951033602057
|
| https://news.ycombinator.com/item?id=25964420
|
| Unfortunately, no one seems to understand it, which isn't a great
| sign. I'm either not explaining it very well, or the idea doesn't
| make sense.
|
| In short: for example in batch: accum +=
| adam(gradients(example)) param += accum accum =
| 0
|
| That way, adam statistics are updated for every training example.
|
| Traditional gradient accumulation looks like this:
| for example in batch: accum += gradients(example)
| param += adam(accum) accum = 0
|
| ... which only updates Adam once.
|
| (It's equivalent to a bigger batch size.)
|
| Probably best to just implement Adam accumulation and see if it
| works, I suppose.
|
| (Sorry for rambling about this here. I was just hoping to find
| some prior work along these lines, if anyone knew of something.)
| koningrobot wrote:
| We don't compute per-example gradients, so in your second code
| snippet there would not be a loop across examples. We compute
| the batch-averaged gradient in the same time it would take to
| compute a single example's gradient, so it's much more
| efficient than your proposal, which is equivalent to using a
| batch size of 1.
| mpfundstein wrote:
| maybe you should show with some well designed experiments that
| your idea improves upon the current practice?
| axegon_ wrote:
| Something like this could be incredibly helpful with arxiv
| articles: being able to pin-point a fragment of text or a formula
| to the actual implementation. This could save so much time and
| ping-ponging between the article and the code.
| misiti3780 wrote:
| Anyone know if this exists for Autoencoders?
| ktpsns wrote:
| This style of documentation is called _literate programming_ ,
| and you should go and google about this term and the various
| implementations for various widespread programming languages if
| you never heard of this before. It's an eye-opener how clear,
| transparent and well-intertwined good code and comments can be.
|
| I've used such a literate programming style with scientific
| python once in university classes and it was a breeze to prepare
| and hand in exercise sheets (rendered with Latex to PDF). My
| feeling was that today people use Jupyter/IPython notebook to
| archieve something similar (especially with embedding results),
| but a jupyter notebook is much more complex than a traditional,
| clean and terminal-readable literate programming source code.
| dynamite-ready wrote:
| Unfashionable though. I strongly believe in working this way,
| but in a webdev environment, such work rarely gets through code
| review.
|
| The usual argument is that code should document itself, or
| something of the like.
|
| Data scientists uphold the standard admirably though, as you
| say.
| [deleted]
| vpj wrote:
| We used https://coffeescript.org/#literate for some products at
| https://forestpin.com and it was so much easier to maintain.
|
| One of the problems with notebooks for literate programming is
| that it kind of breaks down when you define a class or a long
| function. The entire code has to be within a single block.
| cinntaile wrote:
| You can define a class and then use it in a different block
| just fine, so it's not entirely clear to me what you're
| referring to here?
| vpj wrote:
| All of the class definition has to be in a single cell. You
| can't have notes within it only normal comments
| zelphirkalt wrote:
| Or you could use org-mode with org-babel in Emacs to get a
| great document allowing to use many different programming
| languages. The coffeescript URL already has an org in it.
| Coincidence? I think not! ;)
| codethief wrote:
| I don't think this qualifies as literate programming, at least
| not in my book. Yes, it looks nice and everything but from my
| point of view OP just moved regular code comments to a side
| bar.
|
| For instance, there are many comments that explain what values
| are stored in opaque-looking variables/parameters. But these
| are simply necessary (especially in data science, where almost
| everything is a tensor) and should be part of every decently
| documented program, anyway, and don't make this literate
| programming yet just because they're now in a sidebar.
|
| Besides, there are a lot of comments that 1) just repeat
| verbatim what the code does, e.g. Run through
| each layer for layer in self.layers:
| (https://nn.labml.ai/gan/cycle_gan.html)
|
| or that 2) explain what the code does because the code, at
| times, is unnecessarily hard to read and/or little modularized
| (no offense, OP).
|
| These types of comments, I'd say, belong to the sort of
| comments and documentation that can (and usually should) be
| avoided. They don't tell me more about the code than what the
| code already tells me (or should/could tell me) and thus are
| not what I would expect from literate programming, either.
|
| There are a whole lot of other types of comments that I _would_
| expect from literate programming, though, and these are mostly
| missing. There was an excellent article[0] a few years back by
| Salvatore Sanfilippo aka antirez (of Redis fame) where he
| identified the following useful comment types:
| - Function comments - Design comments - Why
| comments - Teacher comments - Checklist
| comments - Guide comments
|
| Now, the OP's code checks off one or two items on that list but
| only in parts and in a few places. Overall, looking at the many
| code snippets antirez's article presents from the Redis code
| base, I find Redis's style is much closer to my idea of
| literate programming than the OP's code.
|
| (Again, I hope OP is not taking offense to my comment. I am
| aware that they didn't claim they were doing literate
| programming.)
|
| [0]:
| https://web.archive.org/web/20191224044125/http://antirez.co...
|
| EDIT: When I wrote my comment, I hadn't looked at _all_ the
| pages /files and simply assumed the others I hadn't looked at
| would be similar. I've now noticed, though, that some of them
| do follow the literate programming style quite closely. Nice.
| :)
| timohear wrote:
| In their Transformer section they have implementations of:
| - kNN-LM: Generalization through Memorization - Feedback
| Transformer - Switch Transformer
|
| Which are all from recent, highly interesting papers
| sooheon wrote:
| The parent project, LabML, looks interesting. Anyone have any
| experience with how this stacks up against Weights and Biases?
___________________________________________________________________
(page generated 2021-01-30 23:01 UTC)