[HN Gopher] Show HN: Collection of deep learning implementations...
       ___________________________________________________________________
        
       Show HN: Collection of deep learning implementations with side-by-
       side notes
        
       Author : vpj
       Score  : 211 points
       Date   : 2021-01-30 09:27 UTC (13 hours ago)
        
 (HTM) web link (nn.labml.ai)
 (TXT) w3m dump (nn.labml.ai)
        
       | sillysaurusx wrote:
       | I thought of a change to gradient accumulation, which I call Adam
       | accumulation:
       | 
       | https://twitter.com/theshawwn/status/1355343951033602057
       | 
       | https://news.ycombinator.com/item?id=25964420
       | 
       | Unfortunately, no one seems to understand it, which isn't a great
       | sign. I'm either not explaining it very well, or the idea doesn't
       | make sense.
       | 
       | In short:                 for example in batch:         accum +=
       | adam(gradients(example))            param += accum       accum =
       | 0
       | 
       | That way, adam statistics are updated for every training example.
       | 
       | Traditional gradient accumulation looks like this:
       | for example in batch:         accum += gradients(example)
       | param += adam(accum)       accum = 0
       | 
       | ... which only updates Adam once.
       | 
       | (It's equivalent to a bigger batch size.)
       | 
       | Probably best to just implement Adam accumulation and see if it
       | works, I suppose.
       | 
       | (Sorry for rambling about this here. I was just hoping to find
       | some prior work along these lines, if anyone knew of something.)
        
         | koningrobot wrote:
         | We don't compute per-example gradients, so in your second code
         | snippet there would not be a loop across examples. We compute
         | the batch-averaged gradient in the same time it would take to
         | compute a single example's gradient, so it's much more
         | efficient than your proposal, which is equivalent to using a
         | batch size of 1.
        
         | mpfundstein wrote:
         | maybe you should show with some well designed experiments that
         | your idea improves upon the current practice?
        
       | axegon_ wrote:
       | Something like this could be incredibly helpful with arxiv
       | articles: being able to pin-point a fragment of text or a formula
       | to the actual implementation. This could save so much time and
       | ping-ponging between the article and the code.
        
       | misiti3780 wrote:
       | Anyone know if this exists for Autoencoders?
        
       | ktpsns wrote:
       | This style of documentation is called _literate programming_ ,
       | and you should go and google about this term and the various
       | implementations for various widespread programming languages if
       | you never heard of this before. It's an eye-opener how clear,
       | transparent and well-intertwined good code and comments can be.
       | 
       | I've used such a literate programming style with scientific
       | python once in university classes and it was a breeze to prepare
       | and hand in exercise sheets (rendered with Latex to PDF). My
       | feeling was that today people use Jupyter/IPython notebook to
       | archieve something similar (especially with embedding results),
       | but a jupyter notebook is much more complex than a traditional,
       | clean and terminal-readable literate programming source code.
        
         | dynamite-ready wrote:
         | Unfashionable though. I strongly believe in working this way,
         | but in a webdev environment, such work rarely gets through code
         | review.
         | 
         | The usual argument is that code should document itself, or
         | something of the like.
         | 
         | Data scientists uphold the standard admirably though, as you
         | say.
        
         | [deleted]
        
         | vpj wrote:
         | We used https://coffeescript.org/#literate for some products at
         | https://forestpin.com and it was so much easier to maintain.
         | 
         | One of the problems with notebooks for literate programming is
         | that it kind of breaks down when you define a class or a long
         | function. The entire code has to be within a single block.
        
           | cinntaile wrote:
           | You can define a class and then use it in a different block
           | just fine, so it's not entirely clear to me what you're
           | referring to here?
        
             | vpj wrote:
             | All of the class definition has to be in a single cell. You
             | can't have notes within it only normal comments
        
           | zelphirkalt wrote:
           | Or you could use org-mode with org-babel in Emacs to get a
           | great document allowing to use many different programming
           | languages. The coffeescript URL already has an org in it.
           | Coincidence? I think not! ;)
        
         | codethief wrote:
         | I don't think this qualifies as literate programming, at least
         | not in my book. Yes, it looks nice and everything but from my
         | point of view OP just moved regular code comments to a side
         | bar.
         | 
         | For instance, there are many comments that explain what values
         | are stored in opaque-looking variables/parameters. But these
         | are simply necessary (especially in data science, where almost
         | everything is a tensor) and should be part of every decently
         | documented program, anyway, and don't make this literate
         | programming yet just because they're now in a sidebar.
         | 
         | Besides, there are a lot of comments that 1) just repeat
         | verbatim what the code does, e.g.                   Run through
         | each layer                   for layer in self.layers:
         | (https://nn.labml.ai/gan/cycle_gan.html)
         | 
         | or that 2) explain what the code does because the code, at
         | times, is unnecessarily hard to read and/or little modularized
         | (no offense, OP).
         | 
         | These types of comments, I'd say, belong to the sort of
         | comments and documentation that can (and usually should) be
         | avoided. They don't tell me more about the code than what the
         | code already tells me (or should/could tell me) and thus are
         | not what I would expect from literate programming, either.
         | 
         | There are a whole lot of other types of comments that I _would_
         | expect from literate programming, though, and these are mostly
         | missing. There was an excellent article[0] a few years back by
         | Salvatore Sanfilippo aka antirez (of Redis fame) where he
         | identified the following useful comment types:
         | - Function comments         - Design comments         - Why
         | comments         - Teacher comments         - Checklist
         | comments         - Guide comments
         | 
         | Now, the OP's code checks off one or two items on that list but
         | only in parts and in a few places. Overall, looking at the many
         | code snippets antirez's article presents from the Redis code
         | base, I find Redis's style is much closer to my idea of
         | literate programming than the OP's code.
         | 
         | (Again, I hope OP is not taking offense to my comment. I am
         | aware that they didn't claim they were doing literate
         | programming.)
         | 
         | [0]:
         | https://web.archive.org/web/20191224044125/http://antirez.co...
         | 
         | EDIT: When I wrote my comment, I hadn't looked at _all_ the
         | pages /files and simply assumed the others I hadn't looked at
         | would be similar. I've now noticed, though, that some of them
         | do follow the literate programming style quite closely. Nice.
         | :)
        
       | timohear wrote:
       | In their Transformer section they have implementations of:
       | - kNN-LM: Generalization through Memorization        - Feedback
       | Transformer        - Switch Transformer
       | 
       | Which are all from recent, highly interesting papers
        
       | sooheon wrote:
       | The parent project, LabML, looks interesting. Anyone have any
       | experience with how this stacks up against Weights and Biases?
        
       ___________________________________________________________________
       (page generated 2021-01-30 23:01 UTC)