[HN Gopher] Composability in Julia: Implementing Deep Equilibriu...
       ___________________________________________________________________
        
       Composability in Julia: Implementing Deep Equilibrium Models via
       Neural ODEs
        
       Author : ChrisRackauckas
       Score  : 90 points
       Date   : 2021-10-21 14:44 UTC (8 hours ago)
        
 (HTM) web link (julialang.org)
 (TXT) w3m dump (julialang.org)
        
       | tomtomftomtom wrote:
       | Fauci funded COVID-19: https://www.zerohedge.com/covid-19/nih-
       | admits-funding-gain-f...
        
       | Tarrosion wrote:
       | I've been part of and/or following the Julia community since
       | 2015, and Julia is my favorite programming language by a wide
       | margin. Seems like every two months there's a new blog post,
       | usually with some of these folks as authors, that
       | describes...something...to do with ODEs, machine learning, neural
       | ODEs, GPUs, adjoints, scientific machine learning, ...
       | 
       | I have never once been able to follow one of these blog posts.
       | Seems like universally these posts have terrible curse of
       | knowledge [1]. To be fair, I know only a light-to-moderate amount
       | about machine learning and very little about differential
       | equations, but still-- I'm a long-term Julia fan, professional
       | data scientist, mathy PhD, would hope that's at least table
       | stakes. Maybe I just need to...try harder? I wonder how many
       | people would be excited but are totally lost by these posts.
       | 
       | To that point, if anybody has a recommendation for a gentle
       | introduction to these topics (preferably Julia), I'd be most
       | appreciative.
       | 
       | [1] https://en.wikipedia.org/wiki/Curse_of_knowledge
        
         | uoaei wrote:
         | > but still-- I'm a long-term Julia fan, professional data
         | scientist, mathy PhD, would hope that's at least table stakes.
         | 
         | I don't understand this part. Are you saying you're a
         | "professional data scientist" with a "mathy PhD" who has a
         | "light-to-moderate amount" of machine learning knowledge? How
         | did you get the job?
         | 
         | I would expect anyone with a mathy PhD to understand ODEs and
         | PDEs, and neural ODEs are commonly understood (by those who
         | read the papers, where I think it is an appropriate assumption
         | that people who want to understand this stuff would do) to be
         | effectively infinite-depth neural networks where every layer
         | represents the same function.
        
           | diskzero wrote:
           | Have you been in interview loops or worked with bread and
           | butter data scientists performing common tasks? I am curious
           | what your view of what most data scientists do day in and day
           | out?
        
             | uoaei wrote:
             | What about tasks being common makes a light-to-moderate
             | understanding of machine learning sufficient?
             | 
             | Processes initiated by data scientists during the execution
             | of their role will tend to fail silently. What is meant
             | here is, throwing an inappropriate model at otherwise good
             | data produces unreliable (catastrophic in certain
             | situations) results, but produces results nonetheless.
             | Without the proper discernment of the reliability of the
             | results, we have an unequivocal failure to execute the
             | role. This is the oft-unmentioned companion to, but
             | decidedly more insidious than, the "garbage in, garbage
             | out" (i.e., right model, wrong data) aphorism.
             | 
             | It is up to the person performing this operation to deduce
             | whether or not the conclusions are trustworthy. I don't see
             | how someone can be confident of this without either relying
             | on a pre-defined workflow verified by someone else
             | qualified to assess the consequences, or to have those
             | qualifications themselves.
             | 
             | What follows is a contrived example, but illustrative of
             | the problem:
             | 
             | Consider e.g. user privacy: it is by now well-known that
             | e.g. embedding vectors (or even merely the relationships
             | between them) can leak a lot of information about the
             | person or object it represents. It is not enough to
             | understand how the forward pass of such a model commences,
             | but also what is stored in those representations, which,
             | having gone through a master's with quite a few people who
             | now call themselves data scientists, I am not confident is
             | commonly understood.
        
           | eigenspace wrote:
           | This is a pretty shitty, non-constructive response. I think
           | neural differential equations are not easy to wrap one's head
           | around even if you have a solid understand of deep learning
           | and differential equations.
           | 
           | Sure, if they spent a lot of time wading through the
           | literature they'd probably understand fine, but the point
           | they were making was that the post was quite unapproachable
           | without having delved into the specific literature on neural
           | differential equations.
           | 
           | I think this is a reasonably valid complaint, and does not
           | warrant you implying that they don't deserve their job.
        
             | uoaei wrote:
             | Valid complaint how? If I have never studied carcinogenesis
             | why should I believe I should understand the description of
             | a new treatment for bone marrow cancer?
             | 
             | The way any article is written reflects the audience it is
             | suited for. If this article was intended for people
             | unfamiliar with neural ODEs they would have put more effort
             | into writing it in a suitable way.
        
             | orbifold wrote:
             | I think it would have helped if the people writing that
             | paper had not confused the issue by introducing a new name
             | for something that is a well known thing in optimal control
             | and had been invented even before neural networks, namely
             | adjoint sensitivity analysis. There even appear multilayer
             | networks of switching components in Pontryagin's book on
             | the subject.
        
         | agumonkey wrote:
         | What's funny is how most of them are revolving around a similar
         | culture of concepts (ODE, automated differentiation, analysis)
         | which is very unlike mainstream computing.
         | 
         | ps: I need to read those adjoints articles.
        
       | adgjlsfhk1 wrote:
       | This is a really cool idea, especially because it is a type of NN
       | that can take more time for harder inputs. This makes it
       | relatively unique since most types of NN have O(1) runtime, which
       | is often nice, but puts limits on the types of problems they can
       | solve.
        
       | bomz24 wrote:
       | NIH Admits Fauci lied to congress and funded covid-19:
       | 
       | https://theintercept.com/2021/10/21/virus-mers-wuhan-experim...
        
       | grzff wrote:
       | Fauci funded COVID-19: https://www.zerohedge.com/covid-19/nih-
       | admits-funding-gain-f...
       | 
       | Everyone involved should face the firing squad.
        
       | cs702 wrote:
       | _Very_ cool. I have only one question:
       | 
       | Has anyone successfully applied DEQs to larger-scale cognitive
       | tasks or benchmarks, as opposed to MNIST, which is a tiny trivial
       | task by today's standards?
       | 
       | Think ImageNet-1000, COCO, LVIS, WMT language translation, ...,
       | SuperGLUE. There's a long list of datasets and benchmarks that
       | regular boring fixed-depth NNs tackle with remarkable ease these
       | days.
       | 
       | Has anyone anywhere applied DEQs to _any_ of those datasets  /
       | benchmarks?
        
         | avikpal1410 wrote:
         | MDEQ work applies DEQ to some of the large scale benchmarks you
         | mention: https://arxiv.org/pdf/2006.08656.pdf
        
           | cs702 wrote:
           | Thank you! The results don't look that great (e.g.,
           | EfficientNet models achieve greater accuracy on ImageNet-1000
           | with ~5x fewer parameters), but the works looks interesting
           | and worthwhile. I'll take a look.
        
       | jstx1 wrote:
       | What are some good references on neural ODEs that don't come from
       | the Julia community? I'm looking for theory and applications -
       | when are they good and who is using them for what?
       | 
       | I'm asking for sources outside of Julia because I find the
       | coupling of algorithm types to tools kind of strange and the
       | whole SciML trend is kind of opaque to me. (Are people applying
       | ML as a solution to newer problems? Are they using new approaches
       | to solve ML problems? How legit is the whole thing? I just don't
       | know.)
        
         | ssivark wrote:
         | The original Neural ODEs paper is quite readable, and by now
         | there are loads of blog posts and even a few talks on the
         | subject.
         | 
         | The basic idea is inspired by the "adjoint method" for ODE
         | solving (so you don't have to hold in memory all the
         | intermediate layer outputs -- which is otherwise necessary to
         | compute the backpropagated gradient signal).
        
           | ChrisRackauckas wrote:
           | Yeah, though with the method described in that paper you do
           | have to be very careful since it has exponential error growth
           | with the Lipchitz constants of the ODE. See
           | https://aip.scitation.org/doi/10.1063/5.0060697 for details.
           | But that is generally the case in numerical analysis: there's
           | always a simple way to do things, and then there's the way
           | that prevents error growths. Both have different pros and
           | cons.
        
         | UncleOxidant wrote:
         | There's some info from the Pyhon/Pytorch camp:
         | https://towardsdatascience.com/neural-odes-with-pytorch-ligh...
         | 
         | I suspect neural ODE work was done in Julia earlier because it
         | was easier given some language features and libraries. But
         | there does seem to be some work on neural ODEs in
         | Python/Pytorch.
        
         | orbifold wrote:
         | Neural ODEs are essentially a rebranding of adjoint sensitivity
         | analysis, which has been around in various forms in established
         | solver suites, such as Sundials, PETSC, etc. The machine
         | learning community got a hold of it, cited one book and
         | otherwise happily reinvented everything.
        
         | adgjlsfhk1 wrote:
         | What do you mean by "the coupling of algorithm types to tools"
         | (not judging, just curious).
        
           | jstx1 wrote:
           | More bluntly my question is if SciML is that good, why aren't
           | more people doing it yet? Why is it limited to a small group
           | of Julia developers and packages?
           | 
           | (There are good possible explanations - it could be very new,
           | have only niche applications, Julia is somehow uniquely
           | suited for it etc. I don't know)
        
             | krastanov wrote:
             | Some minor clarifications: NeuralODEs are not a Julia
             | invention. I am pretty sure the first papers on the topic
             | were using a python package implementing a rather crude ODE
             | solver in torch or tensorflow. Julia just happens to be
             | light years ahead of any other tool when it comes to
             | solving ODEs, while having many high-quality
             | autodifferentiation packages as well, so it feels natural
             | to use it for these problems. But more importantly, SciML
             | is not just for your typical Machine Learning tasks: being
             | able to solve ODEs and have autodiff over them is
             | incredibly empowering for boring old science and
             | engineering, and SciML has become one of the most popular
             | set of libraries when it comes to unwieldy ODEs.
        
             | UncleOxidant wrote:
             | > Why is it limited to a small group of Julia developers
             | and packages?
             | 
             | I don't think there are any gatekeepers limiting it's use.
             | Articles like the one highlighted here help to get the word
             | out to more potential users.
        
             | ViralBShah wrote:
             | What do you mean by "more" people? Perhaps you mean people
             | who know? Anyone who solves a differential equation in
             | Julia is using the SciML ecosystem of packages. The Julia
             | ecosystem is about 1M users, and lots of people in that
             | ecosystem use these tools.
             | 
             | There's over 100 dependent packages: https://juliahub.com/u
             | i/Packages/OrdinaryDiffEq/DlSvy/5.64.1...
        
             | ChrisRackauckas wrote:
             | Lots to say here. First of all, the community growth has
             | been pretty tremendous and I couldn't really ask for more.
             | We're seeing tens of thousands of visitors to the
             | documentation of various packages, and have some high
             | profile users. For example, NASA showing a 15,000x
             | acceleration (https://www.youtube.com/watch?v=tQpqsmwlfY0)
             | and the Head of Clinical Pharmacology at Moderna saying
             | SciML-based Pumas "has emerged as our 'go-to' tool for most
             | of our analyses in recent months" in 2020 (see
             | https://pumas.ai/). We try to keep a showcase
             | (https://sciml.ai/showcase/) but at this point it's hard to
             | stay on top of the growth. I think anyone would be excited
             | to see an open source side project reach that level of use.
             | Since we tend to focus on core numerical issues (stiffness)
             | and performance, we target the more "hardcore" people in
             | scientific disciplines who really need these aspects and
             | those communities are the ones seeing the most adoption
             | (pharmacology, systems biology, combustion modeling, etc.).
             | Indeed the undergrad classes using a non-stiff ODE solver
             | on small ODEs or training a neural ODE on MNIST don't
             | really have those issues so they aren't our major growth
             | areas. That's okay and that's probably the last group that
             | would move.
             | 
             | In terms of the developer team, throughout the SciML
             | organization repositories we have had around 30 people who
             | have had over 100 commits, which is similar in number to
             | NumPy and SciPy. Julia naturally has a much lower barrier
             | to entry in terms of committing to such packages (since the
             | packages are all in Julia rather than C/Fortran), so the
             | percentage of users who become developers is much higher
             | which is probably why you see a lot more developer activity
             | in contrast to "pure" users. With things like the Python
             | community you have a group of people who write blog posts
             | and teach the tools in courses without ever hacking on the
             | packages or its ecosystem. In Julia, that background is
             | sufficient knowledge to also be developing the package, so
             | everyone writing about Julia seems to also be associated
             | with developing Julia packages somehow. I tend to think
             | that's a positive, but it does make the community look
             | insular as everyone you see writing about Julia is also a
             | developer of packages.
             | 
             | Lastly, since we have been focusing on people with big
             | systems and numerically hard problems, we have had the
             | benefit of being able to overlook some "simple user" issues
             | so far. We are starting to do a big push to clean up things
             | like compile times (https://github.com/SciML/DifferentialEq
             | uations.jl/issues/786), improve the documentation, throw
             | better errors, support older versions longer, etc. One way
             | to think about SciML is that it's somewhat the Linux to the
             | monolith Python packages's Windows. We give modular tools
             | in a bunch of different packages that work together, get
             | high performance, and become "more than the sum of the
             | parts", but sometimes people are fine with the simple app
             | made for one purpose. With DEQs, there's a Python package
             | specifically for DEQs (https://github.com/locuslab/deq).
             | Does it have all of the Newton-Krylov choices for the
             | different classes of Jacobians and all of that? No, but it
             | gets something simple and puts an easily Google-able face
             | to it. So while all it takes in Julia with SciML is to
             | stick a nonlinear solver in the right spot in the right way
             | and know how the adjoint codegen will bring it all
             | together, the majority want Visual Studio instead of
             | Awk+Sed or Vim. We understand that, and so the
             | DiffEqFlux.jl package is essentially just a repository of
             | tutorials and prebuilt architectures that people tend to
             | want (https://diffeqflux.sciml.ai/dev/) but we need to
             | continue improving that "simplified experience". The age of
             | Linux is more about making desktop managers that act
             | sufficiently like Windows and less about trying to get
             | everyone building ArchLinux from source. Right now we are
             | currently too much like ArchLinux and need to build more of
             | the Ubuntu-like pieces. We thus have similarly loyal
             | hardcore followers but need to focus a bit on making that
             | installation process easier and the error messages shorter
             | to attract a larger crowd.
        
       | rpmuller wrote:
       | One of the best things about Julia is that people like Chris
       | Rackauckas are developing great packages for it.
        
         | lytefm wrote:
         | Definetely. I've just been working with Stan and Pyro + Python
         | so far for modelling, but post like this encourage me to
         | finally pick up Julia and get seroius with neural ODEs.
        
       | gugagore wrote:
       | > . For example, when we apply convolution filters on images the
       | network consists of repetitive blocks of convolutional layers,
       | and one linear output layer at the very end. It's essentially
       | f(f(f(...f(x))...)) where f is the neural network, and we call
       | this "deep" because of the layers of composition. But what if we
       | make this composition go to infinity?
       | 
       | This really does not jive with my understanding. Each layer of,
       | e.g. VGG 16 [1] does not implement the same function. Each layer
       | has its own weights. There are certain architectures that tie
       | weights across layers, but not all of them.
       | 
       | [1] https://neurohive.io/en/popular-networks/vgg16/
        
         | taeric wrote:
         | I think this works if you consider subscripted fs. That is, it
         | is always the same arity, and the output is the same. So, would
         | be better it they said f_n, where n is the layer if they
         | network.
         | 
         | (I mean this as a question, but don't see an obvious place for
         | a question mark...)
        
       | fauscist1984 wrote:
       | NIH admits Fauci lied about funding Wuhan gain-of-function
       | experiments
       | 
       | https://www.washingtonexaminer.com/opinion/nih-admits-fauci-...
        
       ___________________________________________________________________
       (page generated 2021-10-21 23:01 UTC)