[HN Gopher] Composability in Julia: Implementing Deep Equilibriu...
___________________________________________________________________
Composability in Julia: Implementing Deep Equilibrium Models via
Neural ODEs
Author : ChrisRackauckas
Score : 90 points
Date : 2021-10-21 14:44 UTC (8 hours ago)
(HTM) web link (julialang.org)
(TXT) w3m dump (julialang.org)
| tomtomftomtom wrote:
| Fauci funded COVID-19: https://www.zerohedge.com/covid-19/nih-
| admits-funding-gain-f...
| Tarrosion wrote:
| I've been part of and/or following the Julia community since
| 2015, and Julia is my favorite programming language by a wide
| margin. Seems like every two months there's a new blog post,
| usually with some of these folks as authors, that
| describes...something...to do with ODEs, machine learning, neural
| ODEs, GPUs, adjoints, scientific machine learning, ...
|
| I have never once been able to follow one of these blog posts.
| Seems like universally these posts have terrible curse of
| knowledge [1]. To be fair, I know only a light-to-moderate amount
| about machine learning and very little about differential
| equations, but still-- I'm a long-term Julia fan, professional
| data scientist, mathy PhD, would hope that's at least table
| stakes. Maybe I just need to...try harder? I wonder how many
| people would be excited but are totally lost by these posts.
|
| To that point, if anybody has a recommendation for a gentle
| introduction to these topics (preferably Julia), I'd be most
| appreciative.
|
| [1] https://en.wikipedia.org/wiki/Curse_of_knowledge
| uoaei wrote:
| > but still-- I'm a long-term Julia fan, professional data
| scientist, mathy PhD, would hope that's at least table stakes.
|
| I don't understand this part. Are you saying you're a
| "professional data scientist" with a "mathy PhD" who has a
| "light-to-moderate amount" of machine learning knowledge? How
| did you get the job?
|
| I would expect anyone with a mathy PhD to understand ODEs and
| PDEs, and neural ODEs are commonly understood (by those who
| read the papers, where I think it is an appropriate assumption
| that people who want to understand this stuff would do) to be
| effectively infinite-depth neural networks where every layer
| represents the same function.
| diskzero wrote:
| Have you been in interview loops or worked with bread and
| butter data scientists performing common tasks? I am curious
| what your view of what most data scientists do day in and day
| out?
| uoaei wrote:
| What about tasks being common makes a light-to-moderate
| understanding of machine learning sufficient?
|
| Processes initiated by data scientists during the execution
| of their role will tend to fail silently. What is meant
| here is, throwing an inappropriate model at otherwise good
| data produces unreliable (catastrophic in certain
| situations) results, but produces results nonetheless.
| Without the proper discernment of the reliability of the
| results, we have an unequivocal failure to execute the
| role. This is the oft-unmentioned companion to, but
| decidedly more insidious than, the "garbage in, garbage
| out" (i.e., right model, wrong data) aphorism.
|
| It is up to the person performing this operation to deduce
| whether or not the conclusions are trustworthy. I don't see
| how someone can be confident of this without either relying
| on a pre-defined workflow verified by someone else
| qualified to assess the consequences, or to have those
| qualifications themselves.
|
| What follows is a contrived example, but illustrative of
| the problem:
|
| Consider e.g. user privacy: it is by now well-known that
| e.g. embedding vectors (or even merely the relationships
| between them) can leak a lot of information about the
| person or object it represents. It is not enough to
| understand how the forward pass of such a model commences,
| but also what is stored in those representations, which,
| having gone through a master's with quite a few people who
| now call themselves data scientists, I am not confident is
| commonly understood.
| eigenspace wrote:
| This is a pretty shitty, non-constructive response. I think
| neural differential equations are not easy to wrap one's head
| around even if you have a solid understand of deep learning
| and differential equations.
|
| Sure, if they spent a lot of time wading through the
| literature they'd probably understand fine, but the point
| they were making was that the post was quite unapproachable
| without having delved into the specific literature on neural
| differential equations.
|
| I think this is a reasonably valid complaint, and does not
| warrant you implying that they don't deserve their job.
| uoaei wrote:
| Valid complaint how? If I have never studied carcinogenesis
| why should I believe I should understand the description of
| a new treatment for bone marrow cancer?
|
| The way any article is written reflects the audience it is
| suited for. If this article was intended for people
| unfamiliar with neural ODEs they would have put more effort
| into writing it in a suitable way.
| orbifold wrote:
| I think it would have helped if the people writing that
| paper had not confused the issue by introducing a new name
| for something that is a well known thing in optimal control
| and had been invented even before neural networks, namely
| adjoint sensitivity analysis. There even appear multilayer
| networks of switching components in Pontryagin's book on
| the subject.
| agumonkey wrote:
| What's funny is how most of them are revolving around a similar
| culture of concepts (ODE, automated differentiation, analysis)
| which is very unlike mainstream computing.
|
| ps: I need to read those adjoints articles.
| adgjlsfhk1 wrote:
| This is a really cool idea, especially because it is a type of NN
| that can take more time for harder inputs. This makes it
| relatively unique since most types of NN have O(1) runtime, which
| is often nice, but puts limits on the types of problems they can
| solve.
| bomz24 wrote:
| NIH Admits Fauci lied to congress and funded covid-19:
|
| https://theintercept.com/2021/10/21/virus-mers-wuhan-experim...
| grzff wrote:
| Fauci funded COVID-19: https://www.zerohedge.com/covid-19/nih-
| admits-funding-gain-f...
|
| Everyone involved should face the firing squad.
| cs702 wrote:
| _Very_ cool. I have only one question:
|
| Has anyone successfully applied DEQs to larger-scale cognitive
| tasks or benchmarks, as opposed to MNIST, which is a tiny trivial
| task by today's standards?
|
| Think ImageNet-1000, COCO, LVIS, WMT language translation, ...,
| SuperGLUE. There's a long list of datasets and benchmarks that
| regular boring fixed-depth NNs tackle with remarkable ease these
| days.
|
| Has anyone anywhere applied DEQs to _any_ of those datasets /
| benchmarks?
| avikpal1410 wrote:
| MDEQ work applies DEQ to some of the large scale benchmarks you
| mention: https://arxiv.org/pdf/2006.08656.pdf
| cs702 wrote:
| Thank you! The results don't look that great (e.g.,
| EfficientNet models achieve greater accuracy on ImageNet-1000
| with ~5x fewer parameters), but the works looks interesting
| and worthwhile. I'll take a look.
| jstx1 wrote:
| What are some good references on neural ODEs that don't come from
| the Julia community? I'm looking for theory and applications -
| when are they good and who is using them for what?
|
| I'm asking for sources outside of Julia because I find the
| coupling of algorithm types to tools kind of strange and the
| whole SciML trend is kind of opaque to me. (Are people applying
| ML as a solution to newer problems? Are they using new approaches
| to solve ML problems? How legit is the whole thing? I just don't
| know.)
| ssivark wrote:
| The original Neural ODEs paper is quite readable, and by now
| there are loads of blog posts and even a few talks on the
| subject.
|
| The basic idea is inspired by the "adjoint method" for ODE
| solving (so you don't have to hold in memory all the
| intermediate layer outputs -- which is otherwise necessary to
| compute the backpropagated gradient signal).
| ChrisRackauckas wrote:
| Yeah, though with the method described in that paper you do
| have to be very careful since it has exponential error growth
| with the Lipchitz constants of the ODE. See
| https://aip.scitation.org/doi/10.1063/5.0060697 for details.
| But that is generally the case in numerical analysis: there's
| always a simple way to do things, and then there's the way
| that prevents error growths. Both have different pros and
| cons.
| UncleOxidant wrote:
| There's some info from the Pyhon/Pytorch camp:
| https://towardsdatascience.com/neural-odes-with-pytorch-ligh...
|
| I suspect neural ODE work was done in Julia earlier because it
| was easier given some language features and libraries. But
| there does seem to be some work on neural ODEs in
| Python/Pytorch.
| orbifold wrote:
| Neural ODEs are essentially a rebranding of adjoint sensitivity
| analysis, which has been around in various forms in established
| solver suites, such as Sundials, PETSC, etc. The machine
| learning community got a hold of it, cited one book and
| otherwise happily reinvented everything.
| adgjlsfhk1 wrote:
| What do you mean by "the coupling of algorithm types to tools"
| (not judging, just curious).
| jstx1 wrote:
| More bluntly my question is if SciML is that good, why aren't
| more people doing it yet? Why is it limited to a small group
| of Julia developers and packages?
|
| (There are good possible explanations - it could be very new,
| have only niche applications, Julia is somehow uniquely
| suited for it etc. I don't know)
| krastanov wrote:
| Some minor clarifications: NeuralODEs are not a Julia
| invention. I am pretty sure the first papers on the topic
| were using a python package implementing a rather crude ODE
| solver in torch or tensorflow. Julia just happens to be
| light years ahead of any other tool when it comes to
| solving ODEs, while having many high-quality
| autodifferentiation packages as well, so it feels natural
| to use it for these problems. But more importantly, SciML
| is not just for your typical Machine Learning tasks: being
| able to solve ODEs and have autodiff over them is
| incredibly empowering for boring old science and
| engineering, and SciML has become one of the most popular
| set of libraries when it comes to unwieldy ODEs.
| UncleOxidant wrote:
| > Why is it limited to a small group of Julia developers
| and packages?
|
| I don't think there are any gatekeepers limiting it's use.
| Articles like the one highlighted here help to get the word
| out to more potential users.
| ViralBShah wrote:
| What do you mean by "more" people? Perhaps you mean people
| who know? Anyone who solves a differential equation in
| Julia is using the SciML ecosystem of packages. The Julia
| ecosystem is about 1M users, and lots of people in that
| ecosystem use these tools.
|
| There's over 100 dependent packages: https://juliahub.com/u
| i/Packages/OrdinaryDiffEq/DlSvy/5.64.1...
| ChrisRackauckas wrote:
| Lots to say here. First of all, the community growth has
| been pretty tremendous and I couldn't really ask for more.
| We're seeing tens of thousands of visitors to the
| documentation of various packages, and have some high
| profile users. For example, NASA showing a 15,000x
| acceleration (https://www.youtube.com/watch?v=tQpqsmwlfY0)
| and the Head of Clinical Pharmacology at Moderna saying
| SciML-based Pumas "has emerged as our 'go-to' tool for most
| of our analyses in recent months" in 2020 (see
| https://pumas.ai/). We try to keep a showcase
| (https://sciml.ai/showcase/) but at this point it's hard to
| stay on top of the growth. I think anyone would be excited
| to see an open source side project reach that level of use.
| Since we tend to focus on core numerical issues (stiffness)
| and performance, we target the more "hardcore" people in
| scientific disciplines who really need these aspects and
| those communities are the ones seeing the most adoption
| (pharmacology, systems biology, combustion modeling, etc.).
| Indeed the undergrad classes using a non-stiff ODE solver
| on small ODEs or training a neural ODE on MNIST don't
| really have those issues so they aren't our major growth
| areas. That's okay and that's probably the last group that
| would move.
|
| In terms of the developer team, throughout the SciML
| organization repositories we have had around 30 people who
| have had over 100 commits, which is similar in number to
| NumPy and SciPy. Julia naturally has a much lower barrier
| to entry in terms of committing to such packages (since the
| packages are all in Julia rather than C/Fortran), so the
| percentage of users who become developers is much higher
| which is probably why you see a lot more developer activity
| in contrast to "pure" users. With things like the Python
| community you have a group of people who write blog posts
| and teach the tools in courses without ever hacking on the
| packages or its ecosystem. In Julia, that background is
| sufficient knowledge to also be developing the package, so
| everyone writing about Julia seems to also be associated
| with developing Julia packages somehow. I tend to think
| that's a positive, but it does make the community look
| insular as everyone you see writing about Julia is also a
| developer of packages.
|
| Lastly, since we have been focusing on people with big
| systems and numerically hard problems, we have had the
| benefit of being able to overlook some "simple user" issues
| so far. We are starting to do a big push to clean up things
| like compile times (https://github.com/SciML/DifferentialEq
| uations.jl/issues/786), improve the documentation, throw
| better errors, support older versions longer, etc. One way
| to think about SciML is that it's somewhat the Linux to the
| monolith Python packages's Windows. We give modular tools
| in a bunch of different packages that work together, get
| high performance, and become "more than the sum of the
| parts", but sometimes people are fine with the simple app
| made for one purpose. With DEQs, there's a Python package
| specifically for DEQs (https://github.com/locuslab/deq).
| Does it have all of the Newton-Krylov choices for the
| different classes of Jacobians and all of that? No, but it
| gets something simple and puts an easily Google-able face
| to it. So while all it takes in Julia with SciML is to
| stick a nonlinear solver in the right spot in the right way
| and know how the adjoint codegen will bring it all
| together, the majority want Visual Studio instead of
| Awk+Sed or Vim. We understand that, and so the
| DiffEqFlux.jl package is essentially just a repository of
| tutorials and prebuilt architectures that people tend to
| want (https://diffeqflux.sciml.ai/dev/) but we need to
| continue improving that "simplified experience". The age of
| Linux is more about making desktop managers that act
| sufficiently like Windows and less about trying to get
| everyone building ArchLinux from source. Right now we are
| currently too much like ArchLinux and need to build more of
| the Ubuntu-like pieces. We thus have similarly loyal
| hardcore followers but need to focus a bit on making that
| installation process easier and the error messages shorter
| to attract a larger crowd.
| rpmuller wrote:
| One of the best things about Julia is that people like Chris
| Rackauckas are developing great packages for it.
| lytefm wrote:
| Definetely. I've just been working with Stan and Pyro + Python
| so far for modelling, but post like this encourage me to
| finally pick up Julia and get seroius with neural ODEs.
| gugagore wrote:
| > . For example, when we apply convolution filters on images the
| network consists of repetitive blocks of convolutional layers,
| and one linear output layer at the very end. It's essentially
| f(f(f(...f(x))...)) where f is the neural network, and we call
| this "deep" because of the layers of composition. But what if we
| make this composition go to infinity?
|
| This really does not jive with my understanding. Each layer of,
| e.g. VGG 16 [1] does not implement the same function. Each layer
| has its own weights. There are certain architectures that tie
| weights across layers, but not all of them.
|
| [1] https://neurohive.io/en/popular-networks/vgg16/
| taeric wrote:
| I think this works if you consider subscripted fs. That is, it
| is always the same arity, and the output is the same. So, would
| be better it they said f_n, where n is the layer if they
| network.
|
| (I mean this as a question, but don't see an obvious place for
| a question mark...)
| fauscist1984 wrote:
| NIH admits Fauci lied about funding Wuhan gain-of-function
| experiments
|
| https://www.washingtonexaminer.com/opinion/nih-admits-fauci-...
___________________________________________________________________
(page generated 2021-10-21 23:01 UTC)