[HN Gopher] Category Theory [?] Machine Learning
       ___________________________________________________________________
        
       Category Theory [?] Machine Learning
        
       Author : bgavran
       Score  : 65 points
       Date   : 2023-03-07 20:04 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | AlexCoventry wrote:
       | > _Category Theory has been finding increasing applications in
       | machine learning_
       | 
       | What's the most compelling application so far?
        
       | umutisik wrote:
       | It is tempting to believe that category theory will shed new
       | light on and simplify machine learning, just like it did in
       | algebraic geometry, algebraic topology and other mathematical
       | things. This is wishful thinking. Folks who care about doing
       | something useful should stay away from this content.
        
       | rmdamiao wrote:
       | Is this simply a consequence of exponential growth in CS
       | publications driven by machine learning or is there something
       | really going on here?
        
         | bgavran wrote:
         | OP here.
         | 
         | The exponential growth in CS publication is much faster. This
         | repository is simply a testament that CT is slowly ramping up.
         | 
         | It's meant to show what kind of expressive power and breadth
         | _current_ CT models have, which to my knowledge isn 't
         | something that's well-known outside of our niche community.
        
           | bgavran wrote:
           | It's also meant to suggest where things are going (the kind
           | of a chart I have in mind is this one
           | https://twitter.com/bgavran3/status/1422206118688956420 ),
           | though I understand this is something that deserves a much
           | more substantial proof.
        
         | adamnemecek wrote:
         | The field needs better foundations. CT is pretty good.
        
           | hgsgm wrote:
           | Why? How?
           | 
           | The OP GitHub site doesn't promote any material that
           | introduces the concepts at all. The "survey" paper at the top
           | is nigh-impenetrable. I'm sure the category theorists are
           | having fun modelling machine learning, but it doesn't show
           | how machine learning benefits from the category theory.
        
             | bgavran wrote:
             | There's a well-articulated essay from Christopher Olah that
             | captures the undercurrent, and perhaps even the motivation
             | of many of these papers
             | https://colah.github.io/posts/2015-09-NN-Types-FP/
             | 
             | You're completely right that the repo doesn't promote any
             | introductory material. CT is notoriously difficult to get
             | into, and this repository wasn't meant to be a pedagogical
             | one, but rather a list of all the relevant papers.
             | 
             | Though, I'll see about remedying this. I have a different
             | repository that curates a list of (what I consider to be,
             | as a CS major) relatively approachable CT introductory
             | materials
             | https://github.com/bgavran/Category_Theory_Resources
             | 
             | and it might be a good idea to add a pointer to it.
        
             | adamnemecek wrote:
             | Residual connection serves as a feedback/trace a la trace
             | in traced monoidal categories. That is one insight I have
             | gleaned from CT.
        
               | resource0x wrote:
               | You beat me to this. Indeed, of all categories the
               | monoidal ones are the most potent. Look how nicely they
               | fit in crypto ledgers:
               | https://www.cl.cam.ac.uk/events/syco/3/slides/Nester.pdf
               | 
               | \s
        
             | Yoric wrote:
             | Category Theory (just as all mathematical models for
             | programming or subsets thereof) are building blocks for
             | reasoning on what we build. Past applications of such
             | mathematical models include:
             | 
             | - programming languages with semantics that are better
             | adapted to specific problems (e.g. Rust's ownership);
             | 
             | - better compilers (see e.g. Haskell's supercompiler, which
             | puts to shame `constexpr`-style features);
             | 
             | - better static analyzers (e.g. better type systems,
             | abstract interpretation, model checkers).
             | 
             | In the case of Machine Learning, it _might_ some day help
             | us create Machine Learning that we can understand and trust
             | better. Or it might fail. Or it might help us invent
             | something different entirely, in 30 years.
        
               | anfelor wrote:
               | Intuition from category theory has certainly been useful
               | for some programming language features (eg. algebraic
               | effects, modal types or lenses), but I don't think it has
               | helped with the things you cite. Rusts ownership model
               | was created with little formal, academic understanding -
               | or do you mean more foundational stuff like linear logic?
               | And I don't think that Haskell does supercompilation
               | today? It does optimize a whole lot but that is more due
               | to laziness and other optimizations. But neither these
               | optimizations nor supercompilation have much to do with
               | category theory as far as I am aware.
        
               | Yoric wrote:
               | Yeah, my phrasing is probably unclear. I meant to give
               | examples of applications of formal semantics (huddling
               | everything together). I should probably sleep instead of
               | commenting on HN :)
               | 
               | But it is my understanding that Rust's ownership traces
               | its roots back to both linear logic (well, affine logic)
               | and region-based resource management, both of which have
               | formal semantics.
               | 
               | I haven't followed what got merged into GHC, but I
               | remember seeing demos (and a paper) of a Haskell
               | supercompiler during a conference many years ago, so it
               | is something.
               | 
               | In my sleep-deprived brain, supercompilation is directly
               | related to algebraic effects (although probably not in
               | Haskell itself), which are themselves related to category
               | theory. I could be wrong.
        
               | riku_iki wrote:
               | why those approaches never picked up outside of some
               | academia projects?..
        
               | Yoric wrote:
               | These days, Microsoft requires model-checking proofs
               | before accepting new device drivers. That's their secret
               | weapon that finally got (mostly) rid of the BSOD.
               | 
               | I suspect that Apple is also using model-checking at
               | various layers, but I have no proof :)
        
           | KRAKRISMOTT wrote:
           | No. It won't make a significant (if any at all) difference to
           | effectiveness. Rewriting Pytorch in Haskell won't magically
           | get you AGI.
        
             | adamnemecek wrote:
             | It's not about rewriting things in Haskell but about using
             | CT to reason about architectures.
        
             | solomonb wrote:
             | No one was suggesting rewriting anything in Haskell
             | afaict..
        
       | adamnemecek wrote:
       | I have recently written a paper on understanding machine learning
       | via the lens of Hopf algebra https://arxiv.org/abs/2302.01834.
       | 
       | Hopf algebras (which are really just tensors with recurrence
       | relations built in) subsume convnets, transformers and diffusion
       | model and also provide a theoretically better autodiff that
       | operates within single layers as opposed to across entire graphs.
       | 
       | Furthermore, there is a correspondence between Hopf algebra and
       | cyclical linear logic and Hopf algebras are related to zonotopes,
       | which are polyhedra that have been used in verified numerical
       | computation. I'm strongly convinced the LL connection can provide
       | proofs over zonotopes which paves the way towards interpretable
       | AI and will be central for XAI.
       | 
       | I know this sounds too good to be true but Persi Diaconis has
       | also written a paper that shows how useful Hopf algebras are in
       | the context of Markov chains https://arxiv.org/abs/1206.3620
       | 
       | I'm working on a next gen Hopf algebra based machine learning
       | framework.
       | 
       | Join my discord if you want to discuss this further
       | https://discord.cofunctional.ai.
       | 
       | ====
       | 
       | My account is currently rate limited so I will use this comment
       | to respond to comments below.
       | 
       | red_trumped: What about Hopf algebras do I not understand?
       | 
       | gaze: Haha, it's been a while since I have commented about QC.
       | What do I not understand about it? And what comment are you
       | referring to?
        
         | ianandrich wrote:
         | I just read your Coinductive guide to inductive transformer
         | heads paper.
         | 
         | My mind is blown.
         | 
         | Is the Hopf Algebra based ML framework you are working on on
         | your github? I took a glance, but you have 1500 repositories
         | and it wasn't on the first few of them.
        
         | red_trumpet wrote:
         | Your paper didn't pass my smell test at all, tbh. For example
         | the formula you write about "product" and "coproduct" in
         | section 3 is literally identical (as "=" is symmetric). In
         | section 4.2 you write "the product is the standard tensor
         | product" with a formula that doesn't at all involve the map m:
         | A \otimes A \to A. The formula you write is the _induced_
         | product on A \otimes A, assuming that you already have a
         | product on A. The formula for  "coproduct" is just an
         | example[1] of a coproduct, not every coproduct has to look that
         | way.
         | 
         | [1] https://en.wikipedia.org/wiki/Coalgebra#Examples
        
           | jurynulifcation wrote:
           | adamnemecek has posted too many comments and is in cooldown
           | phase, but he's asked me to post this comment: "It's the
           | programmers equal sign. I think that the surrounding text
           | provides a decent explanation what the deal is.
           | 
           | You are right, there's a missing sentence fragment, "standard
           | tensor product that satisfies the property...".
           | 
           | Read the Diaconis paper. "
           | 
           | --- This isn't a sock puppet and I hope this isn't against
           | site rules. I just wanted to try and help facilitate good
           | discussion. I think trumpet brought up some interesting
           | criticisms and felt Adam had a legitimate interest in
           | responding ASAP.
        
             | LudwigNagasena wrote:
             | > It's the programmers equal sign.
             | 
             | That doesn't seem to make any sense.
        
         | [deleted]
        
         | gexaha wrote:
         | could you advertise your research a bit less often, please? i
         | see your post like literally almost every other day here
        
           | [deleted]
        
           | hgsgm wrote:
           | Or at least explain it in more accessible way. Every time
           | Adam posts about the paper, it gets confused comments and no
           | engagement on the content, because it's pretty deep graduate
           | level pure math, which is occasionally seen but rare on HN.
        
             | red_trumpet wrote:
             | As a maths PhD student that has seen Hopf algebras before
             | (though I'm no expert, and the context was different), I'm
             | not convinced Adam understands things about Hopf algebras.
        
               | gaze wrote:
               | He doesn't understand quantum computing either. I wish he
               | would stop writing authoritatively about things he
               | doesn't understand.
        
       | donnowhy wrote:
       | category theory is 'native 2-dimensional' math. i.e. category
       | theory explains everything in terms of graphs, where a graph is
       | made from two different sorts of 'entities', nodes and vertices
       | i.e. categories and morphisms
       | 
       | this being math, I wonder to which extent can category theory be
       | re-expressed in terms of sets.
       | 
       | perhaps a better question is if category theory can be re-
       | expressed (or founded on) functions?
       | 
       | lastly, I wonder if category theory can be expressed in terms of
       | functions (i think maybe it can, without sets?) why shouldn't it
       | be expressible in terms of sets (for some reason I don't think
       | just sets are sufficient, may have to define functions (which
       | possible in terms of sets) before 'expressing' categories
       | starting with set theory)?
        
         | voxl wrote:
         | Set theory is fine, see the (Stack
         | Project)[https://stacks.math.columbia.edu/browse] which
         | develops a ton of modern Category Theory on ZFC (Zermelo-
         | Fraenkel Set Theory with the Axiom of Choice) alone.
         | 
         | Alternative foundations of mathematics (Set Theory, Category
         | Theory, Type Theory, and all their variations) can all mutually
         | interpret the other by just postulating sufficiently large
         | universes. You don't pick or advocate one based off its ability
         | to encode mathematics, but instead based on its ability to
         | express your intention and ideas.
         | 
         | Really its no different from programming language preference in
         | my book.
        
         | Koshkin wrote:
         | > _i.e. categories and morphisms_
         | 
         | Objects, not categories.
        
         | justatdotin wrote:
         | > I wonder to which extent can category theory be re-expressed
         | in terms of sets...
         | 
         | yoneda
        
         | mydogcanpurr wrote:
         | > for some reason I don't think just sets are sufficient
         | 
         | The reason you're looking for is that the category of sets is
         | not a set.
        
       ___________________________________________________________________
       (page generated 2023-03-07 23:00 UTC)