[HN Gopher] The Illustrated AlphaFold
       ___________________________________________________________________
        
       The Illustrated AlphaFold
        
       Author : dil8
       Score  : 282 points
       Date   : 2024-07-13 15:00 UTC (1 days ago)
        
 (HTM) web link (elanapearl.github.io)
 (TXT) w3m dump (elanapearl.github.io)
        
       | inciampati wrote:
       | It's so, so complex! I confess I had a sense of this but had no
       | idea. We don't even hear which MSA algorithm is used to align the
       | protein sequences.
        
         | flobosg wrote:
         | Input MSAs are generated with jackhmmer and HHblits and further
         | processed, if I recall Alphafold's paper correctly.
        
         | elanapearl wrote:
         | Hi, I was one of the authors of this! I think we briefly
         | mentioned this in a footnote somewhere (a lot of things got cut
         | or moved to footnotes since it is already so long & wanted to
         | focus on the ML parts that aren't described elsewhere).
         | 
         | But yes as @Flobosg mentioned, for protein chains they use
         | jackhmmer to search 4 of the databases (except when searching
         | Uniclust30 + BFD when HHBlits is used instead) and for RNA
         | chains they used nhmmer to search then hmmalign to re-align
         | these to the query chain.
         | 
         | Hope that helps!
        
       | joelS wrote:
       | This is an amazing writeup, thank you. looking forward to going
       | through it in more detail.
        
       | tomohelix wrote:
       | I consider this a glimpse into how neural networks and "AI"-like
       | techs would be implemented in the future. Lots of engineering,
       | lots of clever manipulations of known techniques woven together
       | with a powerful, well trained, model, at the center.
       | 
       | Right now I think stuff like chatgpt is only at the first step of
       | making that foundational model that can generalize and process
       | data. There isn't a lot of work going into processing the inputs
       | into something the model can best understand (not at the
       | tokenizer level, even before that). We have a basic field about
       | this i.e. prompt engineers but nothing as sophisticated as
       | Alphafold exists for natural language or images yet.
       | 
       | People are stacking LLMs together and putting system prompts in
       | to assist this input processing. Maybe when we have some more
       | complex systems in place, we can see something resembling a real
       | AGI.
        
         | astroalex wrote:
         | Some[1] think that things are trending in the opposite
         | direction: away from clever manipulations and hard coded domain
         | knowledge, and towards large scale general models.
         | 
         | [1]: http://www.incompleteideas.net/IncIdeas/BitterLesson.html
        
           | PoignardAzur wrote:
           | Yeah, I was surprised to see the architecture diagram is so
           | complex. It's been a while since I saw a design that wasn't
           | just "stack more transformer layers".
        
           | sangnoir wrote:
           | This made me think of thr differences FPGAs and
           | microprocessors - with "more laters" being equivalent to
           | "more gates"
        
       | great_tankard wrote:
       | This is an awesome writeup that really helped me understand
       | what's going on under the hood. I didn't know, for example, that
       | for the limited number of PTMs AF3 can handle it has to treat
       | every single atom, including those of the main and side chain, as
       | an individual token (presumably because PTMs are very
       | underrepresented in the PDB?)
       | 
       | Thank you for translating the paper into something this
       | structural biologist can grasp.
        
       | mk_stjames wrote:
       | I have no prior knowledge on protein folding but nevertheless I
       | enjoyed (attempting) to read through this. It's interesting to
       | see the complexity in techniques used in comparison to a lot of
       | other ML projects today.
        
       ___________________________________________________________________
       (page generated 2024-07-14 23:01 UTC)