[HN Gopher] The Illustrated AlphaFold
       ___________________________________________________________________
        
       The Illustrated AlphaFold
        
       Author : dil8
       Score  : 165 points
       Date   : 2024-07-13 15:00 UTC (7 hours ago)
        
 (HTM) web link (elanapearl.github.io)
 (TXT) w3m dump (elanapearl.github.io)
        
       | inciampati wrote:
       | It's so, so complex! I confess I had a sense of this but had no
       | idea. We don't even hear which MSA algorithm is used to align the
       | protein sequences.
        
         | flobosg wrote:
         | Input MSAs are generated with jackhmmer and HHblits and further
         | processed, if I recall Alphafold's paper correctly.
        
         | elanapearl wrote:
         | Hi, I was one of the authors of this! I think we briefly
         | mentioned this in a footnote somewhere (a lot of things got cut
         | or moved to footnotes since it is already so long & wanted to
         | focus on the ML parts that aren't described elsewhere).
         | 
         | But yes as @Flobosg mentioned, for protein chains they use
         | jackhmmer to search 4 of the databases (except when searching
         | Uniclust30 + BFD when HHBlits is used instead) and for RNA
         | chains they used nhmmer to search then hmmalign to re-align
         | these to the query chain.
         | 
         | Hope that helps!
        
       | joelS wrote:
       | This is an amazing writeup, thank you. looking forward to going
       | through it in more detail.
        
       | tomohelix wrote:
       | I consider this a glimpse into how neural networks and "AI"-like
       | techs would be implemented in the future. Lots of engineering,
       | lots of clever manipulations of known techniques woven together
       | with a powerful, well trained, model, at the center.
       | 
       | Right now I think stuff like chatgpt is only at the first step of
       | making that foundational model that can generalize and process
       | data. There isn't a lot of work going into processing the inputs
       | into something the model can best understand (not at the
       | tokenizer level, even before that). We have a basic field about
       | this i.e. prompt engineers but nothing as sophisticated as
       | Alphafold exists for natural language or images yet.
       | 
       | People are stacking LLMs together and putting system prompts in
       | to assist this input processing. Maybe when we have some more
       | complex systems in place, we can see something resembling a real
       | AGI.
        
       | great_tankard wrote:
       | This is an awesome writeup that really helped me understand
       | what's going on under the hood. I didn't know, for example, that
       | for the limited number of PTMs AF3 can handle it has to treat
       | every single atom, including those of the main and side chain, as
       | an individual token (presumably because PTMs are very
       | underrepresented in the PDB?)
       | 
       | Thank you for translating the paper into something this
       | structural biologist can grasp.
        
       ___________________________________________________________________
       (page generated 2024-07-13 23:00 UTC)