[HN Gopher] Creating the largest protein-protein interaction dat...
       ___________________________________________________________________
        
       Creating the largest protein-protein interaction dataset in the
       world
        
       Author : abhishaike
       Score  : 51 points
       Date   : 2024-08-12 14:10 UTC (8 hours ago)
        
 (HTM) web link (www.owlposting.com)
 (TXT) w3m dump (www.owlposting.com)
        
       | hirenj wrote:
       | The PTM situation is a bit worse actually. First, of all the
       | PTMs, high mannose N-glycans can be recapitulated (with the right
       | knockouts). It's the complex/hybrid that are completely missing.
       | 
       | Second, the O-glycans are completely different to humans. Unless
       | you're looking at alpha-DG, and a handful of other proteins,
       | you're going to get the wrong glycosylation. This is a problem
       | for two reasons: a) the alpha-Mannose does completely different
       | things to the protein backbone compared to alpha-GalNAc, or
       | probably alpha-Fucose etc, and b) those yeast PMT enzymes don't
       | seem to care where they throw sugars on, so they're going to
       | probably glycosylate something that shouldn't be glycosylated.
       | 
       | This is to say nothing about the different suite of PC-processing
       | enzymes and zymogen activation in yeast too.
       | 
       | So here's my free solution to solve this: On all mated cells, do
       | a ConA enrichment and identify where there is O-glycosylation
       | (mass spectrometry). If it's on your target protein, drop the
       | data?
       | 
       | But otherwise, if you are interested in yeast interaction, looks
       | like a cool technique!
        
         | abhishaike wrote:
         | I actually had no idea that the glycan problem extended beyond
         | high mannose N-glycans! Informative comment, thank you
        
       | michelb wrote:
       | I think I lost count of how many companies are currently building
       | this. I'm not in this field, but are they all very different or
       | just trying to be the first to win?
        
         | abhishaike wrote:
         | There are many companies in the 'which proteins are in my
         | sample' space (Olink, SomaLogic, etc), I actually dont know any
         | others in the 'what proteins interact with other proteins'
         | space
        
       | jszymborski wrote:
       | This is my field (Im a PhD student who writes PPI inference
       | models).
       | 
       | I skimmed the article and start-up website and I'm a bit
       | confused.
       | 
       | PPI inference is not binding affinity prediction is not binding
       | site prediction, despite being related tasks.
       | 
       | There are billions of PPI pairs in public datasets, there is much
       | less binding affinity data, and even less binding site data.
       | 
       | (Side-note: if you're hiring PPI / deep learning / comp. bio
       | people, send me an email at the address in my bio.)
        
         | abhishaike wrote:
         | Sorry yeah, I muddled with the definitions a bit, this is
         | focused on binding affinity data. Afaict, the primary source
         | for such data is
         | https://en.wikipedia.org/wiki/PDBbind_database, which is quite
         | small
        
         | bglazer wrote:
         | Genuinely curious, how is binding affinity prediction not PPI
         | prediction? Isn't a PPI just a binarization of affinity?
        
       | celltalk wrote:
       | My gut says protein-protein interactions aren't that useful given
       | how these interactions scale, and on top of that you have post-
       | translational modifications, SNVs etc. It's a very very hard
       | problem to solve.
       | 
       | Instead, we can focus on gene-gene interactions and go bottom up.
       | There, we don't need new wetlab techniques that needs to be
       | validated, measure mRNA instead. Plus, an average single cell
       | contains 40 million proteins, yet the number of mRNA molecules
       | are orders of magnitudes less and can be sequenced with high
       | precision.
       | 
       | If you for instance open KEGG database in graph mode, you will
       | see one of the largest manually curated datasets ever. Yet it is
       | still tiny! If you imagine A,T,G,C as alphabet, and genes as
       | special tokens. All we know as humanity is couple of words... and
       | it's sad.
       | 
       | I think LLMs might be our best bet on these. Given few words they
       | might uncover "new words" we have never thought about. I kinda
       | tried this... but the methodology is still shaky.
       | 
       | https://celvox.co/blog/TCC/index.html
        
       ___________________________________________________________________
       (page generated 2024-08-12 23:01 UTC)