[HN Gopher] Creating the largest protein-protein interaction dat...
___________________________________________________________________
Creating the largest protein-protein interaction dataset in the
world
Author : abhishaike
Score : 51 points
Date : 2024-08-12 14:10 UTC (8 hours ago)
(HTM) web link (www.owlposting.com)
(TXT) w3m dump (www.owlposting.com)
| hirenj wrote:
| The PTM situation is a bit worse actually. First, of all the
| PTMs, high mannose N-glycans can be recapitulated (with the right
| knockouts). It's the complex/hybrid that are completely missing.
|
| Second, the O-glycans are completely different to humans. Unless
| you're looking at alpha-DG, and a handful of other proteins,
| you're going to get the wrong glycosylation. This is a problem
| for two reasons: a) the alpha-Mannose does completely different
| things to the protein backbone compared to alpha-GalNAc, or
| probably alpha-Fucose etc, and b) those yeast PMT enzymes don't
| seem to care where they throw sugars on, so they're going to
| probably glycosylate something that shouldn't be glycosylated.
|
| This is to say nothing about the different suite of PC-processing
| enzymes and zymogen activation in yeast too.
|
| So here's my free solution to solve this: On all mated cells, do
| a ConA enrichment and identify where there is O-glycosylation
| (mass spectrometry). If it's on your target protein, drop the
| data?
|
| But otherwise, if you are interested in yeast interaction, looks
| like a cool technique!
| abhishaike wrote:
| I actually had no idea that the glycan problem extended beyond
| high mannose N-glycans! Informative comment, thank you
| michelb wrote:
| I think I lost count of how many companies are currently building
| this. I'm not in this field, but are they all very different or
| just trying to be the first to win?
| abhishaike wrote:
| There are many companies in the 'which proteins are in my
| sample' space (Olink, SomaLogic, etc), I actually dont know any
| others in the 'what proteins interact with other proteins'
| space
| jszymborski wrote:
| This is my field (Im a PhD student who writes PPI inference
| models).
|
| I skimmed the article and start-up website and I'm a bit
| confused.
|
| PPI inference is not binding affinity prediction is not binding
| site prediction, despite being related tasks.
|
| There are billions of PPI pairs in public datasets, there is much
| less binding affinity data, and even less binding site data.
|
| (Side-note: if you're hiring PPI / deep learning / comp. bio
| people, send me an email at the address in my bio.)
| abhishaike wrote:
| Sorry yeah, I muddled with the definitions a bit, this is
| focused on binding affinity data. Afaict, the primary source
| for such data is
| https://en.wikipedia.org/wiki/PDBbind_database, which is quite
| small
| bglazer wrote:
| Genuinely curious, how is binding affinity prediction not PPI
| prediction? Isn't a PPI just a binarization of affinity?
| celltalk wrote:
| My gut says protein-protein interactions aren't that useful given
| how these interactions scale, and on top of that you have post-
| translational modifications, SNVs etc. It's a very very hard
| problem to solve.
|
| Instead, we can focus on gene-gene interactions and go bottom up.
| There, we don't need new wetlab techniques that needs to be
| validated, measure mRNA instead. Plus, an average single cell
| contains 40 million proteins, yet the number of mRNA molecules
| are orders of magnitudes less and can be sequenced with high
| precision.
|
| If you for instance open KEGG database in graph mode, you will
| see one of the largest manually curated datasets ever. Yet it is
| still tiny! If you imagine A,T,G,C as alphabet, and genes as
| special tokens. All we know as humanity is couple of words... and
| it's sad.
|
| I think LLMs might be our best bet on these. Given few words they
| might uncover "new words" we have never thought about. I kinda
| tried this... but the methodology is still shaky.
|
| https://celvox.co/blog/TCC/index.html
___________________________________________________________________
(page generated 2024-08-12 23:01 UTC)