[HN Gopher] Chai-1: Decoding the molecular interactions of life
___________________________________________________________________
Chai-1: Decoding the molecular interactions of life
Author : glowingvoices
Score : 263 points
Date : 2024-09-10 22:13 UTC (1 days ago)
(HTM) web link (www.chaidiscovery.com)
(TXT) w3m dump (www.chaidiscovery.com)
| throwup238 wrote:
| How hard would it be for a biohacker to use these models to
| develop novel proteins? Let's say I wanted to take GFP and create
| another color fluorescent or something.
| glowingvoices wrote:
| I don't think it'd be too difficult. Train a PLM to generate
| proteins, validate with AF3, and send them off to a lab. You
| might want to read the ESM-3 paper if you're interested in
| stuff like this (not affiliated in any way).
| zan2434 wrote:
| This is both awesome and feels very dangerous to release
| publicly, no? Can't this be used to discover novel bioweapons as
| easily as it can be used to discover new medicines?
|
| Genuinely curious, would love to learn if that isn't true / or is
| generally just not that big of a deal compared to other risks.
| taspeotis wrote:
| This is as unethical as that time JVC released VHS which
| allowed people to record videos but also pirate content!!1
| zan2434 wrote:
| Clear snark aside, content piracy has pretty bounded risks so
| isn't a reasonable comparison
| mmmore wrote:
| You'd have to work at the RIAA to think that piracy and
| bioweapons are comparable.
|
| I don't know how much releasing this model is a delta on
| safety, but we certainly need to do a better job of vetting
| who can order viruses; my understanding is there's very
| little restrictions right now. This will become more
| important as models get more capable.
| IncreasePosts wrote:
| The saving grace of civilization is that, for the most part,
| terrorists are dumb.
| mmmore wrote:
| Unfortunately this is not always true. For example, one of
| the architects of the Tokyo subway sarin attacks[1], Masami
| Tsuchiya[2], had a masters in physical and organic chemistry.
|
| [1] https://en.wikipedia.org/wiki/Tokyo_subway_sarin_attack
|
| [2] https://en.wikipedia.org/wiki/Masami_Tsuchiya_(terrorist)
| pfisherman wrote:
| There is a big gap between a master's and a PhD, and then
| another between a PhD and a seasoned pro. To do something
| like a bioweapon, you would need a reasonably sized team of
| pros w/ a lot of capital intensive infrastructure. It would
| be virtually impossible to do in secret.
| IncreasePosts wrote:
| Yes, a lot of terrorists have engineering degrees also.
|
| But they're also dumb, which is why they think terrorizing
| random people will positively I prove the world in some
| direction they care about.
|
| I won't go into details, but I think if I had 19 dudes with
| a death wish in America, and a few million dollars, I could
| do something far worse than 9/11.
| sudosysgen wrote:
| The goal of an attack like 9/11 isn't really to kill the
| maximum number of civilians in order to terrorize random
| people.
|
| The attack had a significant degree of symbolism. The
| intended audience was twofold: the Western public and
| leadership, with a durable message that they weren't
| untouchable (hence the attacks on the Pentagon and
| attempt on the Capitol), hence targeting large landmarks;
| the combination of civilian and military targets was to
| signify that they held the two to he equivalent. Plans
| were actually presented to attack other targets that
| would lead to more casualties, notably a nuclear power
| plant.
|
| The other goal was to incite a religious conflict from
| the Muslim world against the US, and therefore probably
| from the US against as many Muslim countries as possible.
|
| So the primary goal really wasn't to kill as many random
| people as possible (though of course that was a
| consideration), it was actually to target the tallest
| buildings possible as well as the most important
| government institutions.
|
| Unfortunately, it really did move the world in the
| direction they wanted. Despite being extremely evil, they
| actually were remarkably successful at causing the social
| and geopolitical changes they wanted given the resources
| they had, and that caused yet more damage we shouldn't
| ignore. It also bears remembering (especially today) that
| terrorists often and unfortunately aren't as dumb as we
| think, and we underestimate them and simplify their
| motives to our peril.
| matrix2003 wrote:
| We already have some pretty horrific and documented/accessible
| bioweapons.
|
| This gets into the philosophy of restricting access to
| knowledge. The conclusion I keep arriving at is that we're
| lucky that there don't appear to be many Timothy McVeighs
| walking around. I don't think there is a practical defense from
| people like that.
| d_silin wrote:
| ...as difficult as discovering new medicines, you mean?
|
| Chemistry and molecular biology are fiendishly complicated
| fields, far more complex and less predictable than what general
| (and most of the non-biochem STEM majors) imagine them to be.
|
| How do I know? I thought of one brilliant startup idea that
| would solve so many of the world's problems if only we used
| computers to simulate biological systems.
|
| Result: https://xkcd.com/1831/
|
| Reference materials:
|
| https://www.amazon.ca/Molecular-Biology-Cell-Loose-Version/d...
|
| I strongly recommend to treat it as introductory-level text on
| the same level as "K&R - C Programming Language". Yes, all 1464
| pages of it.
|
| https://www.amazon.ca/Fundamentals-Systems-Biology-Synthetic...
|
| On the same level as above text, but with more math.
|
| https://www.amazon.com/Introduction-Computational-Chemistry-...
|
| That or any other book on computational chemistry will give you
| an understanding why it is difficult to design anything of
| value in biological systems. ML can only help so much.
|
| Also check out this page for entire field scope:
|
| https://en.wikipedia.org/wiki/Omics
| dekhn wrote:
| MBoC is more like Knuth's textbooks. It's a towering monument
| to the achievements of humanity over the past 150 years
| (molecular biology proper is less than 100 years old). As
| well as being highly accessible (readable).
|
| It's done in an interesting style, with lots of direct
| references to current literature. I was surprised to see a
| recent edition on IA: https://archive.org/details/alberts-
| molecular-biology-of-the...
| glowingvoices wrote:
| Thank you for the textbooks! I've started studying Molecular
| Biology of the Cell to prepare for undergrad, but this is the
| first time I've heard about the others.
|
| Are there any other books you would recommend?
| d_silin wrote:
| Search for "computational biology" on Amazon, but I'd say
| go first to online courses if you have time and commitment,
| like:
|
| https://www.coursera.org/specializations/bioinformatics
|
| https://www.coursera.org/specializations/systems-biology
|
| Also, checkout out
|
| https://www.coursera.org/courses?query=computational%20biol
| o...
|
| Then you will have a better understanding of the subject
| area and the literature to search.
| glowingvoices wrote:
| I'm still in high school, so I don't think I'll have time
| to fit the courses into my schedule. I'll definitely look
| for the books though! Thanks.
| m00x wrote:
| No, it's a very small piece for what you'd need to make
| bioweapons.
| echelon wrote:
| The science to restrict is molecular biology (bacteria) or
| virology, not applied mathematics (AI). These folks can
| _already do_ some wild things with the materials they have on
| hand and don 't need fancy AI to help them.
|
| Structure prediction is just one small slice of all of the
| things you'd need to do. Choosing a vector, culturing it,
| splicing it into an appropriate location for regulation, making
| sure it's compatible with the environment, making sure your
| payload is conserved, study the mechanism of infection and make
| sure all of the steps are unimpeded, make sure it works with
| all of the host and vector kinetics, study the pathology, study
| the epidemiology. And that's just for starters.
|
| This would require a team and an enormous amount of resources.
| People motivated enough to do this can already do it and don't
| need the AI piece.
| peterldowns wrote:
| If you're implying that the answer is "yes this is too
| dangerous", could you possibly give a few examples of
| technological developments that _aren 't_ "very dangerous to
| release publicly" by the same standard?
|
| For instance, would any of the following technologies be
| acceptably "safe"?
|
| - physical locks (makes it possible to keep work secret or
| inaccessible to the government)
|
| - solar power (power is suddenly much cheaper, means bad guys
| can do more with less money)
|
| - general workload computers (run arbitrary code, including bad
| things)
|
| - printing press (ideology spreads much more quickly, erodes
| elite hold over culture)
|
| - bosch-haber process (necessary for creating ammunition
| necessary to fight the world wars)
| mmmore wrote:
| You left out the most relevant comparison:
|
| - nuclear fission, which provides an abundant source of
| environmentally friendly energy, but allows people to make
| bombs capable of wiping out whole cities at once (and
| potentially causing nuclear winter)
|
| But even in that case, I believe that it's a good thing that
| we have access to nuclear power, and I certainly want us to
| use more nuclear power. At the same time, I'm very glad that
| a bomb is hard enough to make that ISIS couldn't do it, let
| alone any number of lone wolf terrorists. So I think I would
| apply the same logic to biotechnology; speeding up medical
| progress seems extremely valuable and I'm excited about how
| AF and other AI systems can help with this, but we should
| mitigate the ability for bad actors to use the same tools for
| evil.
|
| An aspect that's unique about biotechnology that's different
| in comparison to the examples you gave is that most of those
| technologies help good and bad people approximately equally,
| and since there's many more reasonable than crazy people
| they're not super dangerous.
|
| There's a concern that technologies that make bioengineering
| easier could make it easier to produce and proliferated novel
| pathogens, much more so than they make it easier to prevent
| pandemics; in other words, it favors "offense" more than
| "defense". The only one example you listed that has a similar
| dynamic in my mind is the bosch-haber process, but that has
| large positive downstream effects separate from its use for
| ammunition. Again, this is not to say we should stop medical
| progress, but that we should act to mitigate the dangers, and
| keep this concept in mind as the technology progresses.
|
| That said, I'm not certain how much the current tools are
| dangerous in this way. My understanding is that there is
| lower hanging fruit in mitigating these issues right now; for
| example, better controls at labs studying viruses, and better
| vetting of people who order pathogens online.
| dosinga wrote:
| The printing press indeed led to religious wars in Europe.
| The Ottomans banned it and avoided that fate. And the
| progress associated with it.
| dekhn wrote:
| Nobody has really been able to make a convincing argument
| whether these sorts of tools haven't lead to large-scale
| terrorism through bioweapons because the underlying problem is
| hard (for a sufficiently motivated adversary), or that
| terrorists don't have the resources/knowledges/skill, and as
| far as we can tell, the sufficiently motivated adversaries who
| have tried either failed, succeeded secretly, or were convinced
| to walk back from the brink due to the potential consequences.
|
| In short there are other ways to negatively affect large
| numbers of people that are easier, and presumably those avenues
| are being explored first. But we don't know what we don't know.
| crackalamoo wrote:
| Not a solution, but maybe if a bad actor tried to create a
| bioweapon, a trusted organization could use this technology as
| an antidote. Unfortunately this still leaves the possibility of
| some kind of insidious, undetectable bioweapon.
| cowsandmilk wrote:
| I think you overestimate the difficulty of discovering
| bioweapons. There is a reason toxicology is the dead end for
| tons of drug molecules. It is very easy already to design
| molecules that will kill someone.
| whymauri wrote:
| As someone who worked in molecular ADMET, this x1000.
| zan2434 wrote:
| This actually makes a lot of sense! Sounds like finding
| dangerous chemicals is easy and is not the actual limitation
| at all.
| emporas wrote:
| Even the word bioweapon is not accurate to describe a deadly
| (or harmful) biological agent. A weapon usually means that
| there is a source of deadly force, and a target. The source
| doesn't want to be hit by the same weapon it uses to hit
| others.
|
| This is vastly difficult to achieve using biology. Any
| organism on the planet has it's own agency, and it will hit
| anything to reproduce and eat. In addition this is not
| limited to toxicology and releasing toxins, because the agent
| can just eat tissue.
|
| For example phosphorus has been used in chemical warfare, but
| even that cannot be described 100% as a weapon. The
| phosphorus gas can hit people who released it the same as
| everyone else, it just depends on the wind.
|
| Right now, on everyone palms, there are thousands of
| organisms which create electricity, eat wood and kill
| animals. Given that the palms are washed, that number is
| reduced to some thousand different species. If the palms are
| not washed the last 24 hours, that number shoots up to
| hundred thousand different species, even millions.
|
| I do not see any difficulty for someone to enhance a harmful
| agent and make it deadly, using just regular computation and
| not even A.I.. However the person who facilitated this, will
| be a target too.
| f6v wrote:
| There's still a long way from in-silico prediction to wet-lab
| validation. You need a full-blown molecular biology lab to test
| any of these.
|
| Then again, you can just release existing dangerous pathogens.
| Like, poison a water with something deadly. So you don't need a
| new one if you're a terrorist.
| mmmore wrote:
| Does the use of "foundation" and "multi-modal" for describing
| this model mean anything, or are those just used as buzzwords?
| Funnily enough, the only place those terms appear in the paper is
| in the abstract.
|
| Also the paper says they basically copied the methods used for
| AlphaFold, but then included the ability to input language
| embeddings, and input some other side constraints that I don't
| have the biology knowledge to understand. They don't show any
| data that indicate how much these changes improve performance.
| They show a very modest improvement over AF3 (small enough that I
| would think it could be achieve through randomness/small
| variations in the training parameters). So I don't think this is
| very revolutionary, but I suppose it replicates AF3.
| dekhn wrote:
| If by "multi-modal", you mean "it takes several different
| datatypes as input or output", then yes, it's multi-modal. See
| Figure 1 in the Tech Report.
| alexk101 wrote:
| Foundational maybe isn't the best label for this kind of model.
| My understanding of foundational models is that they are made
| to be a baseline which can be further fine tuned for specific
| downstream tasks. This seems more like an already fine tuned
| model, but I haven't looked carefully enough at the methodology
| to say.
| lainga wrote:
| Would you then call it a buzzword, or is there some gentler
| excluded-middle interpretation of that word's application to
| the project?
| brookst wrote:
| It's about like referring to a famous person's red carpet
| attire as "off the shelf [designer name]". It downplays the
| effort that went into it more than anything.
| IanCal wrote:
| I don't think it's a particular buzzword here. They claim
| it's useful across a range of tasks, and that's the key
| part imo.
|
| Now, "predictions for parts of drug discovery" isn't the
| widest range, so perhaps you need to consider "foundation"
| as somewhat context dependent, but I don't think it's a
| wild claim. Neither "foundation" nor "fine tuned" are
| really _better_ than each other, but those are probably the
| two ends of a spectrum here.
|
| My get-out clause here is that someone with a better
| understanding of the field may say these are actually
| extremely narrowly trained things, and the tests are
| equivalent to multiple different coding problem challenges
| rather than programming/translation/poetry/etc.
| ashvardanian wrote:
| There is a pretty noticeable improvement for antibody-antigen
| interactions - looks like double-digit percents. Check out
| figure 4 here:
| https://chaiassets.com/chai-1/paper/technical_report_v1.pdf
| mmmore wrote:
| Figure 4 is comparing the model with itself, unless I'm
| misunderstanding it. The takeaway seems to be the model
| performs better if you give it extra "constraints", i.e.
| extra info already known about the protein.
|
| The table with a comparison to alpha fold gives a less than
| one percentage point improvement.
| marviel wrote:
| > We are releasing Chai-1 via a web interface for free, including
| for commercial applications such as drug discovery. We are also
| releasing the code for Chai-1 for non-commercial use as a
| software library. We believe that when we build in partnership
| with the research and industrial communities, the entire
| ecosystem benefits.
| dgfitz wrote:
| Is there some sort of betting line I can make money off with all
| this? "-150 a new model isn't released in the next month claiming
| it is currently the best at something" would let me retire years
| early.
|
| If there is another line that said "+500 thus model will be
| forgotten and useless in 6 months" could take my retirement from
| years to months.
| anitil wrote:
| I believe Manifold does this sort of thing, though I've never
| used it myself.
| tfehring wrote:
| Manifold [0] has markets on this sort of thing, but it
| primarily uses fake money. (They're working on a real-money
| "sweepstakes" thing, which I'm not super familiar with.) If
| you're outside the US and looking for a real-money market,
| Polymarket [1] is probably your best bet. In the US, real-
| money prediction market contracts are regulated by the CFTC
| in the US, so availability of contracts is pretty limited;
| Kalshi [2] would be the most likely option, but I doubt they
| have anything on this topic.
|
| [0] https://manifold.markets
|
| [1] https://polymarket.com
|
| [2] https://kalshi.com/
| pants2 wrote:
| Your best bet in the US is to use Polymarket with a VPN
| thefourthchime wrote:
| -180 it's a wrapper around alphafold with some pre prompt.
| talldayo wrote:
| > "-150 a new model isn't released in the next month claiming
| it is currently the best at something" would let me retire
| years early.
|
| An optimist and their seed funding are easily parted.
| xianshou wrote:
| In light of last week's fiasco with Reflection
| (https://venturebeat.com/ai/new-open-source-ai-leader-
| reflect...), I hope the community has a newfound enthusiasm for
| independent testing!
|
| This is extremely exciting news if true, so I'm eager to have it
| either confirmed or questioned. The one thing I hope we won't be
| doing is accepting SOTA evals from open-sourced models at face
| value.
| deisteve wrote:
| I don't know how people like Matt Schumer can attempt what
| looks like fraud and deception being chalked off as a giant
| oopsies (which isn't really convincing) and not face any
| consequences.
|
| For rest of us, this is a privilege that we don't have. We
| can't deceive, defraud our investors because it has real
| consequences....but not for people like Matt Schumer, why is
| that?
| Loughla wrote:
| Having mountains of money, in the US, is equated to being
| smart and better than. This means that failures, unless they
| purposefully exploit other better thans, are always
| forgiveable. Even when they're mildly intentional.
| mupuff1234 wrote:
| Pretty sure it's not a US only thing.
| mecsred wrote:
| Just imagine the legal system as a money duel. If you have
| little money you can be crushed at no cost. Trying to fight
| someone with big money, even if you're likely to win, will
| take a lot of time and money. Unless the fraud was black and
| white or you're in for the long haul it's easier just to lick
| the wounds.
| parentheses wrote:
| Does that logic apply to the State - usu plaintiff?
|
| Doesn't seem so since they have seemingly endless capital
| but have limits in what they can bring to bear. You tell
| me...
| nayroclade wrote:
| "The state" is not a monolith. Anti-fraud enforcement is
| handled by agencies with limited budgets and resources.
| Often they are deliberately underfunded and understaffed
| precisely so they cannot cause too much damage and
| embarrassment by going after really big targets.
| wslh wrote:
| Theranos everywhere? Except you can't afford to mess up when it
| comes to health.
| f6v wrote:
| Oh, pharma messes up all the time.
|
| But it's an interesting question. You can't be too risk-
| averse because there're thousands of patients dying horrible
| deaths every single day. There's simply a need for bold
| approaches in many areas of medicine.
| wslh wrote:
| I'd like to add a perspective that might not resonate with
| everyone, based on the famous quote: "Any sufficiently
| advanced technology is indistinguishable from magic." I
| sometimes adapt this to say: "Any sufficiently advanced
| technology is indistinguishable from a scam."
| pama wrote:
| The title in HN is inaccurate. Having a 1% higher score on one
| metric is not beating a previously published model. This is a
| replicate, which is fine enough.
| drob wrote:
| Fwiw, the authors never actually claimed this. From their
| technical report [0]:
|
| > Chai-1 achieves a ligand RMSD success rate of 77%, which is
| comparable to the 76% achieved by AlphaFold3
|
| [0] https://chaiassets.com/chai-1/paper/technical_report_v1.pdf
| dang wrote:
| Ah yes - thanks! We've changed it to the article title now.
|
| Submitters: " _Please submit the original source. If a post
| reports on something found on another site, submit the latter._
| " - https://news.ycombinator.com/newsguidelines.html
|
| (Submitted title was "Chai-1 Defeats AlphaFold 3")
| bbstats wrote:
| the error bars are like 5-10x the size of that 'defeat'
| trott wrote:
| I'm the author of AutoDock Vina (the most cited docking program,
| and the "runner-up" in the AlphaFold 3 paper)
|
| Docking software is used to scan millions and billions of drug-
| like molecules looking for _new_ potential binders. So it needs
| to be able to generalize, rather than just memorize.
|
| But the evaluation approach used here and in the original paper
| (1) does not test how well the software will perform on _novel_
| molecules, because the test set is related to the training set.
|
| If you understand the basics of ML and physics, you may be
| interested in my detailed critique here:
| https://olegtrott.substack.com/p/are-alphafolds-new-results-...
|
| I'm glad that Chai-1 has been released though, as this will
| probably help people evaluate the method better.
|
| (1) It looks like they are a bit different, as this paper allows
| 40% sequence identity. It's still high. I believe that sequences
| with 40% identity tend to have the same shapes, especially in the
| binding site, where it matters.
| uptownfunk wrote:
| Thanks for your work and also for your comments of AF3 and
| Chai-1. It sounds like you are implying there are potentially
| gross and subtle types of data set leakages taking place
| between the train and test which are resulting in what seem to
| be inflated performance metrics? These are pretty serious
| issues if so. Also I would agree with previous authors that
| marginal Improvement over sota is proof more that they have
| recreated something than really made significant new progress.
| But this has been an issue with LLMs for sometime now. But it
| sounds like they have some bright engineers from good brand
| name companies who are coming together with some VC backing of
| the team to try and do something in this space. I do appreciate
| that the weights are open. I would like to learn more about
| their future direction and their training methods
| mandoline wrote:
| This is an exciting result - but knowledge of protein structure
| is usually not a limiting factor in drug discovery:
| https://www.chemistryworld.com/opinion/why-alphafold-wont-re...
|
| Would be interesting to try to estimate the impact of results
| like these across the drug development pipeline.
|
| E.g. N% improvement on our most predictive benchmarks X, Y, Z
| could impact clinical success by M% +- E% (where E would likely
| be quite large).
| LarsDu88 wrote:
| I was just working on a small protein diffusion model and felt
| bad when I started copying and pasting quaternion functions from
| pytorch3d to avoid dependency hell.
|
| Lo and behold I see Chai did the same shit in their repo. Lol
___________________________________________________________________
(page generated 2024-09-11 23:01 UTC)