[HN Gopher] Making the collective knowledge of chemistry open an...
___________________________________________________________________
Making the collective knowledge of chemistry open and machine
actionable
Author : bryanrasmussen
Score : 81 points
Date : 2022-06-14 20:53 UTC (3 days ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| pfisherman wrote:
| Good luck with that...lol. The ontological / informatics space
| for chemicals is a mess.
|
| To make the collective knowledge of chemistry open and available,
| you need to represent, organize, and index it. This problem is
| not as sexy, but it is orders of magnitude more important.
| convolvatron wrote:
| this is a huge problem. arguably one of the primary technical
| reasons that 'web 2.0' was such a dud.
| [deleted]
| gtmitchell wrote:
| Chemist here. Every few years, someone has the novel idea that we
| should have open data for all chemistry laboratories, so then we
| can do Better Science. And like every other proposal I've seen,
| this one will get approximately zero traction because it doesn't
| address any of the core issues behind why laboratory data is
| currently closed.
|
| I try not to be too pessimistic about it, because it really would
| be great if there were more open chemical data. I just really
| doubt anything could accomplish that without remaking the US
| university research system from top to bottom.
| bjelkeman-again wrote:
| What are the core issues?
| mint2 wrote:
| Probably dealing with enough meta data to capture the stuff
| like the reaction only works because the supplier of one of
| the reagents used by that lab had ppm copper impurities
| gtmitchell wrote:
| Off the top of my head:
|
| -Academic researchers are already overworked, underpaid, and
| undertrained. Asking them to spend even more of their time to
| meticulously upload all their notes and data to an electronic
| notebook is going to be an uphill battle.
|
| -Academic scientists live or die by their ability to publish.
| Open data, especially if you're sharing in real time, makes
| you vulnerable to being scooped by competing researchers.
| Even disclosures of data after the fact make it easier for
| others to benefit from work you did with no benefit to the
| ones who collected the data. Given how cut-throat academics
| is, you're also not going to get many researchers on board
| with this idea.
|
| -Interoperability of most laboratory software is poor. People
| have been trying to get laboratory instrument manufacturers
| to support open data standards for years with little success.
| They don't have any financial incentive to allow competitors
| to have easy access to their data.
| Hellbanevil wrote:
| If I was in charge of granting any federal grants; I would
| demand the recipients open source the data, and upload
| everything in a orderly manner.
|
| It would just be if you want this money do the above.
| JPLeRouzic wrote:
| > _Open data, especially if you 're sharing in real time,
| makes you vulnerable to being scooped by competing
| researchers._
|
| Why did something like standards and patents didn't emerge
| in the scientific world?
| airstrike wrote:
| No economic incentive
| barry-cotter wrote:
| The scientific world rewards people in glory and honor
| much more than money. If you want more money go
| corporate. If you want to reward people more with money
| then they'll pay less attention to the glory but that's
| really expensive.
| BenoitP wrote:
| There are initiatives in the EU to require -by law- that if
| it's public research, then it must be released to the public.
| And there are official guidelines on how to do so:
|
| https://hal.archives-ouvertes.fr/hal-03318932
|
| I believe such an initiative for chemistry could very well
| succeed, even if it takes 10 years.
|
| Hopefully this can percolate to other countries and continents
| too, through EU's normative power.
| elcritch wrote:
| That could be very valuable. In many ways it's like material
| science and parts of chemistry are skimping along on the
| fumes of basic science done in the 1950's up to the 70's at
| national labs. Good experimentalists made solid careers doing
| core research without chasing endless grants or the latest
| fads. Seems pretty much all publicity available chemical and
| material databases comes from that era. Some specialty areas
| have progressed way beyond that but it's rarely
| systematically collected, unless you're willing and able to
| pay lots of money for private databases. Those private
| databases of course largely build from publicly funded
| research.
|
| I hope this pans out.
| cellis wrote:
| Can someone with more knowledge of Chemistry enlighten me why
| chemistry experimentation isn't the killer app for the Metaverse,
| at least for low-order reactions? I know the e.g. protein folding
| class of problems are prohibitively computationally expensive,
| but surely there's some low hanging fruit?
| photochemsyn wrote:
| If you're talking about computational modeling of chemical
| reactions, for example getting a computer to figure out a novel
| low-cost synthesis route for an important molecule, well...
| This becomes incredibly complicated very quickly. It's
| generally more likely to get a result using the traditional
| experimental methods, with some exceptions for very small
| molecules perhaps.
|
| The field of physical inorganic/organic chemistry is one of the
| more difficult ones to build accurate models for. A first step
| is to calculate the electronic structure of products,
| reactants, possible intermediaries, and this blows up fast for
| even moderately complex molecules. A lot of work has been done
| with simpler systems like 2 H2O -> 2 H2 + O2 but even that's
| ridiculously complicated, as you have to model the catalyst and
| the surrounding environment as well, and then get the kinetic
| model right. The computational power required is on the
| supercomputer scale, and the level of background knowledge
| required is pretty high to even start to implement something
| like that, for a taste see:
|
| https://h2awsm.org/capabilities/dft-and-ab-initio-calculatio...
|
| This is an area where quantum computers may have applications
| (2021):
|
| https://www.energy.gov/science/ascr/articles/quantum-computi...
| ur-whale wrote:
| This kind of endeavor should be a common theme to all science,
| not just chemistry.
| shpongled wrote:
| It's certainly a goal to work towards. However, it's pretty
| difficult to build One ELN to Rule Them All given how flexible
| many kinds of biological experimental designs are - especially
| when you're working on the bleeding edge.
|
| A good first step is to require supplemental materials are
| published in a machine readable format (e.g. not manually
| thrown together Excel files that lack any kind of normalization
| or rational schema)
| ur-whale wrote:
| But then there are things like GPT-3 , which means stashing
| everything in a rigid schema isn't as hard-core of a
| requirement as it used to be.
|
| OTOH, facilitating: 1. access to the raw
| data 2. access to the metadata 3. access to
| the source code of whatever software was used / created to
| run the experiment 4. making sure everything is
| computer readable (i.e. not a 256x128 graph as a PNG embedded
| in a bloody PDF)
|
| should be a requirement for any scientific publication worth
| its salt.
| abraxaz wrote:
| > it's pretty difficult to build One ELN to Rule Them All
| given how flexible many kinds of biological experimental
| designs are - especially when you're working on the bleeding
| edge.
|
| RDF is quite flexible and using a combination of domain
| specific ontologies like cheminf[1] and other top level
| ontologies like BFO[2] should allow you to capture most of
| the semantics.
|
| [1]: https://www.ebi.ac.uk/ols/ontologies/cheminf [2]: https:
| //en.wikipedia.org/wiki/Basic_Formal_Ontology?wprov=sf...
| apienx wrote:
| "Alchemists turned into chemists when they stopped keeping
| secrets." -- Eric S. Raymond
|
| Open Science (in the publishing sense) used to be fringe just a
| decade ago. It's very much mainstream now.
|
| Open Data will be a much tougher (and long-term) battle, but it's
| inevitable.
| photochemsyn wrote:
| The notion of open-source scientific discovery is a good one, but
| some of the suggestions here seem very unlikely to catch much
| traction, and even if they do, problems will remain.
|
| For example, say an academic chemical research group synthesizes
| a series of novel compounds in the lab - they're not going to
| just release the raw data on everything they did immediately. The
| thinking might be, 'we can give this MS student this compound to
| work out a better synthesis route for, or this pHD student can
| try to extend the synthesis and make other compounds'.
|
| A more realistic scenario mentioned in the article would be to
| require publication of the raw data to a database as a condition
| of publication. This is already done to some extent in journals,
| but materials and methods sections are notorious for leaving out
| some key factor or other, meaning repeatability is an issue and
| other labs will generally only try to replicate the more
| interesting results (possible new antibiotic, etc.).
|
| This worked out fairly well with GenBank, the database of
| published gene sequences, and also with the protein
| crystallography databases, but everyone in the molecular biology
| world knows that all sequence data is not of the same quality,
| and so cross-referencing by the more reputable researchers and
| reading their papers to see if their methods are transparent and
| robust or not is still an important step. A database clogged with
| low-quality data isn't as valuable as a more carefully curated
| one, certainly.
|
| It would be nice though, to have a database where you could look
| up everything there is to know about something like the
| antibiotic ciproflaxin, including all the spectral identification
| data, optimal reaction conditions, etc. - but this is also a
| molecule that researchers are busy making derivatives of, likely
| with the hopes of patenting some novel new knockoff and getting
| an exclusive license distribution deal with a major pharma corp,
| and so they won't be releasing any data, or even publishing in a
| timely manner (at least not until the patent application goes
| through, and maybe not even then).
|
| That leads to a controversial question: should research
| universities and academics financed by taxpayers behave like for-
| profit startups pitching to a VC outfit?
| statuslover9000 wrote:
| For chemical reaction prediction, see the Open Reaction Database,
| a collaboration including the Coley lab at MIT (surprisingly not
| cited by OP):
|
| Paper: https://pubs.acs.org/doi/10.1021/jacs.1c09820
|
| Docs: https://docs.open-reaction-
| database.org/en/latest/overview.h...
|
| It's an incredible effort to collate and clean this data, and
| even then a substantial portion of it will not be reproducible
| due to experimental variability or outright errors.
|
| For computational methods development it's extremely useful,
| maybe even necessary, to have a substantial amount of money and
| one's own lab space to collect new data and experimentally test
| prospective predictions under tightly controlled conditions. The
| historical data is certainly useful but is not a panacea.
| mlinksva wrote:
| Relatedly (and also not citing) from a couple weeks ago
| https://news.ycombinator.com/item?id=31566200 Call for a Public
| Open Database of All Chemical Reactions
| RationPhantoms wrote:
| It would be wonderful to see something like the Materials Project
| (https://materialsproject.org/) but for Chemical
| research/knowledge.
| JPLeRouzic wrote:
| Can someone in the field explain how this "machine actionnable"
| would be different from Galaxy Pipeline [0], or a Chemputer [1]?
|
| [0] https://en.wikipedia.org/wiki/Galaxy_(computational_biology)
|
| [1] https://www.chem.gla.ac.uk/cronin/news/cronin-group-
| builds-c...
___________________________________________________________________
(page generated 2022-06-17 23:01 UTC)