[HN Gopher] The business of extracting knowledge from academic p...
       ___________________________________________________________________
        
       The business of extracting knowledge from academic publications
        
       Author : kevin_hu
       Score  : 253 points
       Date   : 2021-12-08 03:49 UTC (1 days ago)
        
 (HTM) web link (markusstrasser.org)
 (TXT) w3m dump (markusstrasser.org)
        
       | anyfactor wrote:
       | > Why purchase access to a 3rd party AI reading engine or a
       | knowledge graph when you can just hire hundreds of postdocs in
       | Hyderabad to parse papers into JSON? (at a $6,000 yearly salary)
       | 
       | I really like jobs people think AI can do in theory but can't
       | really do them effectively irl. Where do I get a part-time gig
       | like that if I think I am capable of reviewing and creating
       | summary of non-STEM papers? Except for homeworks and assignments
       | of course.
        
         | Mezzie wrote:
         | Yeah, you can't outsource that to Hyderabad. You'd need to know
         | subject knowledge + very specific English and possibly other
         | languages depending on the field (not saying Indians can't do
         | this, but I've studied enough languages to know that doing high
         | level/academic work in a non-native language is hell even when
         | the language is pitched to students).
         | 
         | And you'd have to know enough about the process and authors to
         | know what makes papers relevant. The metadata matters as much
         | as the data.
        
           | anyfactor wrote:
           | All good points. But you do have to recognize the tradeoff.
           | Has AI come so far that it could perform better than industry
           | specific human intelligence? You have to consider that maybe
           | some Indian researchers could review the papers as they are
           | doing that job as part time gig.
           | 
           | You have to test out both solution. And as these jobs are
           | treated as contracts there is no significant commitment for
           | choosing one over the other. We can't be certain if one
           | method is better than the other without trying both of them
           | out without prejudice.
           | 
           | I, for one am agnostic about either choice. Because AI is
           | overhyped yet it has spillover benefits as a marketing-sales
           | point but offshore human intelligence has a bad rep but could
           | be effective if you have proper documentation, supervision
           | and review framework.
        
             | Mezzie wrote:
             | Oh yeah, I was just thinking currently. In five to ten
             | years once AI/ML/etc. trickle out of tech/theory spaces and
             | starts to be combined with subject expertise, I think we'll
             | see really interesting things.
             | 
             | The other matter is that an Indian who could review papers
             | that well would also cost more than 6k/year and would not
             | be easily replaceable, which eliminates the main benefit of
             | outsourcing for a company trying to operate in such a way
             | in 2021.
             | 
             | In 2030? I'd say the odds are if somebody in Hyderabad can
             | do that then they can start their OWN company rather than
             | bother with us at all. Honestly, given India's role in
             | pharmaceutical manufacture, I'd be shocked if things like
             | that don't start popping up.
        
       | PaulHoule wrote:
       | 1. The real value is in operational documents such as clinical
       | notes, maintenance records, soldier and police notebooks, etc.
       | this info is proprietary to a organization and its partners and
       | is directly linked to how it produces and pays for value.
       | 
       | 2. Superhuman accuracy at limited tasks is not good enough. For
       | instance transcribing audio at 95% word-level accuracy would be
       | good for a human but it means every other sentence is garbled.
       | People communicate despite this because they ask questions. A
       | useful text-to-structured information tool has to exert back
       | pressure on bullshit, give feedback about what it understands and
       | push the author to tell a story that makes sense and has adequate
       | detail.
        
       | PaulHoule wrote:
       | When you count all the ways to be wrong, the median scientific
       | paper is wrong.
       | 
       | In biomedical fields they dismiss more than half of papers out of
       | hand when they do a Cochrane meta analysis. It begs the question
       | of why such papers (which aren't fit to extract knowledge from)
       | are published or funded at all.
       | 
       | I got a PhD in theoretical physics and routinely found that
       | something was wrong on page 23 of a 50 page calculation and
       | nobody published anything about it in 30 years. Possibly the
       | whole body of work on string theory since 1980 [most of hep-th]
       | is a pipe dream at best. Because young physicists have to spend
       | the first third of their career in a squid game fight for
       | survival not to fathom the secrets of the universe but to please
       | their elders we get situations like that absurd idea of Steve
       | Hawking that information gets lost in a black hole. (E.g. if you
       | believe that you aren't even going to try quantum gravity.)
        
       | throwaway984393 wrote:
       | Science is not about innovation. Science is about tiny little
       | results that by themselves have no immediate benefit, but slowly
       | improve our overall understanding, and eventually lead to an
       | unexpected benefit. Science is not developed in order to solve a
       | business problem - it is purely an advancement in overall
       | knowledge of the world (the traditional aim of natural
       | philosophy). In this sense, science is not compatible with
       | business interests.
        
       | Nalta wrote:
       | For anyone interested, my whole PhD was in biomedical hypothesis
       | generation! I think the most "serious" attempts at building these
       | systems have been focused around providing assistance to
       | scientists, and not just coming up with new ideas on their own.
       | 
       | here's an actual medical paper that my first system, Moliere, was
       | able to help discover:
       | 
       | https://link.springer.com/article/10.1007/s11481-019-09885-8
        
         | julienchastang wrote:
         | Is your PhD thesis online anywhere?
        
           | Nalta wrote:
           | https://sybrandt.com/documents/dissertation.pdf
        
       | Kydlaw wrote:
       | Interesting insights. Particularly on the business aspect. But I
       | am not surprise by the outcome, as the author said: nobody want
       | to pay for what its proposed in academia. Everybody is already
       | more or less struggling with funding, so nobody want to add extra
       | fat in their funding requests.
       | 
       | Coming from CS, something I would really like to see though is a
       | tool that would summarize a scientific area/domain. Something
       | that would kill literature reviews and/or would provide an
       | overview of the hot topics/open questions in different areas.
       | 
       | Edit: corrections
        
       | PaulHoule wrote:
       | This paper touches one aspect of it, which is that the source
       | material is bad, but it doesn't even start on the fact that the
       | tools aren't good enough and that many of the fashionable ideas
       | (Word embeddings) are dead ends.
        
       | tomlue wrote:
       | Knowledge extraction is weird. Just because I extracted some
       | knowledge doesn't mean that I now 'have' that knowledge.
       | 
       | The better use case for this is teaching, not creating knowledge
       | bases that nobody will use.
        
       | holub008 wrote:
       | > Close to nothing of what makes science actually work is
       | published as text on the web
       | 
       | Unless there's some nuance I missed, I immensely disagree with
       | this statement.
       | 
       | I'm currently in the biomedical literature review space, and I
       | appreciate the detailed insights. I wonder if the author
       | considered that literature review is used in a wide variety of
       | domains outside pharma/drug discovery (where I perceived their
       | efforts were focused). Regulatory monitoring/reporting, hospital
       | guideline generation, etc.
       | 
       | This is a billion dollar industry, and I couldn't agree more that
       | it's technologically underdeveloped. I do not agree that AI-based
       | extraction is the solution, at least in the near-term. The formal
       | methodologies used by reviewers/meta-analysts: search strategy
       | generation, lit search, screening, extraction, critical
       | appraisal, synthesis/statistical analysis, are IMO more nuanced
       | than an AI can capture. They require human input or review. My
       | business is betting on this premise :)
        
       | woliveirajr wrote:
       | > This post is about the issues with semantic intelligence
       | platforms that predominantly leverage the published academic
       | literature.
       | 
       | I was happy to see apost that clearly states its purpose.
       | 
       | edit: misspelling
        
       | [deleted]
        
       | cousin_it wrote:
       | I can confirm that in my current area of interest (how to
       | synthesize a cello or saxophone sound), there are hundreds of
       | academic papers published over decades, each of them says "our
       | method sounds more realistic than others", but code and audio
       | samples are never available, and verbal descriptions always skip
       | crucial details. I have no doubt that academics have a ton of
       | expertise, but their output in paper form is basically unusable,
       | I'm not sure it achieves any purpose besides resume padding.
       | Reading a forum of synth hobbyists is a hundred times more
       | useful.
        
         | dekhn wrote:
         | Hey, that unusability of papers is a form of job security.
         | 
         | Seriously though, you're totally right. I got very dissatisfied
         | with science when I realized that many people were effectively
         | publishing unreproducible crap created by terrible code.
         | Fortunately, more and more people are learning how to recognize
         | the crap.
        
         | tchalla wrote:
         | > I have no doubt that academics have a ton of expertise, but
         | their output in paper form is basically unusable, I'm not sure
         | it achieves any purpose besides resume padding.
         | 
         | If you think the entire field of academia doesn't achieve any
         | purpose, you may want to reconsider your position. Most likely,
         | almost everything that you do today on a computer was an
         | academic paper. Yes, it was without code and data. Yet, it was
         | not unusable and achieved more than enough purpose.
         | 
         | The average comment on HN on academia comes from a mindset
         | where everyone wants a product. The purpose of a paper is NOT
         | to release a software or a product. But, to test an idea under
         | some assumptions. That's what all research does at its core -
         | formulate a hypothesis, design an experiment to test the
         | hypothesis and report the results and implications. Are all
         | research papers perfect"? No. Are all of them usable? No.
         | 
         | Your use case - sound synthesis for a specific instrument - may
         | not be a scientific challenge. It is however an engineering
         | challenge and hence, you found a better answer amongst
         | hobbyists and tinkerers. Now, try looking for the a vaccine for
         | Covid - and guess where you'd find that answer? In decades of
         | research on mRNA with repeated failures, papers that couldn't
         | be replicated, unavailability of "code" and samples with verbal
         | descriptions skipping crucial details.
        
           | tvhahn wrote:
           | Balaji Srinivasan had a good take on this recently in his
           | conversation with Tim Ferris. I quote:
           | 
           | "The thing is, I don't care if something has a thousand
           | retweets, what I care about is if it has two or three
           | independent confirmations from economically dis-aligned
           | actors. This is the same as academia, by the way, everybody's
           | optimizing citations. What you actually want to optimize is
           | independent replication. That's what true science is. It's
           | not peer review. It is physical tests."
        
             | czzr wrote:
             | Yes and no. Literal replications are less valuable than
             | people think - what you really want are independent tests
             | of different parts of the causal network of the underlying
             | model.
        
         | jakub_g wrote:
         | I'm an outsider but it seems to me the difference between
         | academia and opensource/hobby forums is massive:
         | 
         | In opensource the attitude is "See bug? Send a PR!"
         | 
         | Whereas academic papers are like publishing software into a
         | blockchain (and not source but binaries, i.e. PDFs full of
         | shortcuts): you don't want for people to easily find bugs and
         | contribute fixes, so you handwave a lot so that no one can
         | reproduce your exact thing.
        
           | remram wrote:
           | The biggest difference IMHO is when comparing to something
           | like Wikipedia or Stackoverflow. I wish the fabric of
           | scholarly communication similarly allowed for browsing
           | reviews, updating papers, commenting with new references,
           | etc.
        
             | [deleted]
        
             | Mezzie wrote:
             | I think this is a valuable idea. There are online archives
             | that allow for paper updating for academics, like SSRN, but
             | as a CONSUMER of academic literature, the land is pretty
             | barren.
             | 
             | The difficulty in such a thing would be the journals and
             | database companies are holding on to their exclusivity and
             | profit motives with an iron fist, so unless you want to get
             | sued into oblivion, you'd have to stick with open source or
             | accessible articles, so you'd need to either specialize in
             | disciplines that have moved away from closed-source enough
             | that the tool wouldn't have massive holes in it.
             | 
             | Also determining which new references and reviews have
             | relevance (like if anybody can comment with new references,
             | who goes through to check they're actually relevant or say
             | what the person says they say?), preventing
             | academics/administrators from gaming the system if it DOES
             | get popular, etc. In open source, this is crowd-sourced,
             | but for some academic fields the number of people who are
             | qualified to speak on a matter is extremely small.
             | 
             | /academic librarian thoughts
        
             | kwertyoowiyop wrote:
             | Now THAT might be a realistic technical goal & business
             | opportunity.
        
               | Mezzie wrote:
               | The legal costs make this a non-starter unless it's done
               | by a giant company. Who would, in my opinion, ruin it,
               | and the odds of enough academics complying with a big
               | tech company are small imo.
               | 
               | It'd be viable for fields that don't use/rely on for-
               | profit or closed journals, but I don't know if the money
               | to run it would be there, especially since the odds of
               | the big Schol Comm players suing is still there, because
               | it'd be worth it to ruin the tool/effort before it can
               | challenge them.
               | 
               | Building this would be my dream job, but hahaha no.
        
           | kovvy wrote:
           | Generally, anyone writing a paper about something that could
           | benefit from bugfixes would love to accept them, but doesn't
           | have the time or resources to actually do so - unless there's
           | another paper in it. If they have somehow managed to find
           | enough personal time to have a hobby project, then they
           | probably do accept bugfixes - and you should get them in
           | before that person burns out.
        
             | Fomite wrote:
             | It also doesn't happen enough to design for - I once
             | presented a fairly open-source contributor friendly project
             | at SciPy that I hoped would be compelling (it was about
             | modeling the zombie epidemic), actively asked for help, had
             | set up a couple open requests of varying levels of
             | complexity.
             | 
             | I think there was one pull request total?
             | 
             | The juice just didn't end up being worth the squeeze.
        
           | zozbot234 wrote:
           | > In opensource the attitude is "See bug? Send a PR!"
           | 
           | More like "What works: You tell me!" and "Kindly fix this bug
           | plz sar."
        
         | kkylin wrote:
         | I'm an academic (applied math) and want to respond to this:
         | academic papers are the way they are for lots of reasons, many
         | of which (not so good) have been mentioned on HN. There are a
         | couple that I do not see very often however:
         | 
         | (1) Many academics aren't aware non-academics read their papers
         | at all: we work with other academics, go to conferences with
         | other academics, and on the rare occasions we hear from
         | readers, it's from other academics. Big exception: in some
         | fields academia and industry have much more interaction,
         | biomedical research (the subject of the linked article) being
         | one of them. Extracting knowledge from that literature has a
         | large number of practical and economic implications.
         | 
         | (2) There seems to be a perception that published papers are a
         | repository of established or state-of-the-art knowledge.
         | Perhaps they were meant to be that way, and perhaps more of
         | them should be. But for many journals in many fields,
         | publications are a form of moderated discussion. Reconstructing
         | the state of knowledge from snippets of conversation is always
         | going to be hard.
         | 
         | What can help make the literature more accessible? some of the
         | forces are structural, some are due to current limitations of
         | technology. But one thing that can help if you find the results
         | of a paper interesting and are able to track down the authors,
         | write them. People like hearing their work is noticed, and they
         | like talking to people about things they're interested in.
         | 
         | Another is to make constructive suggestions (or even pitch in
         | to improve code where it's open source & available). Between
         | teaching, advising, committee work, etc (not to mention
         | family), most of us have to prioritize, and as much as I'd like
         | to clean up old code for release in the hopes someone finds it
         | useful, it isn't going to get my grad students out the door
         | with a degree or a job -- I'm generally spending more time on
         | their research problems these days than my own. But if I know
         | there's interest / use I might prioritize time a little
         | differently.
        
           | Tomte wrote:
           | > What can help make the literature more accessible?
           | 
           | Review articles, sometimes called surveys.
           | 
           | I've always thought that new PhDs would be excellent authors
           | for those, having digested lots literature for their
           | dissertation.
        
             | lnwlebjel wrote:
             | Also, I believe there is a hierarchy that goes something
             | like: academic papers -> review articles -> specialized
             | books -> text books.
             | 
             | The text changes to fit the audience, and the knowledge
             | becomes more accepted (and or fundamental) further down the
             | line.
        
             | wheelinsupial wrote:
             | > Review articles, sometimes called surveys.
             | 
             | Is this field specific? I have read survey articles in math
             | and biology, and was told by some of my profs that they use
             | these articles as an introduction to a new field.
             | 
             | A quick Google search seems to show these exist in CS
             | (along with tutorial papers), physics, and chemistry but
             | I'm having a little difficulty finding statistics survey
             | papers (survey methods come up instead).
             | 
             | Is the problem that there aren't enough of them or they are
             | behind paywalls?
        
               | tenkabuto wrote:
               | For Stats, check out
               | https://www.annualreviews.org/journal/statistics
               | 
               | Please suggest others if you find them.
               | 
               | Annual Reviews has a bunch of journals for surveys of
               | various fields. Most of them are paywalled, but there's
               | ways around that.
        
             | ska wrote:
             | > I've always thought that new PhDs
             | 
             | A well written PhD or MSc thesis is often the best way into
             | a new field, ime. If the committee is good on this aspect
             | they'll insist you've put enough detail in for someone to
             | follow along mostly self contained.
        
           | markusstrasser wrote:
           | You hit the nail on the head. Will put some of that in the
           | appendix of the post!
        
         | captainmuon wrote:
         | I may be a bit cynical, but at least in my former field
         | (experimental physics), the main purpose of papers seems to be
         | to "lock in" a finished achivement. You do the actual research,
         | pass internal reviews and peer review, and then publishing the
         | paper is just to make it "official". Unfortunately, many papers
         | are never expected to be read. The crucial information exists,
         | but you usually get it from personal communication, internal
         | wikis, or review articles. You just need the paper to copy a
         | formula or graph, and to cite it in the end.
         | 
         | There _are_ papers that are well-written and useful, but there
         | are at least as many that are just drivel (I probably
         | contributed to both kinds).
         | 
         | Unfortunately, the prevailing attitude is that outside people
         | will not understand our stuff anyway, so we often make no
         | effort to make papers understandable, or to publish data.
         | (There is a lot of great outreach and science communication,
         | but not so much for students or researchers from other fields
         | who want to follow the technical details.)
        
           | dsizzle wrote:
           | Counterpoint: citations are a valuable currency in science.
           | Arguably one of the best ways to earn citations is to do good
           | work and write clear papers.
           | 
           | Not saying incentives are perfectly aligned -- many citations
           | are superficial ("this topic was studied before"), and papers
           | count for a lot even if they're never cited, etc
        
           | temporaryi3 wrote:
           | I did my phd in experimental physics and I have to say my
           | realisation of this large point, that papers are little more
           | than resume padding to lock in an achievement was a
           | significant contributor towards destroying, and I use
           | destroying seriously here, any faith or trust that peer
           | review or publishing has anything at all to do with the
           | scientific method at all.
           | 
           | Your results replicate, or they don't. Your calculations,
           | equations, and models predict experiment. Or they don't.
           | 
           | Writing papers about it and getting the feedback of "peers"
           | is nothing more than an old fashioned circle jerk for padding
           | resumes, CVs, and persuading other people in that academic
           | hierarchy that you deserve funding. It is a game that is
           | divorced from actually learning, researching, understanding,
           | measuring, and predicting the world.
        
             | orbifold wrote:
             | In academia there always is a difference between the way
             | results are advertised and what conclusions are drawn
             | internally. This is more true in some fields than others,
             | I'm most familiar with it ML, Physics. Part of your skill
             | as a researcher is to understand based on omissions, the
             | datasets etc. the quiet part that isn't said out loud.
             | Depending how you sell things you can get a Nature /
             | Science paper with confusing inconsistent terminology, hand
             | rolled C++ implementation, provided you are the first and a
             | another method which might be 1000x times faster will only
             | make it into PRL (yes I'm thinking of two specific papers,
             | but won't say which).
        
             | cyanydeez wrote:
             | There's probably space for a startup that properly archives
             | the technical nature of findings.
        
             | paufernandez wrote:
             | +1
        
         | [deleted]
        
         | ska wrote:
         | > but their output in paper form is basically unusable
         | 
         | Others have commented as well but I will reinforce: their
         | output is basically unusable for you for the purpose you want
         | to put it to.
         | 
         | Which is fair, but you should also recognize that you are not
         | the audience of the papers and for good or for ill the system
         | is not set up to help you with this.
        
       | javajosh wrote:
       | Don't know much about this industry but yes, it feels like one of
       | those industries that sprang up because one person with money
       | said, "Hmm, sounds like a good idea," and then other people with
       | money and FOMO joined in. When this happens past a certain level
       | you get a miniature innovation bubble (MIB)!
       | 
       | (At least MIBs are rather harmless, at least in the long run, and
       | can actually yield some benefit: innovative people are drawn to
       | these types of industries and inevitably create cool things as a
       | by-product of their work.)
        
       | pezzana wrote:
       | > My biggest mistake was that I didn't have experience as a
       | biotech researcher or postdoc working in a lab.
       | 
       | That is is big problem - good to recognize it as such.
       | 
       | I can tell because the article, though lengthy, never seems to
       | state an explicit problem to be solved. Rather, various ways to
       | apply technology to a field are discussed.
       | 
       | This is a recipe for failure. You need 3 things:
       | 
       | 1. a problem to be solved
       | 
       | 2. a customer who has that problem
       | 
       | 3. money in the customer's pocket waiting to be transferred to
       | yours
       | 
       | The article never even gets to (1).
       | 
       | Regarding (2), if academic groups are the target customer, you're
       | going to have a bad time. They have little money and they tend to
       | be all to happy to build something that sort-of replicates the
       | commercial product you've created for them.
       | 
       | This leaves scientific for-profit companies. They have lots of
       | problems (and these days money), but these problems tend to be
       | quite difficult to discover and solve because of the extensive
       | domain and industry knowledge required.
        
         | tvhahn wrote:
         | Yes. I wonder if the author had first worked on the patent side
         | (I'd be interested to hear more about this idea). Perhaps
         | working on patents first would be a path to get experience (and
         | product-market-fit). From there, one could branch out into
         | other domains (e.g. bio).
        
       | toss1 wrote:
       | And the bottom line is:
       | 
       | >>... nothing of it will go anywhere.
       | 
       | >>Don't take that as a challenge. Take it as a red flag and run.
       | Run towards better problems.
       | 
       | Wow, speaking of the value of negative results, that is hugely
       | valuable! Could easily save person-decades of work & funds for
       | more productive results.
       | 
       | The insights that the most relevant knowledge is not written into
       | the publications (for a variety of reasons), and that the few
       | that are are of limited use to the target audience, and even when
       | it is useful it is a small part of the workload (i. e., not a
       | real pain point), are key to seeing that the entire category of
       | projects to extract & encode such knowledge is doomed.
        
       | bryanph_ wrote:
       | One thing that strikes me about most academic knowledge tools is
       | that they seem to focus on parsing the current set of academic
       | literature and producing supposedly interesting insights out of
       | them (which quickly tends to snowball into wanting some kind of
       | generalized model for knowledge as a whole). What I think is much
       | more interesting is creating tools that help people create better
       | academic writing in the first place (thinking tools if you will).
       | This is however much more a UX problem rather than it being a
       | pure engineering problem. That is why I think we see many more
       | tools in the knowledge extraction space as most academics
       | thinking about these kind of things probably have an engineering
       | background. That combined with the the fact that we seemingly all
       | want to throw machine learning at any problem we encounter.
        
         | eurasiantiger wrote:
         | It likely wouldn't take much to craft an "arXiv Copilot" out of
         | GitHub Copilot.
        
           | TOMDM wrote:
           | "It's well understood how to"
           | 
           | And
           | 
           | "It likely wouldn't take much to"
           | 
           | Are worlds apart in this case, training and deploying models
           | on that scale is a huge investment, even if you already had
           | all the code and cleaned training data.
        
             | Mezzie wrote:
             | Can confirm: This is my main tech interest at the moment
             | and if I consider how long it's going to take, I want to
             | die.
        
         | a_bonobo wrote:
         | As an ECR with English as a second language, the paid version
         | of Grammarly has clarified my writing quite a bit. I think
         | there's more unexplored value in this space.
        
         | grlass wrote:
         | I recall seeing a Show HN post a while back about a research
         | focussed web browser that helps as a thinking tool:
         | 
         | https://news.ycombinator.com/item?id=28446147
        
           | totetsu wrote:
           | It's amazing what you miss on HN when you skip a day
        
           | Hard_Space wrote:
           | A sign-in/sign-up necessary just to see the browser in
           | action? Hard pass.
        
             | beauzero wrote:
             | https://www.loom.com/share/93c7c0012f514c37b58a42fa65badc88
        
         | civilized wrote:
         | To your point but even more general: the ML/AI space is far too
         | focused on replacing people rather than helping people. There
         | is a suffocating cultural conceit that we are on the verge of
         | general AI and oh my gosh what will the humans do, we better
         | institute universal basic income right away, etc.
         | 
         | What a joke.
         | 
         | Try to help humans think better first. If you succeed at that,
         | you _might_ be on the right track towards developing cold
         | fusion, er, general AI.
        
           | urthor wrote:
           | Unfortunately you'll run into the rapid fact that in the
           | ML/AI space you get almost zero points for building
           | something.
           | 
           | You get a whole lot of points for discovering something,
           | designing something, or a proof.
           | 
           | But there's a very large amount of people focused entirely on
           | aims that are very, very distant from actually making human
           | lives genuinely better.
           | 
           | Mostly because everyone quietly understands all the
           | extraordinarily complicated mathematics is actually
           | extraordinarily complicated.
           | 
           | Hence the ROI isn't worthwhile.
        
             | geoduck14 wrote:
             | >But there's a very large amount of people focused entirely
             | on aims that are very, very distant from actually making
             | human lives genuinely better.
             | 
             | I can't speak to _each and every person_ working on ML, but
             | I thought I would share a fun use case I ran across the
             | other day.
             | 
             | There is a business in some foreign country that is similar
             | to Uber Eats: customer goes to an app, browses for food
             | from various restaurants, orders, it gets delivered.
             | 
             | The business was using ML to help the restaurants: the
             | restaurants upload a pic of the dishes, a title, and a
             | description (usually all from an existing menu). The
             | business would parse the description to guess at what was
             | in the dish. Scan the picture to guess at the quantity of
             | food (entre, side, desert, etc). Compare ingredients
             | against publicly available nutrition info. Now the end
             | consumer can do things like: search for gluten free,
             | vegetarian, pork free, <300 calories, desert, etc.
             | 
             | Almost all of this was "possible" before, but it would have
             | required enormous effort from the restaurants inputting the
             | data or customers reading each item. Now it is "easy", and
             | it actually helps the end customers - and the restaurants.
        
               | oldsecondhand wrote:
               | Guessing allergen content sounds like a disaster waiting
               | to happen.
        
               | auggierose wrote:
               | Sounds like this would be an epic fail. I mean, just add
               | a spoon of oil more, and your calorie guess is totally
               | off. This is a clear cut case of SEEMINGLY helping. It
               | certainly does not help the end customer. It might help
               | the restaurant, as they don't care if the customer is
               | receiving valid information, as long as they are buying.
        
         | Mezzie wrote:
         | I agree.
         | 
         | I'm an academic librarian, and they're completely different
         | ways of working: When I do academic work, I (ideally) have to
         | take my time and I'm not supposed to present my work until it's
         | developed enough that I'm confident it presents a substantial
         | improvement; I have to prove that it's worth a colleague's time
         | to engage with by meeting certain requirements.
         | Coding/developing, on the other hand, requires a lot more back
         | and forth, a lot more "I don't know", and is more immediate in
         | a way I find very satisfying.
         | 
         | I would LOVE to see more back and forth between engineers and
         | academics in terms of ways of working; I think there's a lot of
         | benefit to be gained there: Tech tends to not consider the
         | future as much as they should, but the academics could really
         | benefit from doing what you mentioned and improve the system
         | they work in rather than accepting it.
         | 
         | One of the things I'm trying to do is get better at/learn some
         | ML so I can play around with turning the things I learned in
         | grad school into useful tools, but I'm a single journeyman dev
         | doing this in my spare time so odds of anything actually useful
         | coming out of it is small.
        
         | Quanttek wrote:
         | Exactly! Assist me in my process of researching and writing.
         | and use what I have already done in e.g. putting and
         | classifying paper in Endnote. It's interesting that the author
         | seemed to have a similar idea for a brief second but then
         | tossed it away:
         | 
         | > _similar: an app that pops up serendipitous connections
         | between a corpus (previous writings, saved articles, bookmarks
         | ...) and the active writing session or paragraph. The corpus,
         | preferably your own, could be from folders, text files, blog
         | archive, a Roam Research graph or a Notion /Evernote database._
        
       | acomjean wrote:
       | A lot of papers are hand or computer assisted annotated.
       | 
       | For medical papers Mesh terms:
       | https://www.nlm.nih.gov/mesh/meshhome.html
       | 
       | Gene information is extracted by flybase/ worm base ...
       | 
       | It's time consuming, expensive probably not perfect but for
       | certain types of papers it makes searching better.
        
       | Throwaway197401 wrote:
       | The problem with academic publications is that they compliance
       | based.
       | 
       | The peer-review process is not a scientific process but a
       | publishing process and it serves as an unfortunate gate-keeper.
       | 
       | This gate-keeping has done so much damage to the scientific field
       | that it's hard to see any way out of it in it's current form.
       | 
       | The biggest problem is that peer-review gives the paper a stamp
       | as if it's been approved by some higher scientific standard. And
       | that lead us to the very unhealthy idea of "follow the science"
       | or "the science is settled"
       | 
       | The scientific method is a process of conjecture and criticism
       | and is never ending. Peer review give a "blue checkmark" to
       | papers they don't deserved and is especially problematic in the
       | social sciences where up to 70% of research isn't reproducible.
       | 
       | Reproducibility should be the gold standard NOT peer-review which
       | has it's own bias and cargo cult built in.
       | 
       | The pourpose of science it to create good explanations that are
       | hard to vary. The purpose of scientific publications is to create
       | prestige.
       | 
       | So kill peer reviews there are other mechanisms to ensure quality
       | of research.
        
       | _Wintermute wrote:
       | I think if the author had listened to pretty much any post-
       | doc/technician or senior researcher in the field who has had to
       | review a number of publications they would have been told these
       | things straight away.
        
       | civilized wrote:
       | Interesting but I don't know how to make sense of it. How can it
       | be that "close to nothing of what makes science actually work is
       | published as text on the web"?
       | 
       | - Is the information that makes science actually work mostly in
       | images that the machines don't yet understand?
       | 
       | - Was the information paywalled or in private databases and
       | inaccessible to this researcher?
       | 
       | - Are the papers mostly just advertisements for researchers to
       | come gab with each other at conferences and doodle on cocktail
       | napkins, and that's where all the "real science" happens?
       | 
       | - (From the comments) is the information needed to make sense of
       | papers communicated privately or orally from PI's to postdocs and
       | grad students, or within industrial research labs?
       | 
       | Something is missing from my mental picture here.
       | 
       | Don't real scientists mostly learn how to think about their
       | fields by reading textbooks and papers? (This is a genuine
       | question.) If so, isn't it likely that our tools just aren't
       | advanced enough to learn like humans do? If not, what do humans
       | use to learn how to think about science that's missing from
       | textbooks and papers?
        
         | mellavora wrote:
         | <disclaimer: former real scientist>
         | 
         | Science is a profession like others. When you are earning your
         | Ph.D. you learn to think about the field by reading papers and
         | discussing with peers and colleagues, yes.
         | 
         | The intro of a well-structured research paper should follow
         | this pattern:
         | 
         | - This is a really important topic and here is why.
         | 
         | - What is the current state of the art in this field? (this
         | comes from reading 100-1000 publications on the topic and
         | selecting the 5-10 most relevant to the next point). HOWEVER,
         | the state of the art leaves this question unanswered.
         | 
         | - Here are some reasons why the idea in this paper can help
         | answer that question (cite another 3-10 papers).
         | 
         | - Our hypothesis is that XXX can answer the important
         | unanswered question (where X is derived from the prior
         | section).
         | 
         | So, what I am getting at, a scientific publication is part of a
         | conversation. When I'm citing the 5-10 papers to summarize the
         | state of the art, I'm assuming the reader has read 50% of the
         | 100-1000 papers which I also read, and knows where the 5-10
         | which I cite fit into that broader context.
         | 
         | So any paper, in isolation, only has a fraction of its meaning
         | in the publication. The real information is the surrounding
         | context.
         | 
         | Pro tip: if I'm reading a paper and want to understand it
         | better, I also read one or two of the papers it cites, and one
         | or two papers which cite it. Also, it can take a few times
         | through before I start to understand what the author is trying
         | to say.
        
           | stevenbedrick wrote:
           | Exactly! Scientific papers are not meant to stand on their
           | own -- they are pieces of a much larger jigsaw puzzle. In
           | order to make heads or tails out of a paper, one really needs
           | to have a sense of where the paper fits into its larger
           | picture. Building up necessary base of knowledge to develop
           | that sense, both in terms of explicit knowledge and tacit
           | knowledge, is part of what a PhD student is actually doing
           | while they are working on their PhD, and is part of why the
           | process takes as long as it does.
           | 
           | Also, the mechanical process of effectively reading a paper
           | is highly non-linear, and is a skill in and of itself. In a
           | lot of ways, it is more akin to high-level pattern matching
           | than it is to more "normal" reading. At least at my
           | institution, it is something that we actually teach our
           | students to do in formal ways (the obligatory "How to read a
           | scientific paper" lecture during the first term or two) and
           | then make them practice over and over again for years
           | (journal clubs, etc.). The original author eventually figured
           | this out, which is to their credit.
        
         | hoseja wrote:
         | As the article states, papers are mostly career advancement
         | tools and scientists are incentivized to put the least amount
         | of useful information into them they can get away with. Real
         | scientists mostly learn from their instructors who possess all
         | the jealously guarded institutional knowledge.
         | 
         | Yes, it is very broken.
        
           | mellavora wrote:
           | Hard disagree. With a caveat -- I do acknowledge that for an
           | important number of professional academics your statement may
           | be true, and I have heard a former post-doc at ETH Zurich
           | describe their papers as career points (so also a grain of
           | truth at elite institutes).
           | 
           | But for most of the academics I have known and worked with,
           | publications are taken quite seriously, and institutional
           | knowledge is freely shared. There is an incentive to reduce
           | the content in papers, but it is out of respect for the
           | reader (a paper is not a textbook) and an honest attempt to
           | limit the discussion to the core hypothesis of the work. You
           | have 6 pages to 1) describe the content of 6*100 pages (the
           | 100 other relevant papers on the topic), 2) present your
           | addition to this body of knowledge, 3) discuss the insights
           | your work brings, again referring to the content of 600
           | pages.
           | 
           | and those 600 pages you are summarizing are as information-
           | dense as your work.
        
           | jokteur wrote:
           | It is always difficult to try to understand and implement the
           | theory explained in papers which seem fine on the surface,
           | but when you look more closely, you find a bunch of mistakes,
           | there are giant holes in the details, and you end up trying
           | to redo the whole paper.
           | 
           | There should be journals/websites/blogs dedicated to trying
           | to reexplain / implement papers.
        
             | bluGill wrote:
             | Doing a good review is hard.
             | 
             | I've been reviewing a few C++ papers (things proposed to
             | C++23) lately. Many of them are over my head and all I can
             | find are a few spelling errors. The ones I've understood
             | took me 3 readings before I found some giant holes in the
             | details (which I pointed out to the authors, the next
             | revision corrected them). In one case I actually started
             | implementing the feature myself, and only then did I
             | realize there was something ambiguous in the paper (after
             | talking with the author we decided to document an
             | implementors choices as it doesn't matter for the 99% use
             | case, and the rest could go either way depending on
             | hardware and data details so better to allow choice)
             | 
             | The vast majority if papers I'm far too laze to go into
             | that level of detail on. I just assume the authors is a
             | smart person and so I trustingly let them go. It may well
             | be if I understood the paper I'd be horrified, ask me in
             | 2043 when we have 20 years of hindsight...
             | 
             | I have to believe that peer review is the same - many
             | reviewers are just reading and looking for something
             | obvious but not really understanding details.
        
               | mellavora wrote:
               | Someone once said that they enjoyed one of my papers, and
               | that even though they thought the writing was very clear
               | they still had to read it 3 times to understand it.
               | 
               | I told them that I had to write it 100 times and spend
               | two years before I understood it.
               | 
               | So if they could pick it up in 3 readings over 3 days,
               | they were doing pretty good.
        
               | Mezzie wrote:
               | SO difficult, especially given that just because you're
               | in the 'same' field and technically qualified to do a
               | peer review doesn't mean you actually understand what
               | you're reading.
               | 
               | For example, I'm qualified to review papers on
               | educational programs for children. I should never be
               | asked to do that.
        
           | civilized wrote:
           | It's hard for me to even comprehend how this could be true,
           | but it does sound familiar enough from credible sources that
           | maybe it's right regardless of what makes sense to me.
        
           | anonymousDan wrote:
           | I mean honestly this is just total bullshit. There is plenty
           | of value in academic papers. It's just that there is very
           | little money to be made in developing tools such as those
           | mentioned by the OP as there is very little money in
           | academia.
        
             | viewfromafar wrote:
             | I understood the criticism directed at the value of papers
             | as instruments of knowledge sharing. The argument is not
             | that papers are completely useless in terms of knowledge
             | sharing but that this pure purpose of dissemination is
             | largely overshadowed by considerations of carreer,
             | prestige, funding or any interest other than knowledge
             | sharing.
             | 
             | This is the world we live in. A scientist is a person that
             | needs to make a living and is subject to various
             | constraints.
             | 
             | The reason that there is little money to be made is that
             | society hasn't found a way to set up the scientific process
             | in such a way that the constraints would value the increase
             | in public domain knowledge higher than the incentives to
             | hold some knowledge back.
             | 
             | Part of this may stem from leaving specialized knowledge to
             | academia while letting only companies reap the monetary
             | rewards of putting the knowledge to use. Society benefits
             | only indirectly (better drugs, machines, etc) but industry
             | players will rather shield knowledge and adapt its
             | representation to their own needs.
        
         | Al-Khwarizmi wrote:
         | I can't speak for biomedicine, but speaking as an academic in
         | CS the claim that "close to nothing of what makes science
         | actually work is published as text on the web" looks like a
         | huge hyperbole to me.
         | 
         | It's true that the so-called "folk knowledge", knowledge that
         | exists in the community but no one bothers to publish in the
         | form of papers, is a real problem, but at least in my field,
         | it's by no means the majority of knowledge.
         | 
         | As someone from a peripheral university where you can't just
         | drive a few miles and talk to the best in your field, I have
         | successfully started in new subfields of study (getting to the
         | level of writing research papers in top-tier venues) by reading
         | the literature.
         | 
         | While this essay provides a very interesting point of view, I
         | suspect it's heavily colored by the authors' failure to
         | monetize the technology (which is related to the fact that
         | people doing most of the grunt research work, who would benefit
         | the most for this, are PhD students and postdocs who have no
         | money to pay for it - in the text, the author hints at this). I
         | wouldn't take it as an accurate description of academia.
        
           | viewfromafar wrote:
           | Also CS, my interpretation of "what makes science work" is a
           | little different and I would argue that - despite a lot of
           | foundations and techniques being shared in research papers -
           | this field more than any other is constraining the free
           | circulation and application of knowledge.
           | 
           | The equivalent to those biomedical industry players are the
           | big tech who develop closed source and push the edge in some
           | area. They will publish but that does not mean you can
           | replicate any of it.
           | 
           | Software is also fragmented, crippled by IP lawsuits, patent
           | trolls and so on. This does inhibit ability of society to
           | benefit from software since it depends on the private sector
           | to sort things out. The PhDs go and build businesses to "make
           | the science work" in that sense.
           | 
           | The ideal of detached pursuit of knowledge is not a complete
           | fiction (despite the hyperbole), but it does remain an ideal
           | that can only be approximated.
        
             | Al-Khwarizmi wrote:
             | As an academic, all my papers from the last 5 or so years
             | have associated github repos where all the code is
             | accessible under free licenses. Most of my peers in
             | academia do the same. Documentation quality is admittedly
             | quite hit-and-miss, because we aren't paid for that and we
             | need to jump to the next paper, but all the code is there
             | and everything can be replicated even if it takes some
             | effort due to rushed code or suboptimal documentation.
             | 
             | Industry is a different world, and indeed there are plenty
             | of opaque industry papers that aren't replicable at all
             | because much of the model is essentially a trade secret,
             | and the paper is more an avenue for bragging than for
             | developing new knowledge together with the rest of the
             | community. To be honest, I would just outright disallow
             | that kind of papers. But that's not a popular opinion, and
             | taking into account that big tech companies sponsor our
             | conferences and provide grants, I can't blame those who
             | think otherwise.
        
         | rm445 wrote:
         | The 'what makes science work' is stored in the scientists.
         | 
         | They learn by reading the literature, but also by
         | communicating, and by an active process of testing their own
         | understanding and resolving gaps and inconsistencies. Even when
         | a self-taught genius like Ramanujan comes along, they benefit
         | from being brought into the community.
         | 
         | The question of how one would determine the state of the art in
         | a field has an answer, but at present it would be
         | indistinguishable from training a scientist, rather than
         | running a clever software tool that could synthesize from the
         | literature.
        
           | civilized wrote:
           | Well that's an interesting idea, isn't it (even if completely
           | impractical today)? Self-training AI robot scientist who not
           | only reads the literature but actually chats with other
           | scientists and tries to do science to improve its
           | understanding. AlphaZero but for science.
        
             | Vetch wrote:
             | AlphaZero cannot chat and interact outside moving pieces.
             | For science, self-training would be too wasteful,
             | intractable and impossible to boot, given there's no
             | simulator.
             | 
             | An AlphaZero for science would instead be like the recent
             | deepmind paper where the pattern matching capabilities and
             | internal features of a neural network were used to navigate
             | some domain's decision space of conjecture formation and
             | testing.
        
         | amcoastal wrote:
         | Try: Paperswithcode.com
         | 
         | If its not there I won't use it! If you dont provide code with
         | your paper it better have a really useful concept in it
         | otherwise not citation. Which beckons to the problem in the
         | article where most important information in basic research
         | papers is: "Hey, this concept works" as opposed to a rigorous
         | test of exactly what makes the concept work and how to use it
         | in other situations.
        
         | aimor wrote:
         | In my experience useful scientific knowledge is accumulated in
         | people actively working. Documents (books, papers, guides,
         | programs, talks, blog posts, etc) are communication tools, but
         | are limited by the medium and the ability of the authors.
         | People can consume documents and create analogies to their
         | specific work, but from there it's the process of working that
         | produces: experts, systems, tools. Sometimes those products are
         | again documented.
        
       | Vetch wrote:
       | The article's core claims are:
       | 
       | > Extracting, structuring or synthesizing "insights" from
       | academic publications (papers) or building knowledge bases from a
       | domain corpus of literature has negligible value in industry.
       | 
       | > Most knowledge necessary to make scientific progress is not
       | online and not encoded.
       | 
       | > Close to nothing of what makes science actually work is
       | published as text on the web
       | 
       | > The tech is not there to make fact checking work reliably, even
       | in constrained domains.
       | 
       | > Accurately and programmatically transforming an entire piece of
       | literature into a computer-interpretable, complete and actionable
       | knowledge artifact remains a pipe dream.
       | 
       | It also states existing old school "biomedical knowledge bases,
       | databases, ontologies that are updated regularly", with Expert
       | Entry cutting through the noise in a way that NLP cannot.
       | 
       | Although I disagree with its conclusions, much of this jives with
       | my experience. From the perspective of research, modern NLP and
       | transformers are appropriately hyped but from the perspective of
       | real world application, they are over-hyped. Transformers have
       | deeper understanding than anything prior, they can figure out
       | patterns in their context with a flexibility that goes way beyond
       | regurgitation.
       | 
       | They are also prone to hallucinating text, quoting misleading
       | snippets, require lots of resources for inference and enjoy being
       | confidently wrong at a rate that makes industrial use nearly
       | unworkable. They're powerful but you should think hard about
       | whether you actually need them. Most of the time their true
       | advantage is not leveraged.
       | 
       | -----
       | 
       | My disagreements are with its advice.
       | 
       | > For recommendations, the suggestion is "follow the best
       | institutions and ~50 top individuals".
       | 
       | But this just creates a rich get richer effect and retards
       | science since most are reluctant to go against those with a lot
       | of clout.
       | 
       | > Why purchase access to a 3rd party AI reading engine...when you
       | can just hire hundreds of postdocs in Hyderabad to parse papers
       | into JSON? (at a $6,000 yearly salary). Would you invest in
       | automation if you have billions of disposable income and access
       | to cheap labor? After talking with employees of huge companies
       | like GSK, AZ and Medscape the answer is a clear no.
       | 
       | This reminds me of responses to questions of the sort: "Why
       | didnt't X (where X might be Ottomans or Chinese) get to the
       | industrial revolution first?".
       | 
       | Article also warns against working on ideas such as _"...semantic
       | search, interoperable protocols and structured data,
       | serendipitous discovery apps, knowledge organization. "_
       | 
       | A lot such apps are solutions chasing after a problem but could
       | work if designed to solve a specific real world problem. On the
       | other hand, an outsider trying to start a generalized VC backed
       | business targeting industry is bound to fail. In fact, this seems
       | a major sticking point in the author's endeavor.
       | 
       | Industry is jaded and set in their ways, startups focus on
       | summarization and recommendations and retrieval which are low
       | value in scientific enterprise and academia is focused on
       | automation which turns out brittle. Still, this line of research
       | is needed. Knowledge production is growing rapidly while humans
       | are not getting any smarter. Specialization has meant increases
       | in redundant information, loss of context and a stall in theory
       | production (hence "much less logic and deduction happening").
       | 
       | While the published literature is sorely lacking, humans can with
       | effort extract and or triangulate value from it. Tooling needs to
       | augment that process.
        
         | markusstrasser wrote:
         | "follow the best institutions and ~50 top individuals" wasn't
         | meant as a suggestion actually, just an observation of what
         | most people do.
         | 
         | You're right they "could work if designed to solve a specific
         | real world problem" but against what baseline? The baseline
         | could be spending that time on actual deep tech projects and
         | not NLP meta-science
        
           | markusstrasser wrote:
           | But you're right; open source projects for extracting infos
           | (like PubTator) are valuable but ontologies/KGs need ongoing
           | expert (ML, AI, SWEs, information architects, labelers) work
           | (unlike most of Wikipedia or GH) so it's tough to make
           | something that doesn't suck in a distributed open source
           | fashion
        
       | plaidfuji wrote:
       | Having invested quite a bit of my own time into various aspects
       | of the scientific knowledge extraction morass, I'd say the author
       | is largely on point, but there's a significant, and potentially
       | valuable distinction to be made between extracting research
       | outputs and research inputs.
       | 
       | At least in the field of materials science, papers are by and
       | large a record of research _outputs_. We made material X and it
       | achieved performance Y - here are a bunch of measurements to
       | prove that this is in fact what we made and that it truly
       | achieved performance Y at relevant conditions, etc. In this
       | sense, papers really function as an advertisement: look at what
       | we achieved.
       | 
       | What papers do _not_ do is rigorously document inputs, or provide
       | a step-by-step guide to reproduce said results, for obvious
       | reasons.
       | 
       | My current take on this topic is that it would be both feasible
       | and valuable to build a knowledge extraction system to compile
       | and compare outputs across a specified field. Think the big
       | "chart of all verified solar cell efficiencies over time" [1],
       | but generated automatically. This would at least immediately
       | orient researchers to the distribution of state of the art
       | results, and help ensure that they don't omit relevant references
       | in their reviews.
       | 
       | But extracting and making sense of inputs (methods), or even
       | "knowledge"? Forget about it.
       | 
       | [1] https://www.nrel.gov/pv/cell-efficiency.html
        
       | i000 wrote:
       | When I was in grad school, I joined a startup incubator and build
       | a prototype which combined two of the tools mentioned in the
       | article: "a query builder (by demonstration)" and "A paper
       | recommender system", a simple companion which would help
       | scientist to not miss relevant research to them. This was 10
       | years ago, before Google Scholar has similar features.
       | 
       | The incubator introduced me to advisors with business experience
       | in this field. And I got told in no uncertain terms what is the
       | gist of this article: The value lies in the molecular and
       | clinical data. In 2021 I would add digital pathology / imaging
       | data.
        
         | geoduck14 wrote:
         | >And I got told in no uncertain terms what is the gist of this
         | article: The value lies in the molecular and clinical data. In
         | 2021 I would add digital pathology / imaging data.
         | 
         | I feel like you are trying to tell me something REALLY
         | valuable, but I don't quite understand it. Can you please
         | elaborate?
        
           | potatoman22 wrote:
           | My take: answering questions using clinical data > answering
           | questions with papers
        
             | i000 wrote:
             | There is immense value in clinical data (all the
             | information captured and siloed through EHR). Pharma
             | companies pay for access to it to gather real-world
             | evidence (RWE) how, for example, their drug performs.
             | Molecular information is increasingly valuable too for
             | research, biomarker development, patient cohort
             | identification etc. The imaging data and pathology data are
             | valuable because they are typically expertly annotated and
             | can be used to train computer-vision algorithms etc. to
             | solve medical problems - like diagnosis.
        
       | JackFr wrote:
       | It's interesting that OP did seemingy little research with
       | respect to existing work in the field.
       | 
       | https://www.nlm.nih.gov/medline/medline_overview.html
       | 
       | Medline, a searchable online directory of medical research papers
       | has existed for 50 years. The National Library of Medicine for
       | many years was a leader in document search and retrieval before
       | there was a web. In the 80's they were doing vector cosine
       | document similarity, document clustering and automated
       | classifcation. They were also doing so great stuff like indexing
       | papers based on proteins and gene sequences - so a paper which
       | might be in a field completely different than yours might pop up
       | if a similar protein or sequence was mentioned.
       | 
       | (Disclosure - I worked at the National Library of Medicine in the
       | 90's)
       | 
       | That being said, in the past 30 years search and retrieval
       | exploded to say nothing of ML, but its crazy to ignore the stuff
       | which has come before, AND it's tough to compete with a national
       | lab whose mandate is to basically give the stuff away.
        
         | inlitro wrote:
         | 100%
         | 
         | It also felt like a long apology/explanation for Emergent
         | Ventures rather than a true deep analysis. Pretty strong (and
         | often false) statements for only what seems like half a year of
         | total, somewhat vague work.
        
         | stevenbedrick wrote:
         | The best thing about the NLM's work in this space is how deeply
         | it has been informed by the needs and workflows of the
         | biomedical researchers, which is a perspective that has been
         | sorely lacking in work coming from outsiders.
         | 
         | I did think that the author did a good job of outlining (some
         | of) the basic structural issues that make this a tough field to
         | monetize, but even setting those aside, there's no substitute
         | for actually knowing your users and what they need, and that's
         | something the NLM is amazing at.
         | 
         | (Disclosure, my PhD was funded by an NLM training grant, some
         | of my research is funded extramurally by the NLM, and I have a
         | lot of NLM colleagues, so I'm maybe a little bit biased)
        
       | bigdict wrote:
       | Any article on this topic should mention Tshitoyan et al.
       | 
       | https://www.nature.com/articles/s41586-019-1335-8
        
       | markusstrasser wrote:
       | Hey, author here. Great discussion so far. Will update the post
       | with some of the comments and critiques.
        
       | shusaku wrote:
       | Previous discussion:
       | https://news.ycombinator.com/item?id=29445715
        
       ___________________________________________________________________
       (page generated 2021-12-09 23:00 UTC)