[HN Gopher] Launch HN: Biodock (YC W21) - Better microscopy imag...
       ___________________________________________________________________
        
       Launch HN: Biodock (YC W21) - Better microscopy image analysis
        
       Hi Hacker News! We're Nurlybek and Michael, the cofounders of
       Biodock (http://www.biodock.ai/). We help scientists expedite
       microscopy image analysis.  Michael and I built Biodock due to the
       challenges we experienced in microscopy image analysis while we
       were at Stanford. As a Ph.D. student, I spent hours manually
       counting through lipid droplets in microscope images of embryonic
       tissues. The incredible frustration I felt led me to try all kinds
       of software. Eventually, I went out to seek help from other
       scientists. Michael, a computer science student, was working in a
       lab just across from mine when he got my email asking for help. We
       got to chatting in a med school cafe and realized that we were both
       tackling the same issues with microscopy images.  Microscopy images
       are one of the most fundamental forms of data in biomedical
       research, from discovery all the way to clinical trials. They can
       be used to show the expression of genes, the progression of the
       disease, and the efficacy of treatments.  However, images are also
       very frustrating, and we think a lot of that has to do with the
       current tools available. To analyze their images, many scientists
       at top research institutions use software techniques invented 50
       years ago, like thresholding and filtering. Some even spend their
       days manually drawing regions around cells or regions. Not only is
       this extremely frustrating, but it slows down the research cycle,
       meaning that it takes a lot more time and money to create
       potentially lifesaving cures. Contrast these tools to the
       incredible recent headway into deep learning - where applications
       like AlphaFold have led to incredible gains in what was previously
       possible.  Our goal is to bring these performance gains to research
       scientists. The current core module in Biodock is AI cell
       segmentation for fluorescent cells, based mostly on Mask R-CNN and
       U-Net architectures, and trained on thousands of cell images.
       Essentially, it identifies where each cell is and calculates
       important features like location, size, and fluorescent expression
       for each cell. This module performs around 40% more accurately than
       other software.  So how is this different from training deep
       learning models yourself? First, our pretrained modules are trained
       on a huge amount of data, which allows for great performance for
       all scientists without needing to label data or optimize training.
       Secondly, we've spent time carefully building our cloud
       architecture and algorithms for production, including a large
       cluster of GPUs. We even slice images into crops, process them in
       parallel, and stitch them together. We also have storage, data
       integrations, and visualizations built into the platform.  We know
       that AI cell segmentation addresses only a small fraction of
       microscopy analysis in the biomedical space, and we are launching
       several more modules soon, tackling some of the most difficult
       images in the space. So far, we've been able to generate different
       custom AI modules for diverse tissues and imaging modalities
       (fluorescence, brightfield, electron microscopy, histology).
       Eventually, we want to link other biological data analyses into the
       cloud including DNA sequences, proteomics, and flow cytometry, to
       power the 500K scientists and 3K companies in the US biotech and
       pharma space.  We would love to hear from you and get your feedback
       --especially if you've ever spent hours on image analysis!
        
       Author : nurlybek
       Score  : 27 points
       Date   : 2021-03-03 18:39 UTC (4 hours ago)
        
       | skwb wrote:
       | Bioinformatics PhD student here: any plans to move to histology
       | field? Or are you focusing more on the R&D space for now?
        
         | mike210 wrote:
         | Hey! What kind of histology? If you mean doing AI analysis of
         | histology images for research/etc. Then 100% - that's actually
         | of the modules we're looking to do and if you work with those
         | kind of images I would love for you to reach out at michael at
         | biodock dot ai. We're totally free for academics, and a module
         | like that would be free for academics as well.
         | 
         | If you mean clinical diagnostics histology - we're going to
         | hold off a bit on that, although we're discussing some early
         | partnerships there.
        
       | onychomys wrote:
       | I work in the research arm of a certain extremely famous hospital
       | in a small town in the midwest, and once you guys get your histo
       | package up and running, I'll definitely check it out. It's a
       | crowded space, but we're always on the lookout for the next big
       | thing.
       | 
       | I do wonder, though, about the wisdom of doing that sort of
       | analysis in the cloud. Our projects routinely use several
       | terabytes of images (we have about 150TB stored right now, most
       | of which is full-slide images), and uploading them somewhere
       | isn't just a simple fire-and-forget procedure. Cool analysis
       | algorithms might not be enough to make up for the headache of
       | having to wait for days on end for the uploads to reach the
       | cloud.
        
         | mike210 wrote:
         | Hi! Michael here - one of the cofounders. That sounds great and
         | would love to chat! What kind of tissues are you looking at and
         | what are you trying to achieve?
         | 
         | As for the size of data - it's definitely a tradeoff here.
         | We're seeing more and more scientists already uploading their
         | data to the cloud however. We even integrate with data stores
         | like S3 already so you can hook into data you've already
         | uploaded.
         | 
         | We're also looking to build out a pipeline builder - where you
         | can process images in real-time, so this would be less painful,
         | as well as a Dropbox-like mini tool so data would be uploaded
         | as you acquire it.
        
       | andy99 wrote:
       | Very cool.
       | 
       | Do you have lots of unlabelled data, and if so, do you do any
       | self-supervised pre-training?
       | 
       | Have you ever considered releasing the backbone weights for the
       | pre-trained models you have? No idea if this would be possible
       | without giving up core IP, but I know I'm personally dying for an
       | alternative to Imagenet (COCO for you?) trained on a big dataset.
       | 
       | Are the images in your set diverse enough that you'd expect the
       | backbone to be a good general feature extractor?
        
         | mike210 wrote:
         | Hi Andy,
         | 
         | We do have lots of unlabelled data, and we're also labeling a
         | large portion of it. We do transfer learn for all of the models
         | we're training, and the first backbone we use is partially
         | self-supervised. Seems to help in overall performance, but it's
         | not a huge effect in our experience.
         | 
         | Maybe once we get a lot of models we can release the backbone
         | weights for nuclear segmentation or at least a competition set
         | of some data we've labeled. Some IP issues here though.
         | 
         | What kind of alternative are you looking for? Specifically one
         | for cells, or just for biologics in general? I'm guessing
         | you're trying to have a better base of weights to transfer off
         | so you can train your own model?
         | 
         | I would say we have medium diversity in terms of images - I
         | think unless you have a similar application right now, you'd be
         | better off transferring off of Imagenet just due to the amount
         | of labeled data.
        
           | andy99 wrote:
           | Thanks for the reply! I'm actually working in a different
           | domain but it seems to have a lot in common with yours - lots
           | of unlabelled data, images that have nothing in common with
           | Imagenet, in that they are all essentially of the same thing
           | and we are looking for variations or features. We found that
           | self-supervised pre-training (with various contrastive
           | models) underperformed vs. starting with weights trained on
           | Imagenet.
           | 
           | So a model that has been pretrained on something else, with
           | enough variability to work as a feature extractor, but closer
           | to the problem framing I mention would be of interest.
           | 
           | For state of the art computer vision stuff, most of the
           | benchmarks use imagenet or similar datasets. But
           | unfortunately I'm coming around to the realisation that those
           | datasets are not representative of most real world problems
           | (except general purpose scene / object recognition). So it
           | becomes very challenging to pick out a potential technique to
           | apply, and hope it transfers.
        
             | mike210 wrote:
             | Interesting - it would be great to chat and find out more.
             | Maybe there are things we can learn about each other. Can
             | you shoot me an email at michael at biodock dot ai?
        
       | svara wrote:
       | Hi! We've been bootstrapping ariadne.ai [0] in the same space.
       | Similar origin story, too - turning tools we built as grad
       | students into products for the biomedical industry.
       | 
       | Looks like we're thinking along somewhat convergent lines, but
       | with some interesting differences as well. Feel free to reach out
       | to my HN username at ariadne.ai if you'd like to chat.
       | 
       | [0] https://ariadne.ai
        
         | mike210 wrote:
         | Hey! Super cool. Will shoot you a message. Would love to chat
         | and see what things we could talk about. Seems like you guys
         | have built out some valuable assay analyses that we've heard
         | people asking for, so congrats.
        
       | itamarst wrote:
       | To be fair, you can get really very far with non-machine-learning
       | automated techniques (I did good-enough algorithms for an in-situ
       | fluorescent gene sequencing image processing pipeline at one
       | job). I suspect any form of good automated processing, regardless
       | of whether it's AI, would be welcome to biologists.
       | 
       | If you'd ever like to chat about the automation parts, I'd be
       | interested in hearing how you're approaching it; the niche of
       | "scientific computing, but repeatable" is quite different than
       | traditional scientific software, and it seems like people are
       | still in early stages of figuring out how to do it.
       | 
       | Would also be interested in hearing how you approach correctness.
       | The best approach I've discovered is metamorphic testing.
       | Basically you modify real inputs, and then ensure the output
       | matches. E.g. you say, "OK, I have this finished algorithm that
       | segments cells, `f(image) -> cells`. Now, if I double brightness
       | on everything, I would expect the same results, so let's see what
       | `f(brighter_image)` is, I would expect same output." Or like "If
       | I merge nuclei-looking splotches that cross what original
       | segmentation boundary was, that should result in fewer cells."
       | Unfortunately only discovered this technique after I left the
       | image processing job, so haven't had chance to try it.
        
         | mike210 wrote:
         | Definitely! There are some applications where traditional
         | methods are just simply good enough. However, when they aren't,
         | it can be incredibly frustrating. From our conversations with
         | scientists, this kind of data (3D, histology, difficult
         | tissues, new assays) is increasing in volume.
         | 
         | As for correctness, we've only done mAP scores and traditional
         | accuracy metrics so far to compare with other algorithms, but
         | we also have our own internal metrics and a test set we're
         | building out in-house to cover many edge cases, many of which
         | cover some of the things you are talking about. One thing we're
         | always trying to be sensitive of is fairness. We want to make
         | sure that we're not biasing the test to our algorithm, which
         | would make us look better than we are.
        
           | itamarst wrote:
           | I guess when I say correctness, I mean "how do I know it
           | _continues_ to be correct on data we've never seen before".
           | That's where metamorphic testing can be valuable, because it
           | lets you at least find incorrectness on real-world data that
           | hasn't been hand-tagged.
        
             | mike210 wrote:
             | Ah, yes. We're even looking to use some generative models
             | in order to even do variations based on data and then
             | compare that we do similarly well between cases.
             | 
             | I guess the point I was making was that we want to make
             | sure we don't then use this generated or modified data in
             | order to test other algorithms in the space and say we're
             | better. Simply put, it would be unfair for us to make
             | changes to perform better on a hurdle and then put other
             | algorithms through those hurdles. But for internal use,
             | it's definitely great!
        
         | mike210 wrote:
         | Also would love to chat if you ping me about the automation
         | part - would love to get some feedback there. michael at
         | biodock dot ai
        
       ___________________________________________________________________
       (page generated 2021-03-03 23:01 UTC)