[HN Gopher] Open-source could finally get the world's microscope...
       ___________________________________________________________________
        
       Open-source could finally get the world's microscopes speaking the
       same language
        
       Author : sohkamyung
       Score  : 184 points
       Date   : 2023-10-02 12:39 UTC (10 hours ago)
        
 (HTM) web link (www.nature.com)
 (TXT) w3m dump (www.nature.com)
        
       | klysm wrote:
       | Microscope vendors have gone to incredible efforts to obfuscate
       | their file formats. Some inventing custom compression codecs. I
       | worked for a company that spent enormous amounts of time reverse
       | engineering these protocols.
        
         | Ericson2314 wrote:
         | Labs should not be allowed to spend grant money on this
         | bullshit.
        
         | ahns wrote:
         | :/ [tell me about
         | it...](https://paulbourke.net/dataformats/svs/)
        
           | yread wrote:
           | Hah SVS is child's play compared to MRXS:
           | 
           | https://lists.andrew.cmu.edu/pipermail/openslide-
           | users/2012-...
        
             | extraduder_ire wrote:
             | I'd hope imagemagick or some other program can at least get
             | the data out into another format. I did not know about
             | openslide before reading that mail, but I'll try to keep it
             | in mind.
             | 
             | Here's hoping this eventually gets fixed in a similar sort
             | of way the .mkv format "fixed" video.
        
         | michaelmior wrote:
         | > Some inventing custom compression codecs.
         | 
         | I'm probably not being cynical enough here, but this doesn't
         | necessarily imply a desire for obfuscation.
        
           | eternityforest wrote:
           | A custom codec seems like it could only be obfuscation or
           | really, really bad design.
           | 
           | Codecs are hard, I doubt there are many which are better than
           | the well known ones.
        
           | strangattractor wrote:
           | You are not being cynical enough - "Vendor lock in" would be
           | a more appropriate term;)
        
         | [deleted]
        
         | dekhn wrote:
         | bioformats2raw
         | https://github.com/glencoesoftware/bioformats2raw handles a
         | wide range of obfuscated formats.
        
         | m463 wrote:
         | I'm wondering if decent standards are created, we'll be able to
         | buy an electron microscope on amazon, just like 3d printers.
        
           | dekhn wrote:
           | No. Also, this is a standard for light microscopy, not EM.
        
             | m463 wrote:
             | I guess a unified format is harder...
             | 
             | https://en.wikipedia.org/wiki/Optical_microscope#Alternativ
             | e...
             | 
             | I was hoping for an easy entry:
             | 
             | https://www.hackster.io/michalin70/3d-printed-laser-
             | scanning...
        
           | jrockway wrote:
           | There are definitely consumer electron microscopes available.
           | I remember a video from a couple years ago:
           | https://www.youtube.com/watch?v=t60I0Z7qCsU
           | 
           | I am not sure if the video actually says the name of the
           | microscope, though, because he was mad at them for not
           | testing it at high altitude where he lived at the time.
        
             | alted wrote:
             | I believe this is the Voxa Mochii [1].
             | 
             | [1] https://www.mymochii.com/
        
               | jrockway wrote:
               | Looks like it. Not quite as consumer grade as I expected;
               | "contact us for a quote" = $$$.
        
               | jimmyswimmy wrote:
               | You'd want a quote if you're spending that kind of money:
               | https://www.voxa.co/solutions/mochii/faqs
               | 
               | How much does one unit cost?
               | 
               | Mochii starts at $48K for the imaging only unit. The
               | spectroscopy-enabled version, which provides full
               | featured x-ray spectroscopy and spectrum imaging, is
               | $65K.
               | 
               | It has an integrated metal coater option available for
               | $5,000, and we offer a variety of optical cartridge
               | exchange programs that can fit your consumables
               | utilization and pricing needs.
        
               | jrockway wrote:
               | Yeah I kind of thought that if it was on YouTube then it
               | was like $1000 or something. Apparently not.
        
       | dhfbshfbu4u3 wrote:
       | Why bother? Someone will come up with a model that will just suck
       | it all up and convert it automatically... and by someone I mean
       | some sufficiently motivated LLM.
        
         | [deleted]
        
       | Joel_Mckay wrote:
       | It should fall into an OpenCV module, and include an API standard
       | for generic controller firmware.
       | 
       | Research labs literally have a thousand other things to do...
       | besides play wack-a-mole with vendor IT shenanigans. =)
        
         | extragood wrote:
         | Absolutely. I spent an entire summer writing translation layers
         | for a bunch of different microscopes so that they could
         | interface with the detector software of the company I was
         | interning for. It was tedious work!
        
       | tomnicholas wrote:
       | This article misses one of the coolest things about the Zarr
       | format - that it's flexible enough that it's also becoming widely
       | used in climate science.
       | 
       | In particular the Pangeo project
       | (https://pangeo.io/architecture.html) uses large Zarr stores as a
       | performant format in the cloud which we can analyse in parallel
       | at scale using distributed computing frameworks like dask.
       | 
       | More and more climate science data is being made publicly
       | available as Zarr in the cloud, often through open data
       | partnerships with cloud providers (e.g. on AWS
       | (https://aws.amazon.com/blogs/publicsector/decrease-geospatia...)
       | ERA-5 on GCP(https://cloud.google.com/storage/docs/public-
       | datasets/era5)).
       | 
       | I personally think that the more that common tooling can be
       | shared between scientific disciplines the better.
        
       | zackmorris wrote:
       | Without knowing anything about this, I wonder if standards are
       | trying to work at the wrong level of abstraction. Like worrying
       | about whether to use GIF or PNG to compress map images, when
       | really it would be better to standardize the underlying data like
       | OpenStreetMap. A better goal might be to decide on a single
       | container format for discrete cosine transform (DCT) data in
       | compressed images like how JPEG works.
       | 
       | We probably need a multidimensional lossy compression scheme for
       | 2D images, 3D volumes/video, and 4+D metadata involving multiple
       | spectrums, exposures, etc. I'd probably start with a
       | multidimensional DCT with a way to specify the
       | coefficients/weights of each layer and metadata (optionally
       | losslessly):
       | 
       | https://en.wikipedia.org/wiki/Discrete_cosine_transform#Mult...
       | 
       | Looks like that's built into MATLAB with _dct(x,n,dim)_ :
       | 
       | https://www.mathworks.com/help/signal/ref/dct.html
       | 
       | https://www.mathworks.com/help/images/ref/blockproc.html
       | (coefficients)
       | 
       | https://www.mathworks.com/help/images/discrete-cosine-transf...
       | (2D example)
       | 
       | Then store the data in zipped JSON or BSON, or maybe TAR then
       | ZIP.
       | 
       | Then I'd write an importer and exporter from/to all of the other
       | formats, with a single 0-1 quality argument or maybe an array of
       | qualities for each layer, with a way to specify coefficients or
       | load them from files.
       | 
       | Once a known-good implementation worked, I'd add additional
       | compression schemes that can compress across dimensions, so that
       | a level of zoom only has to save a diff from a previous layer at
       | a slightly different scale or focus (like a 3+D MPEG format, this
       | is probably the standard that we're currently missing, does
       | anyone know of one?).
       | 
       | This all seems fairly straightforward though, so I wonder if
       | proprietary software patents are the real blocker..
        
         | dekhn wrote:
         | The OME-zarr creators are pretty experienced in this field and
         | have already done the work you're describing. Let's not create
         | _another_ standard after OME-zarr.
        
           | zackmorris wrote:
           | Hey thanks, I tend to agree. Although if I read this right,
           | OME-zarr punted on lossy compression (which is all that
           | really matters) so IMHO it doesn't have much use in the real
           | world yet even though people evidently put a lot of work into
           | it:
           | 
           | https://github.com/ome/ome-zarr-py
           | 
           | https://ngff.openmicroscopy.org/latest/
           | 
           | https://zarr.readthedocs.io/en/v2.1.0/api/codecs.html
           | 
           | https://www.researchgate.net/publication/356622607_OME-
           | NGFF_...
           | 
           | https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1..
           | ..                 i.munro (Ian Munro) August 15, 2023,
           | 8:54am              On the original point about size I
           | suspect your answer is compression.       I have found that a
           | lot of image formats use lossy compression.       I'm not
           | sure about CZI specifically.       AFAIK ome-zarr (sensibly
           | IMO) uses only lossless compression.              Best
           | Ian
           | 
           | and:                 sebi06 (Sebi06) August 15, 2023, 1:42pm
           | Hi all,              for CZI we offer currentyl two
           | compression methods:              JPG-XR       ZSTD
           | So we have lossless and lossy options depending on your
           | needs. All of them are implemented in our C++, #NET or Python
           | APIs and also supported from BioFormats.
           | 
           | Looks like CZI is a competitor:
           | 
           | https://www.zeiss.com/microscopy/en/products/software/zeiss-.
           | ..
           | 
           | https://github.com/cgohlke/czifile
           | 
           | http://www.physics.hmc.edu/~gerbode/wppriv/wp-
           | content/upload...
           | 
           | CZI is basically JPEG compression and is probably missing the
           | multidimensional compressor which is the secret sauce missing
           | from the various formats. It also mentions licensing and
           | legal so is maybe lost in the weeds and not open source.
           | 
           | Without more info, it feels like this is still an open
           | problem.
           | 
           | My interest in this is for video games, since so many are
           | multiple GB in size but are also missing this
           | multidimensional compressor. If we had it, I'd guess a 10:1
           | to 100:1 reduction in size over what we have now, with no
           | perceptual loss in quality.
        
             | thfuran wrote:
             | >which is all that really matters
             | 
             | Except for the domains where nobody uses it because it
             | ruins the data.
        
               | anthk wrote:
               | Imagine that in the Medicine World with DICOM images.
        
               | thfuran wrote:
               | DICOM technically supports lossy compression, but I think
               | it's regarded as something of a historical mistake. It's
               | pretty much never used and some viewers will plaster big
               | warnings on lossily compressed series and others will
               | outright refuse to load them.
        
             | _Microft wrote:
             | > lossy compression (which is all that really matters)
             | 
             | No, not in this case. This might be great for game files or
             | whatever you are familiar with but lossy compression would
             | basically mean to be mangling the data after going through
             | all the effort of collecting it (I suspect you have only
             | little experience with lab work?).
             | 
             | When doing experiments, everything is documented in full
             | detail. Settings on machines and equipment, preparation
             | steps, version numbers of software used for processing, ...
             | 
             | You really don't want to lose information on your
             | experiment.
        
             | dekhn wrote:
             | Please don't use lossy compression for scientific data.
             | 
             | Disk space is cheap relative to people-time and data-
             | collection-time.
             | 
             | "Perceptual" is misleading- we're not just humans looking
             | at jpegs. Many people are doing hard-code quantitative
             | analysis for machine learning. better to downscale the
             | images (2X, 4X, 8X or more) than to use lossy compression
             | if you are truly low on disk budget.
        
             | rrock wrote:
             | Lossy compression is a bad idea for scientific images. For
             | instance, we often need to understand the statistics of
             | photon detection events in background regions. That's one
             | of the first things to get tossed.
        
       | amelius wrote:
       | It's too bad that it's based on XML and that this is actually the
       | best we have.
        
         | dekhn wrote:
         | ome-zarr puts metadata in json not XML. There was an older
         | standard that put the metadata in XML. I've already written
         | converters (converted the XML schema to JSON schema), it wasn't
         | really an issue. The raw data is in blocked compressed numpy
         | ararys in a directory structure on disk.
        
       | Ericson2314 wrote:
       | Just like open access, I think at some point this sort of
       | interoperability just has to be demanded by those actually
       | funding the research. Two many narrow interests with no spare
       | effort to coordinate otherwise.
        
         | bafe wrote:
         | It's almost the same in the field of ELN/LIMs. Researchers are
         | required to use one, but there's no wider strategy mandating
         | the systems to be interoperable
        
         | dguest wrote:
         | I'm in a field that would have to change significantly if
         | funding agencies demanded open access. I would love it.
         | 
         | Right not there's basically zero funding for people who work on
         | making our data open. It's kind of a hobby project that you do
         | if you have free time and no one has found a real job for you.
        
           | warkdarrior wrote:
           | I don't think anyone calls for funding _just for_ making the
           | data open. Funding should be for some scientific goal, with
           | the added requirement that all data and code are made open
           | source /open access once the funded project reaches a
           | milestone (e.g., when publishing a paper).
        
             | michaelmior wrote:
             | I think part of the challenge here is that in some fields,
             | proprietary tools and formats are so ingrained that
             | requiring openness could be a massive burden to the point
             | where it would be difficult to put together a reasonable
             | budget that includes _real_ open data access.
             | 
             | I think it's important to acknowledge here that there are
             | degrees of open access. Simply providing data files is
             | relatively easy. Making reproducible workflows as a non-
             | developer can be nearly impossible in some cases.
        
       | tecleandor wrote:
       | Haven't seen the details, but I wonder what's the main difference
       | between this OME-Zarr format and DICOM WSI, that's mostly aimed
       | at the same type of images.
       | 
       | Thinking out loud, this looks like the consolidation of several
       | formats and projects. Bioformats, OME, Zarr...
       | 
       | Back in the day I had the feeling that the OME/Bioformats people
       | were more centered on research, and the DICOM crowd were mostly
       | from the clinical side.
        
       | ramraj07 wrote:
       | Spent a decade managing microscopy data and tbh I don't know how
       | important or useful this is. People can share data in whatever
       | format they have and it wouldn't be hard for me or others to
       | import it one way or another. Not that I have ever felt the need
       | to do such verifications. It'll take weeks to months to do
       | anything like that.
        
         | COGlory wrote:
         | There's real overhead. For instance, one spec that comes to
         | mind the vendor starts the 0,0 pixel in the lower right. The
         | spec calls for 0,0 to be the lower left. That leads to an
         | inverted handedness for all images. Not a problem in
         | projection, but as soon as you start in 3D, it creates a real
         | mess. Some software packages try to detect when the camera was
         | this vendor, and correct for it (silently). Other software
         | packages refuse to correct for it, because they don't have a
         | reliable way to detect it. A pipeline might have 3-4 of these
         | software packages involved, eventually you have no idea if you
         | handedness is correct and no real way to tell without being at
         | a high enough resolution that you can use biological landmarks
         | of known handedness.
         | 
         | Don't get me started on Euler angles.
         | 
         | Caveat being I'm more used to electron microscopy, maybe these
         | things aren't as important with light microscopy because the
         | resolutions are lower?
        
         | LeifCarrotson wrote:
         | > People can share data in whatever format they have and it
         | wouldn't be hard for me or others to import it one way or
         | another.
         | 
         | It's one thing for a highly-skilled user with a decade of
         | experience to be able to import it eventually, another for an
         | unskilled user to just have off-the-shelf tooling do it the
         | same way for everyone automatically. This is more about meta-
         | studies and unlocking new use cases. It sucks that the state-
         | of-the-art for sharing academic microscopy data right now is
         | basically looking at raster images embedded in PDFs, or email
         | the authors, and then write a custom Scikit-Image script.
         | Imagine if you had to read a PDF catalog and then email someone
         | to order something off Amazon, or if your favorite CRUD app
         | consisted instead of having an expert read a PDF and email a
         | screenshots to you. What if sending those very emails to
         | different recipients required implementing each users custom
         | IMAP-like mail client. That sounds absurd, but it's kind of the
         | way academic data sharing works now, lots of people are re-
         | inventing the wheel and creating custom file formats.
         | 
         | Consider, for example, the work of Dr. Bik (example at [1]) who
         | identifies cloned sections from microscopy data. Or what if,
         | instead of each researcher having to generate their own images,
         | or get lucky and remember a particular image there was a Getty
         | Images/AP Newsroom platform where you could just filter for
         | your particular subject and imaging parameters and share your
         | data. A collection of proprietary RAW files with randomly-
         | formatted Excel documents for metadata would allow individual
         | researchers to get their work done, but would be pretty
         | worthless in comparison.
         | 
         | [1] https://scienceintegritydigest.com/2023/06/27/concerns-
         | about...
        
         | yread wrote:
         | I write a slide management platform that doesn't convert images
         | to a single format and it mostly works. From time to time you
         | do get issues though (cause you see millions of slides).
         | Sometimes vendor's own libraries fail to open the slides (cough
         | Phillips cough). Or there is a new version of firmware that
         | sets some flag that wasn't used before.
         | 
         | We support DICOM supp 145 too, but it's no panacea. There are
         | still vendor specific quirks. The "surface" is larger (cause
         | you expect all the metadata to be there in the standard format)
         | so you still sometimes see differences.
        
         | ahns wrote:
         | I work with microscopy data and we have to convert all the
         | proprietary-but-still-readable-by-some-random-package image
         | data generated by microscopy companies to ome-tiff/ome-zarr for
         | it to be in a manageable format. I think it's great!
        
       | charcircuit wrote:
       | >The cloud, however, treats data as a single unstructured entity
       | that is either downloaded in its entirety or not
       | 
       | S3 supports the range header. Google Cloud, Azure, and I'm sure
       | other object stores support it too.
        
       | denton-scratch wrote:
       | > Each pixel must be labelled with metadata, such as illumination
       | level, its 3D position, the scale, the sample type and how the
       | sample was prepared.
       | 
       | Each pixel? Why? All of those except the 3D position apply to
       | _all_ the pixels in a given image, and the (2D) position of a
       | pixel can be inferred from its location in the image.
       | 
       | Wait - are there optical microscopes that can create 3D images? I
       | know you can see a 3D image if you peer into a binocular
       | microscope, but AFAIK cameras for those things are always 2D
       | cameras.
        
         | COGlory wrote:
         | Probably not all these things need per-pixel metadata, but
         | anisotropy exists and that means many of the variables you'd
         | think are per-exposure are actually dependent on where in the
         | exposure the individual pixel is. For instance, illumination
         | level isn't uniform across a field of view for all cameras, and
         | may need to be normalized.
        
           | denton-scratch wrote:
           | > may need to be normalized.
           | 
           | But that's post-processing; isn't the TFA's argument that
           | it's hard for researchers to share _raw_ data?
        
             | COGlory wrote:
             | Unless I'm mistaking the question, because the data need to
             | be normalized later, you need per pixel intensity in the
             | raw data.
        
         | dekhn wrote:
         | They don't really "label every pixel" in the sense that I think
         | about it.
         | 
         | Instead, they have a collection of dense arrays representing
         | the image data itself, then have metadata at the per-array, or
         | overall level.
         | 
         | A typical dataset I work with is multidimensional, it starts
         | as: 1) 2D planes of multiple channel image intensities,
         | typically 5K x 5K pixels, each covering just part of an overall
         | field of view. These are like patches when you do panoramas-
         | take 20 partly overlapping shots. Each plane contains multiple
         | channels- that could be "red green and blue" or more
         | complicated spectral distributions.
         | 
         | 2) 3D information- the microscope takes photos at various
         | depths, only the "in-focus" (within the volume of view)
         | information. These can be stacked (like depth stacking) or
         | turned into a 3D "volume".
         | 
         | 3) Maybe the data was collected over multiple time points, so
         | (1) and (2) repeat every hour. Other parameters- like
         | temperature, etc, could also represent an entire dimension.
         | 
         | 4) Every 2D plane has its own key-value metadata, such as "what
         | color channels were used", "what objective was used"
         | (magnification), and lots of other high-dimensional attributes
         | (that's what they mean by "each pixel must be labelled with
         | metadata"- the 3D position is the same for every pixel in a 2D
         | plane.
         | 
         | 5)
         | 
         | Generally all of this is modelled as structures, arrays, and
         | structures of arrays/arrays of structures. In the case of OME-
         | zarr, it's modelled as an n-Dimensional array with dimensions
         | expressed in a filesystem hierarchy (first dimension typicall
         | the outermost directory, innermost dimension usually a flat
         | file containing a block of scalar points using some compressed
         | numpy storage. Then at each level of the directory you have
         | additional .json files which contain attributes at that level
         | of the data.
         | 
         | Those partly overlapping 2D planes are often assembled into
         | panoramas, which can be a lot more convenient to work with.
         | There are various tools for working with this- I've used
         | navigation map javascript, but napari is a desktop app wiht
         | full suport for sectioned viewing of high-dimensional (7d)
         | data.
         | 
         | OME-zarr is nice because it sort of uses the same underlying
         | tech that the machine learning folks use, and it's ostensibly
         | "optimized for object storage", but I still have lots of
         | complaints with the implementation details, but it's important
         | for me not to distract the OME-zarr team from making the
         | standard successful.
        
         | tomnicholas wrote:
         | > Each pixel? Why?
         | 
         | I use the Zarr format (for climate science data rather than
         | microscope data), and I think this is just poor wording in the
         | article. In the Zarr specification the metadata is stored
         | separately from the actual chunks of compressed array data. So
         | the metadata applies at the array level, not the pixel level.
         | 
         | > Wait - are there optical microscopes that can create 3D
         | images?
         | 
         | I think so - they do it by scanning lots of images at different
         | focal lengths to create a 3D section (I think?). There are
         | whole projects just for visualizing the multi-TeraByte 3D image
         | files produced - Napari is an open-source image viewer which
         | opens OME-Zarr data.
        
           | carreau wrote:
           | Even on classical 2D microscope the illumination can be non-
           | uniform, and you might need to calibrate your image.
           | 
           | Source: PhD in biolab with microscopes, and napari dev.
        
             | dekhn wrote:
             | OME-zarr doesn't really storage per-pixel illumination
             | data. Instead, the illumination will typically be storaged
             | as a per-2D-plane metadata.
             | 
             | Flat fields, dark fields, and light fields can all be
             | stored but would be their own arrays (structure of arrays
             | rather than array of structures).
        
         | dexwiz wrote:
         | There is a lot of specialized microscopes out there. Confocal
         | is pretty widespread.
         | 
         | https://en.wikipedia.org/wiki/Confocal_microscopy
        
         | ahns wrote:
         | As far as I know, not technically (although I haven't kept up
         | in the area), but you can definitely sweep through a volumetric
         | sample; there are microscopes that can for example illuminate a
         | thin z-plane of a transparent sample and collect the image or
         | those that can reject out-of-focus (off-z) light for a
         | particular z-plane, then move to another z-plane, etc. and then
         | generate a volume on the software side.
        
       | Havoc wrote:
       | Academic gang seems quite good at converging on a solution in
       | general though.
       | 
       | Not always a very open solution but converge they do. See matlab
       | etc.
        
       ___________________________________________________________________
       (page generated 2023-10-02 23:00 UTC)