[HN Gopher] Open-source could finally get the world's microscope...
___________________________________________________________________
Open-source could finally get the world's microscopes speaking the
same language
Author : sohkamyung
Score : 184 points
Date : 2023-10-02 12:39 UTC (10 hours ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| klysm wrote:
| Microscope vendors have gone to incredible efforts to obfuscate
| their file formats. Some inventing custom compression codecs. I
| worked for a company that spent enormous amounts of time reverse
| engineering these protocols.
| Ericson2314 wrote:
| Labs should not be allowed to spend grant money on this
| bullshit.
| ahns wrote:
| :/ [tell me about
| it...](https://paulbourke.net/dataformats/svs/)
| yread wrote:
| Hah SVS is child's play compared to MRXS:
|
| https://lists.andrew.cmu.edu/pipermail/openslide-
| users/2012-...
| extraduder_ire wrote:
| I'd hope imagemagick or some other program can at least get
| the data out into another format. I did not know about
| openslide before reading that mail, but I'll try to keep it
| in mind.
|
| Here's hoping this eventually gets fixed in a similar sort
| of way the .mkv format "fixed" video.
| michaelmior wrote:
| > Some inventing custom compression codecs.
|
| I'm probably not being cynical enough here, but this doesn't
| necessarily imply a desire for obfuscation.
| eternityforest wrote:
| A custom codec seems like it could only be obfuscation or
| really, really bad design.
|
| Codecs are hard, I doubt there are many which are better than
| the well known ones.
| strangattractor wrote:
| You are not being cynical enough - "Vendor lock in" would be
| a more appropriate term;)
| [deleted]
| dekhn wrote:
| bioformats2raw
| https://github.com/glencoesoftware/bioformats2raw handles a
| wide range of obfuscated formats.
| m463 wrote:
| I'm wondering if decent standards are created, we'll be able to
| buy an electron microscope on amazon, just like 3d printers.
| dekhn wrote:
| No. Also, this is a standard for light microscopy, not EM.
| m463 wrote:
| I guess a unified format is harder...
|
| https://en.wikipedia.org/wiki/Optical_microscope#Alternativ
| e...
|
| I was hoping for an easy entry:
|
| https://www.hackster.io/michalin70/3d-printed-laser-
| scanning...
| jrockway wrote:
| There are definitely consumer electron microscopes available.
| I remember a video from a couple years ago:
| https://www.youtube.com/watch?v=t60I0Z7qCsU
|
| I am not sure if the video actually says the name of the
| microscope, though, because he was mad at them for not
| testing it at high altitude where he lived at the time.
| alted wrote:
| I believe this is the Voxa Mochii [1].
|
| [1] https://www.mymochii.com/
| jrockway wrote:
| Looks like it. Not quite as consumer grade as I expected;
| "contact us for a quote" = $$$.
| jimmyswimmy wrote:
| You'd want a quote if you're spending that kind of money:
| https://www.voxa.co/solutions/mochii/faqs
|
| How much does one unit cost?
|
| Mochii starts at $48K for the imaging only unit. The
| spectroscopy-enabled version, which provides full
| featured x-ray spectroscopy and spectrum imaging, is
| $65K.
|
| It has an integrated metal coater option available for
| $5,000, and we offer a variety of optical cartridge
| exchange programs that can fit your consumables
| utilization and pricing needs.
| jrockway wrote:
| Yeah I kind of thought that if it was on YouTube then it
| was like $1000 or something. Apparently not.
| dhfbshfbu4u3 wrote:
| Why bother? Someone will come up with a model that will just suck
| it all up and convert it automatically... and by someone I mean
| some sufficiently motivated LLM.
| [deleted]
| Joel_Mckay wrote:
| It should fall into an OpenCV module, and include an API standard
| for generic controller firmware.
|
| Research labs literally have a thousand other things to do...
| besides play wack-a-mole with vendor IT shenanigans. =)
| extragood wrote:
| Absolutely. I spent an entire summer writing translation layers
| for a bunch of different microscopes so that they could
| interface with the detector software of the company I was
| interning for. It was tedious work!
| tomnicholas wrote:
| This article misses one of the coolest things about the Zarr
| format - that it's flexible enough that it's also becoming widely
| used in climate science.
|
| In particular the Pangeo project
| (https://pangeo.io/architecture.html) uses large Zarr stores as a
| performant format in the cloud which we can analyse in parallel
| at scale using distributed computing frameworks like dask.
|
| More and more climate science data is being made publicly
| available as Zarr in the cloud, often through open data
| partnerships with cloud providers (e.g. on AWS
| (https://aws.amazon.com/blogs/publicsector/decrease-geospatia...)
| ERA-5 on GCP(https://cloud.google.com/storage/docs/public-
| datasets/era5)).
|
| I personally think that the more that common tooling can be
| shared between scientific disciplines the better.
| zackmorris wrote:
| Without knowing anything about this, I wonder if standards are
| trying to work at the wrong level of abstraction. Like worrying
| about whether to use GIF or PNG to compress map images, when
| really it would be better to standardize the underlying data like
| OpenStreetMap. A better goal might be to decide on a single
| container format for discrete cosine transform (DCT) data in
| compressed images like how JPEG works.
|
| We probably need a multidimensional lossy compression scheme for
| 2D images, 3D volumes/video, and 4+D metadata involving multiple
| spectrums, exposures, etc. I'd probably start with a
| multidimensional DCT with a way to specify the
| coefficients/weights of each layer and metadata (optionally
| losslessly):
|
| https://en.wikipedia.org/wiki/Discrete_cosine_transform#Mult...
|
| Looks like that's built into MATLAB with _dct(x,n,dim)_ :
|
| https://www.mathworks.com/help/signal/ref/dct.html
|
| https://www.mathworks.com/help/images/ref/blockproc.html
| (coefficients)
|
| https://www.mathworks.com/help/images/discrete-cosine-transf...
| (2D example)
|
| Then store the data in zipped JSON or BSON, or maybe TAR then
| ZIP.
|
| Then I'd write an importer and exporter from/to all of the other
| formats, with a single 0-1 quality argument or maybe an array of
| qualities for each layer, with a way to specify coefficients or
| load them from files.
|
| Once a known-good implementation worked, I'd add additional
| compression schemes that can compress across dimensions, so that
| a level of zoom only has to save a diff from a previous layer at
| a slightly different scale or focus (like a 3+D MPEG format, this
| is probably the standard that we're currently missing, does
| anyone know of one?).
|
| This all seems fairly straightforward though, so I wonder if
| proprietary software patents are the real blocker..
| dekhn wrote:
| The OME-zarr creators are pretty experienced in this field and
| have already done the work you're describing. Let's not create
| _another_ standard after OME-zarr.
| zackmorris wrote:
| Hey thanks, I tend to agree. Although if I read this right,
| OME-zarr punted on lossy compression (which is all that
| really matters) so IMHO it doesn't have much use in the real
| world yet even though people evidently put a lot of work into
| it:
|
| https://github.com/ome/ome-zarr-py
|
| https://ngff.openmicroscopy.org/latest/
|
| https://zarr.readthedocs.io/en/v2.1.0/api/codecs.html
|
| https://www.researchgate.net/publication/356622607_OME-
| NGFF_...
|
| https://www.biorxiv.org/content/10.1101/2023.02.17.528834v1..
| .. i.munro (Ian Munro) August 15, 2023,
| 8:54am On the original point about size I
| suspect your answer is compression. I have found that a
| lot of image formats use lossy compression. I'm not
| sure about CZI specifically. AFAIK ome-zarr (sensibly
| IMO) uses only lossless compression. Best
| Ian
|
| and: sebi06 (Sebi06) August 15, 2023, 1:42pm
| Hi all, for CZI we offer currentyl two
| compression methods: JPG-XR ZSTD
| So we have lossless and lossy options depending on your
| needs. All of them are implemented in our C++, #NET or Python
| APIs and also supported from BioFormats.
|
| Looks like CZI is a competitor:
|
| https://www.zeiss.com/microscopy/en/products/software/zeiss-.
| ..
|
| https://github.com/cgohlke/czifile
|
| http://www.physics.hmc.edu/~gerbode/wppriv/wp-
| content/upload...
|
| CZI is basically JPEG compression and is probably missing the
| multidimensional compressor which is the secret sauce missing
| from the various formats. It also mentions licensing and
| legal so is maybe lost in the weeds and not open source.
|
| Without more info, it feels like this is still an open
| problem.
|
| My interest in this is for video games, since so many are
| multiple GB in size but are also missing this
| multidimensional compressor. If we had it, I'd guess a 10:1
| to 100:1 reduction in size over what we have now, with no
| perceptual loss in quality.
| thfuran wrote:
| >which is all that really matters
|
| Except for the domains where nobody uses it because it
| ruins the data.
| anthk wrote:
| Imagine that in the Medicine World with DICOM images.
| thfuran wrote:
| DICOM technically supports lossy compression, but I think
| it's regarded as something of a historical mistake. It's
| pretty much never used and some viewers will plaster big
| warnings on lossily compressed series and others will
| outright refuse to load them.
| _Microft wrote:
| > lossy compression (which is all that really matters)
|
| No, not in this case. This might be great for game files or
| whatever you are familiar with but lossy compression would
| basically mean to be mangling the data after going through
| all the effort of collecting it (I suspect you have only
| little experience with lab work?).
|
| When doing experiments, everything is documented in full
| detail. Settings on machines and equipment, preparation
| steps, version numbers of software used for processing, ...
|
| You really don't want to lose information on your
| experiment.
| dekhn wrote:
| Please don't use lossy compression for scientific data.
|
| Disk space is cheap relative to people-time and data-
| collection-time.
|
| "Perceptual" is misleading- we're not just humans looking
| at jpegs. Many people are doing hard-code quantitative
| analysis for machine learning. better to downscale the
| images (2X, 4X, 8X or more) than to use lossy compression
| if you are truly low on disk budget.
| rrock wrote:
| Lossy compression is a bad idea for scientific images. For
| instance, we often need to understand the statistics of
| photon detection events in background regions. That's one
| of the first things to get tossed.
| amelius wrote:
| It's too bad that it's based on XML and that this is actually the
| best we have.
| dekhn wrote:
| ome-zarr puts metadata in json not XML. There was an older
| standard that put the metadata in XML. I've already written
| converters (converted the XML schema to JSON schema), it wasn't
| really an issue. The raw data is in blocked compressed numpy
| ararys in a directory structure on disk.
| Ericson2314 wrote:
| Just like open access, I think at some point this sort of
| interoperability just has to be demanded by those actually
| funding the research. Two many narrow interests with no spare
| effort to coordinate otherwise.
| bafe wrote:
| It's almost the same in the field of ELN/LIMs. Researchers are
| required to use one, but there's no wider strategy mandating
| the systems to be interoperable
| dguest wrote:
| I'm in a field that would have to change significantly if
| funding agencies demanded open access. I would love it.
|
| Right not there's basically zero funding for people who work on
| making our data open. It's kind of a hobby project that you do
| if you have free time and no one has found a real job for you.
| warkdarrior wrote:
| I don't think anyone calls for funding _just for_ making the
| data open. Funding should be for some scientific goal, with
| the added requirement that all data and code are made open
| source /open access once the funded project reaches a
| milestone (e.g., when publishing a paper).
| michaelmior wrote:
| I think part of the challenge here is that in some fields,
| proprietary tools and formats are so ingrained that
| requiring openness could be a massive burden to the point
| where it would be difficult to put together a reasonable
| budget that includes _real_ open data access.
|
| I think it's important to acknowledge here that there are
| degrees of open access. Simply providing data files is
| relatively easy. Making reproducible workflows as a non-
| developer can be nearly impossible in some cases.
| tecleandor wrote:
| Haven't seen the details, but I wonder what's the main difference
| between this OME-Zarr format and DICOM WSI, that's mostly aimed
| at the same type of images.
|
| Thinking out loud, this looks like the consolidation of several
| formats and projects. Bioformats, OME, Zarr...
|
| Back in the day I had the feeling that the OME/Bioformats people
| were more centered on research, and the DICOM crowd were mostly
| from the clinical side.
| ramraj07 wrote:
| Spent a decade managing microscopy data and tbh I don't know how
| important or useful this is. People can share data in whatever
| format they have and it wouldn't be hard for me or others to
| import it one way or another. Not that I have ever felt the need
| to do such verifications. It'll take weeks to months to do
| anything like that.
| COGlory wrote:
| There's real overhead. For instance, one spec that comes to
| mind the vendor starts the 0,0 pixel in the lower right. The
| spec calls for 0,0 to be the lower left. That leads to an
| inverted handedness for all images. Not a problem in
| projection, but as soon as you start in 3D, it creates a real
| mess. Some software packages try to detect when the camera was
| this vendor, and correct for it (silently). Other software
| packages refuse to correct for it, because they don't have a
| reliable way to detect it. A pipeline might have 3-4 of these
| software packages involved, eventually you have no idea if you
| handedness is correct and no real way to tell without being at
| a high enough resolution that you can use biological landmarks
| of known handedness.
|
| Don't get me started on Euler angles.
|
| Caveat being I'm more used to electron microscopy, maybe these
| things aren't as important with light microscopy because the
| resolutions are lower?
| LeifCarrotson wrote:
| > People can share data in whatever format they have and it
| wouldn't be hard for me or others to import it one way or
| another.
|
| It's one thing for a highly-skilled user with a decade of
| experience to be able to import it eventually, another for an
| unskilled user to just have off-the-shelf tooling do it the
| same way for everyone automatically. This is more about meta-
| studies and unlocking new use cases. It sucks that the state-
| of-the-art for sharing academic microscopy data right now is
| basically looking at raster images embedded in PDFs, or email
| the authors, and then write a custom Scikit-Image script.
| Imagine if you had to read a PDF catalog and then email someone
| to order something off Amazon, or if your favorite CRUD app
| consisted instead of having an expert read a PDF and email a
| screenshots to you. What if sending those very emails to
| different recipients required implementing each users custom
| IMAP-like mail client. That sounds absurd, but it's kind of the
| way academic data sharing works now, lots of people are re-
| inventing the wheel and creating custom file formats.
|
| Consider, for example, the work of Dr. Bik (example at [1]) who
| identifies cloned sections from microscopy data. Or what if,
| instead of each researcher having to generate their own images,
| or get lucky and remember a particular image there was a Getty
| Images/AP Newsroom platform where you could just filter for
| your particular subject and imaging parameters and share your
| data. A collection of proprietary RAW files with randomly-
| formatted Excel documents for metadata would allow individual
| researchers to get their work done, but would be pretty
| worthless in comparison.
|
| [1] https://scienceintegritydigest.com/2023/06/27/concerns-
| about...
| yread wrote:
| I write a slide management platform that doesn't convert images
| to a single format and it mostly works. From time to time you
| do get issues though (cause you see millions of slides).
| Sometimes vendor's own libraries fail to open the slides (cough
| Phillips cough). Or there is a new version of firmware that
| sets some flag that wasn't used before.
|
| We support DICOM supp 145 too, but it's no panacea. There are
| still vendor specific quirks. The "surface" is larger (cause
| you expect all the metadata to be there in the standard format)
| so you still sometimes see differences.
| ahns wrote:
| I work with microscopy data and we have to convert all the
| proprietary-but-still-readable-by-some-random-package image
| data generated by microscopy companies to ome-tiff/ome-zarr for
| it to be in a manageable format. I think it's great!
| charcircuit wrote:
| >The cloud, however, treats data as a single unstructured entity
| that is either downloaded in its entirety or not
|
| S3 supports the range header. Google Cloud, Azure, and I'm sure
| other object stores support it too.
| denton-scratch wrote:
| > Each pixel must be labelled with metadata, such as illumination
| level, its 3D position, the scale, the sample type and how the
| sample was prepared.
|
| Each pixel? Why? All of those except the 3D position apply to
| _all_ the pixels in a given image, and the (2D) position of a
| pixel can be inferred from its location in the image.
|
| Wait - are there optical microscopes that can create 3D images? I
| know you can see a 3D image if you peer into a binocular
| microscope, but AFAIK cameras for those things are always 2D
| cameras.
| COGlory wrote:
| Probably not all these things need per-pixel metadata, but
| anisotropy exists and that means many of the variables you'd
| think are per-exposure are actually dependent on where in the
| exposure the individual pixel is. For instance, illumination
| level isn't uniform across a field of view for all cameras, and
| may need to be normalized.
| denton-scratch wrote:
| > may need to be normalized.
|
| But that's post-processing; isn't the TFA's argument that
| it's hard for researchers to share _raw_ data?
| COGlory wrote:
| Unless I'm mistaking the question, because the data need to
| be normalized later, you need per pixel intensity in the
| raw data.
| dekhn wrote:
| They don't really "label every pixel" in the sense that I think
| about it.
|
| Instead, they have a collection of dense arrays representing
| the image data itself, then have metadata at the per-array, or
| overall level.
|
| A typical dataset I work with is multidimensional, it starts
| as: 1) 2D planes of multiple channel image intensities,
| typically 5K x 5K pixels, each covering just part of an overall
| field of view. These are like patches when you do panoramas-
| take 20 partly overlapping shots. Each plane contains multiple
| channels- that could be "red green and blue" or more
| complicated spectral distributions.
|
| 2) 3D information- the microscope takes photos at various
| depths, only the "in-focus" (within the volume of view)
| information. These can be stacked (like depth stacking) or
| turned into a 3D "volume".
|
| 3) Maybe the data was collected over multiple time points, so
| (1) and (2) repeat every hour. Other parameters- like
| temperature, etc, could also represent an entire dimension.
|
| 4) Every 2D plane has its own key-value metadata, such as "what
| color channels were used", "what objective was used"
| (magnification), and lots of other high-dimensional attributes
| (that's what they mean by "each pixel must be labelled with
| metadata"- the 3D position is the same for every pixel in a 2D
| plane.
|
| 5)
|
| Generally all of this is modelled as structures, arrays, and
| structures of arrays/arrays of structures. In the case of OME-
| zarr, it's modelled as an n-Dimensional array with dimensions
| expressed in a filesystem hierarchy (first dimension typicall
| the outermost directory, innermost dimension usually a flat
| file containing a block of scalar points using some compressed
| numpy storage. Then at each level of the directory you have
| additional .json files which contain attributes at that level
| of the data.
|
| Those partly overlapping 2D planes are often assembled into
| panoramas, which can be a lot more convenient to work with.
| There are various tools for working with this- I've used
| navigation map javascript, but napari is a desktop app wiht
| full suport for sectioned viewing of high-dimensional (7d)
| data.
|
| OME-zarr is nice because it sort of uses the same underlying
| tech that the machine learning folks use, and it's ostensibly
| "optimized for object storage", but I still have lots of
| complaints with the implementation details, but it's important
| for me not to distract the OME-zarr team from making the
| standard successful.
| tomnicholas wrote:
| > Each pixel? Why?
|
| I use the Zarr format (for climate science data rather than
| microscope data), and I think this is just poor wording in the
| article. In the Zarr specification the metadata is stored
| separately from the actual chunks of compressed array data. So
| the metadata applies at the array level, not the pixel level.
|
| > Wait - are there optical microscopes that can create 3D
| images?
|
| I think so - they do it by scanning lots of images at different
| focal lengths to create a 3D section (I think?). There are
| whole projects just for visualizing the multi-TeraByte 3D image
| files produced - Napari is an open-source image viewer which
| opens OME-Zarr data.
| carreau wrote:
| Even on classical 2D microscope the illumination can be non-
| uniform, and you might need to calibrate your image.
|
| Source: PhD in biolab with microscopes, and napari dev.
| dekhn wrote:
| OME-zarr doesn't really storage per-pixel illumination
| data. Instead, the illumination will typically be storaged
| as a per-2D-plane metadata.
|
| Flat fields, dark fields, and light fields can all be
| stored but would be their own arrays (structure of arrays
| rather than array of structures).
| dexwiz wrote:
| There is a lot of specialized microscopes out there. Confocal
| is pretty widespread.
|
| https://en.wikipedia.org/wiki/Confocal_microscopy
| ahns wrote:
| As far as I know, not technically (although I haven't kept up
| in the area), but you can definitely sweep through a volumetric
| sample; there are microscopes that can for example illuminate a
| thin z-plane of a transparent sample and collect the image or
| those that can reject out-of-focus (off-z) light for a
| particular z-plane, then move to another z-plane, etc. and then
| generate a volume on the software side.
| Havoc wrote:
| Academic gang seems quite good at converging on a solution in
| general though.
|
| Not always a very open solution but converge they do. See matlab
| etc.
___________________________________________________________________
(page generated 2023-10-02 23:00 UTC)