[HN Gopher] A look at the Mojo language for bioinformatics
___________________________________________________________________
A look at the Mojo language for bioinformatics
Author : blindseer
Score : 82 points
Date : 2024-02-10 12:50 UTC (1 days ago)
(HTM) web link (viralinstruction.com)
(TXT) w3m dump (viralinstruction.com)
| jimbob45 wrote:
| Crystal was never able to find traction as a Ruby clone that
| could compete with C speeds. Why would a Python clone have any
| better luck? I don't think anyone would accuse Python of being
| dramatically more usable than Ruby.
| Alifatisk wrote:
| I think the appeal with Crystal is for users who already know
| Ruby, so the marked was already limited there.
|
| Crystal itself is a gem, but comparing it to Mojo and its
| relation to Python is fair but gives the wrong message. Python
| is by far more popular becuse of all the packages, so the
| market is way larger there.
| breather wrote:
| Crystal didn't have much use in ruby's sweet spot--being a DSL
| for some immensely complicated-to-configure framework (eg
| rails, chef).
| akkad33 wrote:
| Crystal is an entirely different language with a similar
| syntax. Valid Python is valid Mojo
| frou_dh wrote:
| Apparently that is the goal, but not the reality:
|
| > Mojo is still early and not yet a Python superset, so only
| simple programs can be brought over as-is with no code
| changes. We will continue investing in this and build
| migration tools as the language matures.
|
| https://docs.modular.com/mojo/faq.html#how-do-i-convert-
| pyth...
| coldtea wrote:
| Well, for the domains Mojo targets, Python is king. So a
| faster-Python-like language would have more potential
| audiences. A fast Ruby-like language, not so much, as Ruby was
| never that special in those domains, or in most places outside
| web development, and even for that it kind of lost steam in the
| past 10 years.
|
| Besides people opting for closer to C speed had Rust, Go, Java,
| Swift, and other options to go to, all with more momentum and
| support, before going for a yet unproven Ruby clone.
| pjmlp wrote:
| I used to be quite sceptical given how Swift for Tensorflow
| went, however since NVidia decided to partner with Modular,
| alongside their ongoing CUDA JIT bindings for Python, I think
| Mojo might actually work out.
| coldtea wrote:
| "Swift for Tensorflow" never had any real backing apart
| from the announcement though.
| pjmlp wrote:
| Apparently it had Google's money backing, for what it is
| worth.
|
| I never believed into it, because Swift is as relevant as
| Objective-C outside NeXT/Apple's platforms, and not the
| kind of programming language that the research community
| cares about.
| coldtea wrote:
| > _Apparently it had Google 's money backing, for what it
| is worth_
|
| You mean they paid to have it created, like they pay for
| thousands of other things.
|
| But it was never really pushed, the way they push things
| they want to promote.
| pests wrote:
| Chris Lattner has made a few comments here about Mojo the
| last few months.
|
| https://news.ycombinator.com/threads?id=chrislattner
|
| Here's his comment on swift for tensorflow:
|
| https://news.ycombinator.com/item?id=37330031
| pjmlp wrote:
| Because of the people and companies behind the project.
| jdiaz97 wrote:
| I think it's less about the language and it's more about
| Modular's product, their MAX supercomputer thingy.
| f6v wrote:
| As someone who practices bioinformatics, it doesn't seem
| appealing. Bioinformatics is like 0.1% dealing with FASTQ files
| and the rest is using the ecosystem of libraries for statistics
| and plotting. Many of them in R, by the way.
| __MatrixMan__ wrote:
| As someone who is considering a switch from generic software
| engineering towards bioinformatics, what would you say the pain
| points are?
|
| If this is not the way to remove workflow friction, what is?
| life-and-quiet wrote:
| Would like to second this question. I'm very interested in
| getting into this world, but it feels like there isn't a
| clear path (especially for someone self-taught like me).
| Bioinformatics feels pretty inaccessible without a computer
| science or biology degree, even with substantial R and Python
| experience.
| fwip wrote:
| There's a few camps in bioinformatics, from what I've seen.
|
| 1) The fellows writing papers - usually these guys have
| PhDs. Usually a science-focused PhD. 2) Analysts - often
| have a background in mathematics, biology, or big-data.
| Success here can lead to an onramp to camp 1. Much of your
| time here is spent in interactive programming environments,
| like Jupyter notebooks. 3) Programmers - writing novel or
| faster bioinformatic tools, often in low-level languages
| like C++ or Rust. Sometimes you can get a paper out of
| these, especially if you have a CS background. There's
| increasingly room for higher-level tools though here too,
| so it starts to overlap with 2. 4) Pipeline programmers -
| people gluing analysis workflows together out of the tools
| written in low-level languages, often with a liberal
| helping of Unix command-fu. Often sort of an ad-hoc role,
| containing people from diverse backgrounds, from biology to
| sysadmin. (This is my current role). 5) Biology/wetlab -
| people running experiments in the lab, and want to analyze
| their own work, especially for QC purposes. Wild-west ad-
| hoc development practices.
| tstactplsignore wrote:
| To disagree, I'm a computational biologist and it's my firm
| belief 99% of the scientifically important stuff happens before
| the stats and plotting. That's not to say I dismiss those
| things and haven't done my fair share of stats, but just that
| the difference between real results and incorrect results _most
| often_ happens before that step.
|
| I'm a microbiologist though, for stuff like human RNA-Seq I
| understand that it's often plug and play to get a gene counts
| table at this point.
| bfrankline wrote:
| Sure, but I think, for example, representation learning,
| doesn't involve manipulating an array of strings.
| folli wrote:
| I guess that depends on your exact ecological niche within
| bioinformatics.
|
| I got my start at a NGS facility, so handling FASTQ was closer
| to 80% of my time, so any speedups would have been greatly
| appreciated.
| hkmaxpro wrote:
| Related:
|
| https://news.ycombinator.com/item?id=39290958
|
| https://news.ycombinator.com/item?id=39296559
| zer00eyz wrote:
| >>> As a bioinformatician who is obsessed with high-performance,
| high-level programming, that's right in my wheelhouse!... Mojo
| currently only runs on Ubuntu and MacOS, and I run neither. So, I
| can't run any Mojo code
|
| 1. Back to the rust vs mojo article that kicked this off... this
| isnt someone who is going to use rust.
|
| 2. Availably, portability, ease of use... These are the reasons
| python is winning.
|
| 3. I am baffled that this person has to write code as part of
| their job, and does not know what a VM is! Note: This isnt a
| slight against the author, I doubt they are an isolated case. I
| think this is my own cognitive dissonance showing.
| refulgentis wrote:
| Got the same general impression, TL;DR: wrote a benchmark
| article without...running it? Then you conclude with "the
| language I use is faster!!!" based on a one-off run on your
| machine, which surely isn't the same machine Mojo used to run
| bechmarks for their website copy?
|
| It's odd to read something that's pretty well-versed with some
| relatively complex CS concepts, i.e. it's not just a PhD with a
| blank text editor. But simultaneously, makes egregiously
| obvious mistakes that I wouldn't expect any college graduate to
| roll with.
|
| There's a certain type, and I don't know what name to give it,
| especially because I certainly don't want to give it a
| condescending name. I call it "data scientist types" when I'm
| in person with someone who I trust to give me some verbal rope.
|
| Software really feels like it ate everything and everyone. So
| you end up with insanely bright people who do software
| engineering as part of their job, but miss some pieces you
| expect from trad software engineering.
| jakobnissen wrote:
| Author here. I do know about VMs. Is it too lazy for me to
| write that article and not bother to install a VM with Mojo
| (and Rust and Julia, to benchmark in the same environment)?
| Maybe. If this was for my work I certainly would have felt
| compelled to.
|
| On the other hand, the fact that Mojo doesn't run on Windows
| and most Linux distros is a point in itself. And also, would
| the blog post really be substantially improved if I had gotten
| the number of milliseconds right for the Mojo implementation on
| my computer? Of course not. It should be clear that the
| implementations are incomparable, and that a similar Julia
| implementation is very fast which implies that the reason the
| original Mojo implementation allegedly beat Rust is not because
| Mojo is faster. It's just a different program.
| zer00eyz wrote:
| >> Is it too lazy for me to write that article and not bother
| to install a VM with Mojo
|
| Yes.
|
| Would you talk about a book you didn't read? Or a movie you
| didn't see? Not on any meaningful level.
| jdiaz97 wrote:
| That's not a very good analogy, you can understand code
| without having to run it.
| mcqueenjordan wrote:
| Another point of clarification that is of great importance to the
| results, and is a common Rust newcomer error: The benchmarks for
| the Rust implementation (in the original post that got all the
| traction) were run with a /debug/ build of rust, i.e. not an
| optimized binary compiled with --release.
|
| So it was comparing something that a) didn't do meaningful
| parsing against b) the full parsing rust implementation in a non-
| optimized debug build.
| tehsauce wrote:
| How much does this particular result change when running in
| release mode?
| alpaca128 wrote:
| Depending on the code I've seen performance increases above
| 100x in some cases. While that's not exactly the norm,
| benchmarking Rust in debug mode is absolutely pointless even
| as a rough estimate.
| fwip wrote:
| On my machine, running the debug executable on the medium-
| size dataset takes ~14.5 seconds, and release mode takes ~0.8
| seconds.
| adgjlsfhk1 wrote:
| do you know why debug mode for rust is so slow? is it also
| compiling without any optimization by default? it's it
| checks for overflow?
| FridgeSeal wrote:
| The optimisation passes are expensive (not the largest
| source of compile time duration though).
|
| Debug mode is designed to build as-fast-as-possible while
| still being correct, so that you can run your binary
| (with debug symbols) ASAP.
|
| Overflow checks are present even in release mode, and
| some write-ups seem to indicate they have less overhead
| than you'd think.
|
| Rust lets your configure your cargo configs to apply some
| optimisation passes even in debug, if you wish. There's
| also a config to have your dependencies optimised (even
| in debug) if you want. The Bevy tutorial walks through
| doing this, as a concrete example.
| fwip wrote:
| Yes, optimization is disabled by default in debug mode,
| which makes your code more debuggable. Overflow checks
| are also present in debug mode, but removed in release
| mode. Bounds checking is present in release mode as well
| as debug mode, but can sometimes be optimized away.
|
| There's also some debug information that is present in
| the file in debug mode, which leads to a larger binary
| size, but shouldn't meaningfully affect performance
| except in very simple/short programs.
| SushiHippie wrote:
| Am I missing something? In the git repository [0] it says:
|
| > needletail_benchmark folder was compiled using the command
| cargo build --release and ran using the following command
| ./target/release/<binary> <path/to/file.fq>.
|
| Or are you talking about something else here?
|
| [0] https://github.com/MoSafi2/MojoFastTrim
| refulgentis wrote:
| I felt like I learned more about the author than Mojo.
|
| - Never actually runs it. Seriously.
|
| - Wants us to know it's definitely not a real parser as compared
| to Needlepoint...then 1000 words later, "real parser" means
| "handles \r\n...and validates 1st & 3rd lines begin with @ and
| +...seq and qual lines have the same length".
|
| - At the end, "Julia is faster!!!!" off a one-off run on their
| own machine, comparing it to benchmark times on the Mojo website
|
| It reads as an elaborate way to indicate they don't like that the
| Mojo website says it's faster, coupled to a entry-level
| explanation of why it is faster, coupled to disturbingly poor
| attempts to benchmark without running Mojo code
| jakobnissen wrote:
| I feel like if you believe my conclusion was that "Julia is
| faster" then you are missing the point.
|
| The point is that the original blogs claims of "Mojo is faster"
| isn't right - it's comparing different programs. That
| implementation in Mojo is faster than Needletail - but that
| doesn't say very much and I prove it by also beating Needletail
| in Julia by using the same algorithm as Mojo does. So it's the
| algorithm. Not Mojo. Not Julia.
|
| Also, did you even read my discussion on how much a parser
| ought to validate? Your resume is completely missing the point.
| refulgentis wrote:
| Yeah, I got the joke, and understood the parser.
|
| It's just, the content length : content ratio is high - all I
| got out of it was you don't like the Mojo speed claim &
| genomics parsing is text parsing*
|
| Don't take that the wrong way, I feel bad. It's just _bad for
| me_ - I 'm a mobile developer, so I was way out of my domain,
| I've barely written Python, Julia is a complete abstraction
| to me outside of HN. An alternative way to think about it is,
| I shouldn't have expected an in-depth analysis of Mojo.
|
| * i mean, everything is bytes parsing, but it always tickles
| me when I find out other domains aren't castles in the sky,
| speaking an alien language
| jakobnissen wrote:
| I yeah I get that. If you were expecting a review of Mojo,
| then the post falls short. Maybe the title should have
| emphasized the benchmark as being in question, not Mojo
| itself.
| disgruntledphd2 wrote:
| I'm a data scientist, not a bioinformatician and I really
| enjoyed the post. I too am sceptical of Mojo though, so
| maybe it just played to my biases...
| cbkeller wrote:
| It looks like you very dramatically missed the point
| stellalo wrote:
| > If I include the time for Julia to start up and compile the
| script, my implementation takes 354 ms total, on the same level
| as Mojo's.
|
| I don't think the article mentions it explicitly, but I suppose
| the timing is from Julia 1.10: as far as I can remember, this
| kind of execution time would have been impossible in Julia 1.8
| even to run a simple script.
|
| Bravo, Julia devs. Bravo.
| adgjlsfhk1 wrote:
| for a script like this that doesn't have any dependencies,
| Julia 1.10 doesn't make a significant difference. that said,
| for real world usability, Julia 1.10 is dramatically better
| than previous versions.
| fwip wrote:
| For what it's worth, I couldn't reproduce the benchmarks cited in
| the post, which claimed a 50% speedup over Rust on M1. The rust
| implementation was consistently about two to three times as fast
| as Mojo with the provided test scripts and datasets. It's
| possible I was compiling the Mojo program suboptimally, though.
| hyperfine -N --warmup 5 test/test_fastq_record
| 'needletail_test/target/release/rust_parser
| data/fastq_test.fastq' Benchmark 1: test/test_fastq_record
| Time (mean +- s): 1.936 s +- 0.086 s [User: 0.171 s,
| System: 1.386 s] Range (min ... max): 1.836 s ...
| 2.139 s 10 runs Benchmark 2:
| needletail_test/target/release/rust_parser data/fastq_test.fastq
| Time (mean +- s): 838.8 ms +- 4.4 ms [User: 578.2 ms,
| System: 254.3 ms] Range (min ... max): 833.7 ms ...
| 848.2 ms 10 runs Summary
| needletail_test/target/release/rust_parser data/fastq_test.fastq
| ran 2.31 +- 0.10 times faster than
| test/test_fastq_record
|
| (Edit: I built the Rust version with `cargo build --release` on
| Rust 1.74, and Mojo with `mojo build` on Mojo 0.7.0.)
| WeatherBrier wrote:
| The language is far from stable, but I have had a LOT of fun
| writing Mojo code. I was surprised by that! The only promising
| new languages for low-level numerical coding that can dislodge
| C/C++/Fortran somewhat, in my opinion, have been Julia/Rust. I
| feel like I can update that last list to be Julia/Rust/Mojo now.
|
| But, for my work, C++/Fortran reign supreme. I really wish Julia
| had easy AOT compilation and no GC, that would be perfect, but
| beggars can't be choosers. I am just glad that there are
| alternatives to C++/Fortran now.
|
| Rust has been great, but I have noticed something: there isn't
| much of a community of numerical/scientific/ML library writers in
| Rust. That's not a big problem, BUT, the new libraries being
| written by the communities in Julia/C++ have made me question the
| free time I have spent, writing Rust code for my domain. When it
| comes time to get serious about heterogeneous compute, you have
| to drop Rust and go back to C++/CUDA, when you try to replicate
| some of the C++/CUDA infrastructure for your own needs in Rust:
| you really feel alone! I don't like that feeling ... of
| constantly being "one of the few" interested in
| scientific/numerical code in Rust community discussions ...
|
| Mojo seems to be betting heavy on a world where deep
| heterogeneous compute abilities are table stakes, it seems the
| language is really a frontend for MLIR, that is very exciting to
| me, as someone who works at the intersection of systems
| programming and numerical programming.
|
| I don't feel like Mojo will cause any issues for Julia, I think
| that Mojo provides an alternative that complements Julia. After
| toiling away for years with C/C++/Fortran, I feel great about a
| future where I have the option of using Julia, Mojo, or Rust for
| my projects.
| adgjlsfhk1 wrote:
| > I really wish Julia had easy AOT compilation and no GC, that
| would be perfect
|
| I pretty strongly disagree with the no gc part of this. A well
| written GC has the same throughout (or higher) than reference
| counting for most applications, and the Rust approach is very
| cool, but a significant usability cliff for users that are
| domain first, CS second. A GC is a pretty good compromise for
| 99% of users since it is a minor performance cost for a fairly
| large usability gain.
| celrod wrote:
| Too bad Julia doesn't have this theoretical "well written
| GC". I do not like GCs, so I agree with OP's sentiment. Why
| solve such a hard problem when you don't have to?
|
| I don't find ownership models that difficult. It's things one
| should be thinking of anyway. I think this provides a good
| example of where stricter checking/an ownership model like
| Rust has makes it easier than languages that do not have it
| (in this case, C++): https://blog.dureuill.net/articles/too-
| dangerous-cpp/
| jdiaz97 wrote:
| Great post. I think Mojo's claims like the speedup over Rust are
| a problem, like the 65000x speedup over Python. How can we
| differentiate between good new tech and Silicon Valley
| shenanigans when they use claims like that? They do nice titles
| and slogans but are shady in substance
| ubj wrote:
| Great post, but I think the author missed a few advantages of
| Mojo:
|
| * Mojo provides first-class support for AoT compilation of
| standalone binaries [1]. Julia provides second-class support at
| best.
|
| * Mojo aims to provide first-class support for traits and a
| modern Rust-like memory ownership model. Julia has second-class
| support for traits ("Tim Holy trait trick") and uses a garbage
| collector.
|
| To be clear, I really like Julia and have been gravitating back
| to it over time. Julia has a very talented community and a
| massive head start on its package ecosystem. There are plenty of
| other strengths I could list as well.
|
| But I'm still keeping my eye on Mojo. There's nothing wrong with
| having two powerful languages learning from each other's
| innovations.
|
| [1]: https://docs.modular.com/mojo/manual/get-started/hello-
| world...
| WeatherBrier wrote:
| I feel the same way, I love using Julia, but the features that
| Mojo provides are exciting. It's great that we have both of
| them.
| jdiaz97 wrote:
| True, but the title of the blog is about Bioinformatics, and
| like another comment said:
|
| > Bioinformatics is like 0.1% dealing with FASTQ files and the
| rest is using the ecosystem of libraries for statistics and
| plotting. Many of them in R
|
| Considering that, do you need AOT, memory ownership for doing
| plotting and statistics? I'd argue not, and that's why R and
| Python are so popular in Bio.
| beanjuiceII wrote:
| doesn't this make more sense to have a python like language
| then for speed? and python for all that other stuff. so learn
| one'ish language and get it all?
| math_dandy wrote:
| I'm really excited about Mojo's potential. But I don't think it's
| ready for real use outside it's AI niche yet. Being able to call
| Mojo functions from Python is the sentinel capability I'm waiting
| for before considering its use for general purpose code.
___________________________________________________________________
(page generated 2024-02-11 23:00 UTC)