hngopher.com

       [HN Gopher] A look at the Mojo language for bioinformatics
       ___________________________________________________________________
        
       A look at the Mojo language for bioinformatics
        
       Author : blindseer
       Score  : 82 points
       Date   : 2024-02-10 12:50 UTC (1 days ago)
        
 (HTM) web link (viralinstruction.com)
 (TXT) w3m dump (viralinstruction.com)
        
       | jimbob45 wrote:
       | Crystal was never able to find traction as a Ruby clone that
       | could compete with C speeds. Why would a Python clone have any
       | better luck? I don't think anyone would accuse Python of being
       | dramatically more usable than Ruby.
        
         | Alifatisk wrote:
         | I think the appeal with Crystal is for users who already know
         | Ruby, so the marked was already limited there.
         | 
         | Crystal itself is a gem, but comparing it to Mojo and its
         | relation to Python is fair but gives the wrong message. Python
         | is by far more popular becuse of all the packages, so the
         | market is way larger there.
        
         | breather wrote:
         | Crystal didn't have much use in ruby's sweet spot--being a DSL
         | for some immensely complicated-to-configure framework (eg
         | rails, chef).
        
         | akkad33 wrote:
         | Crystal is an entirely different language with a similar
         | syntax. Valid Python is valid Mojo
        
           | frou_dh wrote:
           | Apparently that is the goal, but not the reality:
           | 
           | > Mojo is still early and not yet a Python superset, so only
           | simple programs can be brought over as-is with no code
           | changes. We will continue investing in this and build
           | migration tools as the language matures.
           | 
           | https://docs.modular.com/mojo/faq.html#how-do-i-convert-
           | pyth...
        
         | coldtea wrote:
         | Well, for the domains Mojo targets, Python is king. So a
         | faster-Python-like language would have more potential
         | audiences. A fast Ruby-like language, not so much, as Ruby was
         | never that special in those domains, or in most places outside
         | web development, and even for that it kind of lost steam in the
         | past 10 years.
         | 
         | Besides people opting for closer to C speed had Rust, Go, Java,
         | Swift, and other options to go to, all with more momentum and
         | support, before going for a yet unproven Ruby clone.
        
           | pjmlp wrote:
           | I used to be quite sceptical given how Swift for Tensorflow
           | went, however since NVidia decided to partner with Modular,
           | alongside their ongoing CUDA JIT bindings for Python, I think
           | Mojo might actually work out.
        
             | coldtea wrote:
             | "Swift for Tensorflow" never had any real backing apart
             | from the announcement though.
        
               | pjmlp wrote:
               | Apparently it had Google's money backing, for what it is
               | worth.
               | 
               | I never believed into it, because Swift is as relevant as
               | Objective-C outside NeXT/Apple's platforms, and not the
               | kind of programming language that the research community
               | cares about.
        
               | coldtea wrote:
               | > _Apparently it had Google 's money backing, for what it
               | is worth_
               | 
               | You mean they paid to have it created, like they pay for
               | thousands of other things.
               | 
               | But it was never really pushed, the way they push things
               | they want to promote.
        
             | pests wrote:
             | Chris Lattner has made a few comments here about Mojo the
             | last few months.
             | 
             | https://news.ycombinator.com/threads?id=chrislattner
             | 
             | Here's his comment on swift for tensorflow:
             | 
             | https://news.ycombinator.com/item?id=37330031
        
         | pjmlp wrote:
         | Because of the people and companies behind the project.
        
         | jdiaz97 wrote:
         | I think it's less about the language and it's more about
         | Modular's product, their MAX supercomputer thingy.
        
       | f6v wrote:
       | As someone who practices bioinformatics, it doesn't seem
       | appealing. Bioinformatics is like 0.1% dealing with FASTQ files
       | and the rest is using the ecosystem of libraries for statistics
       | and plotting. Many of them in R, by the way.
        
         | __MatrixMan__ wrote:
         | As someone who is considering a switch from generic software
         | engineering towards bioinformatics, what would you say the pain
         | points are?
         | 
         | If this is not the way to remove workflow friction, what is?
        
           | life-and-quiet wrote:
           | Would like to second this question. I'm very interested in
           | getting into this world, but it feels like there isn't a
           | clear path (especially for someone self-taught like me).
           | Bioinformatics feels pretty inaccessible without a computer
           | science or biology degree, even with substantial R and Python
           | experience.
        
             | fwip wrote:
             | There's a few camps in bioinformatics, from what I've seen.
             | 
             | 1) The fellows writing papers - usually these guys have
             | PhDs. Usually a science-focused PhD. 2) Analysts - often
             | have a background in mathematics, biology, or big-data.
             | Success here can lead to an onramp to camp 1. Much of your
             | time here is spent in interactive programming environments,
             | like Jupyter notebooks. 3) Programmers - writing novel or
             | faster bioinformatic tools, often in low-level languages
             | like C++ or Rust. Sometimes you can get a paper out of
             | these, especially if you have a CS background. There's
             | increasingly room for higher-level tools though here too,
             | so it starts to overlap with 2. 4) Pipeline programmers -
             | people gluing analysis workflows together out of the tools
             | written in low-level languages, often with a liberal
             | helping of Unix command-fu. Often sort of an ad-hoc role,
             | containing people from diverse backgrounds, from biology to
             | sysadmin. (This is my current role). 5) Biology/wetlab -
             | people running experiments in the lab, and want to analyze
             | their own work, especially for QC purposes. Wild-west ad-
             | hoc development practices.
        
         | tstactplsignore wrote:
         | To disagree, I'm a computational biologist and it's my firm
         | belief 99% of the scientifically important stuff happens before
         | the stats and plotting. That's not to say I dismiss those
         | things and haven't done my fair share of stats, but just that
         | the difference between real results and incorrect results _most
         | often_ happens before that step.
         | 
         | I'm a microbiologist though, for stuff like human RNA-Seq I
         | understand that it's often plug and play to get a gene counts
         | table at this point.
        
           | bfrankline wrote:
           | Sure, but I think, for example, representation learning,
           | doesn't involve manipulating an array of strings.
        
         | folli wrote:
         | I guess that depends on your exact ecological niche within
         | bioinformatics.
         | 
         | I got my start at a NGS facility, so handling FASTQ was closer
         | to 80% of my time, so any speedups would have been greatly
         | appreciated.
        
       | hkmaxpro wrote:
       | Related:
       | 
       | https://news.ycombinator.com/item?id=39290958
       | 
       | https://news.ycombinator.com/item?id=39296559
        
       | zer00eyz wrote:
       | >>> As a bioinformatician who is obsessed with high-performance,
       | high-level programming, that's right in my wheelhouse!... Mojo
       | currently only runs on Ubuntu and MacOS, and I run neither. So, I
       | can't run any Mojo code
       | 
       | 1. Back to the rust vs mojo article that kicked this off... this
       | isnt someone who is going to use rust.
       | 
       | 2. Availably, portability, ease of use... These are the reasons
       | python is winning.
       | 
       | 3. I am baffled that this person has to write code as part of
       | their job, and does not know what a VM is! Note: This isnt a
       | slight against the author, I doubt they are an isolated case. I
       | think this is my own cognitive dissonance showing.
        
         | refulgentis wrote:
         | Got the same general impression, TL;DR: wrote a benchmark
         | article without...running it? Then you conclude with "the
         | language I use is faster!!!" based on a one-off run on your
         | machine, which surely isn't the same machine Mojo used to run
         | bechmarks for their website copy?
         | 
         | It's odd to read something that's pretty well-versed with some
         | relatively complex CS concepts, i.e. it's not just a PhD with a
         | blank text editor. But simultaneously, makes egregiously
         | obvious mistakes that I wouldn't expect any college graduate to
         | roll with.
         | 
         | There's a certain type, and I don't know what name to give it,
         | especially because I certainly don't want to give it a
         | condescending name. I call it "data scientist types" when I'm
         | in person with someone who I trust to give me some verbal rope.
         | 
         | Software really feels like it ate everything and everyone. So
         | you end up with insanely bright people who do software
         | engineering as part of their job, but miss some pieces you
         | expect from trad software engineering.
        
         | jakobnissen wrote:
         | Author here. I do know about VMs. Is it too lazy for me to
         | write that article and not bother to install a VM with Mojo
         | (and Rust and Julia, to benchmark in the same environment)?
         | Maybe. If this was for my work I certainly would have felt
         | compelled to.
         | 
         | On the other hand, the fact that Mojo doesn't run on Windows
         | and most Linux distros is a point in itself. And also, would
         | the blog post really be substantially improved if I had gotten
         | the number of milliseconds right for the Mojo implementation on
         | my computer? Of course not. It should be clear that the
         | implementations are incomparable, and that a similar Julia
         | implementation is very fast which implies that the reason the
         | original Mojo implementation allegedly beat Rust is not because
         | Mojo is faster. It's just a different program.
        
           | zer00eyz wrote:
           | >> Is it too lazy for me to write that article and not bother
           | to install a VM with Mojo
           | 
           | Yes.
           | 
           | Would you talk about a book you didn't read? Or a movie you
           | didn't see? Not on any meaningful level.
        
             | jdiaz97 wrote:
             | That's not a very good analogy, you can understand code
             | without having to run it.
        
       | mcqueenjordan wrote:
       | Another point of clarification that is of great importance to the
       | results, and is a common Rust newcomer error: The benchmarks for
       | the Rust implementation (in the original post that got all the
       | traction) were run with a /debug/ build of rust, i.e. not an
       | optimized binary compiled with --release.
       | 
       | So it was comparing something that a) didn't do meaningful
       | parsing against b) the full parsing rust implementation in a non-
       | optimized debug build.
        
         | tehsauce wrote:
         | How much does this particular result change when running in
         | release mode?
        
           | alpaca128 wrote:
           | Depending on the code I've seen performance increases above
           | 100x in some cases. While that's not exactly the norm,
           | benchmarking Rust in debug mode is absolutely pointless even
           | as a rough estimate.
        
           | fwip wrote:
           | On my machine, running the debug executable on the medium-
           | size dataset takes ~14.5 seconds, and release mode takes ~0.8
           | seconds.
        
             | adgjlsfhk1 wrote:
             | do you know why debug mode for rust is so slow? is it also
             | compiling without any optimization by default? it's it
             | checks for overflow?
        
               | FridgeSeal wrote:
               | The optimisation passes are expensive (not the largest
               | source of compile time duration though).
               | 
               | Debug mode is designed to build as-fast-as-possible while
               | still being correct, so that you can run your binary
               | (with debug symbols) ASAP.
               | 
               | Overflow checks are present even in release mode, and
               | some write-ups seem to indicate they have less overhead
               | than you'd think.
               | 
               | Rust lets your configure your cargo configs to apply some
               | optimisation passes even in debug, if you wish. There's
               | also a config to have your dependencies optimised (even
               | in debug) if you want. The Bevy tutorial walks through
               | doing this, as a concrete example.
        
               | fwip wrote:
               | Yes, optimization is disabled by default in debug mode,
               | which makes your code more debuggable. Overflow checks
               | are also present in debug mode, but removed in release
               | mode. Bounds checking is present in release mode as well
               | as debug mode, but can sometimes be optimized away.
               | 
               | There's also some debug information that is present in
               | the file in debug mode, which leads to a larger binary
               | size, but shouldn't meaningfully affect performance
               | except in very simple/short programs.
        
         | SushiHippie wrote:
         | Am I missing something? In the git repository [0] it says:
         | 
         | > needletail_benchmark folder was compiled using the command
         | cargo build --release and ran using the following command
         | ./target/release/<binary> <path/to/file.fq>.
         | 
         | Or are you talking about something else here?
         | 
         | [0] https://github.com/MoSafi2/MojoFastTrim
        
       | refulgentis wrote:
       | I felt like I learned more about the author than Mojo.
       | 
       | - Never actually runs it. Seriously.
       | 
       | - Wants us to know it's definitely not a real parser as compared
       | to Needlepoint...then 1000 words later, "real parser" means
       | "handles \r\n...and validates 1st & 3rd lines begin with @ and
       | +...seq and qual lines have the same length".
       | 
       | - At the end, "Julia is faster!!!!" off a one-off run on their
       | own machine, comparing it to benchmark times on the Mojo website
       | 
       | It reads as an elaborate way to indicate they don't like that the
       | Mojo website says it's faster, coupled to a entry-level
       | explanation of why it is faster, coupled to disturbingly poor
       | attempts to benchmark without running Mojo code
        
         | jakobnissen wrote:
         | I feel like if you believe my conclusion was that "Julia is
         | faster" then you are missing the point.
         | 
         | The point is that the original blogs claims of "Mojo is faster"
         | isn't right - it's comparing different programs. That
         | implementation in Mojo is faster than Needletail - but that
         | doesn't say very much and I prove it by also beating Needletail
         | in Julia by using the same algorithm as Mojo does. So it's the
         | algorithm. Not Mojo. Not Julia.
         | 
         | Also, did you even read my discussion on how much a parser
         | ought to validate? Your resume is completely missing the point.
        
           | refulgentis wrote:
           | Yeah, I got the joke, and understood the parser.
           | 
           | It's just, the content length : content ratio is high - all I
           | got out of it was you don't like the Mojo speed claim &
           | genomics parsing is text parsing*
           | 
           | Don't take that the wrong way, I feel bad. It's just _bad for
           | me_ - I 'm a mobile developer, so I was way out of my domain,
           | I've barely written Python, Julia is a complete abstraction
           | to me outside of HN. An alternative way to think about it is,
           | I shouldn't have expected an in-depth analysis of Mojo.
           | 
           | * i mean, everything is bytes parsing, but it always tickles
           | me when I find out other domains aren't castles in the sky,
           | speaking an alien language
        
             | jakobnissen wrote:
             | I yeah I get that. If you were expecting a review of Mojo,
             | then the post falls short. Maybe the title should have
             | emphasized the benchmark as being in question, not Mojo
             | itself.
        
               | disgruntledphd2 wrote:
               | I'm a data scientist, not a bioinformatician and I really
               | enjoyed the post. I too am sceptical of Mojo though, so
               | maybe it just played to my biases...
        
         | cbkeller wrote:
         | It looks like you very dramatically missed the point
        
       | stellalo wrote:
       | > If I include the time for Julia to start up and compile the
       | script, my implementation takes 354 ms total, on the same level
       | as Mojo's.
       | 
       | I don't think the article mentions it explicitly, but I suppose
       | the timing is from Julia 1.10: as far as I can remember, this
       | kind of execution time would have been impossible in Julia 1.8
       | even to run a simple script.
       | 
       | Bravo, Julia devs. Bravo.
        
         | adgjlsfhk1 wrote:
         | for a script like this that doesn't have any dependencies,
         | Julia 1.10 doesn't make a significant difference. that said,
         | for real world usability, Julia 1.10 is dramatically better
         | than previous versions.
        
       | fwip wrote:
       | For what it's worth, I couldn't reproduce the benchmarks cited in
       | the post, which claimed a 50% speedup over Rust on M1. The rust
       | implementation was consistently about two to three times as fast
       | as Mojo with the provided test scripts and datasets. It's
       | possible I was compiling the Mojo program suboptimally, though.
       | hyperfine -N --warmup 5 test/test_fastq_record
       | 'needletail_test/target/release/rust_parser
       | data/fastq_test.fastq'       Benchmark 1: test/test_fastq_record
       | Time (mean +- s):      1.936 s +-  0.086 s    [User: 0.171 s,
       | System: 1.386 s]         Range (min ... max):    1.836 s ...
       | 2.139 s    10 runs              Benchmark 2:
       | needletail_test/target/release/rust_parser data/fastq_test.fastq
       | Time (mean +- s):     838.8 ms +-   4.4 ms    [User: 578.2 ms,
       | System: 254.3 ms]         Range (min ... max):   833.7 ms ...
       | 848.2 ms    10 runs              Summary
       | needletail_test/target/release/rust_parser data/fastq_test.fastq
       | ran           2.31 +- 0.10 times faster than
       | test/test_fastq_record
       | 
       | (Edit: I built the Rust version with `cargo build --release` on
       | Rust 1.74, and Mojo with `mojo build` on Mojo 0.7.0.)
        
       | WeatherBrier wrote:
       | The language is far from stable, but I have had a LOT of fun
       | writing Mojo code. I was surprised by that! The only promising
       | new languages for low-level numerical coding that can dislodge
       | C/C++/Fortran somewhat, in my opinion, have been Julia/Rust. I
       | feel like I can update that last list to be Julia/Rust/Mojo now.
       | 
       | But, for my work, C++/Fortran reign supreme. I really wish Julia
       | had easy AOT compilation and no GC, that would be perfect, but
       | beggars can't be choosers. I am just glad that there are
       | alternatives to C++/Fortran now.
       | 
       | Rust has been great, but I have noticed something: there isn't
       | much of a community of numerical/scientific/ML library writers in
       | Rust. That's not a big problem, BUT, the new libraries being
       | written by the communities in Julia/C++ have made me question the
       | free time I have spent, writing Rust code for my domain. When it
       | comes time to get serious about heterogeneous compute, you have
       | to drop Rust and go back to C++/CUDA, when you try to replicate
       | some of the C++/CUDA infrastructure for your own needs in Rust:
       | you really feel alone! I don't like that feeling ... of
       | constantly being "one of the few" interested in
       | scientific/numerical code in Rust community discussions ...
       | 
       | Mojo seems to be betting heavy on a world where deep
       | heterogeneous compute abilities are table stakes, it seems the
       | language is really a frontend for MLIR, that is very exciting to
       | me, as someone who works at the intersection of systems
       | programming and numerical programming.
       | 
       | I don't feel like Mojo will cause any issues for Julia, I think
       | that Mojo provides an alternative that complements Julia. After
       | toiling away for years with C/C++/Fortran, I feel great about a
       | future where I have the option of using Julia, Mojo, or Rust for
       | my projects.
        
         | adgjlsfhk1 wrote:
         | > I really wish Julia had easy AOT compilation and no GC, that
         | would be perfect
         | 
         | I pretty strongly disagree with the no gc part of this. A well
         | written GC has the same throughout (or higher) than reference
         | counting for most applications, and the Rust approach is very
         | cool, but a significant usability cliff for users that are
         | domain first, CS second. A GC is a pretty good compromise for
         | 99% of users since it is a minor performance cost for a fairly
         | large usability gain.
        
           | celrod wrote:
           | Too bad Julia doesn't have this theoretical "well written
           | GC". I do not like GCs, so I agree with OP's sentiment. Why
           | solve such a hard problem when you don't have to?
           | 
           | I don't find ownership models that difficult. It's things one
           | should be thinking of anyway. I think this provides a good
           | example of where stricter checking/an ownership model like
           | Rust has makes it easier than languages that do not have it
           | (in this case, C++): https://blog.dureuill.net/articles/too-
           | dangerous-cpp/
        
       | jdiaz97 wrote:
       | Great post. I think Mojo's claims like the speedup over Rust are
       | a problem, like the 65000x speedup over Python. How can we
       | differentiate between good new tech and Silicon Valley
       | shenanigans when they use claims like that? They do nice titles
       | and slogans but are shady in substance
        
       | ubj wrote:
       | Great post, but I think the author missed a few advantages of
       | Mojo:
       | 
       | * Mojo provides first-class support for AoT compilation of
       | standalone binaries [1]. Julia provides second-class support at
       | best.
       | 
       | * Mojo aims to provide first-class support for traits and a
       | modern Rust-like memory ownership model. Julia has second-class
       | support for traits ("Tim Holy trait trick") and uses a garbage
       | collector.
       | 
       | To be clear, I really like Julia and have been gravitating back
       | to it over time. Julia has a very talented community and a
       | massive head start on its package ecosystem. There are plenty of
       | other strengths I could list as well.
       | 
       | But I'm still keeping my eye on Mojo. There's nothing wrong with
       | having two powerful languages learning from each other's
       | innovations.
       | 
       | [1]: https://docs.modular.com/mojo/manual/get-started/hello-
       | world...
        
         | WeatherBrier wrote:
         | I feel the same way, I love using Julia, but the features that
         | Mojo provides are exciting. It's great that we have both of
         | them.
        
         | jdiaz97 wrote:
         | True, but the title of the blog is about Bioinformatics, and
         | like another comment said:
         | 
         | > Bioinformatics is like 0.1% dealing with FASTQ files and the
         | rest is using the ecosystem of libraries for statistics and
         | plotting. Many of them in R
         | 
         | Considering that, do you need AOT, memory ownership for doing
         | plotting and statistics? I'd argue not, and that's why R and
         | Python are so popular in Bio.
        
           | beanjuiceII wrote:
           | doesn't this make more sense to have a python like language
           | then for speed? and python for all that other stuff. so learn
           | one'ish language and get it all?
        
       | math_dandy wrote:
       | I'm really excited about Mojo's potential. But I don't think it's
       | ready for real use outside it's AI niche yet. Being able to call
       | Mojo functions from Python is the sentinel capability I'm waiting
       | for before considering its use for general purpose code.
        
       ___________________________________________________________________
       (page generated 2024-02-11 23:00 UTC)