[HN Gopher] Correctness and composability bugs in the Julia ecos...
___________________________________________________________________
Correctness and composability bugs in the Julia ecosystem
Author : benjojo12
Score : 582 points
Date : 2022-05-16 13:28 UTC (9 hours ago)
(HTM) web link (yuri.is)
(TXT) w3m dump (yuri.is)
| j7ake wrote:
| In terms of saving human time I have found R to be fastest (in
| human time) for iterative prototyping, exploring, and visualising
| data
|
| R still has the best statistical package ecosystem, although
| python is catching up.
| one-more-minute wrote:
| It might be useful to separate the issues that are "just" bugs
| from the problems that come with Julia's unusual level of
| composability. I have no idea if Julia has more bog-standard,
| local bugs - things like data structure problems or compiler
| faults - than other languages of comparable maturity and
| resources, but clearly the OP has bumped into several, which is
| frustrating.
|
| The composition bugs - as in offsetarrays or AD - are a bit of a
| special case. In most languages package A will only work with
| package B if it's specifically designed to, and the combination
| will be explicitly developed and tested. That A and B _can_ work
| together by default in Julia is really cool, but it also means
| that as you add new types and packages, you have a quadratically
| growing set of untested edges.
|
| The canonical solution is strict interfaces. But Julia is laissez
| faire about those too (with some good reasons). Together this
| means that if A doesn't work with B as expected, it's not always
| easy even to assign fault, and both might be reluctant to
| effectively special-case the other. Program transformations
| (autodiff) compound this problem, because the default is that you
| promise to support the universe, and it's not easy to opt out of
| the weird cases.
|
| I think it's absolutely right to celebrate Julia's approach to
| composition. I also hope new research (in Julia or elsewhere)
| will help us figure out how to tame it a bit.
| Sebb767 wrote:
| > That A and B can work together by default in Julia is really
| cool, but it also means that as you add new types and packages,
| you have a quadratically growing set of untested edges.
|
| But as the authors example showed, they clearly can't work
| together - they just fail at runtime instead of at compile
| time.
|
| Other languages have generics and interfaces to make stuff like
| this dynamically exchangeable. Sure, your code needs to be
| designed to support this, but it also means that the author
| explicitly thought about what they expect from their data
| structures. If they don't, you might suddenly find yourself
| violating implicit assumptions like arrays starting at 1.
| SemanticStrengh wrote:
| any tutorial/blog on what make julia compositionability special
| vs othe languages? Is there a relation with multiple dispatch
| or delegation?
| rashidrafeek wrote:
| Yes. Its a side effect of multiple dispatch being the core
| paradigm of the language. See Stefan Karpinski's talk about
| it: https://www.youtube.com/watch?v=kc9HwsxE1OY
| FabHK wrote:
| The title of Stefan's talk is great: _The Unreasonable
| Effectiveness of Multiple Dispatch_. He gives a nice
| example of composability: how you can throw a new type into
| an existing algorithm and it just works.
| jpeloquin wrote:
| The "Unreasonable Effectiveness of Multiple Dispatch" talk is
| a good example of how multiple dispatch is special in a good
| way, in that everything (should) work together as new types
| and functions are added to the ecosystem. However, this also
| means the scope of potential integration bugs encompasses the
| entire ecosystem. The Julia manual has a small section about
| special composibility pitfalls arising from multiple
| dispatch:
| https://docs.julialang.org/en/v1/manual/methods/#man-
| method-...
|
| As best as I can summarize it: Multiple dispatch is supposed
| to dispatch a function call to the implementation with the
| most "specific" call signature. This means that you must
| design your functions with an eye to what everyone else has
| implemented or might implement so whatever function gets
| called does the "right" thing, and also that your
| implementation doesn't block someone else from writing their
| own implementation specialized to other types. This requires
| some coordination across packages, as shown in one of the
| manual's examples.
|
| The rules defining type specificity (subtyping) are
| complicated, and I think not in the manual. They have been
| inferred by observation:
| http://janvitek.org/pubs/oopsla18a.pdf. To quote from that
| paper, "In many systems answering the question whether t1 <:
| t2 is an easy part of the development. It was certainly not
| our expectation, approaching Julia, that reverse engineering
| and formalizing the subtype relation would prove to be the
| challenge on which we would spend our time and energy. As we
| kept uncovering layers of complexity, the question whether
| all of this was warranted kept us looking for ways to
| simplify the subtype relation. We did not find any major
| feature that could be dropped." Julia's multiple dispatch
| allows a high degree of composibility, but this does create
| new complexity and new problems.
| chalst wrote:
| Julia has a very nice type system, the nicest of any
| dynamically typed language I am familiar with. This is
| something to do with multiple dispatch, but it's more to do
| with trying to have a type system that allows all the JIT to
| unbox all the things that have to be unboxed for high
| performance without sacrificing the freedom of dynamic
| typing.
|
| IIUC, Common Lisp is the giant on whose shoulders Julia built
| in this respect.
| NeutralForest wrote:
| I mean this looks like good potential targets to improve the
| language moving forward, it's healthy to not be in awe of your
| tools and push to make them better. I don't see this as "bad"
| honestly.
| s_Hogg wrote:
| It seems like the point of the article is that that push is
| insubstantial, if it even exists. Given the language has been
| around this long it's a bit worrying that stuff like that is
| the potential target for moving a language forward.
|
| Julia has always had a reputation in my mind at least of being
| "by academics, for academics" and there's unfortunately a dark
| side to that in terms of reliability and maintainability. The
| concept and goals are great, which is annoying. If this
| language had stayed focussed on the basics, it would be
| extremely handy for someone like me who trains and deploys
| models in an edge computing environment. No way I'm doing that
| with stuff like this going on.
| NeutralForest wrote:
| I suppose we'll see? Honestly this is maybe an opportunity to
| adjust some goals of the language if this is the feeling
| people are having now and outreach to purely CS and SE people
| will probably be needed but seeing the presence it has at the
| MIT, I don't see it being a problem.
| fluidcruft wrote:
| For what it's worth many people feel similarly about R. R is
| great for people actively working in statistics research (I
| assume because that's what I'm always told). But for a lot of
| us who just want to do some analysis, it's constantly
| breaking and we've learned to default to just starting from
| scratch when we need to revisit something we did a few years
| ago. Or we figure out how to buy a commercial system.
| tylermw wrote:
| R is not constantly breaking. R Core does a remarkable job
| ensuring backwards compatibility. There are only a few
| prominent examples of significant "breaking" behavior
| across decades of the language existing, and those can
| often be reverted by setting an option (e.g.
| `options(stringsAsFactors = TRUE)`). But backwards
| compatibility is the primary concern with any update to the
| R language or the packages maintained by R Core.
|
| Now, if you're thinking about changes introduced by a
| specific user-contributed package breaking your analysis,
| that can indeed be a problem. But that can't be blamed on
| the R language. And the main user-contributed R statistics
| packages that have been around for decades (such as lme4 or
| survival) are mature and stable.
| [deleted]
| CoastalCoder wrote:
| I think the real test will be whether or not Julia's custodians
| / developers start putting a greater focus on semantics and
| correctness.
|
| When a language's raison d'etre is to try out certain ideas, it
| probably makes sense _for a while_ to ignore corner cases and
| rigor. But as the author points out, they eventually become
| gating factors for wider adoption.
| NeutralForest wrote:
| It's still at version 1.x, maybe an explicit roadmap could
| help tackling those issues?
| markkitti wrote:
| The question here is are these merely just bugs or is there
| something about the language that makes Julia error prone?
|
| There is potential in using Julia's type inference engine to
| check for correctness. For example see JET.jl. "JET.jl
| employs Julia's type inference to detect potential bugs."
|
| https://github.com/aviatesk/JET.jl
| https://www.youtube.com/watch?v=7eOiGc8wfE0
|
| The video brings up some potential difficulties with Julia's
| metaprogramming facilities for static or lexical analysis,
| but also shows that these issues are also addressable.
|
| The type inference system could be exploited for further
| effect. For example, the type system could be extended to
| check for shape information within the type as demonstrated
| in this prototype:
| https://twitter.com/KenoFischer/status/1407810981338796035
|
| Julia has guard rails (e.g. default bounds checking), but
| also also provides facilities to work outside them
| (`@inbounds`, `unsafe_*` methods, `ccall`, in-place methods
| with a `!` suffix). Typically these provide features that
| trade safety for performance or access to features. Used
| judiciously one can achieve a balance between performance and
| safety. Julia is not a language that restricts its users to a
| sandbox in the name of safety, but it does provide bounds of
| where the sandbox is and is not.
|
| Another take away from the original blog post is that much
| Julia development is happening in the open on Github. These
| issues and their fixes just require a Github account to
| contribute to. Is this a feature?
| freemint wrote:
| Jet.jl is far from a solution. Over short or long
| JuliaComputing (or someone else) will have to pay people
| full time to develop such tools if it wants to see larger
| adoption. Nobody expects Julia to be system language a
| language you write an OS in). The later those tools come
| the more code will need to be fixed up.
| rpmuller wrote:
| I've been a part of many language communities, and that the Julia
| team is the very best in terms of the professionalism of the
| language and the key modules.
|
| Maybe the best response to this is to view it as a call to action
| for us Julia fanboys/girls to stop cheering and fix some bugs
| ;-).
| CJefferson wrote:
| I've had a couple of conversations on twitter with Viral B Shah
| (co creator of Julia) which I found unprofessional, so I
| stopped learning Julia. Unless he was just having a very bad
| day, in my opinion he takes badly to minor criticism of Julia
| (although others might disagree).
|
| Edit, here is one thread I could find quickly: <EDIT2: edited
| out link which most people seem to think is actually fine, just
| people getting slightly annoyed on Twitter. I deleted the link
| as people were going and interacting with people in the old
| thread>
|
| The comments aren't particularly bad, but they do feel to me
| like making a bad faith interpretation of someone's comment,
| then digging in. I don't feel that's a good way to talk to
| users, and ethos comes from the top.
| Sukera wrote:
| Do you have an example? I'd like to know more about this - it
| must have been quite egregious if it makes you stop learning
| a language.
| CJefferson wrote:
| I posted one in. It isn't that bad, but to be honest
| nowadays I believe the community of a language is as
| important, if not more important, than the language itself.
| I don't want to get into a community whose leaders just
| start jumping on random minor Twitter users.
| joaogui1 wrote:
| That guy is a famous Kaggler that works for Nvidia, not a
| minor Twitter user
| CJefferson wrote:
| Honestly, it's interesting you say that, I went and
| looked at his follower count and see what you mean. I
| knew him back before any of us had Twitter :)
| chrsig wrote:
| That thread is just ripe with bad communication across the
| board. It's pretty clear that none of you understand what
| each other is saying, but are very willing to infer.
|
| Maybe try not communicating on twitter.
| cbkeller wrote:
| I don't see anything problematic in what Viral said here; I
| think it would be fair to say your initial take ("Julia has
| been the future of machine learning for 10 years and will
| stay as the future of machine learning for the next 10
| years") is likely to be perceived as at least somewhat
| inflammatory, a defensive response is natural enough in that
| context.
| CJefferson wrote:
| What part of the conversation justifies "If you truly
| believe that nobody will ever adopt anything new, we would
| all have been programming in Fortran or assembly!"? To me
| that is a stupid escalation -- noone was suggesting not to
| do new things, Python (the discussed AI alternative) is of
| course newer than Fortran and assembly for a start!
|
| That just seemed like a bizarre overreaction to me.
| urschrei wrote:
| With the greatest respect, nothing about his comment is
| inflammatory in the least, and I say this as someone who
| is avowedly skeptical about the ability of the Julia
| creators to accept criticism.
| CJefferson wrote:
| It's nice to hear an independent viewpoint. To me it was
| "oh, so randomo on Twitter is coming in randomly and
| looking angry. Oh, it's not a randomo, it's to co-creator
| of Julia!".
| saghm wrote:
| Yeah, I fully expected based on the description of the
| twitter interaction to see something really terrible, and
| from actually looking at it, it seems pretty mild. If
| anything, it seems like they went out of their way to try
| to bait the Julia creator and he had a fairly reasonable
| response to it. I'm not sure what could be considered
| "inflammatory" about any that.
| cs702 wrote:
| A more appropriate title for the OP would have been:
|
| "A new language that makes it easy to write and use _generic
| algorithms_ on a growing number of _custom types developed by
| others_ is bound to experience growing pains as difficult-to-
| foresee correctness bugs have to be discovered and fixed over
| time. "
|
| In my humble opinion, this kind of _universal composability_ ,
| which Julia makes easy via multiple dispatch and naming
| conventions, is the underlying root cause of all the correctness
| bugs that have surfaced as the language has evolved. But the bugs
| are being fixed, one at a time, and ultimately the result should
| be both beautiful and powerful. We will be all be thankful for
| it!
| mbauman wrote:
| The most tragic thing here to me is that we're losing Yuri --
| who has been an invaluable contributor and bug-reporter for
| issues like these -- and that Yuri got burned out instead of
| feeling empowered.
| cs702 wrote:
| Yeah, good point. Sometimes I wonder if the fact that so many
| of the folks developing and using Julia are both highly
| educated (e.g., in math) and insanely smart (evidently) is a
| _barrier to mass adoption_. That is, I wonder if the broader
| mass of developers out there -- many of whom are less
| knowledgeable -- find it difficult to benefit from and
| contribute to the Julia ecosystem.
| lostmsu wrote:
| I am not sure I know of any statically typed languages with
| generics, that experienced the same kind of problems on
| multiple occasions. The only one I am aware of is C# and array
| variance, which is kept for compatibility purposes.
| QuackingTheQ wrote:
| I've spent a lot of time developing large computational codebases
| in Julia, and I think the most insidious of these issues is a
| product of no formal way of enforcing interfaces. Using one of
| the common packages to build a trait system and add some sort of
| guarantee that all the right methods are implemented for a given
| trait simplifies maintenance dramatically.
|
| This doesn't catch mathematical bugs, but those crop up
| everywhere. Instead, knowing what the interfaces must be
| specified so you can trust your implementation is crucial, and
| being able to know when it is invalidated is invaluable.
|
| I've had a few awful bugs involving some of the larger projects
| in this language, but a proper interface/trait system would
| simplify things exponentially. There are some coding style things
| that need to be changed to address this, like using `eachindex`
| instead of `1:length(A)` for array iteration as the example in
| the article points out. However, these should be one-off lessons
| to learn, and a good code linter should be able to catch
| potential errors like this.
|
| Between a good code linter (or some static analysis, I'm pulling
| for JET.jl) and a formal interface spec, I really think most of
| Julia's development-side issues could be quelled.
| fluidcruft wrote:
| Could some of the need for interfaces be addressed by providing
| an extensive test battery for types of object? It seems like if
| something claims to be an implementation of a floating point
| number it should be possible to smash that type into every
| error ever found to uncover implementation errors.
| mcabbott wrote:
| Yes, although that seems like the easy half of this, making
| sure `struct NewNum <: AbstractFloat` defines everything.
| There aren't yet tools for this but they are easy to imagine.
| And missing methods do give errors.
|
| The hard half seems to be correctness of functions which
| accept quite generic objects. For example writing
| `f(x::Number)` in order to allow units, means you also allow
| quaternions, but many functions doing that will incorrectly
| assume numbers commute. (And not caring is, for 99% of these,
| the intention. But it's not encoded anywhere.) Less
| obviously, we can differentiate many things by passing dual
| numbers through `f(x::Real)`, but this tends to find edge
| cases nobody thought of. Right now if your algorithm branches
| on `if det(X) == 0` (or say a check that X is upper
| triangular) then it will sometimes give wrong answers. This
| one should be fixed soon, but I am sure there are other
| subtleties.
| [deleted]
| QuackingTheQ wrote:
| It's possible to hack interface verification into place at
| test-time, but that has a couple of problems:
|
| 1. Running the whole testing framework to determine if you
| implemented an interface is a high overhead when you're
| developing
|
| 2. You have a lot of tests to write to really check every
| error. Perhaps a package which defines an interface could
| provide a tester for this purpose
|
| 3. Interfaces should be attached to the types, and that
| should be sufficient for verifying the interface
|
| I would settle for something like checking for the
| implementation of methods a la BinaryTraits.jl over what we
| have now, which is nothing. A huge step would be
| documentation and automated testing that proper interface
| methods are implemented, not even verifying if they're
| "correct". This drastically reduces the surface area you need
| to write and check to confirm compatibility with outside
| code.
|
| This simple interface specification does produce design
| issues of its own, but correctness is much easier to handle
| if you know what needs to be correct in the first place.
| ThenAsNow wrote:
| I agree with the kernel of your point here, but also with the
| author of the article when he says "But systemic problems like
| this can rarely be solved from the bottom up, and my sense is
| that the project leadership does not agree that there is a
| serious correctness problem. They accept the existence of
| individual isolated issues, but not the pattern that those
| issues imply."
|
| My impression is that the Julia core devs are more focused on
| functionality and being able to construct new, more powerful,
| faster capabilities than on reflecting on how the foundations
| could or should be made more rigorous. For this, I think the
| devs have to philosophically agree that soundness in the large
| should be a first-tier guiding principle, and that the language
| should have mechanisms whereby correctness-by-construction can
| be encouraged, if not enforced. Presently, notions of soundness
| seems to only be considered in the small, such as the behavior
| of specific floating point ops. Basically, I don't think the
| core devs are as concerned with soundness, rigor, and
| consistency as they are with being able to build more
| impressive capabilities.
|
| I don't want this to sound like I'm ungrateful for the
| awesomeness that Julia and its ecosystem does bring to the
| table. For numerical computing, I don't see any alternatives
| whose tradeoffs are more favorable. But it is disappointing
| that it doesn't seem to learn the lessons about rigorous
| language design and the language-level implications for
| engineering vs. craftsmanship appropriate for a twenty-first
| century language.
| FabHK wrote:
| Sounds like Julia needs a Snow Leopard/Mountain Lion/High
| Sierra release - no new features, just cleaning things up...
| dekhn wrote:
| Wait, are those examples real?
|
| I remember complaining about 1-bsaed indexing only to be told
| "julia is great! we have offsetindex". If it's a source of bugs,
| that ... greatly reduces my future interest in adopting the
| language.
| orbifold wrote:
| I was bit by trying to figure out of how to combine unit of
| measurements with other numerical computations. Ultimately a
| lot of the features look great on paper, but once you start
| using them, I only ever was able to produce an ungodly mess
| instead of what I could accomplish in Python in roughly the
| same time. Everything that goes beyond what Matlab does,
| sometimes looks great on paper but is not very pleasant to use
| / sometimes badly broken unfortunately. That being said I work
| in an area of scientific research where Julia or more
| specifically DifferentialEquations.jl would _seem_ to walk away
| with the win, but I find myself searching for alternatives
| implemented in Jax.
|
| I would still think most of this is my failings, but it is also
| extraordinarily hard to figure out what is going wrong.
| patrickkidger wrote:
| You may already know of it, but if you want differential-
| equations-in-JAX then allow me to quickly advertise Diffrax:
| https://github.com/patrick-kidger/diffrax (of which I am the
| author, disclaimer).
| orbifold wrote:
| Yes I am aware :) it is missing a few things but I might
| end up contributing.
| patrickkidger wrote:
| Excellent! I'm very happy to take contributions
| generalising the tool.
| ChrisRackauckas wrote:
| Anything other than units? I'd be curious to know. Unitful.jl
| is something which I think is completely the wrong
| architecture (it violates many standard assumptions about
| arrays when used in arrays) so that's a somewhat special case
| (and I plan to create a new units library to completely
| remove uses of Unitful).
| karmakaze wrote:
| I was wondering if the 1-based arrays (and option to change
| index base) would factor into this.
|
| > OffsetArrays in particular proved to be a strong source of
| correctness bugs. The package provides an array type that
| leverages Julia's flexible custom indices feature to create
| arrays whose indices don't have to start at zero or one.
|
| Array indexing is such a core thing and I don't understand why
| anything mathematical or scientific would start with 1.
| hprotagonist wrote:
| > Array indexing is such a core thing and I don't understand
| why anything mathematical or scientific would start with 1.
|
| So, no FORTRAN, huh?
| coldtea wrote:
| > _Array indexing is such a core thing and I don 't
| understand why anything mathematical or scientific would
| start with 1._
|
| Because starting with 0 is neither math nor array indexing in
| general.
|
| It's just how the base addresss of an array pointer memory
| block was referenced in C (and it spread from there).
|
| Which is why all math focused languages use 1-based (fortran,
| apl, matlab, r, mathematica, etc.)
| xdavidliu wrote:
| > It's just how the base addresss of an array pointer
| memory block was referenced in C (and it spread from
| there).
|
| There was also the famous 1-pager by Dijkstra: "Why
| numbering should start at zero"
|
| https://news.ycombinator.com/item?id=777580
| coldtea wrote:
| Where the argument was "That is ugly"...
| SeanLuke wrote:
| > C (and it spread from there).
|
| BCPL.
| jacobolus wrote:
| Math (usually) uses 1-based indexes because those parts of
| math started before the concept of zero as a number, and
| then the convention persisted, even down to Matlab.
|
| There are many similar path-dependent conventions in human
| culture. E.g. percentages originated before the concept of
| decimal fractions, base-sixty time units come from ancient
| Mesopotamia, and conventions about multi-dimensional array
| memory layout are based on the convention for drawing
| matrices on paper.
|
| Most common mathematical sequences and series work better
| (more naturally/clearly) when zero indexing is used
| instead, and off-by-1 errors are a problem in mathematics
| just like computing (but less of a problem, because
| notation errors get silently corrected in readers' heads,
| and don't actually have to be interpreted strictly).
| capitalsigma wrote:
| I am pretty certain that matrix algebra does not predate
| the number 0
| docandrew wrote:
| It makes iteration less error-prone too when the index of
| the last element is equal to the length of the array. In C
| it's pretty easy to iterate past the end of an array if you
| use <= by mistake in a for loop, or forget a "length - 1"
| somewhere.
| karmakaze wrote:
| I don't read much about users modern languages with 0-based
| index requesting 1-based options/alternatives.
| lapinot wrote:
| Math traditionally has had some bad notation from a formal
| point of view, because humans are good at coping with bad
| notations (or going back and forth between variants),
| unlike machines and formal systems. Computer science being
| a (more) formal science (vs math which is overwhelmingly
| not done formally), it has criticized some traditional math
| notation which are ad-hoc and not nicely formalizable (and
| put forward variants that are actually better behaved in
| terms of mathematical structures).
|
| For indices: indices are about referencing elements of
| finite ordered sets, say of size N. Hence the 'abstract'
| indexing set for N elements is the ordinal N. The most
| canonical way to represent it is to take the length-N
| prefix of the natural numbers (eg 0-based indexing, von
| neumann ordinals), which happen to have all sorts of
| additional structure (eg mod-N arithmetic). This is also
| consistent with the offset view (the i-th element is at
| offset i). The fact that people tend to start ordinal
| numbers at 1 doesn't change anything that mathematicians
| working with ordinal numbers take them to start at 0, for
| the same reason we start naturals at 0.
|
| See also: notation for higher derivatives
| https://arxiv.org/abs/1801.09553; a bit further but in the
| same vein: notations for free variables in programs as de-
| bruijn indices (or some variant thereof) (it's further
| because it's practical for doing proofs, but not for
| writing concrete terms). There are probably other
| instances.
| Rayhem wrote:
| > Because starting with 0 is neither math nor array
| indexing in general.
|
| It very, very much is. Polynomials all start at a zero
| "index", as does just about every expansion I can think of
| (Fourier, Bessel, Legendre, Chebyshev, Spherical Harmonic,
| etc.) Combinatorics, too, make lots of use of zero indices
| and zero-sized sets. As for arrays, I'll leave it to
| Dijkstra[1] to explain why zero indexing is most natural.
| Zero indexing overwhelmingly makes the most sense in both
| math and computers because indexing is a different
| operation than counting.
|
| [1]: https://www.cs.utexas.edu/users/EWD/transcriptions/EWD
| 08xx/E...
| coldtea wrote:
| > _It very, very much is. Polynomials all start at a zero
| "index"_
|
| Notice how you had to put index in quotes.
|
| Because it's not an index, it's the degree of each
| polynomial term, which is a power.
| Rayhem wrote:
| Notice how you're ad homenim-ing the structure of the
| argument and not the argument itself? I do not at all see
| how putting quotes around that word invalidates the
| argument. I did so because mathematical literature
| doesn't refer to it as an index (rather as a degree as
| you mentioned), but it very much does index each
| monomial. There are an infinite number of index sets for
| each polynomial -- just as i can index the i'th monomial,
| so can (i - 7), or (i - 239842), or (i - pi) -- but one
| of them is obviously the most natural (pun intended).
| FabHK wrote:
| GP was addressing the argument. Using the word "you" in
| the reply does not make it ad hominem.
|
| (For more than you ever wanted to know about this, _The
| Ad Hominem Fallacy Fallacy_ covers it exhaustively and
| entertainingly:
|
| https://laurencetennant.com/bonds/adhominem.html )
| Rayhem wrote:
| This is an interesting point, though, respectfully, I
| _do_ still think it 's ad homenim. Internet arguments
| being what they are I don't much care, but I offer my
| reasoning here to better understand your point. OP did
| not engage with any of the points made, merely offering
| another term (without any sort of elaboration or
| definition), and said
|
| > Notice how you had to put index in quotes.
|
| as the thrust of the argument. In saying that, they imply
| that I, the arguer, 'A' in Bond's article, don't actually
| know what an index is (so how could I have a cogent
| argument about 'correct' indexing?). As this is the only
| argument of merit, it seems as though OP _is_ actually
| trying to counter the point by suggesting (attacking)
| something of the arguer (myself).
|
| Now, it may be that this ad homenim is justified -- if I
| truly _don 't_ know what an index is then yes, I probably
| should not be making claims about them -- but it's still
| an ad homenim (and, possibly, poor form).
|
| Of course, this is ascribing a lot to 25 words of text
| with little other context. I would be interested to
| understand if you see things differently/think I have
| grossly erred in my analysis.
| ModernMech wrote:
| > In saying that, they imply that I, the arguer, 'A' in
| Bond's article, don't actually know what an index is (so
| how could I have a cogent argument about 'correct'
| indexing?).
|
| That's not what they are saying. They are saying you know
| what an index is so well that you correctly put quotes
| around your usage of the term, because you understood
| it's not in fact a technically correct usage.
|
| Now calling an argument poor form... that's closer to ad
| hominem.
| coldtea wrote:
| > _I do not at all see how putting quotes around that
| word invalidates the argument_
|
| When the argument is: [0] is very, very
| much is [natural for indexes]
|
| and as an example for that points to something that's not
| an index -- and the person making the argument knows it
| is not an index, so they have to put index in quotes:
| Polynomials all start at a zero "index"
|
| ...then pointing this out, does invalidate the argument.
| It might not prove that the opposite is true, but it sure
| does invalidate the argument.
|
| Notice also how there's no ad-hominen in my response
| (this or the previous one) as you claim. I argue against
| the case and the choice of example, not against who wrote
| it.
| edflsafoiewq wrote:
| The coefficients are indexed. The n on a_n in S a_n X^n
| is an index.
| markkitti wrote:
| For some of these polynomials such as Fourier
| polynomials, it is natural to think about negative
| subscripts from a pure mathematical perspective. While
| these can mapped into non-negative integers, it is often
| intuitive to use the "negative subscripts" as indexes
| necessitating methods such as `fftshift`. For many of
| these polynomials the concept of where they "start" is
| arbitrary.
| lahvak wrote:
| As other posters noted, in mathematics both 0 based and 1
| based indexing is used.
|
| When dealing with matrices and vectors (including data tables
| and data columns), there is a strong preference for 1 based
| indexing: first row, first column, first entry, etc. Most
| matrix and vector based algorithms in literature use 1 based
| indexing. Programming these in a language with 0 based
| indexing is a mess, and a common source or errors.
|
| When dealing with sequences, especially recursively defined
| ones, there is usually an initial value (indexed with 0) and
| then the n-th value is obtained by n applications of the
| recursive step, so 0 based indexing makes more sense, but in
| literature there is no fixed convention, and you can find
| examples with 0 based and with 1-based indexing. Another
| example of 0 based indexing in math are polynomials (and in
| extension, power series) where the index is the degree of the
| term, or in general any functional series where the 0-th term
| is the constant term.
|
| There are also negative indices.
| CRConrad wrote:
| > Array indexing is such a core thing and I don't understand
| why anything mathematical or scientific would start with 1.
|
| Counting things is such a core thing _to humans_ that when we
| have a bunch of N things we think of them as thing #1 to
| thing #N. We start counting from 1, not 0.
|
| Indexing from 0 in computing is adapting the human mind to
| the computer, purely for performance reasons that may have
| been relevant in the 50s or 60s but were beginning to be
| obsolete by the 70s. It was done so you could access elements
| of an array by the simplest possible calculation of your
| offset into heap memory. When your first element is stored at
| Starting_address, you need i for that first element to be =
| 0, just so you don't need to have the compiler add another
| constant term for each element to "Element is at
| Starting_address + i * sizeof(element)".
|
| Would have been trivial, even then (as Wirth showed) to add
| that constant term calculation to compilers, but it was done
| without in C because that eliminated one whole integer
| operation from each (set of?) array access(es).
|
| In stead, we got the mental gymnastics of
| for(i=0, i++, i<=N-1) {...}
|
| and its many variations (in stead of just for i := 1 to
| N...), which surely have caused orders of magnitude more
| headaches in off-by-one bugs over the years than it saved on
| performance.
| DNF2 wrote:
| There are good arguments for using either 0- or 1-based
| indices. As you should be aware, there are many languages on
| each side.
|
| While preferring one over the other is perfectly fine, I
| question the intellectual honesty of anyone claiming
| incredulity about opposite choice.
| davisoneee wrote:
| I'll start by saying that I greatly prefer 0-based, and have
| used but 0- and 1-based indexing, but the choice is largely
| arbitrary.
|
| 0 makes sense as the '0-th offset' when thinking from a
| pointer perspective, but I often find when teaching, that
| 1-based comes more naturally for many students (the 'first'
| item).
|
| You mention mathematical or scientific work...but I
| often/mainly see enumerations (such as weights x_1, x_2, ...
| x_n or SUM 1 to N) start with 1, so for these 1-based can be
| a more natural/direct translation of mathematical notation to
| code.
| jjgreen wrote:
| Not for polynomial coefficient indices :-)
| xscott wrote:
| Or Fourier coefficients :-) Or pretty much anything where
| the index/subscript is related to the math itself.
|
| Math textbooks and papers tend to use 1-based subscripts
| when it _doesn 't_ matter. It's hard to come up with
| examples where starting at 1 facilitates the actual math.
| MzxgckZtNqX5i wrote:
| Just for consistency: a_n is the coefficient of x^n, so
| the constant term ends up being a_0. Based on my
| experience, numbering starts from one (like (x_1, x_2,
| x_3) as point of R^3) and off-sets from zero, e.g., when
| dealing with discrete time, t_0 is the first.
| rocqua wrote:
| My experience is that 0-based offsets (and use of < or even
| != for upper bounds) mean that I should almost never have
| to write something like idx - 1 or idx + 1.
|
| I came to 0-based offsets later in my career, having
| started with Matlab. So I have some real experience with
| 1-based offsets. Experience that was 'untainted' by being
| used to a different option. I much prefer 0-based.
|
| Especially because I now sort-off have a linter rule in my
| head 'if I am writing i - 1 then I am making a mistake or
| doing something the wrong way'. Which has been quite
| successful.
| sdfhdhjdw3 wrote:
| > Array indexing is such a core thing and I don't understand
| why anything mathematical or scientific would start with 1.
|
| Because that's how maths work? Literally everywhere in maths
| you count from 1, except in software engineering. That's why.
| I hope that clarified your confusion.
| planede wrote:
| My hot take is that 1-based indexing is often a mistake in
| math too. It's also not universal, even within math. And
| linear algebra doesn't need 1-based indexing either, and
| some operations are even more easily expressed with 0-based
| indexing.
| wnoise wrote:
| Starting with 0 is quite common in series, e.g. Taylor,
| Fourier, Chebyshev expansions, etc.
| sdfhdhjdw3 wrote:
| No... in those cases you're starting with 0 because
| that's the lowest exponent of a polynomial.
| wnoise wrote:
| You're saying no while explaining my reasoning.
|
| What index to start with only strongly matters when the
| indexes have semantics. Otherwise you should just treat
| it as an opaque index, i.e. eachindex(), keys(), etc. In
| math when there are semantics, the indices usually
| include 0. When not (vector components, matrix indices,
| etc), they usually (but not uniformly) don't.
|
| The one nice side-benefit of Julia's mistake in adopting
| 1-based indexing is that it provided an extra impetus to
| build machinery to handle arbitrary indexing, though too
| much code still doesn't work correctly, and code still
| gets written to only handle 1-based arrays.
| CRConrad wrote:
| > What index to start with only strongly matters when the
| indexes have semantics.
|
| Which in everyday computing (as opposed to mathematics)
| they often do, and those cases are (most?) often, in
| _human_ terms, much more natural to start from 1: "I
| have an array of N elements. The first of a bunch of
| things is thing number one, and the last of N things is
| thing number N." Hence: N_things:
| Array[1-N] of Thing; for i := 1 to N do begin
| Whatever := Whatever + Whateverize(N_things[i]);
| end; // for i := 1 to N...
|
| Yeah, that's how old I am: That's Pascal. (With some
| declarations skipped, and I may have misremembered some
| syntax.) The canonical example is of course the original
| Wirth-style max-255-ASCII-characters fixed-length[1]
| String type: In a string of length N, the nth character
| is at position n in the string. Character number N is the
| last one.
|
| > The one nice side-benefit of Julia's mistake in
| adopting 1-based indexing is that it provided an extra
| impetus to build machinery to handle arbitrary indexing
|
| 1) Arguably, as per the above, not a mistake.
|
| 2) Muahaha, "build machinery"? No need to build anything
| new; that's already existed since the early 70s. (Yeah,
| that's how old I am: Not adding 19 in front. There was
| only one "the seventies".) It's not like starting at 1
| was mandatory; you could well declare
| My_fifty_things: Array[19-68] of Thing;
|
| And then "for i := 19 to 68 do ..." whatever with it, if
| those specific numbers happened to be somehow essential
| to your code.
|
| (At least in Turbo, but AFAICR also in original Wirth
| Pascal. Though probably with the max-255-ASCII-elements
| limitation in Wirth, and possibly also in Turbo up to v.
| 2 or 3 or so.)
|
| __
|
| [1]: Though from at least Turbo Pascal 3 (probably
| earlier; also think I saw it on some minicomputer
| implementation) with the backdoor of changing the length
| by directly manipulating -- surprise, surprise, it
| exists! String was a built-in type with its own
| implementation -- the length bit at index [0]. Better
| start out with your string declared as length 255,
| though, so you don't accidentally try to grow it beyond
| what's allocated.
| wnoise wrote:
| > often, in human terms, much more natural to start from
| 1
|
| This meaning of natural is highly cultural dependent. It
| took the Greeks a startlingly long time to accept that
| one was a number (because it's a singleton), much less
| zero. I do not e.g. want arrays that can't have length
| one, because they have to be containing a number of
| things.
|
| > No need to build anything new;
|
| Well, no, not "new". Arrays with arbitrary bounds is a
| well-trod path. But they still had to make it work in
| Julia: CartesianIndices, LinearIndices, and overloading
| of "begin", and "end" keywords, etc. And the radical
| dependence on multimethod dispatch meant they couldn't
| quite just reuse existing work from other languages.
| CodeArtisan wrote:
| Algol had negative indexes. You could declare an array of
| nine elements going from -4 to 4, for example. I couldn't
| find why they wanted such a thing.
| temp8964 wrote:
| > Array indexing is such a core thing and I don't understand
| why anything mathematical or scientific would start with 1.
|
| From data analytic point of view, indexing should start with
| 1. When we analyze a data table, we always call the first row
| as the 1st row, or row #1, not row #0. It will be very
| strange to label rows as 0, 1, 2, 3, .... It may be fine for
| people with Computer Science background. But it would create
| so much confusion for almost everyone else...
| throwawaymaths wrote:
| It causes problems for people with a CS background too. I
| once numbered machines in racks with zero-indexing (so that
| they could match up with zero-indexed ip addresses). Even
| though literally everyone who touched those machines had CS
| background: DO NOT DO THIS.
| dekhn wrote:
| It just amuses me that one of the big differences between
| the US and EU is which floor is "first" and which one is
| "zero" or "minus one".
| FabHK wrote:
| Yes. A German friend of mine moved into her student
| dormitory in the US, and when she was told that her room
| was on the first floor, asked whether there was a lift,
| because she had a heavy suitcase...
|
| Having said that, given that there are basements (in
| Europe, at least), it makes sense to call the ground
| floor 0. We are dealing with integers here, not natural
| numbers.
| CRConrad wrote:
| But floors of multi-storey buildings are a pretty unique
| exception in the real world in having a characteristic
| where zero -- the number of stairs you need to climb from
| the ground floor -- has an actual tangible meaning _(on_
| the ground floor).
|
| How many other such examples can you (editorial you;
| anyone) come up with? Not many, I'd bet.
| FabHK wrote:
| I thought you'd be wrong, and immediately came up with:
|
| - Hours after midnight. The (non-anglosaxon) watch goes
| from 0:00 to 24:00 (the latter is useful for deadlines:
| The proposal must be submitted by Friday, 24:00 (which
| coincides with Saturday, 0:00)).
|
| Then I cheated and looked up Wikipedia
| (https://en.wikipedia.org/wiki/Zero-
| based_numbering#Other_fie... ) - and whoops, that's
| basically it!
| CRConrad wrote:
| Yeah, duh, totally forgot that one.
|
| And I should know; the alarm clock on my windowsill says
| 0:54 right now.
| [deleted]
| dklend122 wrote:
| If packages use generic indexing functions like eachindex,
| there would be no correctness issue with that specific example
| forgotpwd16 wrote:
| >If it's a source of bugs, that ... greatly reduces my future
| interest in adopting the language.
|
| It can be a source of bugs because some/many packages
| incorrectly assume that what you pass is 1-based indexed.
| sdfhdhjdw3 wrote:
| The problem isn't that 1-base indexing can be "fixed" in Julia.
| The problem is that you see 1-based indexing as a flaw.
| IshKebab wrote:
| It is a flaw. Computers don't work that way fundamentally,
| and it introduces lots of awkward translation.
| dash2 wrote:
| But humans don't work 0-based. Try explaining to a bunch of
| scientists why for rows 2-5 of the DataFrame they have to
| write df[1:5].
| IshKebab wrote:
| Yeah because humans got it wrong. Really the word for
| "first" should correspond to the number 0.
|
| Try doing a block iteration over an array, or any kind of
| interval algorithms in 0-based and 1-based. 0-based with
| right-open intervals just results in way way more
| elegant, easier to understand and (very) slightly more
| efficient code.
| ModernMech wrote:
| The reason why this 0 vs 1 based indexing debate is never
| resolved is because all of the arguments are subjective.
| You've claimed definitively that "humans got it wrong",
| but to back up this argument you've pointed to vague
| notions of "elegance" and "understandability". Even
| Dijkstra in his argument relies on a notion of
| "ugliness". All such arguments fall squarely in the realm
| of "preferences".
|
| Just think about what you're saying: that natural
| language usage of the word "first" is wrong, and doing
| things the opposite way that everyone expects in
| programming languages is somehow more understandable?
| Really? Maybe that makes perfect sense to you, but not to
| people using the word as it's commonly used.
| IshKebab wrote:
| It's not subjective. The fundamental meaning of an index
| is "how far are we from the start of an array". The first
| element is 0 from the start of the array.
|
| If humans had got this right then we wouldn't be even
| having this discussion. Same for similar mistakes like Pi
| vs Tau, negative electrons. But the mistake is
| understandable given that we didn't even think of 0 for a
| long time.
| Rayhem wrote:
| Yes, precisely. Indexing and counting are different
| operations.
| ModernMech wrote:
| It's subjective because you haven't defined or quantified
| elegance or understandability. I could say that zero
| based indexing is not in fact understandable based on the
| amount of confusion I encounter explaining the concept to
| new users. I could say it's inelegant based on the fact
| it makes algorithms I use harder to implement. Others
| argue it's more elegant because it makes algorithms
| _they_ use easier to implement. Same rationale,
| subjective conclusions.
|
| Dijkstra's argument as well hinges on an undefined notion
| of "ugliness", which can mean anything to anyone. That's
| why these conversations never end, because most people
| are talking past one another based on their own
| definitions of "elegance" or "ugliness".
| CRConrad wrote:
| > ...humans got it wrong. [...] 0-based with right-open
| intervals just results in way way more elegant, easier to
| understand...
|
| You must have warped your brain into thinking like a
| computer (well, a C-family-language compiler) for so long
| and/or so thoroughly that you no longer think quite like
| an ordinary human.
|
| Having to sprinkle semi-arbitrary "-1"s all over your
| code is in no reasonable sense of the words "way way more
| elegant, easier to understand" than not having to do so.
| dekhn wrote:
| I didn't say I see 1-based indexing as a flaw. I said I
| complained about it, and then learned they supported multiple
| types of offsets (which ostensibly resolved the issue for
| me), only to learn that the stats library was "written before
| offsetindex" and still has bugs related to it.
| ModernMech wrote:
| > OffsetArrays in particular proved to be a strong source of
| correctness bugs. The package provides an array type that
| leverages Julia's flexible custom indices feature to create
| arrays whose indices don't have to start at zero or one.
|
| I always thought this sounded like a bad idea. I remember one
| time I was working with a C++ guy on a Matlab project, and he
| handed me some Matlab code with 0 based indexing assumed. I said
| "Did you even run this code?", and he assured me he had. But of
| course he had not, because if he did it would have complained
| about the 0-based indices. But the point is that it _did_
| complain when I ran it, and I was able to match it to my code. I
| imagine in Julia he would have used 0-based indices, and I would
| have used 1-based, and our programs would have silently failed.
| cbkeller wrote:
| For it to silently fail of course though, he would have had to
| explicitly used the OffsetArrays package and explicitly
| switched all `Array`s to `OffsetArray`s (which hopefully you
| would notice) -- and then you would have to go ahead and use
| those OffsetArrays in a package which doesn't support them; if
| you just go ahead use 0 as an index in plain Julia code it will
| error as you would expect.
| jahewson wrote:
| When Julia was first released, I tried it out and decided I'd
| write a syntax highlighter for it, so I asked for a grammar.
| There wasn't one. I was told to refer to the parser source code,
| which was written in a custom dialect of LISP. That was a red
| flag for me and I never returned.
| jrochkind1 wrote:
| > Given Julia's extreme generality it is not obvious to me that
| the correctness problems can be solved. Julia has no formal
| notion of interfaces, generic functions tend to leave their
| semantics unspecified in edge cases, and the nature of many
| common implicit interfaces has not been made precise (for
| example, there is no agreement in the Julia community on what a
| number is).
|
| Does all that apply to Python? I think so? Yet apparently similar
| problems don't exist in python, and even one of the examples in
| OP had the reporter moving to python to have no problems getting
| the same thing to work that was problematic in Julia.
|
| In a language intended for math, I do understand the desire to
| have something with more formal properties suited for guarantees
| and such. But Python seems to be doing just fine in that domain
| without those features, so, I'm not sure what we should conclude
| here.
| adgjlsfhk1 wrote:
| The main difference between Julia and python is that most of
| the "core" python ecosystem has had a lot more dev time put
| into it. Google, Facebook, and Microsoft all have hundreds of
| full time developers on major python packages.
| jrochkind1 wrote:
| Makes sense. I guess the author's contention is that if Julia
| had those formal features the author wants, it would need
| very significantly less dev time to reach python's levels of
| reliability?
|
| It's of course plausible, that's what those sorts of features
| are intended for, but I'm not certain I'm absolutely
| confident. At any rate, python demonstrates it is not the
| only path, as the author seems to be suggesting ("it is not
| obvious to me the problem can be solved" without these
| features, says the author. But it's not obvious to _me_ that
| those features are necessary to solve the problem, or
| sufficient to solve the problem...)
| chubot wrote:
| Oof, accessing out of bounds memory is pretty surprising to me
| for a dynamic language ... But I guess it's not surprising if
| your goal is to compile to fast native code (e.g. omit bounds
| checks).
|
| I don't know that much about how Julia works, but I feel like
| once you go there, you need to have very high test coverage, and
| also run your tests in a mode that catches all bound errors at
| runtime. (they don't have this?)
|
| Basically it's negligent not to use ASAN/Valgrind with C/C++
| these days. You can shake dozens or hundreds of bugs out of any
| real codebase that doesn't use them, guaranteed.
|
| Similarly if people are just writing "fast" Julia code without
| good tests (which I'm not sure about but this article seems to
| imply), then I'd say that's similarly negligent.
|
| -----
|
| I've also learned the hard way that composability and correctness
| are very difficult aspects of language design. There is an
| interesting tradeoff here between code reuse with multiple
| dispatch / implicit interfaces and correctness. I would say they
| are solving O(M x N) problems, but that is very difficult,
| similar how the design of the C++ STL is very difficult and
| doesn't compose in certain ways.
|
| (copy of lobste.rs comment)
| mbauman wrote:
| You can also use `julia --check-bounds=yes` -- and our testing
| frameworks automatically do so.
| teddyh wrote:
| Actual title: "Why I no longer recommend Julia".
| mbauman wrote:
| @Dang could we get the title corrected?
| [deleted]
| cbkeller wrote:
| This seems hard to evaluate without a quantitative comparison to
| the abundance of bugs in the package ecosystems of other
| languages at the same age. So, for instance, how many correctness
| bugs existed (or, alternatively, had been found and fixed) in the
| Python ecosystem when Python was ten years old? The author makes
| a subjective claim, but from the few other languages they mention
| it seems they are comparing primarily to older and more stable
| ecosystems.
| snicker7 wrote:
| A lot of these issues can be fixed. Adding robust type
| constraints (e.g. traits) and accompanying "static analysis"
| tooling would help a lot. Julia can learn a lot from ML-family
| languages (e.g. OCaml, Haskell) in that regard. And there are
| efforts in the Julia community to add these features via third-
| party libraries. However, I don't see things improving unless
| such features are baked into the language and used more
| ubiquitously in open source modules.
| Sukera wrote:
| Most of these seem to be about packages in the ecosystem (which,
| after clicking through all links, actually almost all got fixed
| in a very timely manner, sometimes already in a newer version of
| the packages than the author was using), not about the language
| itself. Other than that, the message of this seems to be "newer
| software has bugs", which yes is a thing..?
|
| For example, the majority of issues referenced are specific to a
| single package, StatsBase.jl - which apparently was written
| before OffsetArrays.jl was a thing and thus is known to be
| incompatible:
|
| > Yes, lots of JuliaStats packages have been written before
| offset axes existed. Feel free to make a PR adding checks.
|
| https://github.com/JuliaStats/StatsBase.jl/issues/646#issuec...
|
| EDIT: Since this comment seems to gain some traction - title is
| editorialized, original is "Why I no longer recommend Julia".
| snicker7 wrote:
| "known to be incompatible"
|
| Known to whom? People who regularly participate in the Julia
| forum/chat? Julia's composability relies on people agreeing on
| unwritten rules and standards.
|
| In other languages, such incompatibilities are caught by the
| compiler. Even in other dynamic languages like Python or
| Javascript, it is now considered best practice by many to
| annotate types whenever you can. Like Julia, Haskell is also
| composable. Unlike Julia, it does not need to sacrifice
| correctness.
| DNF2 wrote:
| Agreed, one cannot just expect this to be known.
|
| Does type annotations in Python actually catch type errors? I
| thought they were mainly for documentation.
| snicker7 wrote:
| Yes, if you use tooling (mypy). It definitely helped me a
| few times.
| cwp wrote:
| I wonder how much of this is just that Julia is more composable
| than most people are used to, and the community hasn't yet
| developed the patterns and culture that are needed to avoid these
| kinds of problems.
|
| I'm thinking, for example, of the way that Smalltalkers often
| create parameters with type-evocative names, such as "aString".
| Or Objective-C with two-letter prefixes to work around lack of
| namespaces. Or even the Java "EntityAdaptorFactoryFactory" design
| aesthetic. (Some of you will shudder, and I'm with you, but it
| did solve real problems that the Java world was facing.)
|
| Julia is still a pretty young language, and it's probably only
| recently that the ecosystem has gotten big enough to hit these
| problems.
|
| Edit: come to think of it, one of the issues that the Java folks
| were dealing with was lack of composability. :-/
| asdfman123 wrote:
| > In my experience, Julia and its packages have the highest rate
| of serious correctness bugs of any programming system I've used,
| and I started programming with Visual Basic 6 in the mid-2000s.
|
| Oh God, is this what qualifies you as "old" now
| CRConrad wrote:
| Kids these days, eh? Lawn, etc.
| ur-whale wrote:
| The examples provided feel more like bugs in various libraries
| than an actual problem intrinsic to Julia the language.
| Q6T46nT668w6i3m wrote:
| @inbounds is a Base feature.
| markkitti wrote:
| Yes, and it is a perfectly fine feature when applied
| correctly. It would be incorrect to assume that an
| `AbstractArray` starts at `1` or `0` which is why the updated
| example now correctly uses `eachindex`: https://docs.julialan
| g.org/en/v1/devdocs/boundscheck/#Elidin...
|
| If you want to assume that an array starts at `1` one needs
| to require an `Array` rather than an `AbstractArray`.
| arksingrad wrote:
| @inbounds isn't the problem, it's incorrect usage of it. The
| poor docstring is absolutely a problem though, you should be
| iterating over eachindex(A), not 1:length(A).
| wodenokoto wrote:
| According to the article the problem is in the ecosystem, and
| partly the standard lib.
|
| Basically it doesn't matter if Julia the language is fine, if
| all the stats packages make wrong calculations. Then what is
| the point of Julia, if you have to rewrite all things? might as
| well use another language where you trust the result of the
| ecosystem, since it is the ecosystem you need in order to
| produce results.
| trenchgun wrote:
| All bugs mentioned had been quickly fixed:
| https://news.ycombinator.com/item?id=31397425
| wnoise wrote:
| That comment doesn't say all bugs have been fixed, or even
| quickly fixed. When I check on the posted links, many are
| in fact still open, e.g.
|
| https://github.com/JuliaStats/Distributions.jl/issues/1253
|
| https://github.com/JuliaStats/StatsBase.jl/issues/642
|
| https://github.com/JuliaStats/StatsBase.jl/issues/616
|
| https://github.com/JuliaLang/julia/issues/39385
| pdeffebach wrote:
| I don't get these complaints about `sum!(a, a)`. Sure
| it's a bit of a footgun that you can overwrite the array
| you are working with. This doesn't rise to a "major
| problem" of composability.
|
| The histogram errors seem annoying though. Hopefully they
| can get fixed.
| wnoise wrote:
| Sure, it's unsurprising that it produces unexpected
| results, but there are actually semantics that should be
| expected. The problem is that implementing those
| semantics correctly for all cases is hard, because
| aliasing. Same issue that e.g. memcpy() vs memmove()
| have.
| pdeffebach wrote:
| What semantics are expected in other languages? This
| seems solidly in the realm of undefined behavior as far
| as I can tell.
| wnoise wrote:
| The obvious semantics for these functions is that f!(a,
| args...) should do the same thing as a .= f(args...).
|
| It's only undefined behavior because the simple
| implementations don't do that in the presence of
| aliasing.
|
| I brought up memcpy() and memmove() (which in C are
| copying identity functions on bytes) exactly for this
| point. memcpy() has undefined behavior when the source
| and destination ranges overlap (implementable as a simple
| loop), while memmove() does the right thing if they do
| overlap, at the cost of having to check what direction
| they overlap when they do. And in C you can actually
| easily check if they overlap and in what direction,
| because the only interface there is the pointer. Aliasing
| with objects with internal details that are more
| complicated than that to check is difficult, perhaps too
| difficult to expect. But it is possible if your only
| handling your own objects: witness analogous behavior
| getting specified in numpy: https://docs.scipy.org/doc/nu
| mpy-1.13.0/release.html#ufunc-b... . They do note that
| this can require allocation, even in some cases where it
| shouldn't. But not allocating is of course most of the
| point of the in-place versions.
| exyi wrote:
| Yea, all are just bugs, not some intrinsic flaws in the
| language.
|
| Given Julia's goals (performance, abstractions, accessible to
| science people), it's understandable if they had slightly
| higher bug concentration than other (similarly sized)
| ecosystems.
| kllrnohj wrote:
| The author's argument is that the bugs all share a pattern,
| and thus there is an intrinsic flaw. That doesn't necessarily
| mean the community wants to fix the intrinsic flaw, just like
| nobody is really interested in fixing the intrinsic memory
| safety flaws of C. But they shouldn't be denied as real
| risks, either, or a tradeoff of some kind.
| fgh wrote:
| It would be interesting to know which language the author
| currently uses.
| ninjin wrote:
| Pretty sure it was Go last time I talked to Yuri, he is very
| much a stand-up guy.
| rendall wrote:
| The author mentions that he was stuck on a problem for weeks
| using Julia, but solved it with Python within hours
| tgv wrote:
| That was someone else: Patrick Kidger is mentioned in the
| article. If I look at the author's github, it's go and
| javascript.
| rendall wrote:
| You're right. I misread.
| [deleted]
| [deleted]
| xt00 wrote:
| If you look at the history of lots of packages in matlab they
| fixed tons of bugs that sound similar to this stuff over the
| years. It requires consistent hard work by a core group of people
| who understand the issues to get everything right. I have no idea
| who maintains Julia and these packages but the author of the
| article mentions this as language problems -- aren't these just
| bugs? Like if gcc was incorrectly multiplying some constant by
| the wrong value, that doesn't sound like a bug with C but a bug
| with gcc right?
| dandanua wrote:
| Julia has more than 18k closed issues on its github. No wonder
| such an active user encountered a lot of it. It's not a problem
| with the language, though. Yes, it allows to use offsetarrays
| and @inbounds together, but C can read out-of-memory locations
| too, so what?
|
| Edit: Julia is better than C in this regard, since the usage of
| @inbounds is explicit, i.e. everyone can see that the code is
| potentially unsafe.
| masklinn wrote:
| > but C can read out-of-memory locations too, so what?
|
| So it's widely considered a plague upon the field, suffered
| because of the lack of alternative?
| rob74 wrote:
| I think the point he was trying to make was that the example
| for @inbounds from the official documentation could cause
| out-of-bounds accesses, while it was clearly stated that you
| should only use @inbounds if you are sure that no out-of-
| bounds accesses are possible.
| jakobnissen wrote:
| The issue is that there is no way to verify if OOB access
| is possible given an abstract type, unless you know how
| that type behaves, i.e. how it's indexed.
|
| And Julia provides no way of specifying the behaviour of
| abstract types.
| markkitti wrote:
| > And Julia provides no way of specifying the behaviour
| of abstract types.
|
| I'm not sure if "no way" is accurate. There are
| interfaces and one could use traits. As cited above, we
| could use `eachindex` or `CartesianIndices` to get a list
| of the valid indices. The problem is enforcing and
| testing these interfaces.
| jakobnissen wrote:
| There are no interfaces in Julia. There is simply
| documentation, and the hope that people will read,
| understand, and follow it.
| Gwypaas wrote:
| > but C can read out-of-memory locations too, so what?
|
| Simply decades of exploitable security issues.
| krastanov wrote:
| The Julia example is closer to Rust's `unsafe`. Pretty much
| every language let's you skip bound checks, in Julia (like
| other modern languages) it is elective. The author was
| complaining about a library that decided to skip the bound
| check in a clumsy way (there happens to be a "correct" way
| to skip the bound check). It is not really about the
| language.
| Gwypaas wrote:
| Then it makes sense. Thanks for the clarification. Was
| worried that skipped bounds checks was something more
| intricate than simply explicitly annotating a statement
| to say "trust me, I know what I'm doing!".
| kazinator wrote:
| The author's point seems to be something like: not only are
| there these bugs, but there is a lot of them that people are
| running into regularly, and the project isn't headed in a
| direction where the situation as such will improve (as in even
| if these are fixed, by the time that happens, there will be
| even more).
|
| Hard to prove or disprove.
| cancandan wrote:
| I wonder why not much is done to bring high performance
| scientific computing to common lisp. There are some interesting
| projects I was able to find like https://github.com/clasp-
| developers/clasp and https://github.com/marcoheisig/Petalisp and
| https://github.com/takagi/avm. But I guess it would be good to
| have a coordinated effort in this area.
| sharikous wrote:
| My opinion is that Julia was too ambitious from day one.
| Reimplementing the whole scientific computing stack AND a new
| modern language with an innovative type system and introspection
| AND perfecting tooling is just too big an effort.
|
| The priority for correctness has been drowned out by too much
| other issues and we are here with a 10 years old language with a
| very perfectionist and ambitious mindset that is still a raw
| fruit in basically everything. It's not some rough edges it's
| just too many edges, most of them rough.
|
| I cannot help thinking that if the same amount of people focused
| on a much smaller goal we could have something much more usable
| today. As it is now I know Julia won't be production ready for at
| least 10 years. And that's in the lucky case that it doesn't
| become irrelevant in the meantime.
| DNF2 wrote:
| If Julia followed your recommendation, it would be irrelevant
| before it ever started.
|
| There is no way for a new language to be useful or relevant
| unless it brings significant improvements.
| jbezanson wrote:
| I'm not sure what to make of this. Yuri is great and I'll
| certainly miss having him in the Julia community. Yes, of course
| there are bugs. We work on fixing them all the time. If there are
| just too many for you, or we are too slow at fixing them for you,
| then OK I understand you might walk away.
|
| With these kinds of posts (and the reactions to them) lots of
| issues tend to get conflated. For example there are issues with
| OffsetArrays because some people write code assuming indexes
| start at 1. Starting at 0 wouldn't fix that. A static type system
| wouldn't fix that; most static type systems don't check array
| bounds. Are we supposed to un-register the OffsetArrays package?
| Should we disallow overloading indexing? Personally I have told
| people not to use `@inbounds` many times. We could remove it, but
| those who want the last drop of performance would not be too
| happy. The only path I see is to fix the bugs.
|
| > They accept the existence of individual isolated issues, but
| not the pattern that those issues imply.
|
| I admit, I do not see the pattern allegedly formed by these
| issues. Of course, static types do remove a whole category of
| issues, but "switch to static types" is not really a practical
| request. There are other things you can do, like testing, but we
| do a LOT of testing. I really do not mean to downplay Yuri's
| experience here, I am just not sure what to take away other than
| that we should work even harder on bugs and quality.
| ThenAsNow wrote:
| I've worked on large engineering projects in physical
| disciplines. When I am the customer, I often bring in a group
| of independent experts to review the design products. Often
| these experts provide inputs that are not 100% usable in the
| form they're provided. One may have to disentangle their
| conflation of related-but-not-the-same issues, or ignore the
| specific solutions they propose, etc.
|
| That being said, I have learned the hard way not to ignore or
| trivialize these review inputs, even if they are not
| immediately actionable as-provided. Users and reviewers are
| really good at figuring out weak areas or flaws even if they
| can't articulate the solutions, fully unentangle related
| issues, or do all the generalization or abstraction that would
| make those issues easier to address. There is usually some
| truth underlying the negative feedback.
|
| The article looks to potentially be an example of an expert
| review in the above vein. If you are able to take a step back,
| you might find the HN discussion on this submission to provide
| further inputs to help figure out how any of this should be
| channeled into language, practice, and ecosystem improvements.
| Certainly there is more to work with here than little "to take
| away other than that we should work even harder on bugs and
| quality."
| catchclose8919 wrote:
| Only thing "interesting" to me there would be the _automatic
| differentiation bugs_ ...but is there any argument as to them
| being the fault of the language, instead of just poor engineering
| from the library developers ' part?
|
| I mean, one can't expect all algorithms to work correctly with
| all datatypes just because the compiler allows that code to run
| ... _you write tests and guarantee numerical stability for a
| small subset of types you can actually do it for, and then it 's
| the code's consumers' job to ensure it work with types it's not
| documented to work and such, no?_ ...Julia is quite a dynamic
| language, JITed or what not, its semantics are closer to Python
| and Lisp than to Rust or Haskell ...maybe don't expect guarantees
| that aren't there and just code more defensively when making
| libraries others depends on?
|
| Probably the Python + C(++) ecosystems works better bc their devs
| know they are working in loose, dynamic and weekly typed shoot-
| your-foot-off type languages and just take action and code
| defensively and test things properly, whereas Julia devs expect
| the language to give them guarantees that aren't there.
| Q6T46nT668w6i3m wrote:
| I think the author addresses this. It's a Catch-22. If you
| restrict use to a small subset of types you're undermining one
| of Julia's best features.
|
| As someone who has been writing a lot of numerical analysis
| code recently, I would absolutely love a type system that could
| describe and enforce numerical stability traits.
| catchclose8919 wrote:
| > a type system that could describe and enforce numerical
| stability traits
|
| Wow, that sounds cool! have your reasearched if anyone has
| done anything in this are? how would you even start to
| approach the problem?
|
| Do you think it has any change of being done without massive
| sacrifices to performance?
| jstrong wrote:
| in rust code, I like using `debug_assert!` to represent
| numerical expectations/assumptions of the implementation.
| later if I have a problem, I can turn on debug assertions and
| I will get a bunch of additional checks. but I can also turn
| them off and not pay for them all the time.
| one-more-minute wrote:
| Right. It's important to remember that tools like JAX and
| PyTorch have _total_ control over the numerical libraries
| they are differentiating, and have freedom to impose whatever
| semantics, rules and restrictions are convenient
| (immutability and referential transparency in JAX, for
| example). Seemingly small decisions in an existing language
| and library can have a big impact on the feasibility and
| practicality of AD.
| dklend122 wrote:
| That's exactly where Dex might improve over Julia, with
| language level control over mutability and effect handlers
| and array access safety ... time will tell.
|
| So packages just use those features
|
| Maybe it will hit the right trade off, or maybe Julia will
| adopt similar language level tools, but adjusted for
| dynamic semantics. Is that even possible?
| nohat wrote:
| The power of allowing everyone to make foundational types and
| functions that work together is indeed dangerous. I'm not sure
| you are better off in the even more dangerous waters of
| c/c++/fortran, except that they are older and more established
| with many times the man-hours sunk into them. Is there a good way
| to control the interaction of these many different libraries with
| losing the generality and composability of Julia?
|
| I will say that as a matter of language design 1 based indexing
| is perfectly fine, 0 based indexing is perfectly fine. Choose
| your own indexing is a hilarious foot gun, so no surprise it went
| off sometimes. Fortunately using it seems to be quite rare.
| DNF2 wrote:
| But it's not a matter of language design. The 'choose your own
| indexing' is something you do entirely in libraries.
|
| You can create your own indexing in python too, it will just be
| slow. The 'sin' of Julia is that it will be fast...
| dmos62 wrote:
| In extreme composability, it might be hard to determine where the
| origin of a bug is. Worse yet, when libraries start adhering and
| relying on the brokenness of other libraries, fixing the once
| minor bug isn't enough anymore. How do you address technical debt
| in such situations?
|
| In my mind Julia broke new ground in terms of what happens when
| you create an environment where such compasibility is possible.
| Author's finishing thought is apt:
|
| > Ten years ago, Julia was introduced to the world with inspiring
| and ambitious set of goals. I still believe that they can, one
| day, be achieved--but not without revisiting and revising the
| patterns that brought the project to the state it is in today.
| ChrisRackauckas wrote:
| Everything has correctness issues somewhere. Julia ships an
| entire patched version of LLVM to fix correctness bugs in
| numerical methods. It has its own implementations of things like
| software-side FMA because the FMA implementation of Windows is
| incorrect: https://github.com/JuliaLang/julia/pull/43530 . Core
| Julia devs are now the maintainers of things like libuv because
| of how much had to be fixed there. So from those three points,
| that clearly points out tons of cases where Python, R, etc. code
| is all incorrect where Julia isn't.
|
| I think what's interesting about Julia is that because the code
| is all Julia, it's really easy to dig in there and find potential
| bugs. The standard library functions can be accessed with @edit
| sum(1:5) and there you go, hack away. The easier it is to look at
| the code, the easier it is to find issues with it. This is why
| Julia has such a higher developer to user ratio. That has its
| pros and cons of course. It democratizes the development process,
| but it means that people who don't have a ton of development
| experience (plus Fortran or C knowledge) are not excluded from
| contributing. Is that good or bad? Personally I believe it's good
| in the long run, but can have its bumps.
|
| As an aside, the author highlights "for i in 1:length(A)". I
| agree, code should never do that. It should be `eachindex(A)`. In
| general things should use iterators which are designed for
| arbitrary indexing based on iterators. This is true in any
| language, though you'll always have some newcomers write code
| (and documentation) with this. Even experienced people who don't
| tend to use arrays beyond Array tend to do this. It's an
| interesting issue because coding style issues perpetuate
| themselves: explicitly using 1-base wasn't an issue before GPUs
| and OffsetArrays, but then loop code like that trains the next
| generation, and so more people use it. In the end the people who
| really know to handle these cases are the people who tend to use
| these cases, just like how people who write in styles that are
| ARM-safe tend to be people who use ARM. Someone should just run a
| bot that opens a PR for every occurrence of this (especially in
| Base), as that would then change the source that everyone learns
| from and completely flip the style.
| kazinator wrote:
| > _I agree, code should never do that. It should be
| `eachindex(A)`_
|
| Will that generate the same code as "i in 1:length(A)"?
|
| Maybe whoever wrote that didn't _believe_ so at least, or
| perhaps didn 't find it so at the time.
|
| The reason @inbounds would have been used is performance, so
| that's likely why the for loop header was written that way?
| cbkeller wrote:
| I think it should be fine for performance AFAIU to use
| `eachindex` instead; at least I know `eachindex` plays nicely
| with LoopVectorization.jl with no performance costs there.
|
| That said, I think you're exactly right that people may
| wonder just this and use the seemingly "lower-level" form out
| of concern with or without testing it.
| celrod wrote:
| One of my intentions with the rewrite is to let `@turbo` to
| change the semantics of "unreachable"s, allowing it to
| hoist them out of loops. This changes the observed behavior
| of code like for i =
| firstindex(x):lastindex(x)+1 x[i] += 2 end
|
| where now, all the iterations that actually would have
| taken place before the error will not have happened. But,
| hoisting the error check out when valid will encourage
| people to write safer code, while still retaining almost
| all of the performance. There is also a class of examples
| where the bounds checks will provide the compiler with
| information enabling optimizations that would've otherwise
| been impossible -- so there may be cases with the rewrite
| where `@inbounds` results in slower code than leaving
| bounds checking enabled.
| cbkeller wrote:
| Oh, nice!
| mbauman wrote:
| `eachindex` is -- in quite a few situations -- _faster_ than
| `1:n`.
|
| We've also been trying to promote a culture of not blindly
| putting `@inbounds` notations on things as the compiler gets
| smarter. `@inbounds` is a hack around a dumb compiler,
| _especially_ when the loop is as simple as many of these
| examples. It 's not needed there anymore (but was 5 years
| ago).
| kazinator wrote:
| The question is: is it _at least_ as fast in _all_
| situations? Was it always that way?
|
| The 1 to length loop just has to initialize a local
| variable and step it; it cannot do anything else. It
| doesn't worry about the kinds of array that A may be, with
| its particular configuration of indexing, right?
|
| You may promote a culture of not doing certain things, but
| that by itself won't make those things disappear from
| existing code.
|
| Say you're trying to ship some product and you receive a
| bulletin from the language mailing list encouraging you,
| "try not to use @inbounds, it's a hack around a dumb
| compiler". You know you have that in numerous places; but
| you're not going to stop what you're doing and start
| removing @inbounds from the code base. If you're remarkably
| conscientious, you might open a ticket for that, which
| someone will look into in another season.
| mbauman wrote:
| > The question is: is it at least as fast in all
| situations? Was it always that way?
|
| Yes and yes.
| bjourne wrote:
| Perhaps that is part of the point of the article? If you
| accept things like @inbounds, which is a horrible hack and
| was a horrible hack five years ago, then perhaps the
| culture is a little too tolerant towards horrible hacks.
| Because many of the bugs the author enumerates are of the
| "fixes the problem for now, let's deal with the
| consequences later" type.
| grayclhn wrote:
| Yes and no... Julia's been focused on high performance
| numerical computing from the beginning (and other related
| scientific applications). Using macros to get good
| performance from relatively generic code was (from my
| outside perspective) a really effective way to support
| real applications early on and also give time for the
| compiler to get "sufficiently smart" to make the macros
| less necessary.
| nightpool wrote:
| I'll be honest, based on my experience with Julia, this makes
| me _more_ worried about using e.g. libuv in production systems
| now, not less. I understand your opinion that "The easier it
| is to look at the code, the easier it is to find issues with
| it", but I don't think that has anything to do with the fact
| that `prod((Int8(100), Int8(100)))` and `prod([Int8(100),
| Int8(100)])` disagree, because someone decided to special-case
| tuple multiplication. And to make it even worse, this bug was
| even documented(!) in the comments by whoever committed the
| original code: # TODO: this is inconsistent
| with the regular prod in cases where the arguments #
| require size promotion to system size.
|
| How did this pass code review? Why would it be okay for a
| standard library function to be "inconsistent" in this way?
|
| (EDIT: Since writing this comment, I've realized that (100 *
| 100) % 256 is in fact 16, so the results are a little less
| inexplicable to me. I think having the types annotated in the
| REPL would have made it clearer what was going on, and it's
| still a _very_ difficult inconsistency to debug, especially as
| an end user)
|
| I also think your argument that "[...] you'll always have some
| newcomers write code (and documentation)" that is broken is
| completely incorrect, and it shifts the blame from providing a
| safe and easy-to-use system from the language authors onto the
| users. The OP goes to pains to point out that this was not just
| an issue of "some newcomers"--it was a fundamental issue across
| the entire community, including what seem to be some of the
| most heavily-used packages in Julia's ecosystem, including
| Distributions.jl and StatsBase.jl. It's deeply misleading to
| blame issues like that simply on "people who don't have a ton
| of development experience" and "newcomers writing
| documentation", and it indicates a lack of responsibility and
| humility from Julia's proponents.
|
| P.S: You're correct that the documentation about @inbounds was
| written by someone who was new to the language
| (https://github.com/JuliaLang/julia/pull/19726). But in fact
| the example itself was copied over entirely as-is from devdocs,
| where it was written by the author of the boundschecking
| feature(!) https://github.com/JuliaLang/julia/pull/14474. And
| it was only fixed last year. And the entire docs PR was
| reviewed thoroughly by two core team members, with lots of
| changes and suggestions--but nobody noticed the index issue. So
| I don't think you can blame this one on newcomers.
| nalimilan wrote:
| The problem in this case (as with most issues regarding
| `@inbounds`) is that this text was written before arrays with
| non-standard indices existed in Julia. So the example was
| correct at the time it was written, just like the StatsBase
| code was correct. Old code needs careful checking to fix all
| these occurrences.
| nightpool wrote:
| Discussed in a sibling thread:
| https://news.ycombinator.com/item?id=31401155.
| cmcaine wrote:
| Julia released experimental support for arrays whose indexes
| don't start at 1 in Julia 0.5, October 2016.
|
| The boundschecking feature was added in 2015, so at the time
| they wrote their code and examples, they were correct.
|
| The documentation and review happened in December and January
| 2016/2017 when the non 1-based indexing was still
| experimental and very new, so I don't think this is as big a
| fail as you've made out either.
|
| Yes, the documentation should have been updated when non-
| standard indexing was made non-experimental, and the
| reviewers should maybe have noted the new and experimental
| array indexing stuff, but it's only natural to miss some
| things.
| nightpool wrote:
| That's fair enough! I was unaware of that history. But my
| point wasn't that the issue was "a big fail", it's that the
| GP was unfair in assigning the responsibility of that
| failure to "some newcomers [who] write code (and
| documentation) with this" while "the people who really know
| to handle these cases" are fine. The responsibility should
| have been on the people pushing for the experimental array
| indexing code to make it work safely with the existing
| usage of boundschecks that existed in the ecosystem and the
| existing documentation. It's a fundamental disagreement
| between whether the onus of code safety is on the user (who
| is responsible for understanding the totality of the
| libraries they're using and all of the ways they can fail)
| or on the programming language (for ensuring the stability
| and correctness of its code, documentation and ecosystem
| when making changes).
| leephillips wrote:
| Just to clarify, the prod() bug you mention was fixed about a
| year ago.
| kazinator wrote:
| > _Julia ships an entire patched version of LLVM to fix
| correctness bugs in numerical methods_
|
| Sounds like the banana ships with the gorilla which requires
| the entire jungle, and we're too busy fixing the gorilla to
| give the banana our undivided attention.
| grumpyprole wrote:
| > Everything has correctness issues somewhere.
|
| Yes but Julia is (yet another) dynamic language, presumably for
| "ease of use". A language with static types would have made it
| easier to build correct software (scientific code in e.g. OCaml
| and F# can look pretty good). Julia chose a path to maximize
| adoption at the expense of building a reliable ecosystem. Not
| all languages choose to make this trade-off.
| pron wrote:
| > A language with static types would have made it easier to
| build correct software
|
| This claim is repeated often, but numerous attempts have
| failed to demonstrate that this is generally the case in
| practice (there have been a couple of studies showing an
| effect in very specific circumstances). Static types _might_
| indeed assist with correctness, but they are not the only
| thing that does, and in some situations they could come at
| the expense of others. I.e., even if types were shown to
| significantly help with correctness, it does not follow that
| if you want correctness your best course would be to add
| types.
|
| Given empirical studies, the current working hypothesis
| should be that if static types do have a positive effect on
| correctness, it is a small one (if it were big, detecting it
| would have been easy).
|
| Note that Matlab, the workhorse of scientific computing for a
| few decades now, is even less typed than Julia. That's not to
| say that Julia doesn't suffer from too many correctness
| issues (I have no knowledge on the matter), but even if it
| does, there is little support for the claim that typing is
| the most effective solution.
| StefanKarpinski wrote:
| In particular, not a single issue mentioned in this article
| would have been prevented by static type checking.
| grumpyprole wrote:
| This is not true. For example, the issue regarding custom
| index ranges causing silent data corruption (6 examples)
| could be fixed with static types.
|
| Look how many of the other bug reports contain the phrase
| "does not check for" or refer to specific primitive
| types.
| DNF2 wrote:
| How would static types help with that? Whether your
| indexing range starts with zero or one or something else
| isn't necessarily encoded in the type domain.
| `1:length(A)` is just a range of `Int`s.
| grumpyprole wrote:
| Why not encode the starting offset into the type domain?
| Or at least distinguish between normal and unusual. Then
| the function signature can restrict to 1-offset arrays if
| that is what it assumes internally.
| DNF2 wrote:
| That means disallowing indexing with integers, I presume?
| Since an integer can take the values 0 or 1 equally. And
| what about the other end of the array. Must every index
| be restricted by type to be located in the acceptable
| range?
| markkitti wrote:
| If the function signature said `Array` rather than
| `AbstractArray`, then this code would have been fine.
| `Array` indexing starts at `1`.
|
| ``` julia> function f(A::Array) println(A[1:length(A)])
| end f (generic function with 1 method)
|
| julia> f([1,2,3,4]) [1, 2, 3, 4]
|
| julia> f(OffsetArray(1:10, -1)) ERROR: MethodError: no
| method matching f(::OffsetVector{Int64,
| UnitRange{Int64}}) ```
|
| You could prevent this problem using Julia's type system.
| The `AbstractArray` might have been too broad. Based on
| the chronology of the code that might not have been
| apparent. See other threads for details.
|
| Another way would be to treat `firstindex` as a trait and
| dispatch on that. ``` julia> f(A::AbstractArray) = f(A,
| Val(firstindex(A))) f (generic function with 1 method)
|
| julia> f(A::AbstractArray, firstindex::Val{1}) =
| println(A[1:length(A)]) f (generic function with 2
| methods)
|
| julia> f(A::AbstractArray, firstindex::Val{T}) where T =
| error("Indexing for array does not start at 1") f
| (generic function with 3 methods)
|
| julia> f(A::AbstractArray, firstindex::Val{0}) =
| println("So you like 0-based indexing?") f (generic
| function with 4 methods)
|
| julia> f([1,2,3,4]) [1, 2, 3, 4]
|
| julia> using OffsetArrays
|
| julia> f(OffsetArray(1:10, 1)) ERROR: Indexing for array
| does not start at 1 Stacktrace: [1] error(s::String) @
| Base .\error.jl:33 [2] f(A::OffsetVector{Int64,
| UnitRange{Int64}}, #unused#::Val{2}) @ Main .\REPL[5]:1
| [3] f(A::OffsetVector{Int64, UnitRange{Int64}}) @ Main
| .\REPL[3]:1 [4] top-level scope @ REPL[9]:1
|
| julia> f(OffsetArray(1:10, -1)) So you like 0-based
| indexing? ```
| DNF2 wrote:
| The starting offset is encoded in the type domain, btw,
| and accessible with the `firstindex` function.
|
| But you will still want to calculate indices at runtime,
| and then out-of-bounds errors will have to be caught at
| runtime anyway.
| adgjlsfhk1 wrote:
| it is only fixed with static typing and no generics (i.e.
| C/Fortran). If you have a generic array supertype, a
| statically typed language would let you write exactly the
| same bug.
| grumpyprole wrote:
| If one has a different type or trait for unusual and
| normal range indices, then the signature for the
| procedure that assumes indexing from 1 can be written to
| disallow other starting indices.
| jolux wrote:
| > Note that Matlab, the workhorse of scientific computing
| for a few decades now, is even less typed than Julia.
|
| You always make this argument when discussing PL features
| and I find it irksome. People get along fine without this
| feature, therefore there's no sense in implementing it. But
| it cuts the other way, or we'd all still be using assembly.
| How many Matlab users know things could be better? Was the
| superiority of structured programming and avoiding GOTO
| ever empirically proven, or did we all just collectively
| realize it was a good idea?
| pron wrote:
| > People get along fine without this feature, therefore
| there's no sense in implementing it.
|
| As someone whose job is to add new features to a
| programming language, that's never been my argument.
|
| > But it cuts the other way, or we'd all still be using
| assembly
|
| High-level languages were satisfactorily shown to be more
| productive than Assembly. I don't claim that no
| innovation works, just that not _all_ do, and certainly
| not to the same degree. That feature X is helpful is
| certainly no evidence that feature Y is helpful, and that
| Python is more productive than Assembly does not support
| the claim that programs in OCaml are more correct than
| programs in Clojure.
|
| Also, my argument isn't "we got by without it" or that no
| idea could ever work. It's that a specific claim was
| tested and unconfirmed.
|
| > Was the superiority of structured programming and
| avoiding GOTO ever empirically proven, or did we all just
| collectively realize it was a good idea?
|
| I don't know about the former, but the latter is
| certainly true, and until we actually reach concensus you
| can't claim we have.
|
| BTW, I certainly don't claim that types aren't useful or
| even that they're not better in some ways (I _believe_
| that they help a lot with tooling and organisation), but
| the particular claim that they universally help with
| correctness, and do so better than other approaches, was
| studied, and simply not confirmed. You can 't come up
| with a claim, try and fail to support it time and again,
| and keep asserting it as if it's obviously true, despite
| the evidence.
| jolux wrote:
| > As someone whose job is to add new features to a
| programming language, that's never been my argument.
|
| I've definitely seen you argue along the lines of "it
| hasn't been implemented in Java, therefore nobody uses it
| and we can't tell if it's a good idea or not" before.
| Forgive me for assuming this followed from that.
|
| > You can't come up with a claim, try and fail to support
| it time and again, and keep asserting it as if it's
| obviously true, despite the evidence.
|
| But "correctness" of itself is pretty nebulous. If we
| define it as whether or not the program conforms to one's
| intentions with writing it, I would expect static types
| alone not to show a significant difference in
| correctness. Probably formal methods do but they have
| much higher overhead.
|
| However, in terms of eliminating patterns which are
| literally never correct, like dereferencing null
| pointers, violating resource lifetimes, or calling
| methods that don't exist, static typing can in fact
| eliminate those patterns.
|
| > the claim that programs in OCaml are more correct than
| programs in Clojure
|
| My full-time job is Elixir so I know full well the
| consequences of maintaining large codebases in dynamic
| languages. I would switch to OCaml in a heartbeat if it
| ran on the BEAM! I want to know that I am calling
| functions correctly within a node when the module can be
| resolved at compile time. This is a really basic thing to
| want, and not one that dynamic languages can offer. The
| qualitative difference is similar to that between
| structured and unstructured programming: I can actually
| do local reasoning about a function without having to
| check all the call sites or write a lot of defensive
| tests.
|
| This is an obvious advantage, and on some level I don't
| really care if it contributes to formal correctness or
| not because it would make my job easier.
| pron wrote:
| > I've definitely seen you argue along the lines of "it
| hasn't been implemented in Java, therefore nobody uses it
| and we can't tell if it's a good idea or not" before.
|
| You have not seen me argue anything along those lines. I
| have, however, said the converse, that we try not to
| adopt features in Java until they've proven themselves
| elsewhere.
|
| > However, in terms of eliminating patterns which are
| literally never correct, like dereferencing null
| pointers, violating resource lifetimes, or calling
| methods that don't exist, static typing can in fact
| eliminate those patterns.
|
| But the implication is reveresed! From A => B, i.e. types
| prevent certain bad things, you're concluding B => A,
| i.e. if you don't want those bad things then you should
| use types. That simply does not follow.
|
| > This is an obvious advantage, and on some level I don't
| really care if it contributes to formal correctness or
| not because it would make my job easier.
|
| I wouldn't dare imply that types don't have certain
| important advantages, but that doesn't support the
| specific claim that types generally and significantly
| improve correctness -- which many have tried to show and
| failed -- and it certainly doesn't support the much
| stronger claim that if you want to improve correctness,
| the most effective way to do it is to use types.
| ThenAsNow wrote:
| We can trade anecdotes on this topic, but I've written
| numerical code in OCaml and also Julia. The strictness of
| OCaml's type system is painful in a numerical context but
| for virtually all other things it is awesome to pass code
| into the interpreter/compiler and catch structural problems
| at compile-time rather than maybe at runtime.
|
| OCaml's type system is almost certainly not the right model
| for Julia but the ad-hoc typing/interface system Julia
| currently employs is at strong odds with compile-time
| correctness. There's almost certainly some middle ground to
| be discovered which might be unsound in a strict sense but
| pragmatically constrains code statically so there is high
| likelihood of having to go out of your way to pull the
| footgun trigger.
|
| You can see how little type annotations are used in
| practice in major Julia libraries. It should be integral to
| best practice in the language to specify some
| traits/constraints that arguments must satisfy to be
| semantically valid, but what you often see instead is a
| (potentially inscrutable) runtime error.
| grumpyprole wrote:
| > Given empirical studies, the current working hypothesis
| should be that if static types do have a positive effect on
| correctness, it is a small one.
|
| Which use cases, languages and static type systems are you
| referring to? The context is very important, especially
| when seeking to draw general conclusions from empirical
| studies.
|
| As someone who has previously posted extolling the merits
| of static analysis, I'm very surprised at your position
| regrding static types. Static types help to constrain a
| language and enable reasoning, either by additional static
| analysis or otherwise.
|
| It is precisely the flexibility of dynamic languages that
| makes them difficult to reason about and difficult to build
| correct software in. This is why the use of dynamic
| languages are mostly banned in the defense industry.
|
| Static types clearly help with composition (one of the
| complaints with Julia), especially at scale. How many
| academic empirical studies considered multimillion-line
| code bases? I submit for evidence a lot of expensive type-
| retrofitting projects such as Facebook Hack, Microsoft
| Typescript or Python types, which demonstrate that many
| companies have or had real problems with dynamic languages
| at any kind of scale.
| guenthert wrote:
| Julia allows you to specify the type of a datum if you feel
| the need (not unlike Common Lisp). Is any of the bugs the
| author mentioned related to the type system?
| mattkrause wrote:
| I'm surprised at this critique, as I thought Julia's type
| system was often considered to be one of its strongest
| features.
| ThenAsNow wrote:
| So, I really respect what you've done (for those who don't
| know, Chris is the original developer and lead of
| DifferentialEquations.jl) and use your work heavily. However,
| understanding and writing idiomatic Julia, especially with
| these large packages, is severely hampered by the documentation
| culture.
|
| A prior comment I made, all of which seems unaddressed to me
| three years later:
| https://news.ycombinator.com/item?id=20589167
|
| To be fair, I've only submitted a small documentation patch for
| a package and haven't significantly "put my money where my
| mouth is" on this topic. But I hope the next time there are
| thoughts among the core team about what is the next capability
| to add to the language, addressing this deficiency is
| prioritized.
| ChrisRackauckas wrote:
| FWIW, I posted the other month that I'm looking for any devs
| who can help with building a multi-package documentation for
| SciML, since I don't think the "separate docs for all
| packages" ends up helpful when the usage is intertwined.
| SciML is looking for anyone looking to help out there (and
| there's a tiny bit of funding, though "open source sized"
| funding). In the meantime, we're having a big push for more
| comprehensive docstrings, and will be planning a Cambridge
| area hackathon around this (follow
| https://www.meetup.com/julia-cajun/ for anyone who is curious
| in joining in).
|
| As for high level changes, there's a few not too difficult
| things I think that can be done:
| https://github.com/JuliaLang/julia/issues/36517 and
| https://github.com/JuliaLang/julia/issues/45086 are two I
| feel strongly about. I think limiting the type information
| and decreasing the stack size with earlier error checking on
| broadcast would make a lot of error messages a lot more sane.
| p33p wrote:
| Good comments, Chris. I think the author has a little bit of
| nuance in that Julia isn't correct in the specific use cases he
| needs them to be. While your point is also well taken that
| Julia is correct in cases where other languages aren't as well.
|
| I'm a little unfamiliar with the versioning in the package
| ecosystem, but would you say most packages follow or enforce
| SemVer? Would enforcing a stricter dependency graph fix some of
| the foot guns of using packages or would that limit
| composability of packages too much?
| ChrisRackauckas wrote:
| > but would you say most packages follow or enforce SemVer?
|
| The package ecosystem pretty much requires SemVer. If you
| just say `PackageX = "1"` inside of a Project.toml [compat],
| then it will assume SemVer, i.e. any version 1.x is non-
| breaking an thus allowed, but not version 2. Some (but very
| few) packages do `PackageX = ">=1"`, so you could say Julia
| doesn't force SemVar (because a package can say that it
| explicitly believes it's compatible with all future
| versions), but of course that's nonsense and there will
| always be some bad actors around. So then:
|
| > Would enforcing a stricter dependency graph fix some of the
| foot guns of using packages or would that limit composability
| of packages too much?
|
| That's not the issue. As above, the dependency graphs are
| very strict. The issue is always at the periphery (for any
| package ecosystem really). In Julia, one thing that can
| amplify it is the fact that Requires.jl, the hacky
| conditional dependency system that is very not recommended
| for many reasons, cannot specify version requirements on
| conditional dependencies. I find this to be the root cause of
| most issues in the "flow" of the package development
| ecosystem. Most packages are okay, but then oh, I don't want
| to depend on CUDA for this feature, so a little bit of
| Requires.jl here, and oh let me do a small hack for
| OffSetArrays. And now these little hacky features on the edge
| are both less tested and not well versioned.
|
| Thankfully there's a better way to do it by using multi-
| package repositories with subpackages. For example,
| https://github.com/SciML/GalacticOptim.jl is a global
| interface for lots of different optimization libraries, and
| you can see all of the different subpackages here
| https://github.com/SciML/GalacticOptim.jl/tree/master/lib.
| This lets there be a GalacticOptim and then a GalacticBBO
| package, each with versioning, but with tests being different
| while allowing easy co-development of the parts. Very few
| packages in the Julia ecosystem actually use this (I only
| know of one other package in Julia making use of this)
| because the tooling only recently was able to support it, but
| this is how a lot of packages should be going.
|
| The upside too is that Requires.jl optional dependency
| handling is by far and away the main source of loading time
| issues in Julia (because it blocks precompilation in many
| ways). So it's really killing two birds with one stone:
| decreasing package load times by about 99% (that's not even a
| joke, it's the huge majority of the time for most packages
| which are not StaticArrays.jl) while making version
| dependencies stricter. And now you know what I'm doing this
| week and what the next blog post will be on haha. Everyone
| should join in on the fun of eliminating Requires.jl.
| rcthompson wrote:
| Is "for i in 1:length(A)" _ever_ correct? Should Julia just
| emit a warning any time it encounters that pattern? Or maybe
| something slightly more complicated, such as that pattern
| followed by usage of i to index into A inside the loop?
| a1369209993 wrote:
| > Is "for i in 1:length(A)" _ever_ correct?
|
| Yes, actually. While I have approximately zero knownledge of
| Julia specifically, a language-independent example might be:
| B = OneBasedArray(length(A)) A_ = iter(A) for i
| in 1:length(A) { B[i] = pop(A_) }
| assert(iter_isdone(A_))
|
| And if that looks contrived... yes; it is contrived.
|
| > that pattern followed by usage of i to index into A inside
| the loop?
|
| I can't think of any legitimate uses for that, but there
| probably are some; make sure to allow: len =
| length(A) for i in 1:len ...
|
| as a `if( (x = foo()) )`-style workaround.
| cmcaine wrote:
| It's correct if you want to do something `length(A)` times
| and want an iteration counter, but it's never better than
| `for idx in eachindex(A)` if what you actually want are
| indexes into A (which is of course the much more common
| case).
|
| Julia did not initially support arrays that aren't indexed
| from 1 (experimental support added in Julia 0.5, I don't know
| when it was finalised), and at that time I'm not even sure we
| had something like eachindex, certainly there would be no
| reason why someone would use it for an array.
| TimTheTinker wrote:
| > Is "for i in 1:length(A)" ever correct?
|
| In some rare cases, it very well might be _exactly_ what the
| code 's author intended and needed.
|
| I tend to lean towards when Martin Fowler calls an "enabling
| attitude"[0] (as opposed to a "directing attitude") -- that
| is, when faced with a choice about how to design the
| primitives of an interface, I lean more often towards
| providing flexibility, and I try to avoid choosing ahead of
| time what users _aren 't_ allowed to do. It's better to
| document what's usually the wrong way to do something than to
| enforce it in the design. You can never guess what amazing
| things people will create when they are given flexible,
| unrestricted primitives.
|
| So for cases like this, I think it's better to rely on a
| flexible linting tool (if available) than warnings or errors.
|
| [0] https://martinfowler.com/bliki/SoftwareDevelopmentAttitud
| e.h...
| dan-robertson wrote:
| Why not have a feature to allow you to turn off the
| warning? E.g. have something recognise 1:length(x) and
| complain unless you write e.g. @nowarn eachindex before it.
| rashidrafeek wrote:
| It is correct if `A` is of type `Array` as normal Array in
| julia has 1-based indexing. It is incorrect if `A` is of some
| other type which subtypes `AbstractArray` as these may not
| follow 1-based indexing. But this case errors normally due to
| bounds checking. The OP talks about the case where even
| bounds checking is turned off using `@inbounds` for speed and
| thus silently gives wrong answers without giving an error.
|
| An issue was created sometime ago in StaticLint.jl to fix
| this: https://github.com/julia-
| vscode/StaticLint.jl/issues/337
| KKKKkkkk1 wrote:
| FMA can't be broken on Windows because FMA is implemented in
| hardware by Intel. What's broken is the compiler that Julia
| uses on Windows.
| ChrisRackauckas wrote:
| When FMA isn't in the hardware (due to using some chip where
| it doesn't exist) it has a fallback to a software-based
| emulation. That is incorrectly implemented in Windows. Julia
| ends up calling that in this case because that's what LLVM
| ends up calling, and so any LLVM-based language will see this
| issue.
| celrod wrote:
| Even when FMA is implemented in hardware, LLVM will
| generally use the software version when the arguments are
| known at compile time.
| stephencanon wrote:
| FMA is only implemented in hardware on Haswell and later
| uArches. If you're running on (or compiling for) IVB or
| earlier, you'll get a libcall instead, and MSVC's has been
| broken since forever.
| Diggsey wrote:
| Is this actually broken in MSVC, or is it broken because
| Julia is using mingw and linking to an ancient version of
| libc on windows (which is intentionally left as-is for
| back-compat)?
|
| (I genuinely don't know, but the linked issue mentioned
| mingw specifically)
| adgjlsfhk1 wrote:
| the problem is that LLVM will happily miscompile fma
| instructions by turning them into incorrect constants due to
| windows having a broken libm. This is a bug in C/C++, and I'm
| currently unaware of a language that has fma and a good
| compiler which gives correct fma results on Windows.
| poulpy123 wrote:
| Why allow iterating with 1:length(A) if it's not the good way ?
| adgjlsfhk1 wrote:
| you can't disallow it at a language level since either way,
| you are just indexing with Ints. That said, we can add better
| linting rules to catch stuff like this.
| cmcaine wrote:
| I don't think there's any clean way to stop that at a
| language level (some languages prevent this by disallowing
| random access to arrays, but that's a non-starter for a
| performance-oriented language), and also it would be a
| massively breaking change.
| Strilanc wrote:
| > _Everything has correctness issues somewhere._
|
| This is fallacy of gray. The blog post isn't complaining that
| there are non-zero bugs, it's complaining that when you use the
| language you hit a lot of correctness bugs. More bugs than
| you'd hit using e.g. python.
|
| Also, to the extent that Julia uses LLVM, a correctness bug in
| LLVM is also a correctness bug in Julia. So arguing "LLVM has
| lots of correctness bugs" is not helping the case...
|
| > _because the code is all Julia, it 's really easy to dig in
| there and find potential bugs._
|
| The blog post is about bugs hit while running code, not bugs
| found while reading code. The fact the issue can be understood
| and pointed at is great, but it's the number of issues being
| hit that's the problem.
| mbauman wrote:
| I do think there's a particularly unique challenge to Julia in
| that so many packages can theoretically coexist and
| interoperate. While it quadratically increases the power of
| Julia, it also quadratically increases the surface area for
| potential issues. That -- to me -- is the most interesting part
| of the blog post. How can we help folks find the "happy" paths
| so they don't get lost in the weeds by trying to differentiate
| a distributed SVD routine of an Offset BlockArray filled with
| Unitful Quaternions? And -- as someone who worked with and
| valued Yuri's reported issues and fixes -- how can I more
| quickly identify that they're not someone who gets joy out of
| making such a thing work?
| RcouF1uZ4gsC wrote:
| > If you pass it an array with an unusual index range, it will
| access out-of-bounds memory: the array access was annotated with
| @inbounds, which removed the bounds check.
|
| It think making indexes configurable is a huge mistake. Even if
| they are not ideal for the situation, having a single way to do
| indexes makes a huge source of confusion and potential bugs just
| go away. And this is orthogonal to whether you pick 0 or 1 as
| your starting point, as long as the whole language embraces that.
|
| For example with C/C++/Rust, you know it is zero based indexing.
| Even if it is not perfectly ideal for your formulas, the mental
| math of translating to zero based is with not constantly having
| to worry about if a library is one based or zero based and what
| happens if you compose them.
| mattkrause wrote:
| There's a parallel idea, that you should avoid--insofar as is
| possible--numerical indexing. In other words, instead of
| iterating over `0:length(X) - 1` or `1:length(X)`, you use
| something like `for element in array` or
| indices = CartesianIndices(multidimensional_X) for
| index in indices X[index] = # whatever
|
| If you do that, you don't need to keep track of whether it's
| zero-based, one-based, or anything else. In fact, you may not
| even need to keep track of the number of dimensions, as in this
| example, https://julialang.org/blog/2016/02/iteration/
| runevault wrote:
| I'm only skimming this post and I'm not familiar with Julia
| so maybe I'm missing it, but does it have a way to get an
| item AND its index? There's I think Enumrable? in Rust where
| it gives you a tuple with both the item and its index in
| cases where you need both.
| leephillips wrote:
| julia> pairs("Francois") |> collect
| 8-element Vector{Pair{Int64, Char}}:
| 1 => 'F'
| 2 => 'r'
| 3 => 'a'
| 4 => 'n'
| 5 => 'c'
| 7 => 'o'
| 8 => 'i'
| 9 => 's'
|
| Notice the missing index 6, because c takes two bytes.
|
| In contrast, enumerate() gets you the iteration number:
| julia> enumerate("Francois") |> collect
| 8-element Vector{Tuple{Int64, Char}}:
| (1, 'F')
| (2, 'r')
| (3, 'a')
| (4, 'n')
| (5, 'c')
| (6, 'o')
| (7, 'i')
| (8, 's')
|
| This can trip you up.
| runevault wrote:
| Rust has the same problem with dealing with strings where
| if you don't realize how you are supposed to handle it
| with unicode you'll get burned when you don't correctly
| access code points.
|
| Edit: Also thank you for the answer. I have been curious
| about Julia even though I'm not a data science/ML type,
| but never find time. I do like to keep an eye on it
| though.
| leephillips wrote:
| I haven't used Rust, but Julia keeps you from getting
| burned too badly by giving you an informative error
| message if you try to index "inside" a character:
| julia> "Francois"[6]
| ERROR: StringIndexError: invalid index [6], valid nearby
| indices [5]=>'c' [7]=>'o'
|
| I'm not a data science type either. I came to Julia
| through physics and general computing. It's the best
| language for science I've ever encountered.
| lmiq wrote:
| for (i,val) in pairs(array)
| mike_hock wrote:
| Works great for trivial cases where there's no
| interdependency between array elements. As soon as you need
| to access, for example, adjacent elements, you want to be
| able to just iterate over 1:length(X) - 1 and access a[i-1]
| and a[i]. This is the most direct way and thus easiest to get
| right. Abstractions only make it more error prone.
| mattkrause wrote:
| Is `for i in eachindex(X)` really any worse?
|
| You can still do math on i, it avoids issues with
| OffsetArrays, and it might even be clearer why you're
| iterating. It requires that the array type support linear
| indexing, but so does doing anything sensible with X[i] and
| X[i-1].
| DNF2 wrote:
| What do you mean by 'mistake'? How are the Julia devs going to
| stop someone from defining arrays with configurable indices?
|
| Are you suggesting that the core language should somehow make
| this impossible? How?
| adgjlsfhk1 wrote:
| OffsetArrays can be really nice for things like convolutions.
| For example, it ends up being really natural to have a matrix
| that is indexed on [-2:2, -2:2] to implement a gausian blur. It
| definitely is a potential bug source though.
| kllrnohj wrote:
| Indexes being configurable makes a ton of sense. It's why so
| many languages end up with a slice type (or array_view or span
| or whatever you want to call it). Why shouldn't the base array
| type just itself be the slice type?
| p33p wrote:
| Viral frequents HN so I will be curious to see if he engages this
| directly in a productive manor.
|
| There are many great qualities of Julia, and I've wanted to love
| it and use it in production. However, coming from the tooling and
| correctness of Rust leaves me thinking something is just missing
| in Julia. One of the links in the post references "cowboy"
| culture. While I don't think this is the correct nomenclature,
| there is a sense with looking at the package ecosystem and even
| Julia itself that makes me think of the pressure in academia to
| publish constantly. I'm not sure what to make of that, and it's
| simply a feeling.
| ViralBShah wrote:
| I think Keno's comment above pretty much articulates my
| thoughts as well. I have met Yuri on several occasions and have
| been thrilled to see his contributions. I find the post
| constructive and it will certainly help make Julia better, and
| hope Yuri will be back at a later date.
|
| Some of the issues linked are JuliaStats issues, and there's a
| lot happening to improve it, which should become more visible
| over the next few months. Example:
| https://discourse.julialang.org/t/pushing-julia-statistics-d...
|
| Julia really pushes on language and compiler design in ways
| many statically typed languages do not. There is real wok to be
| done at the frontiers, and also investment in tooling built on
| top of that. It is all happening. The package ecosystem takes
| time to mature - Julia has a deliberate release process, the
| key packages have adopted a more deliberate release process,
| but stuff out in the long tail naturally tends to move fast -
| as it should.
| derbOac wrote:
| I've been a user of Julia for some time (at least since beta
| versions). I love the language and feel like the author of the
| blog post is maybe exaggerating or generalizing a bit too much.
| On the other hand, based on my personal experiences with Julia,
| I can definitely empathize and feel like there's a lot about
| the blog post that rings true.
|
| I share your sense that "something is just missing in Julia"
| but I maybe disagree with the author in that I see it as
| potentially changeable or something, as not hopeless.
|
| Julia has grown tremendously in a short period of time, both in
| the language, its implementation, and the size of the
| community. So in that sense I see it as inevitable there's
| going to be a lot of bugs and chaos for a bit.
|
| On the other hand, I've always felt a bit of unease that a
| numerical language was being developed from the ground up as
| that, without it being an offshoot of more general purpose
| language. It's not that I think there's something inherently
| wrong with it, but I do think that having a greater variety of
| perspectives looking at it are more likely to catch things
| early.
|
| I don't think in this regard it's a function of academia --
| although it certainly could be -- it's more a function of
| having a very narrow community looking at the language.
| Regardless of how smart they all are, I think having a broader
| range of perspectives might catch things earlier.
|
| In this regard, I might have preferred the Julia fervor and
| effort be put into some numerical Nim libraries, or a numerical
| "abstracted subset of Rust" or something. It's not so much I
| dislike Julia as much as it is I'd feel safer with a more
| generalist perspective on basic language design.
|
| But who knows. To me it's a bit ironic the author focuses on
| Python as an alternative, because it's not like that is free
| from problems, and Python has been around for a lot longer.
| They might be _different_ problems, but they 're not absent.
| Python is a bit ironic too in that it has been sort of kludged
| together over time into what it is today, for better or worse.
| I guess it feels like to me all the major numerical programming
| platforms have this kind of kludgy feeling in different ways;
| Julia feels/felt a bit like an opportunity for a clean break,
| if nothing else.
| jbezanson wrote:
| I don't think there is anything "numerical" about the core
| language design of julia; it is just a general generic-
| function-based OO language. In fact I think we made many
| decisions in line with trends in the broader language world,
| e.g. emphasizing immutable objects, having no concrete
| inheritance, using tasks and channels for concurrency, and
| deliberately avoiding "matlab-y" features like implicit array
| resizing. Of course many in the "general purpose" crowd don't
| like 1-based indexing, but surely that is not the source of
| all of our problems :)
| patrec wrote:
| I tend to be a a bit wary of dynamic languages with
| sophisticated, performant implementations of complex
| abstractions, especially if they have somewhat niche appeal. In
| my experience this is a combination that makes for running into a
| lot of implementation bugs. For example, I've run into many more
| nasty compiler bugs with lisps (and julia at least qualifies as
| an almost-lisp) than with more simple-minded dynamic languages
| like python or erlang[1] or fairly sophisticated but niche
| statically typed languages.
|
| I think watching Julia over the next few years will be quite
| interesting: it's the only dynamically typed language that has
| both sophisticated abstractions and a sophisticated
| implementation[1] that has enough pull to have a chance to become
| entrenched in certain domains. I wonder to what extent they will
| be able to get this problem under control.
|
| [1] BEAM, unlike cpython, is actually a marvel of engineering and
| making very deliberate trade-offs. But it's not very complex.
|
| [2] Javascript is of course the one pervasive dynamically typed
| programming language that has sophisticated implementations, but
| of mostly ill-conceived constructs.
| KenoFischer wrote:
| So this one is a tough one for me, because Yuri has certainly
| spent significant time with Julia and I think he's a very
| competent programmer, so his criticism is certainly to be taken
| seriously and I'm sad to hear he ended up with a sour opinion.
|
| There's a lot of different issues mentioned in the post, so I'm
| not really sure what angle to best go at it from, but let me give
| it a shot anyway. I think there's a couple of different threads
| of complaints here. There's certainly one category of issues that
| are "just bugs" (I'm thinking of things like the HTTP, JSON, etc.
| issues mentioned). I guess the claim is that this happens more in
| Julia than in other systems. I don't really know how to judge
| this. Not that I think that the julia ecosystem has few bugs,
| just that in my experience, I basically see 2-3 critical issues
| whenever I try a new piece of software independent of what
| language it's written in.
|
| I think the other thread is "It's hard to know what's expected to
| work". I think that's a fair criticism and I agree with Yuri that
| there's some fundamental design decisions that are contributing
| here. Basically, Julia tries very hard to make composability
| work, even if the authors of the packages that you're composing
| don't know anything about each other. That's a critical feature
| that makes Julia as powerful as it is, but of course you can
| easily end up with situations where one or the other package is
| making implicit assumptions that are not documented (because the
| author didn't think the assumptions were important in the context
| of their own package) and you end up with correctness issues.
| This one is a bit of a tricky design problem. Certainly adding
| more language support for interfaces and verification thereof
| could be helpful, but not all implicit assumptions are easily
| capturable in interfaces. Perhaps there needs to be more explicit
| documentation around what combinations of packages are
| "supported". Usually the best way to tell right now is to see
| what downstream tests are done on CI and if there are any
| integration tests for the two packages. If there are, they're
| probably supposed to work together.
|
| To be honest, I'm a bit pained by the list of issues in the blog
| post. I think the bugs linked here will get fixed relatively
| quickly by the broader community (posts like this tend to have
| that effect), but as I said I do agree with Yuri that we should
| be thinking about some more fundamental improvements to the
| language to help out. Unfortunately, I can't really say that that
| is high priority at the moment. The way that most Julia
| development has worked for the two-ish years is that there are a
| number of "flagship" applications that are really pushing the
| boundary of what Julia can do, but at the same time also need a
| disproportionate amount of attention. I think it's overall a good
| development, because these applications are justifying many
| people's full time attention on improving Julia, but at the same
| time, the issues that these applications face (e.g. - "LLVM is
| too slow", better observability tooling, GC latency issues) are
| quite different from the issues that your average open source
| julia developer encounters. Pre 1.0 (i.e. in 2018) there was a
| good 1-2 year period where all we did was think through and
| overhaul the generic interfaces in the language. I think we could
| use another one of those efforts now, but at least that this
| precise moment, I don't think we have the bandwidth for it.
| Hopefully in the future, once things settle down a bit, we'll be
| able to do that, which would presumably be what becomes Julia
| 2.0.
|
| Lastly, some nitpicking on the HN editorialization of the title.
| Only of the issues linked
| (https://github.com/JuliaLang/julia/issues/41096) is actually a
| bug in the _language_ - the rest are various ecosystem issues.
| Now, I don 't want to disclaim responsibility there, because a
| lot of those packages are also co-maintained by core julia
| developers and we certainly feel responsibility to make those
| work well, but if you're gonna call my baby ugly, at least point
| at the right baby ;)
| StefanKarpinski wrote:
| The big language design problem that I think this post
| highlights is that the flip side of Julia's composability is
| that composing generic code with types that implement
| abstractions can easily expose bugs when the caller and the
| callee don't agree on exactly what the abstraction is.
|
| Several of the bugs that Yuri reported are a very specific case
| of this: there's a lot of generic code that assumes that array
| indexing always starts at one, but that's not always the case
| since OffsetArrays allow indexing to start anywhere. The older
| code in the stats ecosystem is particularly badly hit by this
| because it often predates the existence of OffsetArrays and the
| APIs that were developed to allow writing efficient generic
| code that works with arrays that don't start at the typical
| index (or which might even want to be iterated in a different
| order).
|
| Fixing these specific OffsetArray bugs is a fairly
| straightforward matter of searching for `1:length(a)` and
| replacing it with `eachindex(a)`. But there's a bigger issue
| that this general problem raises: How does one, in general,
| check whether an implementation of an abstraction is correct?
| And how can one test if generic code for an abstraction uses
| the abstraction correctly?
|
| Many people have mentioned interfaces and seem to believe that
| they would solve this problem. I don't believe that they do,
| although they do help. Why not? Consider the OffsetArray
| example: nothing about `for i in 1:length(a)` violates anything
| about a hypothetical interface for AbstractArrays. Yes, an
| interface can tell you what methods you're supposed to
| implement. There's a couple of issues with that: 1) you might
| not actually need to implement all of them--some code doesn't
| actually use all of an interface; 2) you can find out what
| methods you need to implement just by running the code that
| uses the implementation and see what fails. What the interface
| would guarantee is that if you've implemented these methods,
| then no user of your implementation will hit a missing method
| error. But all that tells you is that you've implemented the
| entire surface area of the abstraction, not that you've
| implemented the abstraction at all correctly. And I think that
| covering the entire surface area of an abstraction when
| implementing it is the least hard part.
|
| What you really want is a way to generically express behaviors
| of an abstraction in a way that can be automatically tested. I
| think that Clojure's spec is much closer to what's needed than
| statically checked interfaces. The idea is that when someone
| implements an abstraction, they can automatically get tests
| that their implementation implements the abstraction correctly
| and fully, including the way it behaves. If you've implemented
| an AbstractArray, one of the tests might be that if you index
| the array with each index value returned by `eachindex(a)` that
| it works and doesn't produce a bounds error.
|
| On the other end, you also want some way of generating mock
| instances of an abstraction for testing generic code. We do a
| bit of this in Julia's test suite: there are GenericString and
| GenericSet types, which implement the minimal string/set
| abstraction, and use these to test generic code to verify that
| it doesn't assume more than it should about the string and set
| abstractions. For a GenericArray type, you'd want it to start
| at an arbitrary index and do other weird stuff that exotic
| array types are technically allowed to do, so that any generic
| code that makes invalid assumptions will get caught. You could
| call this type AdversarialArray or something like that.
|
| I've personally thought quite a bit about these issues, but as
| Keno has said, there hasn't been time to tackle these problems
| in the last couple of years. But they certainly are important
| and worth solving.
|
| On a personal note, Yuri, thanks for all the code and I'm sorry
| to see you go.
| blindseer wrote:
| > 2) you can find out what methods you need to implement just
| by running the code that uses the implementation and see what
| fails.
|
| For large codebases this is SO painful to do. I just don't
| understand how anyone gets anything done when this is how
| they have to develop code.
| StefanKarpinski wrote:
| That's why interfaces are useful--they save you from that.
| But they don't actually solve the problem of checking that
| an abstraction has been implemented correctly, just that
| you've implemented the entire API surface area, possibly
| incorrectly. Note, however, that if you have a way of
| automatically testing the behavioral correctness of an
| implementation, then those tests presumably cover the
| entire API, so automatic testing would subsume the benefit
| that static interface checking provides--just run the
| automatic tests and it tells you what you haven't
| implemented as well as what you may have implemented
| incorrectly.
| sirwhinesalot wrote:
| Indeed, static types don't save you from this issue, at
| least structural ones don't, exactly the same issue would
| occur as in Julia (see: C++ templates). Static structural
| types have the same problem as Julia here, you gain a lot
| of compositional power at the expense of potential
| correctness issues.
|
| However, nominal types do "solve" the problem somewhat,
| as there's a clear statement of intent when you do "X
| implements Y" that the compiler enforces. If that promise
| is missing, the compiler will _not_ let you use an X
| where a Y is expected. And if you do say X implements Y,
| then you probably tested that you did it correctly.
|
| But this would also fail at the OffsetArray problem. The
| only way I can see of protecting against it (statically
| or dynamically) is to have an "offset-index" type,
| different from an integer, that you need to have to index
| an OffsetArray. That makes a[x] not be compatible between
| regular Arrays and OffsetArrays.
|
| I don't think anyone wants that mess though. So if your
| language has OffsetArrays, and they're supposed to be
| compatible with Arrays, and you can index both the same
| way, no amount of static types will help (save for
| dependent/refinement types but those are their own mess).
|
| EDIT: I seem to have replied to the wrong comment, but
| the right person, so hey, no issue in the end :)
| kaba0 wrote:
| Interface's provide correctness guarantees by way of
| implementing them is a conscious decision. If your array
| implements GenericArray, you know about that interface, and
| presumably what it is used for. Its methods can also contain
| documentation.
|
| The point is a common point of... trust may be the word? Two
| developers that don't even know each other can use each
| other's code correctly by programming against a third,
| hypothetical implementation that they both agree on. Here
| OffsetArray would simply not implement the GenericArray
| interface if the latter expects 1-based indexing.
|
| In this specific case the solution would be to move the
| indexing question into the interface itself - it is not only
| an implementation detail. Make the UltraGenericArray
| interface have an offset() method as well and perhaps make []
| do 1-based indexing always (with auto-offsetting for indexed
| arrays), and a separate index-aware get() method, so that
| downstream usage must explicitly opt in to different
| indexing.
| mfsch wrote:
| It seems to me that much of the difficulty with interfaces,
| whether they are made explicit or kept implicit, lies in
| defining the semantics that the functions are supposed to
| have.
|
| As we expand the types our generic code can handle, we have
| to refine the semantics it relies on. For a long time,
| Base.length(::AbstractArray) could mean "the largest one-
| based index of the array", but then we started using the same
| code that handles regular Arrays for OffsetArrays and this
| interpretation was no longer valid. I guess the alternative
| would have been to leave length(::OffsetArray) unimplemented
| and block the valid use of OffsetArrays for all generic code
| that understands Base.length as "the number of values".
|
| It can still be difficult to tell what a function like
| Base.length should mean if I implement it for my types. For
| example, should it return the number of local values or the
| global length for an array that is distributed between
| multiple processes (e.g. in an MPI program)? Perhaps some
| generic code will use it to allocate a buffer for
| intermediate values, in which case it should be the local
| length. Or some generic code computes an average by dividing
| the (global) sum by the global length.
|
| It seems impossible to come up with a precise definition of
| all the semantics your generic code assumes a priori, so we
| can either restrict our usage of generics to a small number
| of concrete types that were considered when the code was
| written, or we have to accept that we occasionally run into
| these sorts of issues while we refine the semantics.
|
| Anecdotally, it has been my experience that packages that
| have been made to work in many generic contexts (such as the
| ODE packages) are likely to work flawlessly with my custom
| types, while packages that have seen less such effort (e.g.
| iterative solvers) are more likely to cause issues. This
| makes me hopeful that it is possible to converge towards very
| general generic implementations.
|
| It is also worth mentioning that it is very possible to use
| Julia without ambitious use of cross-package generic
| functionality, and use it "merely" as a better Fortran or
| Matlab.
| iamed2 wrote:
| Invenia's approach to interface testing ("Development with
| Interface Packages" on our blog) does some of the things you
| suggest as a standard of practice, by providing tools to
| check correctness that implementers can use as part of
| package tests. ChainRulesTestUtils.jl is a decent example
| (although this one doesn't come with fake test types). I
| think this is typically good enough, and only struggles with
| interface implementations with significant side effects.
|
| One little win could be publishing interface tests like these
| for Base interfaces in the Test stdlib. I appreciate that the
| Generic* types are already exposed in the Test stdlib!
| clhodapp wrote:
| > What you really want is a way to generically express
| behaviors of an abstraction in a way that can be
| automatically tested.
|
| The pure FP ecosystems in Scala often accomplish this in the
| form of "laws", which are essentially bundles of pre-made
| unit tests that they ship alongside their core abstraction
| libraries.
| Sukera wrote:
| To expand on the "interfaces are not enough" part: Defining
| an interface on an abstract type only gives you that _a_
| implementation exists, not that it is _correct_ , i.e. that
| the specific implementation for a subtype guarantees the same
| properties the interface specifies.
|
| On top of this, you really want to be alerted to when you
| expect more of an interface than the interface guarantees -
| this is what happened in the case of `1:length(A)` being
| assumed to give the indices into `A`, when the
| `AbstractArray` interface really only guarantees that a given
| set of methods exists.
|
| I feel like these sorts of issues more or less require more
| formal models being provided & checked by the compiler.
| Luckily for us, nothing in this space has been implemented or
| attempted in & for julia, while there are a lot of
| experiments with formal methods and proofing systems being
| researched right now (TLA+, coq,..). There are of course a
| lot of footguns[1], but the space is moving fast and I'd love
| to see something that makes use of this integrated into julia
| at some point.
|
| [1]: Why specifications don't compose -
| https://hillelwayne.com/post/spec-composition/
| tialaramex wrote:
| > Defining an interface on an abstract type only gives you
| that a implementation exists, not that it is correct
|
| Pretty far off topic for Julia, but the definition of
| Rust's Traits over _semantics_ rather than _syntax_ (even
| though of course the compiler will only really check your
| syntax) gives me a lot of this.
|
| The fact that this Bunch<Doodad> claims to be
| IntoIterator<Item=Doodad> tells me that the person who
| implemented that explicitly intends that I can iterate over
| the Doodads. They can't _accidentally_ be IntoIterator
| <Item=Doodad> the author has to literally write the
| implementation naming the Trait to be implemented.
|
| But that comes at a heavy price of course, if the author of
| Bunch never expected me to iterate over it, the best I can
| do is new type MyBunch and implement IntoIterator using
| whatever ingredients are provided on the surface of Bunch.
| This raises the price of composition considerably :/
|
| > you really want to be alerted to when you expect more of
| an interface than the interface guarantees
|
| In the case alluded to (AbstractArray) I feel like the
| correct thing was _not_ to implement the existing
| interface. That might have been disruptive at the time, but
| people adopting a new interface which _explicitly_ warns
| them not to 1:length(A) are not likely to screw this up,
| and by now perhaps everything still popular would have
| upgraded.
|
| Re-purposing existing interfaces is probably always a bad
| idea, even if you can persuade yourself it never
| _specifically_ said it was OK to use it the way you suspect
| everybody was in practice using it, Hyrum 's Law very much
| applies. That interface is frozen in place, make a new one.
| renox wrote:
| I remember reading a long time ago about the 1-based array
| and the offset-array 'kludge'.
|
| My first thought was they should have replicated Ada's design
| instead, my second thought I hope that they have a good
| linter because putting arbitrary offset implementation in a
| library is a minefield.
|
| I don't claim to be especially smart: this is/was obvious..
| Unfortunately what isn't obvious is how to fix this issue and
| especially how to fix the culture which produces this kind of
| issue..
| dang wrote:
| Re the title: ok, we've replaced the submitted title ("The
| Julia language has a number of correctness flaws") with a
| representative phrase from the OP which uses the word
| 'ecosystem'.
|
| HN's title rule calls for using the original title unless it is
| misleading or linkbait
| (https://news.ycombinator.com/newsguidelines.html) and "Why I
| no longer recommend Julia" is generic enough to be a sort of
| unintentional linkbait - I think it would lead to a less
| specific and therefore less substantive discussion. In that
| sense the submitter was probably right to change the title, and
| for the same reason I haven't reverted it.
|
| I'm going to autocollapse this comment so we don't get a big
| thread about titles.
| KenoFischer wrote:
| Thanks. Appreciate your thoughtful moderation as always :).
| patrickkidger wrote:
| FWIW my take is not that Yuri is expressing "there are too many
| bugs" so much as he's expressing a problem in the culture
| surrounding Julia itself:
|
| > But systemic problems like this can rarely be solved from the
| bottom up, and my sense is that the project leadership does not
| agree that there is a serious correctness problem.
|
| Concisely:
|
| 1. The ecosystem is poorly put together. (It's been produced by
| academics rather than professional software developers.)
|
| 2. The language provides few tools to guarantee correctness.
| (No static typing; no interfaces.)
|
| Personally, what I'd love to see is one of the big tech
| companies come on board and just write their own ecosystem. The
| Julia language is amazing. The ecosystem needs to be rewritten.
| ChrisRackauckas wrote:
| Lots of things are being rewritten. Remember we just released
| a new neural network library the other day, SimpleChains.jl,
| and showed that it gave about a 10x speed improvement on
| modern CPUs with multithreading enabled vs Jax Equinox (and
| 22x when AVX-512 is enabled) for smaller neural network and
| matrix-vector types of cases
| (https://julialang.org/blog/2022/04/simple-chains/). Then
| there's Lux.jl fixing some major issues of Flux.jl
| (https://github.com/avik-pal/Lux.jl). Pretty much everything
| is switching to Enzyme which improves performance quite a bit
| over Zygote and allows for full mutation support
| (https://github.com/EnzymeAD/Enzyme.jl). So an entire machine
| learning stack is already seeing parts release.
|
| Right now we're in a bit of an uncomfortable spot where we
| have to use Zygote for a few things and then Enzyme for
| everything else, but the custom rules system is rather close
| and that's the piece that's needed to make the full
| transition.
| adgjlsfhk1 wrote:
| I don't think it needs a rewrite as much as careful
| maintenance from people who have time to dedicate to software
| quality. Most of the APIs are good, it's just that a lot of
| the code is under-tested and doesn't receive enough love.
| Having more big companies using Julia would help a lot with
| that.
| amkkma wrote:
| Hi Keno,
|
| Thanks for the honest assessment. Do you have any thoughts
| about correctness/ composability of compiler transforms like
| AD, reliability of GPU acceleration and predictability of
| optimizations? (basically what you've discussed in some of your
| compiler talks).
|
| How is that going to be possible in an imperative language?
| Right now we have lux.jl, which is a pure by convention DL
| framework, but that ends up being jax without the TPUs, kernel
| fusion, branching (Lux relies on generated functions) and copy
| elision (though this last part is being worked on IIUC).
|
| A bunch of folks in the ML, Probprog and fancy array space have
| been grappling with things like generated functions, type level
| programming and such, and were wondering about future
| directions in this space:
| https://julialang.zulipchat.com/#narrow/stream/256674-compil...
| there among other discussions
|
| Edit: re : bandwidth issue Jan Vitek's group is thinking a lot
| about the verification vs flexibility tradeoff and some people
| are working on a trait/ static typing system. Maybe something
| can be done to help them along?
| KenoFischer wrote:
| > Thanks for the honest assessment. What about correctness/
| composability of compiler transforms like AD, reliability of
| GPU acceleration and predictability of optimizations?
| (basically what you've discussed in some of your compiler
| talks).
|
| I don't think we really have a good answer yet, but it's
| actively being worked on. That said, I don't think we can be
| faulted for that one, because I don't think anybody really
| has a good answer to this particular design problem. There's
| a lot of new ground being broken, so some experimentation
| will be required.
|
| > TPUs, kernel fusion, branching (Lux relies on generated
| functions) and copy elision (though this last part is being
| worked on IIUC).
|
| We have demonstrated that we can target TPUs. Kernel fusion
| is a bit of an interesting case, because julia doesn't really
| use "kernels" in the same way that the big C++ packages do.
| If you broadcast something, we'll just compile the "fused"
| kernel on the GPU, no magic required. There is still
| something remaining, which is that when you're working on the
| array level, you want to be able to do array-level
| optimization, which we currently don't really do (though
| again, the TPU work showed that we could), but is broadly
| being planned.
|
| > Edit: re : bandwidth issue Jan Vitek's group is thinking a
| lot about the verification vs flexibility tradeoff and some
| people are working on a trait/ static typing system. Maybe
| something can be done to help them along?
|
| We work closely with them of course, so I think there'll be
| some discussions there, but it's a very tough design problem.
| amkkma wrote:
| Glad to hear it's being worked on!
|
| > That said, I don't think we can be faulted for that one,
| because I don't think anybody really has a good answer to
| this particular design problem.
|
| Agreed! To be clear, If there's any implication of "fault"
| it was certainly not in a moral sense or even anything
| around making poor design decisions. Julia's compiler is
| being asked to do many new things with semantics that
| necessarily predated many advances in PL.
|
| Re Kernel fusion, there's another piece here, which you may
| or many not have included in "array-level optimizations".
| Julia's "just write loops" ethos is awesome, until you get
| to accelerators...now we're back to an "optimizer defined
| sub language" as TKF puts it. People like loops and
| flexibility, Dex, Floops.jl, Tullio, Loopvec and KA.jl show
| that it's possible to retain structure and emit
| accelerator-able loopy code. But none of those, except for
| dex, has a solution for fusing kernels that rely on loops.
| I'm still using the concept of Kernels, because there's
| still a bit of a separation between low level CUDA.jl
| code/these various DSLs and higher level array code, even
| if not as stark as python or C++.
|
| Would be really cool, if like Dex, there's a plan to fuse
| these sorts of structured loops as well. Dex does it by
| having type level indexing and loop effects (they're
| actually moving to a user defined parallel effect handler
| system (https://arxiv.org/abs/2110.07493) ...the latter can
| tell the compiler when it's safe to parallelize and
| fuse+beta reduce loops. But that relies on structured
| semantics/effects and a higher level IR than exists in
| Julia.
|
| Not sure what a Julian solution would look like, if
| possible. But given the usability wins, it would be great
| to have in Julia as well.
| celrod wrote:
| > But none of those, except for dex, has a solution for
| fusing kernels that rely on loops.
|
| The LV rewrite will. Some day, I'd like to have it target
| accelerators, but unlike fusion, I've not actually put
| any research/engineering into it so can't make any
| promises.
|
| But my long term goal is that simple loops in ->
| optimized anything you want out. Enzyme also deserves a
| shout out for being able to generate reverse mode AD
| loops with mutation.
| amkkma wrote:
| to add, as you know, this is part of a more general problem
| about type level programming vs write your own compiler vs
| the non composability of DSLs, where Julia folks in various
| other non ML domains like PPLs and fancy arrays have been
| wondering about how to do things that get compiled away,
| without relying on compiler heuristics or generated
| function blowups: https://julialang.zulipchat.com/#narrow/s
| tream/256674-compil...
|
| Another non ML example I discussed with some Probprog folks
| is that there was an arxiv review of PPLs and Julian ones
| that heavily rely on macros don't compose well within and
| across packages. The same mechanism for composability which
| Dex uses for parallelism and AD (effect handlers) is what
| new gen PPLs in jax and Haskell are using for composable
| transformable semantics, so maybe that's worth looking
| into.
|
| We've been having some discussions about how to bring that
| to Julia, but stalled on engineering time and PL knowledge.
| Eventually wanted to talk to the core team about it with
| proposal in hand, but never got there. Let me know if you'd
| like to talk to some of those folks who have been involved
| in the discussions as you design the new compiler plugin
| infra.
|
| https://julialang.zulipchat.com/#narrow/stream/256674-compi
| l...
| tqaky wrote:
| The tweet to Elon is hilarious. Using Julia in safety critical
| systems is the funniest thing I've heard all year. Same for
| Python: There are reasons apart from performance that you use
| C/C++ in the final product ...
| nomilk wrote:
| > tons of C++/C engineers needed ... educational background is
| irrelevant, but all must pass hardcore coding test.
|
| https://twitter.com/elonmusk/status/1224182478501482497?lang...
| harpiaharpyja wrote:
| I'm guessing you've never used Python for a serious project
| before, because your statement is incorrect. Aside from the
| fact that Python is memory safe, my experience has been that it
| is far easier to avoid logic errors in Python than in C/C++.
|
| Equivalent Python code is shorter and less noisy than C/C++,
| and Python makes it easier to create succinct and simple to use
| abstractions. The result is that code is far easier to audit.
| Furthermore, my experience has been that it is much easier to
| create unit and integration tests in Python, which means that
| people actually do it and/or test more comprehensively.
|
| My experience has been that if the performance cost of Python
| is no object then there is a correctness benefit over C/C++ for
| a disciplined programmer. I suggest you gain some more
| experience with both languages before making an offhand
| comparison.
| Rayhem wrote:
| > ...there is a correctness benefit over C/C++ for a
| disciplined programmer.
|
| Businesses want cheap programmers, not disciplined
| programmers. It's a great utopia if everyone just follows the
| best practices and does it perfectly all the time, but I
| guarantee that as margins become thin people writing, say,
| code for autonomous vehicles are doing the _very_ minimum to
| meet correctness requirements.
| cozzyd wrote:
| In my experience it's much harder to write Python that won't
| crash than C/C++ that won't crash. With Python, rarely taken
| code paths can have very dumb errors in them (e.g.
| accidentally introducing a new variable with a different name
| than the variable you wanted or typoing a method name, or
| passing arguments with the wrong type) that would get caught
| by a C/C++ compiler but won't be caught in Python until the
| crash happens some months later. Typically in "reliable" code
| you're not going to do much dynamic allocation anyway, at
| least past the initialization stage (because memory
| fragmentation!) so memory errors aren't that common anyway.
|
| Yes, probably Rust and Ada provide even more guarantees...
| snicker7 wrote:
| C/C++ memory errors are literally responsible for 70% of
| ALL critical security bugs. Most python "dumb errors" can
| be caught by linters. Memory is infinitely harder.
|
| Source -- https://msrc-
| blog.microsoft.com/2019/07/16/a-proactive-appro...
| cozzyd wrote:
| The linter can't tell that the method doesn't exist until
| runtime, can it? If so, it wouldn't accept lots of valid
| python... Also, what linter should I use?
|
| Either way, not all software needs to be secure to an
| adversary (since it's running behind several firewalls
| and if the attacker has shell access, then it's already
| game over).
| buescher wrote:
| And Python is implemented as a large C program.
| mirekrusin wrote:
| 66% python, 32% C - for cpython.
|
| 94% python, 5% C - for pypy.
| buescher wrote:
| cpython is the reference implementation and 32% means
| about 350,000 lines of code. My point stands.
| hprotagonist wrote:
| like dropping memory safety! oh, wait...
| clarle wrote:
| Genuinely curious (I don't work with low level systems day-to-
| day), but why would C/C++ be better for safety?
|
| I would imagine that the risk of bugs related to memory access
| and data racing would be pretty high, compared to a higher-
| level language or something like Rust that focuses on safety.
| buescher wrote:
| Tooling and experience. In principle, there are other
| languages that might be better than C or C++ for safety-
| critical software. You know all the hoary jokes about the
| difference between theory and practice?
| CRConrad wrote:
| > You know all the hoary jokes about the difference between
| theory and practice?
|
| Only in theory; do you have a practical example?
| buescher wrote:
| Can't a guy leave anything as an exercise to the reader
| anymore?
| pfortuny wrote:
| AFAIK (and I am not an expert) when they use C/C++ in
| automotive systems (in the automotive part, not elsewhere)
| they might tend to use deterministic programming (f.e. fixed-
| length arrays, no allocations, etc.) which eliminate most of
| the bugs.
|
| But, again, I am not an expert and Toyota's sudden
| acceleration problem was said to be caused by bad C
| programming (said by some, not by others).
| docandrew wrote:
| I thought it was caused by loose floormats.
| mcv wrote:
| Is C/C++ really such a better choice in safety-critical
| systems? It's notorious for having all sorts of buffer
| overflows and memory issues on unexpected input.
| my123 wrote:
| MISRA C or MISRA C++ are used, enforcing much stricter
| guarantees than what the C or C++ specs provide.
| cestith wrote:
| If you really want safety, use Rust or Ada.
| cb321 wrote:
| Or Nim [1]..kind of an Ada with Lisp macros and more
| Pythonesque surface syntax.
|
| All three are ahead-of-time compiled/less REPL friendly than
| Julia, though. Taking more than 100 milliseconds to compile
| can be a deal breaker for some, especially in exploratory
| data analysis/science settings where the mindset is "try out
| this idea..no wait, this one..oops, I forgot a -1" and so on.
| In my experience, it's unfortunately hard to get scientist
| developers onboard with "wait for this compile" workflows.
|
| [1] https://nim-lang.org/
| adgjlsfhk1 wrote:
| Even if you take, 100ms to compile, in practice, it takes
| about 2 seconds since it requires the user to type in
| another command and register that it has finished.
| ninjin wrote:
| I am a long-time member of the Julia community and had a
| discussion with the author about these issues a long time ago -
| but did not give feedback on the post. Let me first state that
| Yuri is a great person and was a valuable member of the
| community. He pushed the boundaries of the language and produced
| some very nice packages in his time. His concerns are genuine and
| should be respected and discussed in that context.
|
| Also, let me say that encountering these kinds of bugs is not
| something I have had experience with. _But_ , I tend to be very
| conservative with my usage of libraries and fancy composition.
|
| If I had more experience with programming language theory and
| implementation, perhaps I would have a better name to describe
| the source of the issues described. My attempt is to call it
| "type anarchy". The way I see it, there is not a clear way to
| assign responsibility for correctness. In the case of the array
| used in the post, is it the fault of the implementer of the `sum`
| function (without a type signature, as it should be) or
| implementer of the data structure? I am honestly not sure. But as
| Julia breaks news ground with its type system and multiple
| dispatch, this could very much be an open question.
| adolph wrote:
| Oftentimes people describe languages as "Turing complete" but how
| often do they talk about languages being "Godel incomplete?"
| Another way of stating maybe is "Are what some call flaws what
| others call features?"
|
| https://stackoverflow.com/questions/7284/what-is-turing-comp...
|
| https://plato.stanford.edu/entries/goedel-incompleteness/
| mattkrause wrote:
| Even fairly simple arithmetic is incomplete, so unless the
| language is _heavily_ restricted, allowing only multiplication
| of positive integers (x)or addition of natural numbers, they
| 're all going to be incomplete.
| randyzwitch wrote:
| Not specific to specific examples in the article, I think some of
| the things people perceive as "bugs" other people see as features
| or an opportunity to correct past mistakes.
|
| I can remember an example where I suggested automatic treatment
| of missing values in a stats library, and the library maintainer
| disagreed. Meaning, my lobbying for Julia to do what R/Python did
| was seen as "Yes, but that's wrong and we shouldn't promote that
| sort of treatment". As a business user, I didn't care that it was
| theoretically wrong, the maintainer as an academic did.
|
| That ends up becoming open-source prerogative. I could do it
| wrong "on my own time" in my own code...doesn't make either a
| bug, but a different choice based on perspective.
| hzhou321 wrote:
| Think about programing layers:
| A->B->C->D->...->Compiler->binary->output, where A is the end
| programmer, and B, C, D are the libraries and modules. I think
| what the article describes is not much different from issues in
| any complicated software systems, as quite a few comments also
| pointed out. However, when the language become more expressive
| and compiler become more clever, more of the issues will be
| rooted from the the compiler->binary link. I think this is
| inevitable with the current model of how software works, which I
| can simplify as: A -> [super compiler] -> output
|
| The middle part is the concatenation of all the middle links and
| handles the complexity necessary to translate from language to
| output. As we trying to make A less complex, the middle [super
| compiler] will get more complex, and more buggy because of the
| complexity.
|
| I believe the fundamental issue with this model is the lack of
| feedback. A feedback on output, and A makes change (in A) until
| output get correct. With the big complex and opaque middle, for
| one, we can't get full feedback on output -- that is the
| correctness issue. The more complex the middle gets, the less
| coverage the testing can achieve. For two, even with clear
| feedback -- a bug -- A cannot easily fix it. The logic from A to
| output is no longer understandable.
|
| I believe the solution is to abandon the pursuit of magic
| solution of A -> [super compiler] -> output but to focus on how
| to get feedback from every link in
| A->B->C->D->...->compiler->binary->output
|
| For one this give A a path to approach and handle complexity. A
| can choose to check on B or C or ... directly on output,
| depending on A's understanding and experience. For the least, A
| can point fingers correctly.
|
| For two, this provides a path to evolve the design. The initial
| design on which handles which or how much complexity is no longer
| crucial. Each link, from A, to B, to C, ... to compiler can
| adjust and shift the complexity up and down, and eventually
| settle down to a system that fits the problem and team.
|
| I believe this is how natural language works. Initially A tells B
| to "get an apple" and they directly feedback on the end result of
| what apple B gets to A and may alter layer of A by expanding into
| more details until it gets the right result. Then, some of the
| details will be handled by B and A can feed back on B's
| intermediate response for behavior. As the world gets more
| complex, the complexity at the layer A stays finite but we added
| middle layers. Usually, A only need feedback on its immediate
| link (B) and the final output, but B needs to be able to feedback
| on its next immediate link, and if A is capable, A may choose to
| cut-out the one of his middle man.
| freemint wrote:
| As a huge fan of Julia i got to fully agree. Although i would
| probably not "no longer recommend Julia" but "give huge caveats
| when mentioning Julia". Organisations (that includes those who
| maintain programming language) have values Bryan Cantrill has an
| excellent talk on this https://youtu.be/2wZ1pCpJUIM and i got to
| agree with the author that correctness (especially correctness
| under arbitary composability) is not a value that Julia teaches
| and instills in its users. Some Julia users care about this, some
| core maintainers do to (as the Pkg3 demonstrates). However there
| are many invocations (SafeTestSets vs Test) and stumbling blocks.
| I am aware of no efforts to do formal verification on Julia code.
| There are no good ways to move certain Run-Time to compile
| errors. Correctness is not a value of the Julia language. Here is
| the good thing though, as Bryan demonstrates in his talk, you can
| hire for values.
| isaacimagine wrote:
| So it seems Julia's multiple dispatch (dynamic dispatch for any
| function based on argument types) has a flaw: namely, if the
| types used do not match assumptions present in the implementation
| of the function (e.g. arrays start at 1), the results may be
| silently incorrect. Julia's multiple dispatch is really cool, but
| I'm not sure how this issue can be prevented in practice (without
| a lot of added verbosity). It'd be a pity to have to restrict
| yourself to a small set of types you know work with the functions
| you're using, because multiple dispatch is one of Julia's killer
| features.
| hprotagonist wrote:
| Ouch. That sounds all the more damning for the authors studious
| care to calmly describe instead of angrily rant.
|
| I've spent too much time in research working on codebases that
| feel like quicksand -- you never know what changing something
| might do!-- to want to worry about that for stdlib or major
| package ecosystems, too.
| ble wrote:
| This article contains no instances of the word "test", which
| seems surprising but entirely in keeping with the author's
| observations.
|
| > Julia has no formal notion of interfaces, generic functions
| tend to leave their semantics unspecified in edge cases, and the
| nature of many common implicit interfaces has not been made
| precise (for example, there is no agreement in the Julia
| community on what a number is).
|
| > The Julia community is full of capable and talented people who
| are generous with their time, work, and expertise. But systemic
| problems like this can rarely be solved from the bottom up, and
| my sense is that the project leadership does not agree that there
| is a serious correctness problem. They accept the existence of
| individual isolated issues, but not the pattern that those issues
| imply.
|
| It sounds like the cultural standard for writing libraries is,
| "works good enough for users like me" which should be good if you
| are using things the same way as the authors. Writing good tests
| for numerics is hard and grueling; testing numerics or numerics-
| like code is not nearly as fun or productive-feeling as using
| numerics to get shit done, so it all makes sense to me.
| pankgeorg wrote:
| I feel this post is a bit unfair and quite outdated (seems like
| it's written 9-12 months ago), and I interpret his issue as a
| prioritization issue, not a language one. If your priorities
| mandate a more mature ecosystem, you should use one. The Julia
| ecosystem is much smaller - both in terms of people and
| development invested, than Python, Java or JavaScript, and still
| overperforms in many aspects of computing. If those aspects,
| where Julia is first-of-class, are not your priorities, and your
| fault tolerance is very low, maybe another tool is better for
| you.
|
| Also, as every ecosystem, the Julia Ecosystem will naturally see
| some packages come and go. JSON3 is the third approach to reading
| JSON (and it's terrific). HTTP.jl is the reference HTTP
| implementation - Julia hasn't had it's `requests.py` moment. Web
| frameworks have also been immature, python has had `Django`,
| `pyramid`, `flask` and so many others before `FastAPI` (along
| with new language features) came and dominated. Some people need
| to put effort in attempts that will naturally hit a dead end
| before we have a super polished and neat FastAPI.jl, and the same
| goes for everything.
|
| Also, https://github.com/JuliaLang/julia/issues/41096 is
| referenced with a wrong name that involves the issue's author's
| misunderstanding, can you update please and, if possible, add a
| note about the edit?
| vasili111 wrote:
| Does the Python have similar issues?
| IshKebab wrote:
| I tried Julia but the compilation time for interactive use was
| just too insane.
|
| I ended up paying PS125 for MATLAB. Nothing else really remotely
| compares to MATLAB's plotting facilities.
| forgotpwd16 wrote:
| Did you tried Octave, GNU's numerical package that is
| compatible to MATLAB?
| IshKebab wrote:
| Of course! The language implementation is decent and the GUI
| is promising, except for the most important feature of the
| GUI - the plot viewer, which is completely awful. Forget
| about the same league, it's not even playing the same game as
| MATLAB.
| [deleted]
| DNF2 wrote:
| I use Matlab daily, and the plotting is indeed excellent.
|
| But the language itself is a horrible kludgy mess. Most of the
| development time is spent on input parsing and contorting your
| code into a vectorized shape.
| Sporktacular wrote:
| This is a pity. It seems like a great language and I'd be keen to
| dive in more, but it seems fair to expect a math/numerical
| analysis-oriented language to be especially dependable wrt
| correctness.
|
| I remember a claim made by Mathworks about MATLAB and wondering
| if it wasn't far fetched, but if true I appreciate it: "A team of
| MathWorks engineers continuously verifies quality by running
| millions of tests on the MATLAB code base every day."
| https://www.mathworks.com/products/matlab/why-matlab.html#re...
| cbkeller wrote:
| I actually wouldn't be surprised if the total number of tests
| run in the Julia ecosystem wasn't too different (thousands of
| packages with typically hundreds to thousands of unit tests,
| run on every commit and PR) -- virtually every Julia package
| has CI set up (at least standalone unit tests, though many
| packages could use more integration tests). Of course, in
| neither Matlab nor Julia do tests guarantee correctness.
| Sporktacular wrote:
| Is that tests for the purpose of verifying correctness or
| tests of applications that will flag problems incidentally?
| I'm not too familiar, but like the idea of dedicating
| resources to that specifically.
|
| Guarantees aside, does MATLAB have an issue with this to the
| same extent as Julia?
| cbkeller wrote:
| Personally I'd probably categorize most unit tests as
| verifying correctness (but only for the scenarios tested);
| integration tests may be more useful for finding incidental
| issues that you wouldn't have thought to test for directly.
| I'm for sure on board with dedicating more resources to
| testing -- and in my case as an academic, this is something
| I only have really been exposed to as a result of
| interacting with the Julia community.
|
| Matlab is pretty mature at this point, but I'm sure it's
| had its share of bugs over the years as well (especially if
| you also counted the file exchange, which is probably the
| closest thing they have to an open source package
| ecosystem); it would be interesting to compare the two at a
| similar level of maturity / development person-hours if
| quantitative data could be found.
| adgjlsfhk1 wrote:
| sample size of 1, but I've run 1 billion tests today in Julia
| (floating point power for Float16, Float32 and Float64)
| [deleted]
| blindseer wrote:
| Correctness in Julia feels like it'll never happen, because
| interfaces seem like they'll never happen.
|
| Correctness guarantees / interfaces and slow startup are both my
| biggest pain points in Julia.
|
| I often think what would happen if every Julia dev just dropped
| the language and used Rust instead. A scientific ecosystem in
| Rust would be amazing.
| jakobnissen wrote:
| As someone who really likes both Rust and Julia, there is
| absolutely no way Julia's scientific users would switch to a
| static language. Rust is slow to write, verbose, also suffers
| from long compile times, has no REPL or garbage collector... It
| is deeply unsuitable for scientific coding.
___________________________________________________________________
(page generated 2022-05-16 23:00 UTC)