[HN Gopher] JavaScript for Data Science
       ___________________________________________________________________
        
       JavaScript for Data Science
        
       Author : mrmagoo17
       Score  : 126 points
       Date   : 2021-04-25 08:50 UTC (14 hours ago)
        
 (HTM) web link (js4ds.org)
 (TXT) w3m dump (js4ds.org)
        
       | czep wrote:
       | To address some of the skepticism about when and where javascript
       | would be appropriate in data science, would you want to fit a
       | logistic regression model in javascript? Probably not, but to
       | build a solver that takes model outputs and visualizes the
       | changes in predicted probabilities based on different
       | combinations of variables? This is definitely where javascript
       | would make sense. Visualization, dashboards, reporting, and
       | exploratory analysis are all ripe domains for developing rich
       | responsive UIs. Basically, any layer where you have a data-to-
       | human interface can be leveraged with javascript.
       | 
       | There is a lot of great work happening in this space already. In
       | the R world for example, shiny makes heavy use of js to the point
       | that you often can't tell where R code ends and javascript
       | begins. Plotly's Dash provides bindings for R, Python, and Julia.
       | Personally, as a data scientist, I have been excitedly learning
       | React because it really rips the landscape wide open for all the
       | use cases I mentioned above. It then makes sense to have
       | libraries that give JS users a good data model and can do _most_
       | of the same numerical computation that we 'd be doing in other
       | languages. Again, you probabaly don't want to do serious
       | numerical work in js, but remember people said that about Python
       | ten years ago too.
       | 
       | I love the framing of this book, because I want more data
       | scientists to start thinking about the presentation of data and
       | spark some bits of ingenuity to make datasets and model outputs
       | accessible to non-data scientists. Data scientists should be the
       | ones writing the tools that interface data with humans because of
       | their domain knowledge. But this is a different skillset and
       | usually the work of SW engineers. Of course engineers can also
       | have great data intuition too, but I really do encourage data
       | scientists to develop their front end skills, it's well worth it.
        
       | brianzelip wrote:
       | Just putting this out there: stdlib - a standard library for js,
       | https://stdlib.io/.
        
       | m00dy wrote:
       | well, I was expecting training a neural network with web-assembly
       | through gpu support in its last chapter :)
        
       | la_fayette wrote:
       | Data science is not a standardized term, however I don't get what
       | specifically makes this text relevant for the domain of data
       | science... For some data science projects one could surely use
       | javascript, however in mamy cases one misses important libraries,
       | for purposes such as statistical analysis, data manipulation,
       | machine learning, ...
        
       | mark_l_watson wrote:
       | I thought of writing a Javascript + tensor flow.js + NLP + web
       | scraping + linked data + etc. book about a year ago.
       | tensorflow.js is especially very cool: well documented with great
       | examples. In fact, it was the great tensor flow.js examples and
       | demos that convinced me to not write the book because I didn't
       | feel like I could do much value add on that subject.
        
       | Rainymood wrote:
       | Really cool but no one needs this... as a data scientist learning
       | javascript, teach me how to run data science models using
       | javascript! That's where the real gold is... I'm even thinking of
       | writing articles about this myself... JS is great for making
       | things more tangible and interactive
        
       | splithalf wrote:
       | Data scientists are the new webmasters.
        
         | qntty wrote:
         | Could you elaborate?
        
       | jason0597 wrote:
       | Why on earth would you want to use JavaScript for Data Science?
        
         | nesarkvechnep wrote:
         | Because some people are monoglots :(
        
         | bambam24 wrote:
         | Because nobody wants to use Java?
        
       | talolard wrote:
       | As a data scientist who does more frontend, I think this is a
       | really valuable concept. Hello by users/stakeholders engage with
       | our work is the way to push it forward in the org and a dash of
       | frontend can do wonders for getting that message across. It's
       | wonderful that people are making resources about the frontend for
       | data scientists
        
       | beforeolives wrote:
       | I know that data science is a broad and somewhat vague term but
       | this -                  We will cover:              Core features
       | of modern JavaScript              Programming with callbacks and
       | promises              Creating objects and classes
       | Writing HTML and CSS              Creating interactive pages with
       | React              Building data services              Testing
       | Data visualization              Combining everything to create a
       | three-tier web application
       | 
       | - this isn't data science.
        
         | d--b wrote:
         | Well the problem with "data science" is that it costs a shit
         | ton of money but rarely integrates into anything. A book about
         | wiring data science models into real user facing application
         | maybe isn't data science, but sure is useful...
        
         | jhbadger wrote:
         | The book does cover a lot of basic Javascript material, as its
         | target is actual natural scientists who may not have much
         | experience with the language, but towards the end it does cover
         | things like Data-Forge (which is a data science library in
         | Javascript)
        
         | zitterbewegung wrote:
         | It's more like Presenting and Serving Models using Javascript
         | for Data Science.
        
         | bryanrasmussen wrote:
         | nobody ever writes books assuming you know how to use the
         | language, I suppose it decreases customer base.
        
           | HWR_14 wrote:
           | It decreases the amount of boilerplate "how to program in X"
           | text you have to write. Producing text, especially novel
           | text, is expensive in a non-fiction book.
        
           | rapfaria wrote:
           | While understanble, I hate this. "Here's 100 pages of python
           | before we get to the good stuff", which ends up not even
           | being good.
           | 
           | Publishers should just offer a free e-book of said language,
           | and make it a requirement.
        
             | [deleted]
        
         | 11235813213455 wrote:
         | _out of context_ it 's not data science
        
       | tharne wrote:
       | I don't see the point of this. You already have a ubiquitous,
       | easy-to-learn, high-level language that's great for data science,
       | it's called python. If you're a JavaScript developer who wants to
       | get into data science but are too lazy to learn python, you
       | probably weren't that interested in data science in the first
       | place.
       | 
       | Python definitely has some problems, but if you were going to
       | have a new lingua franca for data science, it would probably be
       | something like Julia, certainly not JavaScript.
        
       | slt2021 wrote:
       | hard pass.
       | 
       | even python is not used for data science, all heavy lifting is
       | done in C/fortran, and python is just a glue
        
       | bambam24 wrote:
       | We run an experiment. We hired 4 Java developers all senior. And
       | 1 Fullstack Javascript developer. Gave them the same tasks
       | without telling them. The result: We got a Userinterfacd, aws
       | serverless, and scalable infra within a week the task is
       | comoleted by Single Javascript developer. And when we ask whats
       | the status to 4 senior Java developer, they say they are still
       | designing "thinking how to do it" At the end if second week, they
       | were still sturggling with Gradle and supporting authenticafion.
       | 
       | And what they designed was to run k8s with EKS etc. Luckily they
       | are no working in our company anymore.
        
       | danpalmer wrote:
       | I don't want to repeat the old and tired JavaScript hate, but
       | this just isn't a great idea.
       | 
       | I'd suggest that there are 3 important primitives for data
       | science: flexible numeric types, fast math/algorithm libraries,
       | and data manipulation being easy.
       | 
       | JavaScript doesn't really have any of these. Numbers are 64bit
       | floats only - no integers, no big numbers. There aren't
       | equivalents to Numpy/Pandas/Scikit Learn, and the lack of
       | standard library and expressiveness in data manipulation in the
       | language makes basic tasks harder.
       | 
       | JavaScript has its uses, but there's really no reason to force
       | data science be one of them.
        
         | bryanrasmussen wrote:
         | the reason to force data science is that same as the reason to
         | develop libraries in languages for tasks which that language
         | might otherwise seem not well suited to, that there is a large
         | userbase of the language who know how to use it and would like
         | to explore using that language for doing other things than it
         | is normally used for. You may of course suggest that they
         | should just learn a new language, but the history of computing
         | shows that solutions for using languages to new purposes they
         | might not seem suited for happens whenever such a purpose
         | arises.
        
         | nosianu wrote:
         | > _Numbers are 64bit floats only - no integers, no big
         | numbers._
         | 
         | That is not true. BigInt has been available for a bit already.
         | 
         | MDN: https://developer.mozilla.org/en-
         | US/docs/Web/JavaScript/Refe...
         | 
         | Availability: https://caniuse.com/bigint
         | 
         | I don't want to argue for or against using JS for "data
         | science" (I myself used R for that but I use JS a lot for other
         | things), just a clarification on this one concrete claim.
        
           | mschuetz wrote:
           | > That is not true. BigInt has been available for a bit
           | already.
           | 
           | performance-wise, BigInts are terrible. Tried to use them,
           | made things about a hundred times slower. What JS needs are
           | 64 bit integer types, and some form of typing system that
           | allows differentiating between various number types.
        
             | felixfbecker wrote:
             | Genuine question: I imagine most data science things
             | involve arrays of numbers, not just single numbers. JS has
             | UInt8Array, i.e. it does kinda have integers if you want
             | them in an array anyway. Can that speed things up?
        
               | [deleted]
        
             | spion wrote:
             | The JIT that understands what number type you want and
             | switches between 31 bit ints and doubles when assumptions
             | are violated without big performance loss. Something
             | similar is likely possible with bigints and 64bit ints
        
         | v8dev123 wrote:
         | Well, nowadays you can use WASM with JS to access libraries at
         | near native speed.
        
         | RobinL wrote:
         | You can get a long way nowadays with Arquero[0] and
         | Observable[1]. Arquero allows columnar based data storage and
         | processing, with a grammar of data processing verbs similar to
         | e.g. dplyr. Not as fast as vectorized computations in e.g.
         | Python or R, but faster than has previously been possible.
         | 
         | I'm not suggesting these are the first tools you'd reach for
         | for data science in production, but I've found them extremely
         | useful for prototyping, experimenting with algorithms, and
         | visualization. I think it's got to the stage they should be
         | seriously considered for some types of relatively simple data
         | processing work due to their ease of deployment.
         | 
         | [0]https://github.com/uwdata/arquero
         | [1]https://observablehq.com/
        
         | jwilber wrote:
         | On point 3 - I had to implement a logistic regression model in
         | js recently and implementing all of the required math methods
         | (eg dot product, transpose, vectorized addition, etc.) were
         | actually super easy with js's functional array utilities.
        
           | clircle wrote:
           | js doesn't have a glm library?
        
             | jwilber wrote:
             | js does have a glm library.
        
         | spion wrote:
         | Other than the fact we have BigInts now, we also have
         | 
         | * tensorflowjs, which runs on GPUs
         | https://www.tensorflow.org/js and
         | 
         | * danfo, which aims to be a pandas equivalent for JS:
         | https://danfo.jsdata.org/
         | 
         | Given the powerful interactive visualisation capabilities
         | available in JS, its only a matter of time until JS becomes a
         | serious contender IMO.
        
           | mschuetz wrote:
           | > Other than the fact we have BigInts now
           | 
           | performance-wise, BigInts are terrible. Tried to use them,
           | made things about a hundred times slower.
        
             | spion wrote:
             | That's typical with most JS features, it takes some time
             | for engine performance optimizations to catch up with them.
             | In this particular case I suppose things are moving slower
             | than expected, but with demand increasing prioritization
             | will take place.
        
         | slver wrote:
         | JavaScript has plenty of libraries covering the basics. Here a
         | few:
         | 
         | https://github.com/nicolaspanel/numjs
         | 
         | https://www.npmjs.com/package/fast-math
         | 
         | https://smartbear.com/de/blog/2013/four-serious-math-librari...
         | 
         | That's not the problem. The problem is mindshare and network
         | effects. When analyzing why Python is used one way and JS
         | another we're tempted to retroactively rationalize this with
         | something fundamental about the language. There's nothing
         | fundamental about it. It's just happenstance. Python was around
         | longer as a general purpose script, and it filled that niche.
         | JS is relatively new as a script outside the browser.
        
           | brylie wrote:
           | The first repo has one core contribitor who hasn't been
           | active since June 2018.
           | 
           | https://github.com/nicolaspanel/numjs/graphs/contributors
           | 
           | I sincerely believe it is possible for JavaScript to be a
           | viable language ecosystem, but there is dire need for
           | cohesion, collaboration, and longevity. As it stands, there
           | are so many potentially viable projects strewn across the NPM
           | landscape like old, discarded toys.
           | 
           | I'm not aware of an initiative, let alone ethos, in the JS
           | community that comes anywhere close to something like
           | NumFocus.
           | 
           | https://numfocus.org/
        
             | brylie wrote:
             | It is worth mentioning the Danfo project from a sibling
             | comment: https://danfo.jsdata.org/
        
         | heresie-dabord wrote:
         | > repeat the old and tired JavaScript hate, but this just isn't
         | a great idea.
         | 
         | There is absolutely nothing wrong with
         | coders/analysts/scientists building solutions in any language.
         | The "hate" that you mention -- and then proceed to echo -- is a
         | narrow way of asserting the superiority of $mylanguage and the
         | inferiority of $yourlanguage.
         | 
         | > flexible numeric types, fast math/algorithm libraries, and
         | data manipulation
         | 
         | Your point b) is usually written in a performant, compiled
         | language, and your point c) can be built from robust primitives
         | in any language. However, I will add a point d) about speed and
         | memory usage.
         | 
         | I do data analysis with the simplest set of performant tools:
         | sqlite, bash-awk-sed-grep, Perl, Python, C++, SVG, and a
         | browser to render. Any kind of glorified REPL beyond a terminal
         | creates fragile complexity and dependency Hell.
         | 
         | My kit doesn't include Node.js or ECMAscript but I'm willing to
         | open my mind enough to think it might, one day. The current
         | tooling for data analysis (or "data science" if we want to be
         | faddish) is a mess and I look forward to better tools in the
         | future.
        
         | nsonha wrote:
         | the only thing that used to be a problem is the number type.
         | Libraries are ecosystem problem, not inherent to the language.
        
         | lhnz wrote:
         | You should absolutely read "JavaScript and the next decade of
         | data programming" by Ben Schmidt [1] before outright saying
         | that it wouldn't be a great idea.
         | 
         | JavaScript does have integers (e.g. `Uint8Array`) and it also
         | has big numbers (e.g. `BigInt`). It's true that there's not yet
         | an equivalent to Numpy/Pandas/Scikit yet, but POCs show that it
         | will be possible to create such a thing and that we will be
         | able to use the WebGPU API to access higher performance than is
         | available using Python [2].
         | 
         | I'm not saying that it will definitely happen, but why not?
         | 
         | [1] http://benschmidt.org/post/2020-01-15/2020-01-15-webgpu/
         | 
         | [2] https://github.com/milhidaka/webgpu-blas
        
         | RedShift1 wrote:
         | There is decimal.js but yes it's not going to be fast.
        
         | 11235813213455 wrote:
         | I don't see JS as less powerful than Python for data science,
         | it's faster than Python, or can use bindings just like Python.
         | JS is maybe less commonly used than Python in data science
         | nowadays, but I wouldn't be surprised if this changes in next
         | years. There are equivalent libs like tensorflow-core, there
         | are native features like BigInt, and there are libs for 64bits
         | floats (decimal.js, big.js). I'd be glad to spend some time
         | converting Scikit-learn into JS and also show you how
         | expressive JS actually is, if you show me some Python code,
         | I'll translate it
        
           | danpalmer wrote:
           | The fact that the number support isn't part of the language
           | is Linda the problem though.
           | 
           | When you're writing data science code, the value is in the
           | answer more than the process of getting to that answer.
           | Anything that complicates that gets in the way. This is why
           | things like Pandas are so popular despite having some
           | questionable engineering. Using a library for big number
           | support, having to get that to play nicely with other
           | libraries, it all goes against the aims.
           | 
           | Now for data engineering it's very different. I wouldn't
           | choose JS myself, but it's a much more reasonable choice. For
           | engineering the process by which you get the answer matters
           | far more - is it scalable, testable, repeatable, etc. Having
           | to use a library for big number support is fine.
           | 
           | It's two very different ways of working and I'm still fairly
           | convinced that JS is not conducive to the former.
        
           | uryga wrote:
           | > libs for 64bits floats (decimal.js, big.js)
           | 
           | both of those libraries are for arbitrary precision decimals,
           | not floats.
        
             | __jem wrote:
             | If it's arbitrary precision, what's the difference, besides
             | slightly more bookkeeping on your end?
        
           | tyingq wrote:
           | >it's faster than Python
           | 
           | Is that generally true for data science type tasks, though,
           | where the "fast" in python is really numpy, pandas, etc?
           | 
           | >or can use bindings just like Python
           | 
           | But there's not really anything like numpy/pandas for it to
           | bind to at the moment, is there? Meaning anything as broad in
           | functionality, fast, mature, etc.
        
       | genrez wrote:
       | I am a noob to Javascript, so if someone knows better, than
       | please correct me about this, but arrow functions aren't meant to
       | replace normal function syntax, right? From [1], it seems like
       | the main point of arrow syntax is to allow you to inherit the
       | "this" parameter if you are inside a method. Meanwhile, you need
       | normal function syntax if you are creating a constructor, making
       | a method function for a prototype, or making generator functions.
       | (I didn't even know javascript had generator functions until just
       | now :))
       | 
       | So it seems a bit weird to me that they advocate using arrow
       | function syntax instead of the regular syntax. They seem to be
       | advocating using the new class syntax instead, so I guess they
       | don't need the constructor or method creation features of the
       | normal syntax, but I still don't see why they would specifically
       | advocate for arrow function syntax. Is it faster? They say it
       | interferes with other features, but which features?
       | 
       | [1] https://developer.mozilla.org/en-
       | US/docs/Web/JavaScript/Refe...
        
         | __jem wrote:
         | Not changing `this` is a huge benefit that shouldn't be
         | ignored. Especially when you're programming in a more
         | functional style, it makes sense to default to arrow functions
         | because you never want to engage in `this` shenanigans anyway.
         | So, yes, I'd say it's a pretty common idiom in the JS community
         | to replace "normal" function declarations.
        
           | genrez wrote:
           | I agree that inheriting the `this` for arrow functions is
           | beneficial. To me it seems like you would want to use the
           | normal syntax for global functions for hoisting and to
           | prevent unintentional re-definitions, the arrow functions
           | where you would use lambda functions in other languages, and
           | the class method syntax for methods.
           | 
           | side-note: Most of my JS experience is writing userscripts
           | for myself, so I definitely do my share of 'this'
           | shenanigans.
        
             | ctidd wrote:
             | As a heads up since you mentioned "class method syntax",
             | methods are one of the most important places to have
             | lexical `this` binding in many scenarios.
             | 
             | Take the following example, which is a normal class method:
             | 
             | > alertSum() { alert(this.a + this.b); }
             | 
             | And here we have an arrow function used to create an
             | instance method (just an arrow function assigned to a
             | property on the instance):
             | 
             | > alertSum = () => { alert(this.a + this.b); }
             | 
             | Then let's say we want to pass the method directly as
             | callback:
             | 
             | > this.button.addEventListener('click', this.alertSum)
             | 
             | The first example (class method syntax) won't have the
             | necessary `this` context unless it has its context bound to
             | the instance through `Function.prototype.bind`. There are
             | other patterns to avoid this (e.g. wrapping all callbacks
             | in arrow functions when passing them), but it's useful to
             | consider that classes methods can easily create confusion
             | because that's _exactly where_ someone more used to a
             | different language may assume the `this` context is bound
             | lexically.
        
               | genrez wrote:
               | Excellent point! I can see that getting confusing
               | quickly.
               | 
               | Edit: I was confused about how this could work, so I dug
               | through [1] for a bit. It appears that for each object of
               | that class created, an arrow function will be created on
               | that object and its this will indeed be bound to the same
               | scope that the constructor function is bound to. This is
               | really cleaver and I applaud whoever thought it up!
               | 
               | It is interesting to note that this creates a new arrow
               | function on each object as opposed to the normal
               | definitions which create a single function which is
               | stored in the prototype of the class. (its easier to
               | check this in a browser's dev console then it is to
               | decode the spec)
               | 
               | This would suggest that one should use different
               | approaches for different types of objects: It makes sense
               | to use arrow functions for "resource" or "actor" objects,
               | of which there are few but they may have callback
               | functions. It makes sense to use normal method
               | definitions for "plain old data", of which there may be
               | many, (which would make the arrow functions too
               | expensive) but they should not have callback functions.
               | 
               | [1] https://tc39.es/proposal-class-fields/unified.html
        
               | pwdisswordfish0 wrote:
               | > This is really cleaver and I applaud whoever thought it
               | up!
               | 
               | Not really. It's contortionist and wasteful and one of
               | the many reasons why mainstream web apps are one big
               | celebration of bloat on a boat.
               | 
               | The neophyte programmers who have turned into expert
               | Modern JS programmers are always recommending arrow
               | functions like this because they've never actually looked
               | at the event listener interface. What happens is they try
               | to make things more complicated than they need to be and
               | bodge their event registration. So they apply a "fix" by
               | doing what they do with everything else: layering on even
               | more. "What we need," they say, "are arrow functions."
               | 
               | No.
               | 
               | Go the other way. Approach it more sensibly. You'll end
               | up with a fix that is shorter than the answer that the
               | cargo cult NPM/GitHub/Twitter programmers give. It's
               | familiar to anyone coming from a world with interfaces as
               | a language-level construct and therefore knows to _go
               | look at the interface definition of the interface that
               | you 're trying to implement_.
               | 
               | Make your line for registering an event listener look
               | like this: `this.button.addEventListener("click", this)`,
               | and change the name of your `addSum` method to
               | `handleEvent`. (Read it aloud. The object that we're
               | dealing with (`this`) is something that we need to be
               | able to respond to clicks, so we have it listen for them.
               | Gee, what a concept.) In other words, the real fix is to
               | make sure that the thing we're passing in to
               | `addEventListener` is... actually an event listener.
               | 
               | This goes over 90% of frontend developers' heads (and
               | even showing them this leads to them crying foul in some
               | way; I've seen them try to BS their way through the
               | embarrassment before) because most of the codebases they
               | learned from were written by other people who, like
               | themselves, only barely knew what they were doing. Get
               | enough people taking this monkey-see-monkey-do approach,
               | and from there you get "idioms" and "best practices" (no
               | matter whether they were even "good" in the first place,
               | let alone best).
        
               | [deleted]
        
         | mLuby wrote:
         | I've seen a majority of sources abandon the function keyword
         | entirely in favor of const arrow declarations (and shorthand
         | method syntax).
         | 
         | FWIW I personally like the function keyword, since it's clear
         | what it is to non-JS readers, but primarily because it hoists
         | to the top of its file, so unimportant utility functions can
         | sit unobtrusively at the end of the file, thereby letting
         | readers encounter more important logic earlier in the file.
        
           | [deleted]
        
           | genrez wrote:
           | Interesting to know that what the article recommends is
           | indeed the industry standard. I'd forgotten about hoisting
           | until you brought it up!
        
       | temp8964 wrote:
       | They use data-forge.js, which has less stars than danfo.js.
       | 
       | I can't find any benchmark how they compare to data.table or
       | pandas.
       | 
       | Without a dominant and high performance data frame library as a
       | foundation, I wouldn't even try.
        
       ___________________________________________________________________
       (page generated 2021-04-25 23:01 UTC)