[HN Gopher] JavaScript for Data Science
___________________________________________________________________
JavaScript for Data Science
Author : mrmagoo17
Score : 126 points
Date : 2021-04-25 08:50 UTC (14 hours ago)
(HTM) web link (js4ds.org)
(TXT) w3m dump (js4ds.org)
| czep wrote:
| To address some of the skepticism about when and where javascript
| would be appropriate in data science, would you want to fit a
| logistic regression model in javascript? Probably not, but to
| build a solver that takes model outputs and visualizes the
| changes in predicted probabilities based on different
| combinations of variables? This is definitely where javascript
| would make sense. Visualization, dashboards, reporting, and
| exploratory analysis are all ripe domains for developing rich
| responsive UIs. Basically, any layer where you have a data-to-
| human interface can be leveraged with javascript.
|
| There is a lot of great work happening in this space already. In
| the R world for example, shiny makes heavy use of js to the point
| that you often can't tell where R code ends and javascript
| begins. Plotly's Dash provides bindings for R, Python, and Julia.
| Personally, as a data scientist, I have been excitedly learning
| React because it really rips the landscape wide open for all the
| use cases I mentioned above. It then makes sense to have
| libraries that give JS users a good data model and can do _most_
| of the same numerical computation that we 'd be doing in other
| languages. Again, you probabaly don't want to do serious
| numerical work in js, but remember people said that about Python
| ten years ago too.
|
| I love the framing of this book, because I want more data
| scientists to start thinking about the presentation of data and
| spark some bits of ingenuity to make datasets and model outputs
| accessible to non-data scientists. Data scientists should be the
| ones writing the tools that interface data with humans because of
| their domain knowledge. But this is a different skillset and
| usually the work of SW engineers. Of course engineers can also
| have great data intuition too, but I really do encourage data
| scientists to develop their front end skills, it's well worth it.
| brianzelip wrote:
| Just putting this out there: stdlib - a standard library for js,
| https://stdlib.io/.
| m00dy wrote:
| well, I was expecting training a neural network with web-assembly
| through gpu support in its last chapter :)
| la_fayette wrote:
| Data science is not a standardized term, however I don't get what
| specifically makes this text relevant for the domain of data
| science... For some data science projects one could surely use
| javascript, however in mamy cases one misses important libraries,
| for purposes such as statistical analysis, data manipulation,
| machine learning, ...
| mark_l_watson wrote:
| I thought of writing a Javascript + tensor flow.js + NLP + web
| scraping + linked data + etc. book about a year ago.
| tensorflow.js is especially very cool: well documented with great
| examples. In fact, it was the great tensor flow.js examples and
| demos that convinced me to not write the book because I didn't
| feel like I could do much value add on that subject.
| Rainymood wrote:
| Really cool but no one needs this... as a data scientist learning
| javascript, teach me how to run data science models using
| javascript! That's where the real gold is... I'm even thinking of
| writing articles about this myself... JS is great for making
| things more tangible and interactive
| splithalf wrote:
| Data scientists are the new webmasters.
| qntty wrote:
| Could you elaborate?
| jason0597 wrote:
| Why on earth would you want to use JavaScript for Data Science?
| nesarkvechnep wrote:
| Because some people are monoglots :(
| bambam24 wrote:
| Because nobody wants to use Java?
| talolard wrote:
| As a data scientist who does more frontend, I think this is a
| really valuable concept. Hello by users/stakeholders engage with
| our work is the way to push it forward in the org and a dash of
| frontend can do wonders for getting that message across. It's
| wonderful that people are making resources about the frontend for
| data scientists
| beforeolives wrote:
| I know that data science is a broad and somewhat vague term but
| this - We will cover: Core features
| of modern JavaScript Programming with callbacks and
| promises Creating objects and classes
| Writing HTML and CSS Creating interactive pages with
| React Building data services Testing
| Data visualization Combining everything to create a
| three-tier web application
|
| - this isn't data science.
| d--b wrote:
| Well the problem with "data science" is that it costs a shit
| ton of money but rarely integrates into anything. A book about
| wiring data science models into real user facing application
| maybe isn't data science, but sure is useful...
| jhbadger wrote:
| The book does cover a lot of basic Javascript material, as its
| target is actual natural scientists who may not have much
| experience with the language, but towards the end it does cover
| things like Data-Forge (which is a data science library in
| Javascript)
| zitterbewegung wrote:
| It's more like Presenting and Serving Models using Javascript
| for Data Science.
| bryanrasmussen wrote:
| nobody ever writes books assuming you know how to use the
| language, I suppose it decreases customer base.
| HWR_14 wrote:
| It decreases the amount of boilerplate "how to program in X"
| text you have to write. Producing text, especially novel
| text, is expensive in a non-fiction book.
| rapfaria wrote:
| While understanble, I hate this. "Here's 100 pages of python
| before we get to the good stuff", which ends up not even
| being good.
|
| Publishers should just offer a free e-book of said language,
| and make it a requirement.
| [deleted]
| 11235813213455 wrote:
| _out of context_ it 's not data science
| tharne wrote:
| I don't see the point of this. You already have a ubiquitous,
| easy-to-learn, high-level language that's great for data science,
| it's called python. If you're a JavaScript developer who wants to
| get into data science but are too lazy to learn python, you
| probably weren't that interested in data science in the first
| place.
|
| Python definitely has some problems, but if you were going to
| have a new lingua franca for data science, it would probably be
| something like Julia, certainly not JavaScript.
| slt2021 wrote:
| hard pass.
|
| even python is not used for data science, all heavy lifting is
| done in C/fortran, and python is just a glue
| bambam24 wrote:
| We run an experiment. We hired 4 Java developers all senior. And
| 1 Fullstack Javascript developer. Gave them the same tasks
| without telling them. The result: We got a Userinterfacd, aws
| serverless, and scalable infra within a week the task is
| comoleted by Single Javascript developer. And when we ask whats
| the status to 4 senior Java developer, they say they are still
| designing "thinking how to do it" At the end if second week, they
| were still sturggling with Gradle and supporting authenticafion.
|
| And what they designed was to run k8s with EKS etc. Luckily they
| are no working in our company anymore.
| danpalmer wrote:
| I don't want to repeat the old and tired JavaScript hate, but
| this just isn't a great idea.
|
| I'd suggest that there are 3 important primitives for data
| science: flexible numeric types, fast math/algorithm libraries,
| and data manipulation being easy.
|
| JavaScript doesn't really have any of these. Numbers are 64bit
| floats only - no integers, no big numbers. There aren't
| equivalents to Numpy/Pandas/Scikit Learn, and the lack of
| standard library and expressiveness in data manipulation in the
| language makes basic tasks harder.
|
| JavaScript has its uses, but there's really no reason to force
| data science be one of them.
| bryanrasmussen wrote:
| the reason to force data science is that same as the reason to
| develop libraries in languages for tasks which that language
| might otherwise seem not well suited to, that there is a large
| userbase of the language who know how to use it and would like
| to explore using that language for doing other things than it
| is normally used for. You may of course suggest that they
| should just learn a new language, but the history of computing
| shows that solutions for using languages to new purposes they
| might not seem suited for happens whenever such a purpose
| arises.
| nosianu wrote:
| > _Numbers are 64bit floats only - no integers, no big
| numbers._
|
| That is not true. BigInt has been available for a bit already.
|
| MDN: https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...
|
| Availability: https://caniuse.com/bigint
|
| I don't want to argue for or against using JS for "data
| science" (I myself used R for that but I use JS a lot for other
| things), just a clarification on this one concrete claim.
| mschuetz wrote:
| > That is not true. BigInt has been available for a bit
| already.
|
| performance-wise, BigInts are terrible. Tried to use them,
| made things about a hundred times slower. What JS needs are
| 64 bit integer types, and some form of typing system that
| allows differentiating between various number types.
| felixfbecker wrote:
| Genuine question: I imagine most data science things
| involve arrays of numbers, not just single numbers. JS has
| UInt8Array, i.e. it does kinda have integers if you want
| them in an array anyway. Can that speed things up?
| [deleted]
| spion wrote:
| The JIT that understands what number type you want and
| switches between 31 bit ints and doubles when assumptions
| are violated without big performance loss. Something
| similar is likely possible with bigints and 64bit ints
| v8dev123 wrote:
| Well, nowadays you can use WASM with JS to access libraries at
| near native speed.
| RobinL wrote:
| You can get a long way nowadays with Arquero[0] and
| Observable[1]. Arquero allows columnar based data storage and
| processing, with a grammar of data processing verbs similar to
| e.g. dplyr. Not as fast as vectorized computations in e.g.
| Python or R, but faster than has previously been possible.
|
| I'm not suggesting these are the first tools you'd reach for
| for data science in production, but I've found them extremely
| useful for prototyping, experimenting with algorithms, and
| visualization. I think it's got to the stage they should be
| seriously considered for some types of relatively simple data
| processing work due to their ease of deployment.
|
| [0]https://github.com/uwdata/arquero
| [1]https://observablehq.com/
| jwilber wrote:
| On point 3 - I had to implement a logistic regression model in
| js recently and implementing all of the required math methods
| (eg dot product, transpose, vectorized addition, etc.) were
| actually super easy with js's functional array utilities.
| clircle wrote:
| js doesn't have a glm library?
| jwilber wrote:
| js does have a glm library.
| spion wrote:
| Other than the fact we have BigInts now, we also have
|
| * tensorflowjs, which runs on GPUs
| https://www.tensorflow.org/js and
|
| * danfo, which aims to be a pandas equivalent for JS:
| https://danfo.jsdata.org/
|
| Given the powerful interactive visualisation capabilities
| available in JS, its only a matter of time until JS becomes a
| serious contender IMO.
| mschuetz wrote:
| > Other than the fact we have BigInts now
|
| performance-wise, BigInts are terrible. Tried to use them,
| made things about a hundred times slower.
| spion wrote:
| That's typical with most JS features, it takes some time
| for engine performance optimizations to catch up with them.
| In this particular case I suppose things are moving slower
| than expected, but with demand increasing prioritization
| will take place.
| slver wrote:
| JavaScript has plenty of libraries covering the basics. Here a
| few:
|
| https://github.com/nicolaspanel/numjs
|
| https://www.npmjs.com/package/fast-math
|
| https://smartbear.com/de/blog/2013/four-serious-math-librari...
|
| That's not the problem. The problem is mindshare and network
| effects. When analyzing why Python is used one way and JS
| another we're tempted to retroactively rationalize this with
| something fundamental about the language. There's nothing
| fundamental about it. It's just happenstance. Python was around
| longer as a general purpose script, and it filled that niche.
| JS is relatively new as a script outside the browser.
| brylie wrote:
| The first repo has one core contribitor who hasn't been
| active since June 2018.
|
| https://github.com/nicolaspanel/numjs/graphs/contributors
|
| I sincerely believe it is possible for JavaScript to be a
| viable language ecosystem, but there is dire need for
| cohesion, collaboration, and longevity. As it stands, there
| are so many potentially viable projects strewn across the NPM
| landscape like old, discarded toys.
|
| I'm not aware of an initiative, let alone ethos, in the JS
| community that comes anywhere close to something like
| NumFocus.
|
| https://numfocus.org/
| brylie wrote:
| It is worth mentioning the Danfo project from a sibling
| comment: https://danfo.jsdata.org/
| heresie-dabord wrote:
| > repeat the old and tired JavaScript hate, but this just isn't
| a great idea.
|
| There is absolutely nothing wrong with
| coders/analysts/scientists building solutions in any language.
| The "hate" that you mention -- and then proceed to echo -- is a
| narrow way of asserting the superiority of $mylanguage and the
| inferiority of $yourlanguage.
|
| > flexible numeric types, fast math/algorithm libraries, and
| data manipulation
|
| Your point b) is usually written in a performant, compiled
| language, and your point c) can be built from robust primitives
| in any language. However, I will add a point d) about speed and
| memory usage.
|
| I do data analysis with the simplest set of performant tools:
| sqlite, bash-awk-sed-grep, Perl, Python, C++, SVG, and a
| browser to render. Any kind of glorified REPL beyond a terminal
| creates fragile complexity and dependency Hell.
|
| My kit doesn't include Node.js or ECMAscript but I'm willing to
| open my mind enough to think it might, one day. The current
| tooling for data analysis (or "data science" if we want to be
| faddish) is a mess and I look forward to better tools in the
| future.
| nsonha wrote:
| the only thing that used to be a problem is the number type.
| Libraries are ecosystem problem, not inherent to the language.
| lhnz wrote:
| You should absolutely read "JavaScript and the next decade of
| data programming" by Ben Schmidt [1] before outright saying
| that it wouldn't be a great idea.
|
| JavaScript does have integers (e.g. `Uint8Array`) and it also
| has big numbers (e.g. `BigInt`). It's true that there's not yet
| an equivalent to Numpy/Pandas/Scikit yet, but POCs show that it
| will be possible to create such a thing and that we will be
| able to use the WebGPU API to access higher performance than is
| available using Python [2].
|
| I'm not saying that it will definitely happen, but why not?
|
| [1] http://benschmidt.org/post/2020-01-15/2020-01-15-webgpu/
|
| [2] https://github.com/milhidaka/webgpu-blas
| RedShift1 wrote:
| There is decimal.js but yes it's not going to be fast.
| 11235813213455 wrote:
| I don't see JS as less powerful than Python for data science,
| it's faster than Python, or can use bindings just like Python.
| JS is maybe less commonly used than Python in data science
| nowadays, but I wouldn't be surprised if this changes in next
| years. There are equivalent libs like tensorflow-core, there
| are native features like BigInt, and there are libs for 64bits
| floats (decimal.js, big.js). I'd be glad to spend some time
| converting Scikit-learn into JS and also show you how
| expressive JS actually is, if you show me some Python code,
| I'll translate it
| danpalmer wrote:
| The fact that the number support isn't part of the language
| is Linda the problem though.
|
| When you're writing data science code, the value is in the
| answer more than the process of getting to that answer.
| Anything that complicates that gets in the way. This is why
| things like Pandas are so popular despite having some
| questionable engineering. Using a library for big number
| support, having to get that to play nicely with other
| libraries, it all goes against the aims.
|
| Now for data engineering it's very different. I wouldn't
| choose JS myself, but it's a much more reasonable choice. For
| engineering the process by which you get the answer matters
| far more - is it scalable, testable, repeatable, etc. Having
| to use a library for big number support is fine.
|
| It's two very different ways of working and I'm still fairly
| convinced that JS is not conducive to the former.
| uryga wrote:
| > libs for 64bits floats (decimal.js, big.js)
|
| both of those libraries are for arbitrary precision decimals,
| not floats.
| __jem wrote:
| If it's arbitrary precision, what's the difference, besides
| slightly more bookkeeping on your end?
| tyingq wrote:
| >it's faster than Python
|
| Is that generally true for data science type tasks, though,
| where the "fast" in python is really numpy, pandas, etc?
|
| >or can use bindings just like Python
|
| But there's not really anything like numpy/pandas for it to
| bind to at the moment, is there? Meaning anything as broad in
| functionality, fast, mature, etc.
| genrez wrote:
| I am a noob to Javascript, so if someone knows better, than
| please correct me about this, but arrow functions aren't meant to
| replace normal function syntax, right? From [1], it seems like
| the main point of arrow syntax is to allow you to inherit the
| "this" parameter if you are inside a method. Meanwhile, you need
| normal function syntax if you are creating a constructor, making
| a method function for a prototype, or making generator functions.
| (I didn't even know javascript had generator functions until just
| now :))
|
| So it seems a bit weird to me that they advocate using arrow
| function syntax instead of the regular syntax. They seem to be
| advocating using the new class syntax instead, so I guess they
| don't need the constructor or method creation features of the
| normal syntax, but I still don't see why they would specifically
| advocate for arrow function syntax. Is it faster? They say it
| interferes with other features, but which features?
|
| [1] https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...
| __jem wrote:
| Not changing `this` is a huge benefit that shouldn't be
| ignored. Especially when you're programming in a more
| functional style, it makes sense to default to arrow functions
| because you never want to engage in `this` shenanigans anyway.
| So, yes, I'd say it's a pretty common idiom in the JS community
| to replace "normal" function declarations.
| genrez wrote:
| I agree that inheriting the `this` for arrow functions is
| beneficial. To me it seems like you would want to use the
| normal syntax for global functions for hoisting and to
| prevent unintentional re-definitions, the arrow functions
| where you would use lambda functions in other languages, and
| the class method syntax for methods.
|
| side-note: Most of my JS experience is writing userscripts
| for myself, so I definitely do my share of 'this'
| shenanigans.
| ctidd wrote:
| As a heads up since you mentioned "class method syntax",
| methods are one of the most important places to have
| lexical `this` binding in many scenarios.
|
| Take the following example, which is a normal class method:
|
| > alertSum() { alert(this.a + this.b); }
|
| And here we have an arrow function used to create an
| instance method (just an arrow function assigned to a
| property on the instance):
|
| > alertSum = () => { alert(this.a + this.b); }
|
| Then let's say we want to pass the method directly as
| callback:
|
| > this.button.addEventListener('click', this.alertSum)
|
| The first example (class method syntax) won't have the
| necessary `this` context unless it has its context bound to
| the instance through `Function.prototype.bind`. There are
| other patterns to avoid this (e.g. wrapping all callbacks
| in arrow functions when passing them), but it's useful to
| consider that classes methods can easily create confusion
| because that's _exactly where_ someone more used to a
| different language may assume the `this` context is bound
| lexically.
| genrez wrote:
| Excellent point! I can see that getting confusing
| quickly.
|
| Edit: I was confused about how this could work, so I dug
| through [1] for a bit. It appears that for each object of
| that class created, an arrow function will be created on
| that object and its this will indeed be bound to the same
| scope that the constructor function is bound to. This is
| really cleaver and I applaud whoever thought it up!
|
| It is interesting to note that this creates a new arrow
| function on each object as opposed to the normal
| definitions which create a single function which is
| stored in the prototype of the class. (its easier to
| check this in a browser's dev console then it is to
| decode the spec)
|
| This would suggest that one should use different
| approaches for different types of objects: It makes sense
| to use arrow functions for "resource" or "actor" objects,
| of which there are few but they may have callback
| functions. It makes sense to use normal method
| definitions for "plain old data", of which there may be
| many, (which would make the arrow functions too
| expensive) but they should not have callback functions.
|
| [1] https://tc39.es/proposal-class-fields/unified.html
| pwdisswordfish0 wrote:
| > This is really cleaver and I applaud whoever thought it
| up!
|
| Not really. It's contortionist and wasteful and one of
| the many reasons why mainstream web apps are one big
| celebration of bloat on a boat.
|
| The neophyte programmers who have turned into expert
| Modern JS programmers are always recommending arrow
| functions like this because they've never actually looked
| at the event listener interface. What happens is they try
| to make things more complicated than they need to be and
| bodge their event registration. So they apply a "fix" by
| doing what they do with everything else: layering on even
| more. "What we need," they say, "are arrow functions."
|
| No.
|
| Go the other way. Approach it more sensibly. You'll end
| up with a fix that is shorter than the answer that the
| cargo cult NPM/GitHub/Twitter programmers give. It's
| familiar to anyone coming from a world with interfaces as
| a language-level construct and therefore knows to _go
| look at the interface definition of the interface that
| you 're trying to implement_.
|
| Make your line for registering an event listener look
| like this: `this.button.addEventListener("click", this)`,
| and change the name of your `addSum` method to
| `handleEvent`. (Read it aloud. The object that we're
| dealing with (`this`) is something that we need to be
| able to respond to clicks, so we have it listen for them.
| Gee, what a concept.) In other words, the real fix is to
| make sure that the thing we're passing in to
| `addEventListener` is... actually an event listener.
|
| This goes over 90% of frontend developers' heads (and
| even showing them this leads to them crying foul in some
| way; I've seen them try to BS their way through the
| embarrassment before) because most of the codebases they
| learned from were written by other people who, like
| themselves, only barely knew what they were doing. Get
| enough people taking this monkey-see-monkey-do approach,
| and from there you get "idioms" and "best practices" (no
| matter whether they were even "good" in the first place,
| let alone best).
| [deleted]
| mLuby wrote:
| I've seen a majority of sources abandon the function keyword
| entirely in favor of const arrow declarations (and shorthand
| method syntax).
|
| FWIW I personally like the function keyword, since it's clear
| what it is to non-JS readers, but primarily because it hoists
| to the top of its file, so unimportant utility functions can
| sit unobtrusively at the end of the file, thereby letting
| readers encounter more important logic earlier in the file.
| [deleted]
| genrez wrote:
| Interesting to know that what the article recommends is
| indeed the industry standard. I'd forgotten about hoisting
| until you brought it up!
| temp8964 wrote:
| They use data-forge.js, which has less stars than danfo.js.
|
| I can't find any benchmark how they compare to data.table or
| pandas.
|
| Without a dominant and high performance data frame library as a
| foundation, I wouldn't even try.
___________________________________________________________________
(page generated 2021-04-25 23:01 UTC)