https://github.com/nomic-ai/deepscatter Skip to content Toggle navigation Sign up * Product + Actions Automate any workflow + Packages Host and manage packages + Security Find and fix vulnerabilities + Codespaces Instant dev environments + Copilot Write better code with AI + Code review Manage code changes + Issues Plan and track work + Discussions Collaborate outside of code + Explore + All features + Documentation + GitHub Skills + Blog * Solutions + For + Enterprise + Teams + Startups + Education + By Solution + CI/CD & Automation + DevOps + DevSecOps + Case Studies + Customer Stories + Resources * Open Source + GitHub Sponsors Fund open source developers + The ReadME Project GitHub community articles + Repositories + Topics + Trending + Collections * Pricing [ ] * # In this repository All GitHub | Jump to | * No suggested jump to results * # In this repository All GitHub | Jump to | * # In this organization All GitHub | Jump to | * # In this repository All GitHub | Jump to | Sign in Sign up {{ message }} nomic-ai / deepscatter Public * Notifications * Fork 19 * Star 490 Zoomable, animated scatterplots in the browser that scales over a billion points License View license 490 stars 19 forks Star Notifications * Code * Issues 28 * Pull requests 7 * Actions * Projects 0 * Security * Insights More * Code * Issues * Pull requests * Actions * Projects * Security * Insights nomic-ai/deepscatter This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository. main Switch branches/tags [ ] Branches Tags Could not load branches Nothing to show {{ refName }} default View all branches Could not load tags Nothing to show {{ refName }} default View all tags Name already in use A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch? Cancel Create 7 branches 3 tags Code * Local * Codespaces * Clone HTTPS GitHub CLI [https://github.com/n] Use Git or checkout with SVN using the web URL. [gh repo clone nomic-] Work fast with our official CLI. Learn more. * Open with GitHub Desktop * Download ZIP Sign In Required Please sign in to use Codespaces. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching GitHub Desktop If nothing happens, download GitHub Desktop and try again. Launching Xcode If nothing happens, download Xcode and try again. Launching Visual Studio Code Your codespace will open once ready. There was a problem preparing your codespace, please try again. Latest commit @bmschmidt bmschmidt fix custom color schemes ... 97119ce Mar 14, 2023 fix custom color schemes 97119ce Git stats * 317 commits Files Permalink Failed to load latest commit information. Type Name Latest commit message Commit time dist docs src tests .eslintrc .gitignore .prettierignore .prettierrc CODE_OF_CONDUCT.md LICENSE README.md build.sh clifford.html index-simplest-way-to-start.html index.html integers.html package-lock.json package.json playwright.config.ts release_notes.md tsconfig.json vietnam2.html vite.config.cjs View code [ ] Deep Scatterplots for the Web Examples Get help Quick start Importing the module. Running locally. Your own data. Build the module Code strategy Future codebase splits. API Implemented aesthetics. Planned Jitter Principles README.md Deep Scatterplots for the Web This is an evolving library for displaying more points than are ordinarily possible over the web. It's fast for three reasons: 1. All data is sent in the Apache Arrow feather format, in a custom quadtree format that makes it possible to only load data as needed on zoom. Feather takes no time to parse in the browser once transferred, compresses pretty well, and can be directly copied to the GPU without transformation in JS. This is the way of the future. 2. Most rendering is done in custom layers using WebGL, with a buffer management strategy handled by REGL. This means that there are no unnecessary abstractions around points or separate draw calls for different objects; a minimum number of buffers are attached for the needed points. 3. Almost all grammar-of-graphics transforms such are handled on the GPU, which allows for interpolated transitions with calculations done in parallel. It also runs in completely static settings, so you can host a million-point scatterplot over something like Github Pages. Examples * 1 million+ documents from arxiv.com rendered inside an Observable notebook. (Ben Schmidt) * Every person in the 2010 and 2020 US Censuses displayed in an interactive svelte-kit app. (Ben Schmidt) * Newspaper Articles at the Library of Congress from the Reconstruction Era. (By Andromeda Yelton while in residency at the Library of Congress). Get help Github issues, even low quality ones, are welcom here. There is also a dedicated Deepscatter Slack which you are welcome to join. I came into doing this stuff from a very non-technical background and welcome people to join with naive questions. Quick start Importing the module. See the arxiv example above to see some basic examples. Running locally. First, install the companion tiling library, which is written in python, and generate a million points of test data in tiles of 50000 apiece. python3 -V # requires Python 3.9.x or 3.10.x python3 -m pip install git+https://github.com/bmschmidt/quadfeather quadfeather-test-data 1_000_000 quadfeather --files tmp.csv --tile_size 50_000 --destination tiles Then setup this library to run. It will start a local dev server. npm i npm run dev If you go to localhost:3344, you should see an interactive scatterplot. To dig into what you're seeing, open index.html. (In 2021, this development site works in Chrome, not Safari or Firefox, because it uses ES6 module syntax inside the webworker. The distributed version of the module should work in all browsers.) Your own data. 1. Create a CSV, parquet, or feather file that has columns called 'x' and 'y'. (Or a feather file that has columns x, y). Any other columns (categorical information, etc.) can be included as additional columns. 2. Tile it: cd deepscatter # if you're not already there quadfeather --files ../some-path-to/your-data.csv --tile_size 50000 --destination tiles 3. Assuming your dataset has an x and y column and the tiles folder is in the root directory of this project, you can see the data visualized by running npm run dev and opening http://localhost:3345/index-simplest-way-to-start.html in your browser. To edit the visualization, or to troubleshoot, look at the file index-simplest-way-to-start.html, where you should find a bare-bones implementation of deepscatter. Explore index.html, and render it at http://localhost:3345/ index.html, for a more advanced example. Note: Ideally, in a future release you'll be able to create these specs in away that doesn't require coding JSON directly. Build the module npm run build will create an ES module at dist/deepscatter.es.js The mechanics of importing this are very slightly different than index.html. Note that this is an ESM module and so requires you to use See index_prod.html for an example This is currently bundled with vite and rollup. There is/will be a further interaction layer on top of it, but the core plotting components are separate and should work as a standalone layer that supports plot requests via an API. Code strategy Any interaction logic that changes the API call directly does not belong in this library. The only interaction code here is for zooming and interacting with points. Future codebase splits. The plotting components and the tiling components are logically quite separate; I may break the tiling strategy into a separate JS library called 'quadfeather'. Apache Arrow would still be a necessary intermediate format, but it could be generated from CSV files using, say, arquero or a WASM port of DuckDB. API This is still subject to change and is not fully documented. The encoding portion of the API mimics Vega-Lite with some minor distinctions to avoid deeply-nested queries and to add animation and jitter parameters. { encoding: { "x": { "field": "x", "transform": "literal" }, "color": { "field": "year", "range": "viridis", "domain": [1970, 2020] } } Implemented aesthetics. 1. x 2. y 3. size 4. jitter_radius: size of jitter. API subject to change. 5. jitter_speed: speed of jitter. API subject to change. 6. color (categorical or linear: range can call color scales explicitly, or accepting any d3-color name.) 7. x0 (for animations; transitions between x0 and x) 8. y0 (for animations; transitions between y0 and y) 9. filter. (Filtering is treated as an aesthetic operation by this library.) Planned 1. Symbol (Mapping of categorical variables to single unicode points in a single font; probably 255 max.) 2. Label (Full-text label) 3. Image (Like PixPlot) Jitter Jitter is a little overloaded with features right now, but some are quite fun. jitter method is set on 'method' key of the 'jitter_radius' field. Possible values are: 1. circle 2. spiral 3. time 4. normal Principles 1. This is a 2d library. No fake 3d. 2. The central zoom state is handled by d3-zoom. 3. Use the zoom state to render other layers on top of Deepscatter by hooking in (note on_zoom is directly set, not passed in via prefs): const scatterplot = new Scatterplot('#deepscatter'); scatterplot.on_zoom = (transform) => {...} About Zoomable, animated scatterplots in the browser that scales over a billion points Topics visualization webgl data-visualization Resources Readme License View license Code of conduct Code of conduct Stars 490 stars Watchers 12 watching Forks 19 forks Report repository Releases 3 2.7.1 Latest Feb 8, 2023 + 2 releases Packages 0 No packages published Used by 3 * @slowdh @slowdh / moviemap * @davidnmora @davidnmora / lyric-viz * @bmschmidt @bmschmidt / scattershots Contributors 8 * @bmschmidt * @AndriyMulyar * @davidnmora * @bstadt * @thatandromeda * @DonIsaac * @dependabot[bot] * @gerstej9 Languages * TypeScript 70.8% * GLSL 17.7% * HTML 7.7% * JavaScript 3.8% Footer (c) 2023 GitHub, Inc. Footer navigation * Terms * Privacy * Security * Status * Docs * Contact GitHub * Pricing * API * Training * Blog * About You can't perform that action at this time. You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session.