[HN Gopher] Show HN: Squey, an open-source GPU-accelerated data ...
       ___________________________________________________________________
        
       Show HN: Squey, an open-source GPU-accelerated data visualization
       software
        
       While we hope you'll find it quite useful already, there is plenty
       of room for improvement so we greatly appreciate your feedback!
        
       Author : jbleonesio
       Score  : 55 points
       Date   : 2024-10-08 08:38 UTC (14 hours ago)
        
 (HTM) web link (squey.org)
 (TXT) w3m dump (squey.org)
        
       | JacobiX wrote:
       | Impressive project, judging by commits and features, it's clear
       | that significant effort has been poured into this :)
       | Unfortunately, there's no specific MacOS installation method
       | provided, unsure if buildable from source ?
        
         | jbleonesio wrote:
         | Thanks for your feedback. Unfortunately there is currently only
         | a Linux build (which happens to also be running under Windows
         | thanks to WSL2) because there is a lot of dependencies[1] to
         | build. Any help to implement a MacOS build would of course be
         | warmly welcomed :)
         | 
         | In the meantime, you can deploy the software from AWS
         | Marketplace[2] and use it through your web browser but note
         | that this is an on-demand paying product.
         | 
         | [1]:
         | https://gitlab.com/squey/squey/-/tree/main/buildstream/eleme...
         | 
         | [2]:
         | https://aws.amazon.com/marketplace/pp/prodview-l363lrih42bhm
        
           | JacobiX wrote:
           | Thank you, I'll look into it more closely, fortunately it
           | builds using CMake / Clang, and cross-platform libs ... might
           | be possible to port it to MacOS after some tweaks.
        
             | jbleonesio wrote:
             | We are using BuildStream to export a flatpak application
             | but the build system is indeed CMake with both Clang and
             | GCC compiling the project without warnings.
             | 
             | Feel free to open an issue[1] on the project repository to
             | further discuss about a MacOS port :)
             | 
             | [1]: https://gitlab.com/squey/squey/-/issues/new
        
       | jmakov wrote:
       | Would be interestimg to see how this compares to
       | hvplot+datashader
        
       | bbor wrote:
       | Very cool, and it's already on version _five_! I'm impressed.
       | Only one question for now, since I'm don't yet have experience
       | with these specific data viz techniques:
       | 
       | Skew-ey? Skoo-ey? Squee?
        
         | jbleonesio wrote:
         | Version five indeed because it already has quite a bit of an
         | history as an ex-proprietary product.
         | 
         | We pronounce it "Skwey" (like in "query") but you can really
         | pronounce it as you wish since its not even an existing word x)
        
       | macros wrote:
       | Neat tool.
       | 
       | Couldn't find anything in the docs on mapping file sources to
       | resource needs on the host, how much is too much data to dump
       | into the tool on a single workstation?
        
         | jbleonesio wrote:
         | Thanks!
         | 
         | It depends on the number of rows/columns and the types of the
         | values, but the application displays a dialog asking you if you
         | want to stop the import before completion when it feels like
         | resources are being exhausted.
         | 
         | The software was specifically developed to be able to handle as
         | much data as possible while remaining responsive so the
         | workstation resources will likely be the bottleneck here.
         | 
         | On my 32GB development machine, I can easily load tens of
         | millions rows with tens of columns.
        
       | jmakov wrote:
       | 10GB, 1TB, 100TB? Memory mapping or does it need to fit into
       | memory (RAM, VRAM?)? Is streaming supported - can I point to a
       | 100TB dataset and cruise through it? 1 parquet file or parquet
       | dataset? What about Delta lake? Are outliers drawn or are you
       | doing some sort of sampling/smoothing? Also would be great to
       | have some comparison to similar tools in this space e.g.
       | https://github.com/finos/perspective and HvPlot+Datashader.
        
         | jbleonesio wrote:
         | Data needs to fit in RAM and graphics in VRAM. Let's say 100GB
         | or more if you filter some rows during import. Data is ingested
         | in a in-house database designed to refresh the ever changing
         | selected rows as quickly as possible to conduct a true
         | investigation. You can load as many parquet files as you want
         | in one go provided they have the same structure. Any outlier in
         | any visual representation will be drawn as this is a
         | requirement to detect weak signals and anomalies
         | 
         | Comparisons with the tools you mentioned would indeed be
         | interesting, writing a blog post would be a good idea I guess!
         | I wrote a comparison with ELK here :
         | https://squey.org/domains/cybersecurity/pentesteracademy-mac...
        
       ___________________________________________________________________
       (page generated 2024-10-08 23:01 UTC)