[HN Gopher] Kart: DVC for geospatial and tabular data. Git for GIS
       ___________________________________________________________________
        
       Kart: DVC for geospatial and tabular data. Git for GIS
        
       Author : starkparker
       Score  : 84 points
       Date   : 2023-10-30 18:22 UTC (4 hours ago)
        
 (HTM) web link (kartproject.org)
 (TXT) w3m dump (kartproject.org)
        
       | ulrischa wrote:
       | Wow cool. I needed something like this 10 years before for a
       | project
        
       | nix0n wrote:
       | Neat!
       | 
       | Does this improve Git's support for large binaries generally, or
       | is it necessary to have introspection into any filetype you want
       | to support?
       | 
       | Is there good interoperability with existing Git repos?
        
         | asabla wrote:
         | Copied from their site:
         | 
         | > Because Kart uses Git for data transfer and storage, you can
         | host a Kart repository anywhere you can host a Git repository -
         | for example, GitHub, Bitbucket...
         | 
         | ref:
         | https://docs.kartproject.org/en/latest/pages/basic_usage_tut...
        
         | UberMouse wrote:
         | >Does this improve Git's support for large binaries generally
         | 
         | No, this still uses LFS for larger binary formats (ie raster or
         | point cloud datasets)
        
         | polemic wrote:
         | Kart repositories are also Git repositories - they're
         | 'interopable' in the sense that there is a lot of tooling that
         | will work, but the storage structure for vector data differs,
         | and using Git on a Kart repository won't work very well.
         | 
         | Kart serializes vector/tablular data into datasets in the
         | repository, and manages the process of writing them out to
         | useful working copies (GeoPackages, or into databases).
         | 
         | For large binaries - rasters and pointclouds - we're using LFS.
         | We include some additional spaital information into pointer
         | files to enable some very useful GIS functionality, like
         | spatially filtered clones (this works for vector data too).
        
       | mjhay wrote:
       | Great to see stuff like this! Data in general, and not just code,
       | gets updated or corrected constantly. Given that data is used
       | collaboratively in a distributed setting, it should be a first-
       | class citizen wrt diffing and merging, just as line-wise text is.
       | Anybody working in geophsyics or other data-heavy scientific
       | fields should see the value in this approach.
        
         | solardev wrote:
         | I was wondering about that. How DOES diffing work with this,
         | like in a Geopackage?
        
           | cyanydeez wrote:
           | I assume it just displays rows edited.
           | 
           | I don't see how they'd display anything other than points.
           | That leaves XYZM for diffing.
           | 
           | They might be showing summary stats like length, perimeter,
           | area, volume, but that's usually not easy to generalize.
        
             | polemic wrote:
             | Hi there,
             | 
             | Kart supports points, lines & polygons, as well as GeoTIFFs
             | for imagery and LAZ for point clouds.
             | 
             | Kart is a CLI tool, but provides fully machine readable
             | outputs. You can use the QGIS plugin to get a visual diff
             | of vector feature changes though.
        
               | solardev wrote:
               | > You can use the QGIS plugin to get a visual diff of
               | vector feature changes though.
               | 
               | This sounds like an amazing opportunity for a screenshot,
               | btw :)
        
       | asabla wrote:
       | This looks super cool! I will for sure be testing this out and
       | keeping an eye out for future additions.
        
       | sccxy wrote:
       | Cool project but homepage needs two things:
       | 
       | * Docs should not be hidden in small font and as disabled link
       | color, make it big button in features list or make features
       | clickable to relevant docs.
       | 
       | * Add some screenshots
       | 
       | I spent way too much time clicking every heading to figure out
       | what is this all about till I found Docs link.
        
         | polemic wrote:
         | Hi! It's neat to see Kart making HN. These are great points,
         | we'll get that Docs link much more visible.
        
         | Mertax wrote:
         | Just saw this, which might be a better home page:
         | https://koordinates.com/products/kart/
        
       | peoplenotbots wrote:
       | I wish all the success for this project, GIS is an under-valued
       | and under served technical system.
        
       | polemic wrote:
       | Hi everyone, I'm Hamish, PM for KartProject here. If you want to
       | learn more about Kart:                 * Our CTO Rob Coup
       | presenting on Kart at FOSS4G 23:
       | https://www.youtube.com/watch?v=1B-HB2Z3Vlc       * Docs are
       | available at https://docs.kartproject.org/en/latest/       * We
       | also have a QGIS plugin! This gives you visual diffs of vector
       | feature changes. https://plugins.qgis.org/plugins/kart/
       | 
       | Happy to answer any questions!
        
       | emj wrote:
       | So how does it work for Openstreetmap? I mean git is not very
       | good at handling large repositories, fsck and all that takes
       | ages. So what performance do you get with an small geographical
       | area?
        
         | polemic wrote:
         | Hi there,
         | 
         | OpenStreetMap has it's own versioning mechanisms (and a fairly
         | specific-to-OSM data model) and Kart isn't really designed to
         | work with OSM data as such. Kart adds version control to the
         | GIS data that planners, academics, architects, civil engineers,
         | etc, use day-to-day. There's a lot of data out there!
         | 
         | "Large" is relative, but Kart works well with quite big vector
         | datasets for these typical use cases. For example, we're
         | regularly working with datasets that have over 2 million
         | features, with a decade of weekly data changes.
         | 
         | Kart includes some feautres specifically for working with small
         | geographic areas. We can spatially filtering cloned data so
         | you're working with a small subset of a much larger dataset,
         | but you still retain the abilityt commit/merge/push to the
         | source repo.
        
       | 1attice wrote:
       | Cool product! How does this compare to (e.g.) dolt, which is
       | pitched as 'git for data'?
        
         | polemic wrote:
         | Dolt is a neat project, but it's tightly coupled to MySQL. Kart
         | supports MySQL as a working-copy format but MySQL has some
         | limitations around geometry support that make it unsuitable for
         | most GIS usage - see our docs for more info:
         | https://docs.kartproject.org/en/latest/pages/wc_types/mysql_...
         | 
         | Kart works with GIS working copies that are more familiar to
         | GIS people - e.g. GeoPackage, Postgres/PostGIS & MSSQL
         | databases. Differenet users can use different working copies,
         | and still collaborate together too.
        
       | everybodyknows wrote:
       | If you're wondering where to find an architecture document, the
       | nearest to such may be:
       | 
       | https://docs.kartproject.org/en/latest/pages/development/tab...
       | 
       | Top takeaway being that it's not just versioned geo feature
       | items, but versioned per-feature formats. Various popular GIS
       | database formats are supported as the "checked-out"
       | representation, analogous to a git local-filesystem tree. Maybe
       | does conversions between standard GIS formats well -- wasn't
       | obvious.
       | 
       | One question I'm left with is performance:
       | 
       | > Every database table row is stored in its own file. ...
        
         | polemic wrote:
         | One of the benefits of building on Git is a lot of people have
         | put a lot of time into make it work _really well_ with lots of
         | objects. And even though we say  "files", Git abstracts that
         | into packfiles etc very efficiently.
         | 
         | So, we're seening pretty good performance. We're maintaining a
         | number of repositories with several millions features, with a
         | decade of weekly updates of ~10,000+ rows. It _does_ take some
         | time to push that data around, but it's _vastly_ better than
         | old ways, and once you have your clone, maintaining updates
         | becomes extremely trivial - a _major_ unsolved problem in the
         | GIS/data world.
         | 
         | I'd add - Kart has GIS specific features that nullify some of
         | these issues. The ability to spatially index the objects, then
         | filtering them on Clone, means I rapidly clone a tiny subset of
         | the data to work with.
        
           | everybodyknows wrote:
           | > The ability to spatially index the objects, then filtering
           | them on Clone, means I rapidly clone a tiny subset of the
           | data to work with.
           | 
           | Okay -- so is the "--depth=N" filtering option to git-clone
           | supported as well? And does it remain useful in the context
           | of Kart applications?
        
             | polemic wrote:
             | Yes, you can do shallow clones with `--depth` as well. This
             | is incredibly useful - it means we can publish massive Kart
             | repositories of spatial data with lots of versioning info,
             | but still allow users to work with small subsets of the
             | most recent changes. Very important for typical GIS use
             | cases.
        
           | Mertax wrote:
           | Is there a public git repo available somewhere that
           | represents a Kart repository?
           | 
           | Are the raw files in the working repository GeoPackages? How
           | is it tracking the changes made inside the geopackages? What
           | happens if it's replaced with an updated copy of the
           | geopackage the was edited via some other application? How
           | does it diff the changes?
        
       | satuke wrote:
       | Have you tried out https://underhive.in/ for this?
        
         | polemic wrote:
         | I haven't, but it looks like it's a Git repo hosting solution?
         | This issue with using Git with data directly, is you generally
         | loose the per-row/feature change information. With common
         | binary GIS data formats, just putting them into Git looses a
         | lot of the utility and will blow out the size of the repo as
         | you apply changes.
         | 
         | Kart gives you row-level tracking, so you can see who made what
         | change & when, and diffs small and fast to apply.
        
       | SOLAR_FIELDS wrote:
       | Why is this going to succeed when something like Geogig never
       | took off? I was super interested in the project at the time of
       | its initial release but it's been dead for years. What did that
       | project fundamentally do wrong vs what Kart is doing? Or is it
       | just a super niche thing?
        
       ___________________________________________________________________
       (page generated 2023-10-30 23:00 UTC)