[HN Gopher] Getting to the bottom of web map performance
       ___________________________________________________________________
        
       Getting to the bottom of web map performance
        
       Author : hampelm
       Score  : 80 points
       Date   : 2021-10-12 15:30 UTC (7 hours ago)
        
 (HTM) web link (bdon.org)
 (TXT) w3m dump (bdon.org)
        
       | BbzzbB wrote:
       | Offtopic, but while I may have nerdy GISers gathering here I'll
       | shoot my shot.
       | 
       | Someone knows what is a common/standard way of displaying vector
       | information (points in this case) without having that information
       | scrape-able? Is that even possible? I have a project where I want
       | to sell some georeferenced data and be able to show said data
       | (without all attributes, but showing all points) to potential
       | customers in a map, but IIRC I've never failed to scrape vector
       | web map data so I'm not sure it's even possible. I could imprint
       | them in the tile rasters but then they wouldn't be interactable.
       | Thanks
       | 
       | (if offtopic ain't permitted on HN just delete this, sorry)
       | 
       | Edit: So many insightful answers, thank you so much for the
       | pointers HN, love y'all... and sorry OP for piggybacking.
        
         | gregsadetsky wrote:
         | You could bake the points into the tiles and then have a web
         | service that answers requests on mouse clicks i.e. when a user
         | clicks one of the points, you send the coordinates to the
         | server and the server determines if there's a point (or
         | multiple) nearby and sends back the related attributes.
         | 
         | You'd also add some rate limiting on the server-side so that
         | someone couldn't easily request from your server all attributes
         | for point 0,0 then 0,1 then 0,2 etc.
         | 
         | If someone was very determined to break this, they could make a
         | screenshot and manually triangulate each point (depending how
         | many there are -- if you're hiding 10 points, don't bother. If
         | it's 1k or more, that'd be harder to do manually) or even use
         | computer vision/pixel color thresholding to extract points
         | (say, red pixels). Same thing for the attributes, they could
         | always use different IP addresses to break any IP-based rate
         | limit.
         | 
         | In response to that, you could force users to authentify (and
         | use recaptcha during signup) to minimize the IP-rotation
         | problem.
        
           | BbzzbB wrote:
           | Love this idea, great way to have a raster be interactive
           | without keeping point coordinates client-side. I do assume
           | that someone very motivated and knowledgeable will always
           | find a way to scrape it, I just have to make it hard enough
           | that it's not worth the hassle over just buying the
           | (reasonably priced) product. Thank you
        
         | tyingq wrote:
         | You could limit the window by encrypting the data stream with
         | some time-limited "long enough for a demo" key. The can open a
         | debugger of course.
        
           | BbzzbB wrote:
           | Thanks for that suggestion!
        
         | londons_explore wrote:
         | You could quantize the coordinates, making them still look
         | great for viewing on a screen, but insufficiently accurate for
         | whatever the scrapers want to use them for.
        
           | BbzzbB wrote:
           | True, or in the same vein limiting free access to clustered
           | data. Thanks!
        
         | 3np wrote:
         | I'd just choose some small subset of your dataset as a sample
         | dataset, expose just that, and accept that some users will
         | scrape and make use of that.
         | 
         | If this really isn't satisfactory, the best you can do is non-
         | interactive video, I think.
         | 
         | For your actual paying customers, you control it via contract
         | rather than DRM.
         | 
         | (Worked with a data business, not specifically GIS)
        
           | BbzzbB wrote:
           | Thanks for your input! I'll be trying to implement hardly-
           | scrapeable data as a learning opportunity (as I'm working
           | towards coding + GIS career), but settling for a subset of
           | data + demonstrative video would be a satisfactory compromise
           | if the former turns out to be unachievable or overkill.
        
         | arthurcolle wrote:
         | Haha one of the vesseltracking sites used .swf for the longest
         | time to be able to not allow you to grab the points
         | corresponding to the ships. You could probably prerender images
         | but that's a bad solution. Good question, I'm sure someone else
         | will have a better idea of the latest state-of-the-art for
         | this.
        
           | BbzzbB wrote:
           | Thanks for the input.
           | 
           | >I'm sure someone else will have a better idea of the latest
           | state-of-the-art for this.
           | 
           | More than I could've hoped for!
        
         | moritonal wrote:
         | Google Maps has the same issue. Their trick is to pass a ton of
         | the data as protobuf, then decrypt it in WASM and load it into
         | WebGL to render and interact with.
         | 
         | Whilst all a massive pain, you can still scrape it with raw obj
         | dumps from the GPU. So it's always a & game.
        
         | hampelm wrote:
         | Nope, not that I've seen -- if the browser gets the data,
         | anyone can get the data. You can make it more difficult, and
         | the defenses I've seen vary based on the type of data. Here are
         | a few:
         | 
         | - Limit the geography of the sample
         | 
         | - Use raster tiles at far zooms and switch to vector at close
         | zooms for interactivity. Combine this with a limit on the
         | number of tiles an unauthenticated user can consume to make
         | mass downloading more difficult.
         | 
         | - Have data that changes frequently enough that a one-time
         | scrape decreases in value pretty quickly
         | 
         | - Only share the real value at scale with authenticated
         | customers. The real value might be the geometries or the
         | attributes or the combo pack.
         | 
         | - Trust that most serious customers will prefer to pay to work
         | with you rather than abscond with the data. Those that are
         | willing to put work into scraping it probably won't pay anyways
         | 
         | Really, the last point is the key. You want to have a data
         | product where most consumers want to create a legitimate
         | business relationship with you. My opinion is if your potential
         | customers don't fit in this bucket, there likely are deeper
         | concerns, and if most do, you'll be fine! Aka sweat getting the
         | first customer rather than blocking the first scraper.
        
           | BbzzbB wrote:
           | Thank you for the suggestions, especially that last one - I
           | may be "paranoid" considering the target audience are
           | professionals, I really should focus on the product rather
           | than the fence (besides some basic defense so the points
           | aren't in plain JSON).
        
           | hampelm wrote:
           | If you really want to go hog wild, you could use a system
           | where tokens with a short expiry are used to authenticate
           | requests even when users aren't logged in. You'd combine that
           | with rate limits + IP-level bans for when active or expired
           | tokens are overused. I would say that's total overkill for
           | 99% of services though.
        
         | Doctor_Fegg wrote:
         | You could serve them in vector tiles, which are served
         | protobuf-encoded. It's still fairly easily scrapeable (get the
         | URL via the browser's Network tab, run through vt2geojson) but
         | would probably deter the casual scraper.
        
           | BbzzbB wrote:
           | Thanks I'll look into that. Deterring the casual scraper
           | would be the goal basically, make them work enough that it's
           | not worth the hassle in respect to the price for legitimate
           | access as a motivated and technical person with a lot of time
           | would always get to extract data which is shown client-side.
        
         | Chyzwar wrote:
         | You could try to "encrypt" data and use Mapbox
         | expressions/frontend transform to decrypt. Point coordinates
         | will be randomly shifted, you will send in a separate request
         | as Wasm/J module to reposition features on the map. Wasm module
         | could call to Mapbox expression to reposition points.
         | 
         | This would make it very hard to scrape. If someone scrape
         | vector data without reverse engendering decryption module, then
         | will get incorrect data. You just make sure that wasm module
         | obfuscated.
         | 
         | Not sure if mapbox expression can perform change of
         | coordinates. But there might be different ways to transform
         | vector data on frontend.
         | 
         | https://docs.mapbox.com/mapbox-gl-js/style-spec/expressions/
        
           | BbzzbB wrote:
           | Thank you! I'll dig around Mapbox encrypting sounds
           | promising.
        
             | [deleted]
        
             | Chyzwar wrote:
             | Only risk that if someone reverse engineer yours wasm/JS
             | module, they will still get the data from that point of
             | time. The above approach is good if data changes over time.
             | I think this should stop most people.
        
       | lmeyerov wrote:
       | Rendering 100K-10M lines quickly is interesting!
       | 
       | Now that WebGL2 is (almost) universal, any sense of which
       | techniques make more sense now? Still no geometry shaders afaict,
       | but maybe there is something else in there. We currently
       | tessellate ahead of time - even streaming in from GPUs on our
       | server - but that's not how we'd do it with say raw OpenGL. Maybe
       | there's something else close enough now?
       | 
       | (If you're into that kind of thing, plz email build@graphistry :D
       | )
        
       | have_faith wrote:
       | Interesting article. It's been a few years since I've had a play
       | with this kind of stuff; what's the best option available for
       | someone to host custom styled vector maps and display them in a
       | performant way? preferably on an open source stack. I haven't
       | used MapBox before as I was worried about costs and being locked
       | in. I'm interested in experimenting with map interaction/UX.
        
         | Rebelgecko wrote:
         | There's some forks of pre-closed source Mapbox
        
           | TOMDM wrote:
           | One of these is called MapLibre for the curious.
           | 
           | https://maplibre.org/
           | 
           | https://github.com/maplibre/maplibre-gl-js
           | 
           | I've used it on a smaller project, it worked well for me.
        
         | hampelm wrote:
         | The author has been been building an open-source map rendering
         | stack -- here's an intro blog post on that from April this
         | year: https://protomaps.com/blog/new-way-to-make-maps
        
         | bezossucks wrote:
         | Leaflet for simpler use cases or OpenLayers for more power
         | 
         | Recent versions of both are semi-fast, enough for general use
        
           | KronisLV wrote:
           | Correct me if i'm wrong, but isn't OpenLayers
           | (https://openlayers.org/) mostly just a client side library
           | for the displaying of maps, much like Leaflet
           | (https://leafletjs.com/)?
           | 
           | To have your own tile server, you'd probably want something
           | like OpenMapTiles (https://openmaptiles.org/) or another
           | alternative like Tilemaker (https://tilemaker.org/).
        
         | pininja wrote:
         | deck.gl is another open source rendering option with a
         | TileLayer, TerrainLayer, and MVTLayer.
         | 
         | Other libraries mentioned have better text label and styling
         | support out of the box compared to deck, so typically people do
         | interleaved WebGL rendering with deck.gl and other basemap
         | libraries to get a beautiful base and a super performant deck
         | overlay.
         | 
         | Tile hosting is still typically a paid service from someone,
         | though COGS and S3 are a self-hosting option.
         | 
         | I primarily work on libraries adjacent to deck.gl, happy to
         | answer questions.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2021-10-12 23:01 UTC)