[HN Gopher] Getting to the bottom of web map performance
___________________________________________________________________
Getting to the bottom of web map performance
Author : hampelm
Score : 80 points
Date : 2021-10-12 15:30 UTC (7 hours ago)
(HTM) web link (bdon.org)
(TXT) w3m dump (bdon.org)
| BbzzbB wrote:
| Offtopic, but while I may have nerdy GISers gathering here I'll
| shoot my shot.
|
| Someone knows what is a common/standard way of displaying vector
| information (points in this case) without having that information
| scrape-able? Is that even possible? I have a project where I want
| to sell some georeferenced data and be able to show said data
| (without all attributes, but showing all points) to potential
| customers in a map, but IIRC I've never failed to scrape vector
| web map data so I'm not sure it's even possible. I could imprint
| them in the tile rasters but then they wouldn't be interactable.
| Thanks
|
| (if offtopic ain't permitted on HN just delete this, sorry)
|
| Edit: So many insightful answers, thank you so much for the
| pointers HN, love y'all... and sorry OP for piggybacking.
| gregsadetsky wrote:
| You could bake the points into the tiles and then have a web
| service that answers requests on mouse clicks i.e. when a user
| clicks one of the points, you send the coordinates to the
| server and the server determines if there's a point (or
| multiple) nearby and sends back the related attributes.
|
| You'd also add some rate limiting on the server-side so that
| someone couldn't easily request from your server all attributes
| for point 0,0 then 0,1 then 0,2 etc.
|
| If someone was very determined to break this, they could make a
| screenshot and manually triangulate each point (depending how
| many there are -- if you're hiding 10 points, don't bother. If
| it's 1k or more, that'd be harder to do manually) or even use
| computer vision/pixel color thresholding to extract points
| (say, red pixels). Same thing for the attributes, they could
| always use different IP addresses to break any IP-based rate
| limit.
|
| In response to that, you could force users to authentify (and
| use recaptcha during signup) to minimize the IP-rotation
| problem.
| BbzzbB wrote:
| Love this idea, great way to have a raster be interactive
| without keeping point coordinates client-side. I do assume
| that someone very motivated and knowledgeable will always
| find a way to scrape it, I just have to make it hard enough
| that it's not worth the hassle over just buying the
| (reasonably priced) product. Thank you
| tyingq wrote:
| You could limit the window by encrypting the data stream with
| some time-limited "long enough for a demo" key. The can open a
| debugger of course.
| BbzzbB wrote:
| Thanks for that suggestion!
| londons_explore wrote:
| You could quantize the coordinates, making them still look
| great for viewing on a screen, but insufficiently accurate for
| whatever the scrapers want to use them for.
| BbzzbB wrote:
| True, or in the same vein limiting free access to clustered
| data. Thanks!
| 3np wrote:
| I'd just choose some small subset of your dataset as a sample
| dataset, expose just that, and accept that some users will
| scrape and make use of that.
|
| If this really isn't satisfactory, the best you can do is non-
| interactive video, I think.
|
| For your actual paying customers, you control it via contract
| rather than DRM.
|
| (Worked with a data business, not specifically GIS)
| BbzzbB wrote:
| Thanks for your input! I'll be trying to implement hardly-
| scrapeable data as a learning opportunity (as I'm working
| towards coding + GIS career), but settling for a subset of
| data + demonstrative video would be a satisfactory compromise
| if the former turns out to be unachievable or overkill.
| arthurcolle wrote:
| Haha one of the vesseltracking sites used .swf for the longest
| time to be able to not allow you to grab the points
| corresponding to the ships. You could probably prerender images
| but that's a bad solution. Good question, I'm sure someone else
| will have a better idea of the latest state-of-the-art for
| this.
| BbzzbB wrote:
| Thanks for the input.
|
| >I'm sure someone else will have a better idea of the latest
| state-of-the-art for this.
|
| More than I could've hoped for!
| moritonal wrote:
| Google Maps has the same issue. Their trick is to pass a ton of
| the data as protobuf, then decrypt it in WASM and load it into
| WebGL to render and interact with.
|
| Whilst all a massive pain, you can still scrape it with raw obj
| dumps from the GPU. So it's always a & game.
| hampelm wrote:
| Nope, not that I've seen -- if the browser gets the data,
| anyone can get the data. You can make it more difficult, and
| the defenses I've seen vary based on the type of data. Here are
| a few:
|
| - Limit the geography of the sample
|
| - Use raster tiles at far zooms and switch to vector at close
| zooms for interactivity. Combine this with a limit on the
| number of tiles an unauthenticated user can consume to make
| mass downloading more difficult.
|
| - Have data that changes frequently enough that a one-time
| scrape decreases in value pretty quickly
|
| - Only share the real value at scale with authenticated
| customers. The real value might be the geometries or the
| attributes or the combo pack.
|
| - Trust that most serious customers will prefer to pay to work
| with you rather than abscond with the data. Those that are
| willing to put work into scraping it probably won't pay anyways
|
| Really, the last point is the key. You want to have a data
| product where most consumers want to create a legitimate
| business relationship with you. My opinion is if your potential
| customers don't fit in this bucket, there likely are deeper
| concerns, and if most do, you'll be fine! Aka sweat getting the
| first customer rather than blocking the first scraper.
| BbzzbB wrote:
| Thank you for the suggestions, especially that last one - I
| may be "paranoid" considering the target audience are
| professionals, I really should focus on the product rather
| than the fence (besides some basic defense so the points
| aren't in plain JSON).
| hampelm wrote:
| If you really want to go hog wild, you could use a system
| where tokens with a short expiry are used to authenticate
| requests even when users aren't logged in. You'd combine that
| with rate limits + IP-level bans for when active or expired
| tokens are overused. I would say that's total overkill for
| 99% of services though.
| Doctor_Fegg wrote:
| You could serve them in vector tiles, which are served
| protobuf-encoded. It's still fairly easily scrapeable (get the
| URL via the browser's Network tab, run through vt2geojson) but
| would probably deter the casual scraper.
| BbzzbB wrote:
| Thanks I'll look into that. Deterring the casual scraper
| would be the goal basically, make them work enough that it's
| not worth the hassle in respect to the price for legitimate
| access as a motivated and technical person with a lot of time
| would always get to extract data which is shown client-side.
| Chyzwar wrote:
| You could try to "encrypt" data and use Mapbox
| expressions/frontend transform to decrypt. Point coordinates
| will be randomly shifted, you will send in a separate request
| as Wasm/J module to reposition features on the map. Wasm module
| could call to Mapbox expression to reposition points.
|
| This would make it very hard to scrape. If someone scrape
| vector data without reverse engendering decryption module, then
| will get incorrect data. You just make sure that wasm module
| obfuscated.
|
| Not sure if mapbox expression can perform change of
| coordinates. But there might be different ways to transform
| vector data on frontend.
|
| https://docs.mapbox.com/mapbox-gl-js/style-spec/expressions/
| BbzzbB wrote:
| Thank you! I'll dig around Mapbox encrypting sounds
| promising.
| [deleted]
| Chyzwar wrote:
| Only risk that if someone reverse engineer yours wasm/JS
| module, they will still get the data from that point of
| time. The above approach is good if data changes over time.
| I think this should stop most people.
| lmeyerov wrote:
| Rendering 100K-10M lines quickly is interesting!
|
| Now that WebGL2 is (almost) universal, any sense of which
| techniques make more sense now? Still no geometry shaders afaict,
| but maybe there is something else in there. We currently
| tessellate ahead of time - even streaming in from GPUs on our
| server - but that's not how we'd do it with say raw OpenGL. Maybe
| there's something else close enough now?
|
| (If you're into that kind of thing, plz email build@graphistry :D
| )
| have_faith wrote:
| Interesting article. It's been a few years since I've had a play
| with this kind of stuff; what's the best option available for
| someone to host custom styled vector maps and display them in a
| performant way? preferably on an open source stack. I haven't
| used MapBox before as I was worried about costs and being locked
| in. I'm interested in experimenting with map interaction/UX.
| Rebelgecko wrote:
| There's some forks of pre-closed source Mapbox
| TOMDM wrote:
| One of these is called MapLibre for the curious.
|
| https://maplibre.org/
|
| https://github.com/maplibre/maplibre-gl-js
|
| I've used it on a smaller project, it worked well for me.
| hampelm wrote:
| The author has been been building an open-source map rendering
| stack -- here's an intro blog post on that from April this
| year: https://protomaps.com/blog/new-way-to-make-maps
| bezossucks wrote:
| Leaflet for simpler use cases or OpenLayers for more power
|
| Recent versions of both are semi-fast, enough for general use
| KronisLV wrote:
| Correct me if i'm wrong, but isn't OpenLayers
| (https://openlayers.org/) mostly just a client side library
| for the displaying of maps, much like Leaflet
| (https://leafletjs.com/)?
|
| To have your own tile server, you'd probably want something
| like OpenMapTiles (https://openmaptiles.org/) or another
| alternative like Tilemaker (https://tilemaker.org/).
| pininja wrote:
| deck.gl is another open source rendering option with a
| TileLayer, TerrainLayer, and MVTLayer.
|
| Other libraries mentioned have better text label and styling
| support out of the box compared to deck, so typically people do
| interleaved WebGL rendering with deck.gl and other basemap
| libraries to get a beautiful base and a super performant deck
| overlay.
|
| Tile hosting is still typically a paid service from someone,
| though COGS and S3 are a self-hosting option.
|
| I primarily work on libraries adjacent to deck.gl, happy to
| answer questions.
| [deleted]
___________________________________________________________________
(page generated 2021-10-12 23:01 UTC)