[HN Gopher] Show HN: My demo for vector embeddings for the Earth...
___________________________________________________________________
Show HN: My demo for vector embeddings for the Earth's surface
Author : ckrapu
Score : 74 points
Date : 2023-09-16 10:31 UTC (1 days ago)
(HTM) web link (www.louisquissetlabs.com)
(TXT) w3m dump (www.louisquissetlabs.com)
| [deleted]
| aaomidi wrote:
| This is amazing!
| throwaway743 wrote:
| Dude, please provide context on the site. I have no clue what I'm
| looking at or its purpose. Not trying to poo poo on it, just want
| context.
| breckenedge wrote:
| It's highlighting similar areas to the area currently under the
| cursor.
| ckrapu wrote:
| Sorry! The presentation could be better. I'll work on the FAQ.
| dlnovell wrote:
| Chris - just saw your presentation of this at PNNL, awesome
| seeing it pop up on HN too!
| ckrapu wrote:
| Cool! Glad you got to see it working and that presentation was
| a nice reason to make sure everything was cleaned up.
| 1024core wrote:
| Moved the center to SF and I've been sitting, watching the
| spinner.
|
| Some documentation would be helpful.
| watersb wrote:
| Very nice!
| DerSaidin wrote:
| Seems to not handle the ocean well.
| spousty wrote:
| It's due to the fact that they used satellite imagery to create
| the embeddings. The map is just for visualization. They
| probably used 5 or more bands of the satellite data which means
| each pixel is going to be slightly different due to things like
| depth, amount of silt in the water, amount of plankton....
|
| Having worked on these types of problems before the model is
| doing a pretty great job matching pixels.
| ckrapu wrote:
| Thanks! And you are giving it too much credit here - it's
| just trained on one-hot encoded land cover (24 classes) from
| Copernicus. Using imagery directly would be # 2 on my list of
| to-dos after including elevation in the input data.
| ckrapu wrote:
| I intentionally avoided using lots of ocean areas - this way I
| cut down the number of required sites for inference from ~100
| million (at resolution 7 in the H3 system) to around 25
| million.
| ckrapu wrote:
| I've had to build out some version of a geospatial vector
| embedding / latent variable dataset for at least 4 separate
| projects now. Come see the viewer I've built on top of it!
|
| The embeddings come from globally available Copernicus land cover
| data.
| spousty wrote:
| How did you generate the embeddings. The vectors are relatively
| small for all the embedding I have seen built from image and
| nlp models.
|
| Which copernicus bands were you using? Did you augment the data
| with DEM info?
| ckrapu wrote:
| The embeddings were obtained using a CNN triplet loss model
| (~10M parameters) on the Copernicus land cover data. I
| haven't used DEM data yet but I have done generative modeling
| on DEMs in other work and would like to do that too:
|
| https://www.linkedin.com/in/christopher-
| krapu/overlay/157690...
| fnordpiglet wrote:
| Can you explain what I'm looking at? I don't know how to
| interpret the hex tiles :-)
| tartakovsky wrote:
| Great question. A legend or brief description of the
| underlying logic / heuristic would be helpful.
| breckenedge wrote:
| The heuristic is likely the result of an ML algorithm, so
| the underlying logic may not make much sense to us.
| wyldfire wrote:
| I'm _pretty_ sure I 'm not the intended audience but I also
| have no idea what this is used for. Surveying? Real estate
| tycoons? Oil & gas exploration?
| potatoman22 wrote:
| It's a way to encode land to make predictions of it. E.g.
| is the land arable, is it rural, how similar is it to X,
| etc. Embeddings help encode data in formats more usable by
| ML models.
| lovasoa wrote:
| The question was: in what context do people need to
| answer a question like "which geographical points are
| close to X and similar to X"?
|
| I don't understand who the target audience is and what
| this can be used for.
| ckrapu wrote:
| The original idea came from something I saw at work - we
| needed a way to build generic feature sets representing
| something about real estate, but beyond the data we had
| on prices, floors, and other house-specific details.
| wyldfire wrote:
| Sure, I get that part -- but then how do people use the
| predictions?
| foota wrote:
| The embeddings are used by algorithms, not people,
| generally. You could ask something like "what's the most
| similar place to X within Y", and it would using the
| embeddings (which cover a variety of facts) to calculate
| answer. An embedding is an N dimensional vector (where
| the dimensions may or may not be meaningful to us), and
| similarity can be implemented by looking at the
| similarity between vectors.
| ckrapu wrote:
| Yup, and while the similarity search is perhaps the most
| visually appealing way to work with it, the real use (in
| my opinion) is in providing generic sets of geospatial
| features which are reusable across applications. I've
| built out versions of H3-referenced feature sets at each
| of the jobs I've had over the last 10 years.
| ckrapu wrote:
| Sure! The basic idea is that each hexagon is a discrete unit
| of space for which I obtain a vector embedding. This vector
| is supposed to represent a sort of data-based summary of that
| location, obtained in this case using deep learning.
|
| When you put the search on a hex, it looks up the vector for
| that hex and then performs a similarity search on all other
| vectors within the circle and shows the ones which are most
| similar in terms of land cover. The dependence on land cover
| / land use data is just because that was easy to get.
|
| As other folks have pointed out here, raw satellite imagery
| is also a potential input source for this. I'm playing around
| with other sources and really want to integrate something
| like GeoVex (https://openreview.net/forum?id=7bvWopYY1H) into
| the embeddings as well.
| skygazer wrote:
| This tool looks very interesting, and seems to work well, but
| being utterly unfamiliar with geospatial vector embeddings, their
| purpose or use, I had no idea what I was looking at, or why.
|
| It seems to show areas of similarity, within a radius of a
| central query location, with regard to (perhaps) vegetation cover
| (e.g., forests, grasslands, wetlands), artificial surfaces (e.g.,
| urban areas, roads), agricultural areas, water bodies, etc,
| overlayed on Google Maps, and allows exporting of the embeddings
| for lat/lons as cvs. It looks like land features for hexagonal
| grid areas have been turned into points in a 15 dimensional
| space, and some sort of nearest-neighbor search is done to return
| most similar other grid areas within the larger area. It does
| indeed seem accurate in my area!
|
| I'm not sure what this would be useful for, but I'm assuming
| urban planning, real estate, agriculture or conservation? I know
| I'm not the target audience, but more info or ideas would be
| fascinating.
| ckrapu wrote:
| You pretty much hit the nail on the head. The application areas
| you mentioned are the same as the ones that I had in mind when
| developing this.
| [deleted]
___________________________________________________________________
(page generated 2023-09-17 23:00 UTC)