[HN Gopher] Show HN: My demo for vector embeddings for the Earth...
       ___________________________________________________________________
        
       Show HN: My demo for vector embeddings for the Earth's surface
        
       Author : ckrapu
       Score  : 74 points
       Date   : 2023-09-16 10:31 UTC (1 days ago)
        
 (HTM) web link (www.louisquissetlabs.com)
 (TXT) w3m dump (www.louisquissetlabs.com)
        
       | [deleted]
        
       | aaomidi wrote:
       | This is amazing!
        
       | throwaway743 wrote:
       | Dude, please provide context on the site. I have no clue what I'm
       | looking at or its purpose. Not trying to poo poo on it, just want
       | context.
        
         | breckenedge wrote:
         | It's highlighting similar areas to the area currently under the
         | cursor.
        
         | ckrapu wrote:
         | Sorry! The presentation could be better. I'll work on the FAQ.
        
       | dlnovell wrote:
       | Chris - just saw your presentation of this at PNNL, awesome
       | seeing it pop up on HN too!
        
         | ckrapu wrote:
         | Cool! Glad you got to see it working and that presentation was
         | a nice reason to make sure everything was cleaned up.
        
       | 1024core wrote:
       | Moved the center to SF and I've been sitting, watching the
       | spinner.
       | 
       | Some documentation would be helpful.
        
       | watersb wrote:
       | Very nice!
        
       | DerSaidin wrote:
       | Seems to not handle the ocean well.
        
         | spousty wrote:
         | It's due to the fact that they used satellite imagery to create
         | the embeddings. The map is just for visualization. They
         | probably used 5 or more bands of the satellite data which means
         | each pixel is going to be slightly different due to things like
         | depth, amount of silt in the water, amount of plankton....
         | 
         | Having worked on these types of problems before the model is
         | doing a pretty great job matching pixels.
        
           | ckrapu wrote:
           | Thanks! And you are giving it too much credit here - it's
           | just trained on one-hot encoded land cover (24 classes) from
           | Copernicus. Using imagery directly would be # 2 on my list of
           | to-dos after including elevation in the input data.
        
         | ckrapu wrote:
         | I intentionally avoided using lots of ocean areas - this way I
         | cut down the number of required sites for inference from ~100
         | million (at resolution 7 in the H3 system) to around 25
         | million.
        
       | ckrapu wrote:
       | I've had to build out some version of a geospatial vector
       | embedding / latent variable dataset for at least 4 separate
       | projects now. Come see the viewer I've built on top of it!
       | 
       | The embeddings come from globally available Copernicus land cover
       | data.
        
         | spousty wrote:
         | How did you generate the embeddings. The vectors are relatively
         | small for all the embedding I have seen built from image and
         | nlp models.
         | 
         | Which copernicus bands were you using? Did you augment the data
         | with DEM info?
        
           | ckrapu wrote:
           | The embeddings were obtained using a CNN triplet loss model
           | (~10M parameters) on the Copernicus land cover data. I
           | haven't used DEM data yet but I have done generative modeling
           | on DEMs in other work and would like to do that too:
           | 
           | https://www.linkedin.com/in/christopher-
           | krapu/overlay/157690...
        
         | fnordpiglet wrote:
         | Can you explain what I'm looking at? I don't know how to
         | interpret the hex tiles :-)
        
           | tartakovsky wrote:
           | Great question. A legend or brief description of the
           | underlying logic / heuristic would be helpful.
        
             | breckenedge wrote:
             | The heuristic is likely the result of an ML algorithm, so
             | the underlying logic may not make much sense to us.
        
           | wyldfire wrote:
           | I'm _pretty_ sure I 'm not the intended audience but I also
           | have no idea what this is used for. Surveying? Real estate
           | tycoons? Oil & gas exploration?
        
             | potatoman22 wrote:
             | It's a way to encode land to make predictions of it. E.g.
             | is the land arable, is it rural, how similar is it to X,
             | etc. Embeddings help encode data in formats more usable by
             | ML models.
        
               | lovasoa wrote:
               | The question was: in what context do people need to
               | answer a question like "which geographical points are
               | close to X and similar to X"?
               | 
               | I don't understand who the target audience is and what
               | this can be used for.
        
               | ckrapu wrote:
               | The original idea came from something I saw at work - we
               | needed a way to build generic feature sets representing
               | something about real estate, but beyond the data we had
               | on prices, floors, and other house-specific details.
        
               | wyldfire wrote:
               | Sure, I get that part -- but then how do people use the
               | predictions?
        
               | foota wrote:
               | The embeddings are used by algorithms, not people,
               | generally. You could ask something like "what's the most
               | similar place to X within Y", and it would using the
               | embeddings (which cover a variety of facts) to calculate
               | answer. An embedding is an N dimensional vector (where
               | the dimensions may or may not be meaningful to us), and
               | similarity can be implemented by looking at the
               | similarity between vectors.
        
               | ckrapu wrote:
               | Yup, and while the similarity search is perhaps the most
               | visually appealing way to work with it, the real use (in
               | my opinion) is in providing generic sets of geospatial
               | features which are reusable across applications. I've
               | built out versions of H3-referenced feature sets at each
               | of the jobs I've had over the last 10 years.
        
           | ckrapu wrote:
           | Sure! The basic idea is that each hexagon is a discrete unit
           | of space for which I obtain a vector embedding. This vector
           | is supposed to represent a sort of data-based summary of that
           | location, obtained in this case using deep learning.
           | 
           | When you put the search on a hex, it looks up the vector for
           | that hex and then performs a similarity search on all other
           | vectors within the circle and shows the ones which are most
           | similar in terms of land cover. The dependence on land cover
           | / land use data is just because that was easy to get.
           | 
           | As other folks have pointed out here, raw satellite imagery
           | is also a potential input source for this. I'm playing around
           | with other sources and really want to integrate something
           | like GeoVex (https://openreview.net/forum?id=7bvWopYY1H) into
           | the embeddings as well.
        
       | skygazer wrote:
       | This tool looks very interesting, and seems to work well, but
       | being utterly unfamiliar with geospatial vector embeddings, their
       | purpose or use, I had no idea what I was looking at, or why.
       | 
       | It seems to show areas of similarity, within a radius of a
       | central query location, with regard to (perhaps) vegetation cover
       | (e.g., forests, grasslands, wetlands), artificial surfaces (e.g.,
       | urban areas, roads), agricultural areas, water bodies, etc,
       | overlayed on Google Maps, and allows exporting of the embeddings
       | for lat/lons as cvs. It looks like land features for hexagonal
       | grid areas have been turned into points in a 15 dimensional
       | space, and some sort of nearest-neighbor search is done to return
       | most similar other grid areas within the larger area. It does
       | indeed seem accurate in my area!
       | 
       | I'm not sure what this would be useful for, but I'm assuming
       | urban planning, real estate, agriculture or conservation? I know
       | I'm not the target audience, but more info or ideas would be
       | fascinating.
        
         | ckrapu wrote:
         | You pretty much hit the nail on the head. The application areas
         | you mentioned are the same as the ones that I had in mind when
         | developing this.
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-09-17 23:00 UTC)