Post AY9BPeqUMaEdck3UmW by freyfogle@mastodon.social
(DIR) More posts by freyfogle@mastodon.social
(DIR) Post #AY7vrWucFlyrFQ5fMm by simon@fedi.simonwillison.net
2023-07-27T17:43:04Z
0 likes, 0 repeats
https://overturemaps.org/ released an astonishing GIS dataset yesterday that includes 60m "place of interest" listings (businesses, attractions etc) under a VERY permissive licenseIt's 8GB of data and the quality from an initial spot-check seems to be very high. I wrote about how I've been exploring it so far here: https://til.simonwillison.net/overture-maps/overture-maps-parquet
(DIR) Post #AY7w5XAw4Umk7OOxJQ by simon@fedi.simonwillison.net
2023-07-27T17:45:58Z
0 likes, 0 repeats
I used DuckDB to extract data from the released parquet files, then loaded that into SQLite so I could use it with @datasette Here's a demo I built with just the data from the places city for the city of Half Moon Bay - 931 listings in total: https://hmb-overture-demo.vercel.app/hmb/places
(DIR) Post #AY7wXpACmkOg13Y8gq by mitch@posts.dumb.stuff.donaberger.xyz
2023-07-27T17:49:17Z
0 likes, 0 repeats
@simon oddly, when i read about this elsewhere, my first thought was "I should ask Simon what's up," lol. thanks as always.I am also greatly enjoying the `llm` tool, so ty for that too.
(DIR) Post #AY81lmRiBZYXPdDDyS by ian@social.modest.com
2023-07-27T18:49:43Z
0 likes, 0 repeats
@simon Thanks for sharing out the Parquet querying parts of this!
(DIR) Post #AY82KP86rmaPEZYO00 by seav@en.osm.town
2023-07-27T18:56:01Z
0 likes, 0 repeats
@simon kinda sucks that you need to download the whole thing. Maybe using the Athena or Azure routes would allow faster selects than DuckDB?
(DIR) Post #AY88x2Rtj6htuRcAwC by simon@fedi.simonwillison.net
2023-07-27T20:08:32Z
0 likes, 0 repeats
@seav You don't have to download the whole thing for a bunch of operations - but the "find places within this bounding box" thing does seem to be too much for the remote HTTP mechanism to handle quicklyA problem I have is that I don't have good instincts yet for figuring out if a query is likely to work well over remote Parquet or not
(DIR) Post #AY8BHkOEGUmHtCtvqS by bradlarsen@infosec.exchange
2023-07-27T20:35:58Z
0 likes, 0 repeats
@simon @datasette I enjoy using duckdb!
(DIR) Post #AY8BWYcqHpvVkMnZ4a by jwass2000@mapstodon.space
2023-07-27T20:37:27Z
0 likes, 0 repeats
@simon @seav This is awesome! There are some tricks we can use to structure the parquet files that allow more efficient bounding box queries using remote predicate pushdown. It’s all pretty new for spatial parquet data. We’ll probably look at this for future releases along with easier country/region partitioning.
(DIR) Post #AY8VreSJnrRwzfDBsu by simon@fedi.simonwillison.net
2023-07-28T00:27:05Z
0 likes, 0 repeats
@jwass2000 @seav That would be fantastic - I'm very new to Parquet/DuckDB myself so any extra documentation from Oversight illustrating the kinds of queries you can run against it without downloading GBs of data would be fantastic
(DIR) Post #AY8yvuu0tuvpHmkw5o by benhur07b@mastodon.social
2023-07-28T05:52:44Z
0 likes, 0 repeats
@simon Thank you for this! I also downloaded the entire dataset and was able to extract data for the Philippines (bounding box) using the approach you shared. 👍
(DIR) Post #AY9BPeqUMaEdck3UmW by freyfogle@mastodon.social
2023-07-28T08:12:29Z
0 likes, 0 repeats
@simon @datasette great write-up, thanks.What's your impression as a resident of half Moon Bay?Is the data correct? How fresh is it? How does it compare with OSM?
(DIR) Post #AY9sOZUL5lJykRQJjk by freyfogle@mastodon.social
2023-07-28T08:16:12Z
0 likes, 0 repeats
@simon @datasette umm, is there actually a restaurant / cinema on the water in the middle of the bay?
(DIR) Post #AY9sOa48wdxeXUAuXo by simon@fedi.simonwillison.net
2023-07-28T16:13:43Z
0 likes, 0 repeats
@freyfogle @datasette it looks like that's the Mavericks surf competition, so yeah that's the right spot
(DIR) Post #AY9sOb5x7P0DjO1s0G by freyfogle@mastodon.social
2023-07-28T08:19:38Z
0 likes, 0 repeats
@simon @datasette in fairness I guess the do give it a low confidence score, so could exclude all that fall below a certain threshold
(DIR) Post #AY9saKi1rmQmO0R2mm by simon@fedi.simonwillison.net
2023-07-28T16:14:21Z
0 likes, 0 repeats
@freyfogle @datasette I haven't compared to OSM yet - my initial spot check for my favourite places all looked correct to me
(DIR) Post #AY9unzuCD8VjpKlmCm by freyfogle@mastodon.social
2023-07-28T16:41:03Z
0 likes, 0 repeats
@simon @datasette sure, but why is it labeled as a resturant?
(DIR) Post #AY9v1QI3QLxHQGKZu4 by simon@fedi.simonwillison.net
2023-07-28T16:41:50Z
0 likes, 0 repeats
@freyfogle @datasette yeah those categories are very clearly off!