Post ASW6gltYK4dA5PSqXY by geraldew@fosstodon.org
(DIR) More posts by geraldew@fosstodon.org
(DIR) Post #ASUc3z9DvLhi71QJv6 by simon@fedi.simonwillison.net
2023-02-09T05:07:23Z
0 likes, 0 repeats
Wrote up a neat trick you can do with SQLite (and I'm sure other databases as well) - you can embed full subqueries in the select expressions of a query, and use json_group_array() and json_object() to do things like return JSON for the three most recent releases related to each row in a repos tableThis means you can fetch multiple related items for each row in a single select query!https://til.simonwillison.net/sqlite/subqueries-in-select
(DIR) Post #ASUcUCBY4c6u4o0ceO by simon@fedi.simonwillison.net
2023-02-09T05:12:25Z
0 likes, 1 repeats
While exploring this I prototyped a new Datasette plugin for running "explain query plan select.." while you type a query: https://github.com/simonw/datasette-explainDemo here: https://latest-with-plugins.datasette.io/github?sql=select+repos.full_name%2C+%28+select+json_group_array%28+json_object%28+%27id%27%2C+id%2C+%27name%27%2C+name+%29+%29+from+%28+select+*+from+releases+where+repo+%3D+repos.id+order+by+created_at+desc+limit+3+%29+%29+as+recent_releases%0D%0Afrom+repos%0D%0Alimit+10
(DIR) Post #ASUidYzNG6oj8FAGvI by ossronny@fosstodon.org
2023-02-09T06:20:52Z
0 likes, 0 repeats
@simon this looks neat, unfortunately orm have to unroll related selects into a join as they can't have more structured output in a reliable manner
(DIR) Post #ASUsTRfyam7j65uCwq by simon@fedi.simonwillison.net
2023-02-09T08:11:42Z
0 likes, 0 repeats
Anton Zhiyanov suggested an alternative to my query using window functions instead of subqueries which I think is a whole lot more elegant: I've added that to the TIL here https://til.simonwillison.net/sqlite/subqueries-in-select#user-content-achieving-the-same-thing-with-window-functions
(DIR) Post #ASUsybOkUmIuE8hjKC by ummjackson@mastodon.social
2023-02-09T08:17:07Z
0 likes, 0 repeats
@simon Window functions are so handy.
(DIR) Post #ASUtzolGOfuluqNY0m by TheSameCat@cyberplace.social
2023-02-09T08:28:28Z
0 likes, 0 repeats
@simon This is very cool. Thanks for the write-up.
(DIR) Post #ASUxs5WSawbtOr1Zey by adamchainz@fosstodon.org
2023-02-09T09:12:07Z
0 likes, 0 repeats
@simon Yeah I've seen this done in PostgreSQL, very useful optimization trick.
(DIR) Post #ASVooFdaLCspBPPdoW by jbeimler@fosstodon.org
2023-02-09T19:04:59Z
0 likes, 0 repeats
@simon I just recently learned it as well from @judell and his mastodon tool for steampipe (sql for the net) https://www.infoworld.com/article/3685532/lists-and-people-on-mastodon.html
(DIR) Post #ASW5ZeVHJ28pBG9SIS by simon@fedi.simonwillison.net
2023-02-09T22:07:44Z
0 likes, 0 repeats
I iterated on the window function version a few times and I've written up the final query I settled on for this: https://til.simonwillison.net/sqlite/subqueries-in-select#user-content-simplified-to-use-just-one-window-function
(DIR) Post #ASW6gltYK4dA5PSqXY by geraldew@fosstodon.org
2023-02-09T22:25:07Z
0 likes, 0 repeats
@simon by habit I always take a close look when I see someone using rank() rather than row_number()That's because I see a lot of mistakes made where people assume or don't know that multiple rows can produce the same rank for a partition.Am glad to see that's not an issue in this case, because the rank is being used for recentness.
(DIR) Post #ASW6retYgkhzYteslM by simon@fedi.simonwillison.net
2023-02-09T22:26:09Z
0 likes, 0 repeats
@geraldew oh that's a good point though, I think that could cause a problem if two releases happened to have the exact same creation time!
(DIR) Post #ASW8WY6YwIOuKTnq9w by akahn@mastodon.social
2023-02-09T22:46:15Z
0 likes, 0 repeats
@simon I've started using a #TIL tag in Logseq but this is inspiring me publish them publicly!
(DIR) Post #ASW8iTzk2udj3cIqGW by benosteen@wandering.shop
2023-02-09T22:48:14Z
0 likes, 0 repeats
@simon thank you for writing this up! I really need to get better at better using SQL for what it's good at.
(DIR) Post #ASWgnHO5kArjqW9R6e by mucio@mastodon.social
2023-02-10T05:10:04Z
0 likes, 0 repeats
@simon my quick comments, can you rename cte to something that explains what that query is doing? Maybe repos_with_ranked_rel. Who will maintain this query will appreciate it. Also, the left join, do you expect repos without releases? Or the repo creation isalready a release and so the join can be an inner?Apologies in advance, if you didn't need these advices
(DIR) Post #ASWgxiGgeoNY6uf9tI by simon@fedi.simonwillison.net
2023-02-10T05:11:49Z
0 likes, 0 repeats
@mucio yeah the CTE would definitely benefit froma better name thereIf you explore the database linked from the article you'll see it has hundreds of repos without any releases, so it's not a theoretical concern for that data
(DIR) Post #ASWi8TEWzq0mcXLdJY by mucio@mastodon.social
2023-02-10T05:25:05Z
0 likes, 0 repeats
@simon sorry, just read the article. I see the repos without releases are mentioned there.If I were to performance tuning this query I would do the release ranking, json object build, and filter only top 3 in a separated cte before joining with repos (assuming all releases belong to a repo). This could also avoid using filter for the json array. I will try yo test it later today
(DIR) Post #ASWiSWtd5USHlx8VRg by simon@fedi.simonwillison.net
2023-02-10T05:28:55Z
0 likes, 0 repeats
@mucio I've not really spent any time on optimization here because all of the example queries so far returned in less than 10ms - but yeah, optimizing this would make for a very interesting educational exercise in order to use these tricks against much larger tables