Post AboPHe1P8l07rupvCi by fpbhb@mastodon.social
(DIR) More posts by fpbhb@mastodon.social
(DIR) Post #AboNZZ5a1DP1sHCt60 by simon@fedi.simonwillison.net
2023-11-14T22:48:26Z
0 likes, 0 repeats
Midjourney generated images are hosted on the Discord CDN. I was curious how much space these take up... It looks like the answer is well over 148 TB!Here's how I figured that out, using DuckDB to query remote Parquet files hosted on Hugging Facehttps://til.simonwillison.net/duckdb/remote-parquet
(DIR) Post #AboNtAtZgCKEdAwTCq by simon@fedi.simonwillison.net
2023-11-14T22:52:02Z
0 likes, 0 repeats
In putting these notes together I learned how to monitor the bandwidth used by a particular process on macOS using the nettop CLI utility nettop -p PIDRunning that shows a constantly updated list of connections being made and how much traffic they are responsible for
(DIR) Post #AboO6F3VWVYhIXoq92 by simon@fedi.simonwillison.net
2023-11-14T22:54:11Z
0 likes, 0 repeats
Not a huge surprise then that Discord are changing their policy to serve expiring links instead:Discord is switching to expiring links for files shared off-platform https://www.engadget.com/discord-is-switching-to-expiring-links-for-files-shared-off-platform-202533531.html
(DIR) Post #AboOYfrPSOQQXYiZYO by MudMan@mas.to
2023-11-14T22:57:44Z
0 likes, 0 repeats
@simon People keep worrying about Moore's law slowing down for processing power, but storage has been stuck for a while. Speeds go up, capacity does not.It's starting to become a signfiicant, expensive bottleneck.
(DIR) Post #AboOkbUHXdcWS7aYF6 by simon@fedi.simonwillison.net
2023-11-14T22:59:43Z
0 likes, 0 repeats
While I was writing this up I built myself a tiny tool to convert values in bytes into KB/MB/GB/TB - now available at https://til.simonwillison.net/tools/byte-size-converter(I got ChatGPT to write the HTML, CSS and JavaScipt for me using a short sequence of prompts: https://chat.openai.com/share/640d0d98-9493-4b62-9b27-e7b2acea36bd )
(DIR) Post #AboPHe1P8l07rupvCi by fpbhb@mastodon.social
2023-11-14T23:07:50Z
0 likes, 0 repeats
@simon Generating that code probably used more cycles than the planetary demand for shift operations for a decade or so … ;-)
(DIR) Post #AboQYZHu5sXA41Bgjw by etchedpixels@mastodon.social
2023-11-14T23:21:48Z
0 likes, 0 repeats
@simon Would probably be better to label it KiB as per standards since IEC-80000-13 but I'm just a pedant
(DIR) Post #AboQk3Jcn8y6NXT7Gy by simon@fedi.simonwillison.net
2023-11-14T23:23:20Z
0 likes, 0 repeats
@etchedpixels I have to admit when I see KB vs KiB I can never remember which one is which - that's why I included the note at the bottom of https://til.simonwillison.net/tools/byte-size-converter
(DIR) Post #AboQvhGkaLfXThg120 by whynothugo@fosstodon.org
2023-11-14T23:23:58Z
0 likes, 0 repeats
@simon I’m surprised that this wasn’t the case already.
(DIR) Post #AboRUhy1yvfJmKcDlQ by j2kun@mathstodon.xyz
2023-11-14T23:32:18Z
0 likes, 0 repeats
@simon did you know Google search can do this with its calculator? (Search "2.5 TiB in MiB" for example)
(DIR) Post #AboTvLKGTBWLHhqLUu by simon@fedi.simonwillison.net
2023-11-14T23:58:26Z
0 likes, 0 repeats
@j2kun weirdly that frequently doesn't give me what I want when I give it a value in bytes - partly because it often leans into hard-for-me-to-read scientific numeric notation
(DIR) Post #AboU6bqJOttXhECnM8 by alans@social.lol
2023-11-14T23:58:39Z
0 likes, 0 repeats
@simon This writeup is really powerful, thanks!Are Midjourney themselves storing all of their output there in Discord? That seems like a bad practice to have taken up, and no wonder Discord is cutting that off.
(DIR) Post #AboUKGQjoBi7LBkpaS by simon@fedi.simonwillison.net
2023-11-15T00:00:40Z
0 likes, 0 repeats
@alans presumably they download their own copy to help with future model training runs, but maybe?Their Discord instance had over 1 million people last I heard and was by far the biggest on that platform, so I imagine they have a strong behind-the-scenes relationship
(DIR) Post #AboUXNTKwDrMwgDt0i by mattm@infosec.exchange
2023-11-15T00:04:31Z
0 likes, 0 repeats
@simon @etchedpixels I think it's easiest to remember that "kb" is metric, so 1kb = 1000 bytes and ki adds the `i` for the 1024 "binary" prefix used in computers. So your note is helpful that you're using the arguably-wrong units :)
(DIR) Post #AboVVG9XHRDjcUW08m by alans@social.lol
2023-11-15T00:16:56Z
0 likes, 0 repeats
@simon Oh, yeah, I meant hosting for publication/sharing. Surely they keep internal copies? And -- Holy cow, that's a big discord. For sure, they must have a relationship beyond boosts and super-reacts!
(DIR) Post #AboVhlX4RBXOPuUEvA by j2kun@mathstodon.xyz
2023-11-15T00:19:34Z
0 likes, 0 repeats
@simon that is true, removing the scientific notation would make that feature better
(DIR) Post #AboqzKF1HgBVSRrGOe by pawandubey@tiny.tilde.website
2023-11-15T04:17:44Z
0 likes, 0 repeats
@simon i think the post is missing the nettop invocation.
(DIR) Post #AborYMwNExQbxsqYWO by paulsmith@hachyderm.io
2023-11-15T04:24:18Z
0 likes, 0 repeats
@simon just for fun I couldn't help myself and improved it to not lose precision when numbers go over the JavaScript 53-bit numeric limit: https://jsfiddle.net/khza1crs/link to chat: https://chat.openai.com/share/a1f6b077-7625-428e-85dd-ba59cd0de29f
(DIR) Post #AboyApaflgQ5YBQ4P2 by simon@fedi.simonwillison.net
2023-11-15T05:38:24Z
0 likes, 0 repeats
@pawandubey It was! Thanks, fixed that: https://til.simonwillison.net/duckdb/remote-parquet#user-content-tracking-network-usage-with-nettop
(DIR) Post #AbozFWeXhlSaDc2316 by simon@fedi.simonwillison.net
2023-11-15T05:50:32Z
0 likes, 0 repeats
@paulsmith Oh I like that! Upgraded mine to BigInt inspired by yours, then added some extra logic to only show units relevant to the size of the input: https://github.com/simonw/til/commit/0bdbca2d0ccec180cc40a34720a1aee70104fb47
(DIR) Post #AbqOU6ddOv8jtKcHLs by markallanson@mastodon.org.uk
2023-11-15T22:07:46Z
0 likes, 0 repeats
@simon no need for all this 😂 ‘llm “10mib to KiB”’
(DIR) Post #AbuC8J3g7ZUnDtTbwu by severo@mastodon.social
2023-11-17T18:08:11Z
0 likes, 0 repeats
@simon have you seen the work by DouEnergy to support "range" glob in URLs in #duckdb ([00-55])? https://huggingface.co/datasets/vivym/midjourney-messages/resolve/main/data/0000[00-55].parquetMore here: https://twitter.com/douenergy/status/1725423199138279823?Work in progress here: https://github.com/duckdb/duckdb/compare/main...douenergy:duckdb:brace-expansion