[HN Gopher] The Overflow Offline project
___________________________________________________________________
The Overflow Offline project
Author : donutshop
Score : 209 points
Date : 2022-10-20 13:13 UTC (9 hours ago)
(HTM) web link (stackoverflow.blog)
(TXT) w3m dump (stackoverflow.blog)
| maw wrote:
| To me this basically seems like boat programming made
| respectable.
|
| Of course, if you asked me, it always was. You couldn't assume
| great connectivity then and you often still can't today.
| Spivak wrote:
| My favorite consultant I ever worked with was a boat
| programmer. You hired him for super specialized MySQL magics so
| he, well his company, charged a pretty substantial hourly rate
| and he apparently had enough revenue/leverage to get his
| company to foot the bill for two separate satellite internet
| connections on his boat. I feel like I would get lonely but
| it's definitely a vibe.
| pjmlp wrote:
| To me this is programming during the first 10 years where the
| "Internet" were local BBS, magazines spoke about Compuserve and
| Prodigy, and the connection rates where impossible, so we had
| to get by with what came on magazines and local library.
| jhgkjhlkhjkljk wrote:
| I assume this is so people can train AI on it. It's just hard to
| say that outright because some people don't like the idea.
| speedgoose wrote:
| It was already possible to download dumps since a long time.
|
| https://archive.org/details/stackexchange
| mkathuri wrote:
| Nice to find Kiwix again. Shameless plug, I made my own Kiwix
| alternative for macOS: https://github.com/technusm1/kiwings
| orblivion wrote:
| So this is a desktop app, but it uses the server as part of it?
| The normal Kiwix desktop client doesn't do that right?
|
| I'll throw in my own shameless plug: Self-host your Stack
| Overflow, Wikipedia etc on Sandstorm:
| https://apps.sandstorm.io/app/5uh349d0kky2zp5whrh2znahn27gwh...
| Obviously uses kiwix-serve as well. 3 years old, I need to make
| a better clip for updating it.
| ComodoHacker wrote:
| They could actually try to build a Copilot competitor off their
| data. /s
| VoidWhisperer wrote:
| It would be interesting to see how many times a copilot
| competitor trained off it gave correct code vs wrong code for a
| given case
| mdaniel wrote:
| I would suspect that would differ whether it was trained on
| the question's code versus any accepted answer's (or most
| upvotes?) code
| mdaniel wrote:
| I see the "/s" but I actually do wonder if integrating the
| "prompt" behavior into the _question box_ would help cut down
| on the absolutely staggering number of duplicate questions.
| Regrettably, I 'm not enough of a GPT expert to know what
| percentage of the time it would generate gibberish thus making
| the duplication question problem _worse_
| cee_el123 wrote:
| This is a amazing dose of humility.
| gragundier wrote:
| I've always wondered if we could force web apps into some sort of
| "default" offline mode with like some offline://url.here . Very
| cool of overflow.
| throwoutway wrote:
| I like this idea; I wonder if there's a way to get Firefox to
| support this via the settings. There's already support for
| file:/// ftp:// etc
| txtai wrote:
| There was a recent HN Post for codequestion which builds an
| offline semantic index (using https://github.com/neuml/txtai) on
| the archive.org Stack Overflow dumps -
| https://news.ycombinator.com/item?id=33110219
|
| GitHub: https://github.com/neuml/codequestion
|
| Article: https://medium.com/neuml/find-answers-with-
| codequestion-2-0-...
| xd1936 wrote:
| Love this. Reminds me of the other Kiwix projects to make
| MediaWiki services like Wikipedia available offline[1]. The
| entirety of English Wikipedia is ~50GB of text and ~100GB of
| images.
|
| 1. https://wiki.kiwix.org/
| 7373737373 wrote:
| I feel like their homepage could be greatly improved. It
| doesn't really make obvious what great capability it provides
| jokoon wrote:
| I already downloaded documentations, like the python api, or the
| cpp preference website as a pdf or html archive.
|
| I don't know if it's available for html or js or css, or opengl.
| they4kman wrote:
| https://devdocs.io/ exposes a huge catalog of indexed and
| searchable collections of documentation for a wide variety of
| languages, libraries, and subjects, including HTML, JS, and CSS
| - though, the only GL I see is WebGL - and _all_ of it can be
| downloaded to an IndexedDB for offline use.
|
| It's been a very handy tool in my toolbelt.
| eternauta3k wrote:
| You should check out Zeal, it's an offline documentation
| browser with existing documentation packages for HTML and a
| whole bunch of things
|
| https://zealdocs.org/
| 7373737373 wrote:
| This is great! Too many services today become completely unusable
| when they encounter technical problems, are hacked or are just
| lost over time. Having an easily accessible offline copy is
| always reassuring, showing that their survivability does not
| depend on just a few people and the projects are fundamentally
| about the information, not an organization.
| sytse wrote:
| You can also run FreeCodeCamp locally
| https://github.com/freeCodeCamp/freeCodeCamp/blob/main/docs/...
|
| And I funded to work to run that on an Android phone
| https://play.google.com/store/apps/details?id=space.atrailin...
| iib wrote:
| I remember already being able to use certain stackexchanges with
| kiwix before, as well as the arch wiki, wikipedia without images,
| and some other great resources. It is nice to see that they
| actually pay attention to this use-case and I look forward to
| updated workflows with kiwix or similar in the future. Latency is
| way better that way, even with good and stable internet.
| OpenZIM[1] is also useful in turning any page for use with kiwix.
|
| I also have great memories from a University exam where we were
| allowed to have laptops that were not connected to the internet.
|
| [1] https://wiki.openzim.org/wiki/OpenZIM
| benpopper1 wrote:
| What was test score ;)
| throwoutway wrote:
| This is awesome! At first I thought this only supported Stack
| Overflow and not the other 170+ StackExchange forums, but it
| looks like it does (or will?). From the blog:
|
| > "We built the Sotoki (Stack Overflow to Kiwix) scraper in such
| a way that it can capture each and every one of the 180 Stack
| Exchange websites."
|
| Unclear to me if "can" means "does" or "will soon" or just
| "could"
| benpopper1 wrote:
| It already does - everything from the technical stack exchanges
| to the sites on cooking and gardening :)
| polarix wrote:
| This has been available for a while but it's great to see some
| acknowledgement especially since the most recent data set was
| stuck in 2019 for a while.
|
| Here are the datasets:
| http://download.kiwix.org/zim/stack_exchange/
|
| It's not clear to me why the data set shrank between 2019/3 and
| 2022/6; was something excluded? Compression improvements?
|
| > stackoverflow.com_en_all_2019-02.zim 2019-03-12 19:53 134G
|
| > stackoverflow.com_en_all_2022-05.zim 2022-06-17 12:36 75G
| FinnLeSueur wrote:
| The article states:
|
| > ... to ensure that an up-to-date version of our dataset is
| easily available for those who need it, and will work to
| improve its readability and reduce its size so there is less
| friction for end users...
| gernb wrote:
| The data isn't stuck. The data is available here
|
| https://archive.org/details/stackexchange
|
| It's the "official" place to get the data
|
| I've download it several times and extracted my own
| contributions.
___________________________________________________________________
(page generated 2022-10-20 23:00 UTC)