[HN Gopher] How to Download All of Wikipedia onto a USB Flash Drive
___________________________________________________________________
How to Download All of Wikipedia onto a USB Flash Drive
Author : bubblehack3r
Score : 69 points
Date : 2022-10-06 21:06 UTC (1 hours ago)
(HTM) web link (planetofthepaul.com)
(TXT) w3m dump (planetofthepaul.com)
| PaulDavisThe1st wrote:
| Can someone explain what the role of kiwix in all this, please?
| [deleted]
| londons_explore wrote:
| Note that it's possible to make wikipedia substantially smaller
| if you're happy to use more aggressive compression algorithms.
|
| Kiwix divides the data into chunks and adds various indexes and
| stuff to allow searching data and fast access, even on slow CPU
| devices. But if you can live with slow loading, you can probably
| halve the storage space required, or maybe more.
| bombcar wrote:
| Kiwix is great - I have a collection of various things from their
| library https://library.kiwix.org/?lang=eng downloaded for when
| I'm on a plane or the internet is otherwise unavailable.
|
| That and the TeXlive PDF manuals can get me through anything.
| daneel_w wrote:
| I second Kiwix. I found out about it not too long ago on the
| topic of portable Wikipedia readers. It really stands out as
| the best software part of such a solution.
| 23B1 wrote:
| I third Kiwix. Immensely useful when I was deployed without
| internet.
| barbs wrote:
| Is there a portable version of Kiwix? Would be cool if you could
| plug the USB into any computer and start reading Wikipedia
| without having to install anything.
| tehnicaorg wrote:
| Yes. You download a zip archive. Unpack from 121MB to 263MB,
| and start the exe. (assuming you're using Windows)
| orliesaurus wrote:
| Oh wow, I thought this was gonna be a REALLY large file, but only
| 95GB not bad, some worthless videogames are larger haha
| bscphil wrote:
| I was curious how they achieve this. It looks like the
| underlying file format uses LZMA, or optionally Zstd,
| compression. Both achieve pretty high compression ratios
| against plain text and markup.
|
| > Its file compression uses LZMA2, as implemented by the xz-
| utils library, and, more recently, Zstandard. The openZIM
| project is sponsored by Wikimedia CH, and supported by the
| Wikimedia Foundation.
|
| https://en.wikipedia.org/wiki/ZIM_(file_format)
| keepquestioning wrote:
| I remember the era of stupidly large games.
| aendruk wrote:
| Circa 2003 I carried around a pared down copy on a Pocket PC.
| Dropping a few chosen categories (who needs Sports?) allowed it
| to barely fit on a 1-GB SD card.
| FeistySkink wrote:
| People going back in time need sports. An almonac of some
| kind.
| yieldcrv wrote:
| protip: you need to download wikipedia in other languages as well
|
| they are not translations, they are completely different articles
| under the name brand and platform of Wikipedia
|
| an entry that may be just a blurb in English may be one of the
| most comprehensive and fully fleshed out and researched entries
| on the site in German, for example
| thakoppno wrote:
| Somewhere around the original ipad era, I believe there was a
| curated subset of wikipedia articles that may have been called
| something like Educator's Edition.
|
| It worked offline and had images and I traveled to Peru with it
| and learned so much. Does anyone remember this sort of thing?
|
| I've tried wix formatted copies and they do work but the
| experience on an offline ipad was simply better. Thanks in
| advance.
| Rediscover wrote:
| Yes, I remember - I had a copy on an SD card on my OLPC.
|
| I believed it morphed into "Wikipedia for Schools" ^0 -
| possibly this ^1 is a comment about it?
|
| 0:
| https://en.m.wikipedia.org/wiki/Wikipedia:Wikipedia_for_Scho...
|
| 1: https://www.speedofcreativity.org/2008/11/11/wikipedia-to-
| go...
| thehours wrote:
| Tangent - I've noticed a lot more comments like this using
| the "^0" syntax for citations vs the traditional "[0]" one
| I've become accustomed to seeing on HN. Is there a real shift
| happening here and, if so, why?
| ashraful wrote:
| maybe: https://github.blog/changelog/2021-09-30-footnotes-
| now-suppo...
| teh_klev wrote:
| Checking to see if supported on HN [^1]
|
| Edit: nope :)
|
| [^1]: https://github.blog/changelog/2021-09-30-footnotes-
| now-suppo...
| teh_klev wrote:
| It's a bit non-standard, and if it's trying to follow the
| wikipedia citation style then it's the wrong way round.
| pupppet wrote:
| Can anyone recommend a hardy device for viewing the content? As
| nutty as it sounds, in some post-apocalyptic world it would sure
| be nice to have. I'd keep it under the bed just in case..
| bryanlarsen wrote:
| There used to be one, maybe you can find one somewhere.
|
| https://en.wikipedia.org/wiki/WikiReader
| bombcar wrote:
| Honestly a generic PC would probably be best, because it may be
| a bit harder to find power, etc, but you will have infinite
| amounts of replacement parts.
| c7b wrote:
| Have you looked at e-Ink readers?
| IggleSniggle wrote:
| Print it out on paper, small but legible font.
| teh_klev wrote:
| Someone did actually print out and bind Wikipedia in 2015:
|
| https://en.wikipedia.org/wiki/Print_Wikipedia
| SahAssar wrote:
| If you follow the logic that anything is at about half its life
| that would probably be an older thinkpad laptop, like an x61 or
| x200. If you are willing to spend the money on something newer
| perhaps a thoughbook. I have a modded kobo ebook reader (I
| upgraded mine to 256GB storage and have project gutenberg,
| wikipedia and a few other things on it) with a good solar
| powerbank.
| bscphil wrote:
| > If you follow the logic that anything is at about half its
| life
|
| I don't think that makes any sense. By that logic any
| currently working device should be assumed to last another
| $currentlifetime. My 20 year old car is not gonna last
| another 20 years. My 10 year old laptop won't last another
| 10. If my car somehow _did_ last another 20 years, it would
| not then make sense to assume it would still be running in
| another 40.
|
| Makes more sense to look at all objects of the same class. If
| 75% of laptops are dead in 10 years and 95% are dead in 15,
| and your laptop is 10 years old, you can infer that 5 out of
| 25 surviving laptops will make it another 5 years, or 20%.
| (These numbers completely made up, just an example.)
| ScottEvtuch wrote:
| I think the idea of "everything is about half its life" is
| to account for survivorship bias in longevity. The only
| units that make it to the 95th percentile lifetimes clearly
| got luckier with parts and can reasonably be expected to
| last longer.
| sgerenser wrote:
| Reliability of most complicated devices (cars,
| electronics) is usually thought to follow a "bathtub
| curve." Some early mortality due to defective parts or
| manufacturing defects, a long trough of reliability from
| say, 1-10 years, then a rapid rise in failures due to
| aging. "Everything at half life" is a pretty bad
| approximation of this.
| seba_dos1 wrote:
| https://en.wikipedia.org/wiki/WikiReader ? ;)
| colordrops wrote:
| Is there a way to keep a mirror that stays in sync?
| doomrobo wrote:
| It looks like Kiwix uses the ZIM file format, which appears to
| have diffing support [0] (see zimdiff and zimpatch). That said,
| it doesn't look like Kiwix actually publishes those diffs.
|
| [0] https://github.com/openzim/zim-tools/tree/master/src
| blue1 wrote:
| Does it include the images or it's just the text?
| 0x073 wrote:
| Yes, with images but only english
|
| All possible dumps:
| https://dumps.wikimedia.org/other/kiwix/zim/wikipedia/
| [deleted]
| sprash wrote:
| Is there something similar for Stack Overflow?
| Jun8 wrote:
| https://library.kiwix.org/?lang=eng&category=stack_exchange
| [deleted]
| ankaAr wrote:
| Kiwix can do that also. You needs to specify the ZIM file and
| it works:
|
| https://wiki.kiwix.org/wiki/Content_in_all_languages
|
| Why I know that? I wanted to travel as system administrator in
| some antartica base with a whole copy of stackoverflow with me.
| sqrt_1 wrote:
| Article mentions to format to exFat as NTFS has a 4GB limit - I
| don't think that is true.
| Wingman4l7 wrote:
| It's not -- FAT32 is the one with the 4GB limit. NTFS has much
| less native support on Macs than exFAT, though.
| aaron695 wrote:
| [deleted]
| kloch wrote:
| I wonder if there is an offline backup of Wikipedia on ISS? There
| should be. And on every manned space mission.
| Dig1t wrote:
| Why not just every space mission, period?
| vorpalhex wrote:
| Well the robots don't read too well..
| Rebelgecko wrote:
| How much would the science capabilities of a telescope like
| JWST be reduced if 1/3 of its SSD was repurposed for storing
| the latest wikipedia dump (that 1/3 number is assuming it's
| only English, compressed, and without images)? To me that
| seems like an easy cost/benefit analysis.
| bagels wrote:
| Why should there be?
| mhh__ wrote:
| The next Apollo 13 will probably be a software problem ,
| doesn't hurt if they can read up about it
| tablespoon wrote:
| > The next Apollo 13 will probably be a software problem ,
| doesn't hurt if they can read up about it
|
| What good would an "offline backup of Wikipedia" do in that
| situation?
|
| Wikipedia is good for one thing, and one thing only:
| getting some cursory knowledge on a topic you're unfamiliar
| with. It's the tourist map to the "sum of all human
| knowledge." If you expect to use it for anything else,
| you're asking too much of it.
| PaulDavisThe1st wrote:
| So, stackoverflow, not wikipedia, then?
___________________________________________________________________
(page generated 2022-10-06 23:00 UTC)