http://prize.hutter1.net/ bg image(back.gif) +-------------------------------------------------------------------+ | 500'000EUR Prize for Compressing Human Knowledge | |(widely known as the Hutter Prize. Total payout so far: 23'034EUR) | +-------------------------------------------------------------------+ Compress the 1GB file enwik9 to less than the current record of about 114MB * The Task * Motivation * Detailed Rules for Participation * Previous Records News: Saurabh Kumar is * More Information the sixth Winner! * Discussion Congratulations! Prize forum on the contest and prize Saurabh Kumar Medal * History ... the contest * Committee continues ... * Frequently Asked Questions * Contestants * Links * Disclaimer Being able to compress well is closely related to intelligence as explained below. While intelligence is a slippery concept, file sizes are hard numbers. Wikipedia is an extensive snapshot of Human Knowledge. If you can compress the first 1GB of Wikipedia better than your predecessors, your (de)compressor likely has to be smart(er). The intention of this prize is to encourage development of intelligent compressors/programs as a path to AGI. Interview with Lex Fridman (26.Feb'20) (Video, Audio, Tweet) The Task Losslessly compress the 1GB file enwik9 to less than 114MB. More precisely: * Create a Linux or Windows compressor comp.exe of size S1 that compresses enwik9 to archive.exe of size S2 such that S:=S1+S2 < L := 114'156'155 (previous record). * If run, archive.exe produces (without input from other sources) a 10^9 byte file that is identical to enwik9. * If we can verify your claim, you are eligible for a prize of 500'000EURx(1-S/L). Minimum claim is 5'000EUR (1% improvement). * Restrictions: Must run in [?]50 hours using a single CPU core and <10GB RAM and <100GB HDD on our test machine. Remark: You can download the zipped version enwik9.zip of enwik9 here ([?]300MB). Please find more details including constraints and relaxations at http://prize.hutter1.net/hrules.htm. Motivation This compression contest is motivated by the fact that being able to compress well is closely related to acting intelligently, thus reducing the slippery concept of intelligence to hard file size numbers. In order to compress data, one has to find regularities in them, which is intrinsically difficult (many researchers live from analyzing data and finding compact models). So compressors beating the current "dumb" compressors need to be smart(er). Since the prize wants to stimulate developing "universally" smart compressors, we need a "universal" corpus of data. Arguably the online encyclopedia Wikipedia is a good snapshot of the Human World Knowledge. So the ultimate compressor of it should "understand" all human knowledge, i.e. be really smart. enwik9 is a hopefully representative 1GB extract from Wikipedia. Detailed Rules for Participation * Rules * Relaxations * Participation * Award * More Information Baseline Enwik9 and Previous Records Enwik8 +---------------------------------------------------------------------------+ | Author | Date |Decompressor| Total Size |Compr.Factor|% | Award|Sponsor| | (enwik9) | | | | |RAM|time | | | |-----------+------+------------+------------+------------+---------+-------| | You? | 202? | ? |<113'014'593| >8.85 | | >1%| >|Marcus | | | | | |<10GB | <50h| 5'000EUR|Hutter | |-----------+------+------------+------------+------------+---------+-------| | Saurabh |16.Jul| fast cmix |114'156'155 |8.76 | 8.4GB| 1.04%| |Marcus | | Kumar | 2023 | | | | 43h | 5'187EUR|Hutter | |-----------+------+------------+------------+------------+---------+-------| | Artemiy |31.May|starlit ... |115'352'938 |8.67 | 10GB | 1.1% | |Marcus | |Margaritov | 2021 | | | | ~50h | 9000EUR|Hutter | |-----------+------+------------+------------+------------+---------+-------| | Alexander |4.Jul | phda9v1.8 |116'673'681 |8.58 | 6.3GB| -- || - | |Rhatushnyak| 2019 | ... | | | ~23h |pre-prize| | +---------------------------------------------------------------------------+ +------------------------------------------------------------------------------+ | Author | Date |Decompressor|Total Size|Compr.Factor|% | Award|Sponsor| | (enwik8) | | | | |RAM|time | | | |-----------+-----------+------------+----------+------------+---------+-------| | Alexander |4.Nov 2017 | phda9 ... |15'284'944| 6.54 | | 4.17%| |Marcus | |Rhatushnyak| | | |1048MB | ~5h| 2085EUR|Hutter | |-----------+-----------+------------+----------+------------+---------+-------| | Alexander |23.May 2009|decomp8 ... |15'949'688|6.27 | 936MB| 3.2%| |Marcus | |Rhatushnyak| | | | | ~9h | 1614EUR|Hutter | |-----------+-----------+------------+----------+------------+---------+-------| | Alexander |14.May 2007|paq8hp12 -7 |16'481'655|6.07 | 936MB| 3.5%| |Marcus | |Rhatushnyak| | | | | 9h | 1732EUR|Hutter | |-----------+-----------+------------+----------+------------+---------+-------| | Alexander |25.Sep.2006| paq8hp5 -7 |17'073'018|5.86 | 900MB| 6.8%| |Marcus | |Rhatushnyak| | | | | 5h | 3416EUR|Hutter | |-----------+-----------+------------+----------+------------+---------+-------| | Matt |24.Mar.2006| paq8f -7 |18'324'887|5.46 | 854MB| -- || - | | Mahoney | | | | | 5h |pre-prize| | +------------------------------------------------------------------------------+ More Information * Discussion forum on the contest and prize * Compression benchmarks enwik9 and others * Motivation of compressing the Human Knowledge * Information about the enwik9 data file * Wikipedia on the Hutter Prize History * 13??: William of Ockham's razor: Entities should not be multiplied beyond necessity. * 1964: Ray Solomonoff introduced algorithmic probability for universal prediction. * 1996: Leonid Broukhis introduced the first compression competition with a prize. * 2000: Marcus Hutter introduced a compression based universal intelligent agent. * 2005: Jim Bowery proposed a larger scale compression contest based on the Wikipedia corpus. * 2006: Matt Mahoney compressed Wikipedia with many state of the art compressors. * 2006: Marcus Hutter launched the 50'000EUR prize. * 2006-2017: Alexander Rhatushnyak is 4-times winner of the HKCP. * 2020: Marcus Hutter launched the 500'000EUR prize. * 2021: Artemiy Margaritov is the first winner the 10x HKCP. Committee * Jim Bowery: verification of claims, public relations, finding sponsors, newsgroups, etc. * Matt Mahoney: running the compression competition. * Marcus Hutter: arbiter, current sponsor, and manager of prize fund. Frequently Asked Questions (FAQ) * What is this contest about? * Is the compression contest still ongoing? (YES) * Why did you grant a temporary relaxation in 2021 of 5'000 Byte per day? * Where do I start? How do I develop a competitive compressor? * What is (artificial) intelligence? * What does compression has to do with (artificial) intelligence? * What is/are (developing better) compressors good for? * The contest encourages developing special purpose compressors * Why lossless compression? * I have a really good lossy compressor. (How) can I participate? * Why aren't cross-validation or train/test-set used for evaluation? * Why is (sequential) compression superior to other learning paradigms? * Why is Compressor Length superior to other Regularizations? * How can I achieve small code length with huge Neural Networks? * Batch vs incremental/online/sequential compression. * Why don't you allow using some fixed default background knowledge data base? * Why is "understanding" of the text or "intelligence" needed to achieve maximal compression? * Why do you focus on text? * What is the ultimate compression of enwik9? * Why recursively compressing compressed files or compressing random files won't work * Can you prove the claims in the answers to the FAQ above? * The PAQ8 compressors are hard to beat * There are lots of non-human language pieces in the file * Why include the decompressor? * Why do you require submission of the compressor and include its size and time? * Why not use Perplexity, as most big language models do? * Why did you start with 100MB enwik8 back in 2006? * Why did you go BIG in 2020? * Why are you limiting (de)compression to less than 100 hours on systems with less than 10GB RAM? * Why do you restrict to a single CPU core and exclude GPUs? * The total prize is not exactly 500'000EUR * Is 1GB 2^30 byte or 10^9 byte? * The website looks dated * Why do you require Windows or Linux executables? * Why do you require submission of documented source code? * Where can I find the source code of the past winners and baseline phda9? * Under which license can/shall I submit my code? * What if I can (significantly) beat the current record? * How can I produce self-contained or smaller decompressors? * Is Artificial General Intelligence (AGI) possible? * Is Ockham's razor and hence compression sufficient for AI? * The human brain works very differently from (de)compressors * I have other questions or am not satisfied with the answer Contestants and Winners for enwik8 So far we have received the submissions below for enwik8. Each is/was open for public comment and verification for 30 days before an award decision will be/was made. Comments should be made to the Hutter Prize Discussion Forum or by email to members of the Prize committee. Date Author Decompressor Compression Size of Size of Total Size %Improve Compr. Bits/ Memory Time Note Options archive decompr. 1-S/L Factor Char 4.Nov'17 Alexander Rhatushnyak phda9 compressed_enwik8 enwik8 15'242'496 42'448 15'284'944 4.17% 6.54 1.225 1048MB ~5h Meets all prize criteria. Fourth winner! 23.May'09 Alexander Rhatushnyak decomp8 archive8.bin enwik8 15'932'968 16'720 15'949'688 3.2% 6.27 1.278 936MB ~9h Meets all prize criteria. Third winner! 22.Apr'09 Alexander Rhatushnyak decomp8 archive8.bin enwik8 15'970'425 16'252 15'986'677 3.0% 6.26 1.279 924MB 9h 3.0% improvement over new baseline paq8hp12 14.May'07 Alexander Rhatushnyak paq8hp12 -7 16'381'959 99'696 16'481'655 3.5% 6.07 1.319 936MB 9h Meets all prize criteria. Second winner! ... " ... ... ... ... ... ... ... ... ... ... ... 6.Nov'06 Alexander Rhatushnyak paq8hp6 -7 16'731'800 170'400 16'902'200 1% 5.92 1.352 941MB 5h 1% improvement over new baseline paq8hp5 25.Sep'06 Alexander Rhatushnyak paq8hp5 -7 16'898'402 174'616 17'073'018 6.8% 5.86 1.366 900MB 5h Meets all prize criteria. First winner! 10.Sep'06 Alexander Rhatushnyak paq8hp4 -7 17'039'173 206'336 17'245'509 5.9% 5.80 1.380 803MB 5h Superseded by paq8hp5 3.Sep'06 Alexander Rhatushnyak paq8hp3 -7 17'241'280 178'468 17'419'748 4.9% 5.74 1.394 742MB 5h Superseded by paq8hp4 28.Aug'06 Alexander Rhatushnyak paq8hp2 -7 17'390'460 205'276 17'595'736 4.0% 5.68 1.408 747MB 5h Superseded by paq8hp3 21.Aug'06 Alexander Rhatushnyak paq8hp1 -7 17'566'769 206'764 17'773'533 3.0% 5.63 1.422 748MB 5h Superseded by paq8hp2 20.Aug'06 Alexander Rhatushnyak paq8hkcc -7 17'597'599 244'224 17'841'823 2.6% 5.61 1.43 747MB 5h Superseded by paq8hp1 16.Aug'06 Dmitry Shkarin durilca0.5h -m1650 -o21 -t2 17'958'687 86'016 18'044'703 1.5% 5.54 1.444 1650MB 30min Fails to meet the reasonable memory limitations 16.Aug'06 Rudi Cilibrasi raq8g -7 18'132'399 34'816 18'167'215 0.9% 5.50 1.453 1089MB 7h Fails to meet the 1% hurdle, and others 24.Mar'06 Matt Mahoney paq8f -7 18'289'559 35'328 18'324'887 0% 5.46 1.466 854MB 5h Pre-prize baseline The time for decompression/compression is estimated for a 2GHz P4 till 2010 and for a 2.7GHz i7 since 2017. The percent (%) improvement is over the baseline previous record. More details on the (de) compressors can be found here. * Apr-Nov'17: Alexander Rhatushnyak submits another series of ever improving compressors based on phda9, with the final one on 4.Nov'17 improving over his previous record by over 4%! * Sep'07-...: Alexander Rhatushnyak submits another series of ever improving compressors. Is there nobody else who can keep up with him? * Nov'06-May'07: Alexander Rhatushnyak submits another improved series of (de)compressors paq8hp6-12 (option -7). On 14.May 2007 he submits paq8hp12 It achieved an improvement of 3.5% over the new baseline paq8hp5 and was finally confirmed as the second winner on 30.June 2007. Congratulations! A detailed description of paq8hp12 can be found here. Most of the time in developing paqhp6-12 went into planning and performing experiments, and studying and understanding the results of these experiments. Alexander Rhatushnyak's current occupation is in software engineering. For him data compression is science and art and sport all together. This was his motivation for participating in the contest. Dr. Rhatushnyak was born in the Siberian Scientific Center (www.nsc.ru), studied data compression and related algorithms since 1991, and graduated from the Moscow State University (www.msu.ru) in 1996. After his PhD in 2002 he lived and worked in various places in the world. * Aug-Sep'06: Alexander Rhatushnyak of the Moscow State University Compression Project submits an improving series of (de) compressors paq8hp? (option -7), modifications of paq8h with a custom dictionary built from enwik8 and other improvements. Przemyslaw Skibinski contributed to earlier versions. On 25.Sep.2006 Alexander Rhatushnyak submits paq8hp5. It achieved an improvement of 6.8% over the baseline paq8f and was finally confirmed as the first winner on 25.Oct.2006. Congratulations! A detailed description of paq8hp5 can be found here. * 16.Aug'06: Dmitry Shkarin submits a modification of (de) compressor durilca (option -m1650 -o21 -t2), a modification of ppmd/ppmonstr with filters for text, exe, and data with fixed length records. * 16.Aug'06: Rudi Cilibrasi submits (de)compressor raq8g.cpp (option -7), a modification of paq8f with additional text modeling. Links (Further Information/Discussion/News) Core Resources: * Wikipedia: Hutter Prize * Large Text Compression Benchmark * Interview on Intelligence & Compression & Contest (10min, video) * Presentation by past winner Alex Rhatushnyak * Kolmogorov complexity = the ultimate compression * Universal Artificial Intelligence (book, 45min/1.5h/3h lecture) * Interview on Universal AI with Lex Fridman (1.5h) Further Recommended Technical Reading relevant to the Compression=AI Paradigm: * Franz&al. (2021) A theory of incremental compression * Zenil (2020) Compression is Comprehension, and the Unreasonable Effectiveness of Digital Computation in the Natural World * Yogatama&al. (2019) Learning and Evaluating General Linguistic Intelligence * Zenil&al (2019) Causal deconvolution by algorithmic generative models (3min video) * Everitt&Hutter (2018) Universal Artificial Intelligence: Practical agents and fundamental challenges * Mattern (2016) On Statistical Data Compression * Mahoney (2011) Data Compression Explained * Rathmanner&Hutter (2011) A Philosophical Treatise of Universal Induction (slides, recordings) * Salomon&Motta (2010) Handbook of Data Compression * Janzing&Scholkopf (2010) Causal Inference Using the Algorithmic Markov Condition * Hernandez-Orallo&Dowe (2010) Measuring Universal Intelligence: Towards an Anytime Intelligence Test * Mahoney (2009) Rationale for a Large Text Compression Benchmark (and further references) * Hutter (2007) Universal Algorithmic Intelligence: A Mathematical Top-Down Approach (slides, recordings) * Schmidhuber (2007) The New AI: General & Sound & Relevant for Physics * Cilibrasi&Vitanyi (2005) Clustering by Compression * Wallace (2005) Statistical and Inductive Inference by Minimum Message Length * Sanghi&Dowe (2003) A Computer Program Capable of Passing I.Q. Tests * Catoni (2001) Statistical Learning Theory and Stochastic Optimization * Hutter (201X) Recommended books & Courses for (Under)Graduate Students Post-2020 Discussion (enwik9,EUR500k): * Twitter Announcement (2023) Progress on the Human Knowledge Compression front * HKCP GG Announcement (2023) Saurabh Kumar's fast-cmix wins EUR5187 Hutter Prize Award! * Slashdot Informal Discussion (2023) Sixth 'Hutter Prize' Awarded for Achieving New Data Compression Milestone * Research Snipers Article (2023) New Record Set In Data Compression * Slashdot Informal Discussion (2021) New Hutter Prize Winner Achieves Milestone for Lossless Compression of Human Knowledge * Mike James Article (2021) New Hutter Prize Milestone For Lossless Compression * Discussion (2021) on News YCombinator * Analytics India Magazine Article (2020) Compress Data And Win Hutter Prize Worth Half A Million Euros * Mike James Article (2020) Hutter Prize Now 500,000 Euros * Reddit News (2020) 500,000EUR Prize for distilling Wikipedia to its essence Pre-2020 Discussion (enwik8,EUR50k): * Language Modelling on Hutter Prize * Discussion in the AGI mailing list * Discussion in the Hutter-Prize mailing list * Technical Discussion in the Data Compression Forum encode.su * Discussion in Yahoo Group ai-philosophy * Informal Discussion at Slashdot (13.Aug'06, 29.Oct'06, 10.Jul'07, 21.Feb'20) * In the Online Heise News * In the KurzweilAI.net News * In Mark Nelson's blog (24.Aug'06) * O'Reilly Radar (29.Sep'06) * Discussion at the Accelerating Future page * In the ebiquity news * In WebPlanet News in Russian * Wissenschaft-Wirtschaft-Politik, Ausgabe 34/2006 (22.Aug'06) * Discussion at Newsgroup comp.ai.nat-lang * Discussion at Newsgroup comp.compression and here * Discussion at Newsgroup comp.ai * Prediction market as to when enwik8 will be compressed to Shannon's estimate of 1 bit per character * many other sites Warning: The average quality of the posts in the discussion groups and mailing lists is very low. Most participants don't know the underlying scientific concepts and some have not even read the rationale behind the contest. For a cleaned summary consult the frequently asked questions. The competition was also announced or discussed in many blogs. +-------------------------------------------------------------------+ |Disclaimer: Copying and distribution of this page (http:// | |prize.hutter1.net) is permitted, provided the source is cited. The | |prize will be paid if the solution reflects the spirit of the | |contest. In particular decompressors (secretly) receiving any kind | |of "outside" information are forbidden. Also in order to verify | |your claim we need to be able to run your executable on our | |machines within reasonable space and time constraints. This is a | |privately run and funded contest. Payment of the prize cannot be | |legally enforced. The smallest claimable prize is 5'000EUR. After an | |award, the prize formula (L) will be adapted. Rules may change at | |any time to meet the goals of fairness, accuracy, maximizing public| |participation, and recognizing existing practice. July 2006. | |Updated Feb.2020. | +-------------------------------------------------------------------+