https://web.archive.org/web/20210413060837/http://robertmatthews.org/wp-content/uploads/2016/03/RM-storks-paper.pdf Wayback Machine [http://robertmatthew][Go] 24 captures 03 Aug 2016 - 17 Jan 2022 Feb APR Jan Previous capture 13 Next capture 2020 2021 2022 success fail About this capture COLLECTED BY Collection: Open Syllabus The Open Syllabus collection contains WARC files from a mid-2021 crawl of about 50 million unique seed URLs extracted from the Open Syllabus version 2.6 dataset and their page requisites. The bulk of the seed URLs are from ".com", ".org", ".edu", and ".uk" TLDs. Crawl Summary * Crawl start: 2021-04-12 * Crawl end: 2021-09-05 * Seed URLs: 49,735,419 * Archived URLs: 338,690,414 * Collection Size: 25 TB * Crawler: Heritrix/3.3.0-hq1-SNAPSHOT-2015-03-16T18:09:23Z * Crawl depth: maxHops=0 Seed Summary * Unique URLs: 49,735,419 * Unique Canonical URLs: 48,956,395 * Unique Hosts: 984,223 * IPv4 Addresses: 3,328 * Unique TLDs: 21,761 * Unique IANA Valid TLDs: 739 * Wayback Machine URLs*: 6,568,213 * NOTE: More than 13% URLs in the dataset point to Wayback Machine! TIMESTAMPS loading The Wayback Machine - https://web.archive.org/web/20210413060837/ http://robertmatthews.org/wp-content/uploads/2016/03/ RM-storks-paper.pdf