[HN Gopher] The arXiv of the future will not look like the arXiv
___________________________________________________________________
The arXiv of the future will not look like the arXiv
Author : dginev
Score : 71 points
Date : 2022-03-28 19:14 UTC (3 hours ago)
(HTM) web link (ar5iv.labs.arxiv.org)
(TXT) w3m dump (ar5iv.labs.arxiv.org)
| gsvclass wrote:
| At https://42papers.com/ we want to get more folks reading papers
| our focus in on surfacing papers from arXiv that our community
| would appreciate so we focus on trending papers, improving
| readability, etc
| wolverine876 wrote:
| > the PDF is not a format fit for sharing, discussing, and
| reading on the web. PDFs are (mostly) static, 2-dimensional and
| non-actionable objects. It is not a stretch to say that a PDF is
| merely a digital photograph of a piece of paper.
|
| It is too far a stretch, murdering the poor subject:
|
| PDFs are the best format available for long-term information,
| such as research papers. They have the advantages of digital
| data: Searchable, copy-able, transmittable, and data is
| extractable. They are also an open format, don't rely on a
| central service to be available, and they preserve presentation
| across platforms. They have metadata, and are annotatable and
| reviewable. And the PDF format is the best for long-term
| preservation, carefully designed to be readable in 50 years -
| partly because they preserve presentation across platforms - and
| that includes the metadata, annotations, and reviews.
|
| PDFs _are_ like paper in that they will look the same 50 years
| from now as they do today, unlike (almost?) any other digital
| format.
|
| Yes, I wish they were a bit more dynamic in layout, and that the
| text was more cleanly extracted.
| chaxor wrote:
| One thing I would love to see from the arxiv sites is a publicly
| available download of an SQLite database _. They have a bunch of
| PDFs, and latex source - but the real killer would be a database
| with just the text for each section, and then the ability to_
| generate* the pdf, using various different styles. This would
| save an enormous amount of space, and make things far more tidy.
| I suppose the images could be stored in the SQLite as blobs, but
| there's probably a better way with vector dbs or something.
|
| That's what the future will probably look like. With the SQLite
| decentralized on IPFS or torrent, where only queries get stored
| on each computer, making more popular queries faster to load
| (more peers).
|
| *(or maybe an archive of a tons of zstd parquets for each table?
| - Not sure what the best way to organize several tables in
| parquet is yet)
| enriquto wrote:
| > This would save an enormous amount of space, and make things
| far more tidy.
|
| Why? The output pdf is typically smaller than the input that
| produces it. Using rendered pdfs seems simple and very natural,
| and at worst can use twice the total amount of space.
| azangru wrote:
| > The arXiv of the future is format-neutral and separates format
| from content.
|
| Didn't this use to be Latex's tagline? Separate format from
| content. Which the authors of the article don't find separate
| enough.
|
| How does the proper separation of format from content even work?
| Don't you need to markup your content in order for it to become
| formatted?
| ssivark wrote:
| It's fairly well separated from the perspective of being able
| to write content fairly agnostic of a presentation template,
| and then swap in the required publication template in the end
| (with a few cosmetic tweaks very occasionally).
|
| But LaTeX is largely an extension of TeX, and these markup
| languages seem not very amenable to re-implementing parsing /
| automated processing (given numerous attempts that have
| resulted in stalemates).
| lazyjeff wrote:
| I read this article the other day, "There are four schools of
| thought on reforming peer review" [1] about how there's four
| schools of thought about how to reform publishing and peer
| review. Each of them independently are fairly well received and
| makes sense in itself, at least among my academic circles.
| However, there are tensions between them, so it's hard to come up
| with a solution that's universally satisfying to even the
| majority of stakeholders.
|
| This article about ArXiv is clearly in the "Democracy and
| Transparency school" as categorized article, but it doesn't yet
| address the other three camps. The arxiv article proposes
| machine-readable semantics, easier sharing and discoverability,
| papers + supplementary materials + reviews all open; this floods
| the world with even more publications with varying quality, so
| it's even harder to identify good quality work; and when things
| can be more easily aggregated by machines and measured with the
| alternative metrics proposed, it often leads to a more powerful
| winner-takes-all system that can be gamed (there's now a subtle
| game of increasing citations that appear on Google Scholar);
| finally, with an increase in submissions and materials that go
| along with submissions, it puts an even greater strain on the
| review system. These problems are not unsolvable, but almost
| every idea I've seen proposed so far has only been in a single
| camp, and there's side effects that harm the goals of the other
| three camps. So I'd love to see more ideas that balance the
| interests of all four camps that want to reform peer review and
| publishing.
|
| [1]:
| https://blogs.lse.ac.uk/impactofsocialsciences/2022/03/24/th...
| curiousgal wrote:
| I don't usually read long articles on my phone but the design of
| that page on my Pixel 6 was just so perfect! I hope this becomes
| the norm!
| periheli0n wrote:
| This is precisely their point. Reading the usual Arxiv-PDF on a
| phone is a pain, even if you just want to glance at some key
| parts of the text. Their version is much, much better. It's
| self-promotion by the Authorea team on the platform they are
| competing with (ArXiv), but they have a point.
|
| Arxiv needs to go HTML.
| stncls wrote:
| But the article link is arXiv's own (admittedly experimental)
| HTML5 viewer!! And your parent comment is praising it.
| bee_rider wrote:
| This seems... ambitious.
|
| I think ArXiv (edit: _Actually this is not by ArXiv, but some
| other group_ ) is drastically over-estimating the desire to
| submit papers to their service. They are popular because they
| host the documents you were going to produce, in the format that
| the journals expect. The production of a Arxiv appropriate
| document is a side effect of the actual job, which is writing a
| paper to submit to a journal (hey, I'm as unhappy as you are that
| this is the actual job, but everyone hates publish-or-perish, if
| it could be overthrown it would have been).
|
| "Getting academics to act in a way that is not directly in their
| self-interest because they just love sharing information" is a
| usually a pretty safe bet, but I think this would be a bit too
| far. Unless ArXiv can somehow get journals to expect their format
| (good luck!) I think this is going to be hard.
| stncls wrote:
| The article is not at all by the arXiv people. This is just a
| paper submitted to arXiv (about arXiv). The confusion is
| understandable, because the link is to arXiv's experimental
| HTML5 viewer, not the usual format (which would be:
| https://arxiv.org/abs/1709.07020).
|
| The authors are from Authorea.com, a for-profit that wants to
| replace arXiv.
|
| Edit: Aside from that, fully agree with you. Good luck to them.
| bee_rider wrote:
| Ah, thanks for the correction, that really changes things!
| 0lmer wrote:
| I'm still wandering about a service that would be to arXiv what
| Github became to Sourceforge. Order of magnitude improvement of
| collaboration and interconnection between published materials.
| tempnow987 wrote:
| "sharing research via PDF must inevitably come to an end."
|
| Maybe instead of using the obsolete toolset arxiv provides, they
| could host their groundbreaking research on their own platform?
| The combination of ground breaking features and insightful
| commentary would draw users?
|
| Actually, many of the negatives they list are positives in my
| book. The latex barrier screens out a ton of garbage in my view -
| I'm on some social science / word based research lists, and the
| quality of stuff is mind bogglingly bad.
|
| Getting stuff it fit into a PDF (instead of the NY times new
| scrollable story stuff) makes grabbing or print off or even
| reading easy - less dynamic is good in my book.
| kkfx wrote:
| A small proposal: why not a PopcornTime of papers? Witch means a
| distributed network (no matter if BitTorrent, ZeroNet, GNUNet,
| I2P or something else) to publish? That's the best freedom
| guarantee and just the mere number of nodes with a paper is a
| good metric about it's popularity, to avoid oblivion each
| uni/researcher can easily store and serve their own papers
| forever: files are small, so download is quick, not much
| resources are needed.
| PeterisP wrote:
| What problems does it solve for the authors? The features you
| describe above don't seem a problem in the current solutions;
| freedom and availability is a non-issue for authors, "to avoid
| oblivion each uni/researcher can easily store and serve their
| own papers forever" is a flaw not a feature (there are already
| far too many ways to do that, which only add extra burden to
| the authors if they want to "be everywhere" for the sake of
| availability), it doesn't seem that it would be easier than the
| current way; the resources/effort needed would be small but
| non-zero, so it sounds like just an extra annoyance, not
| something beneficial.
|
| And if it solves some problems for someone else but not the
| authors, then how would a comprehensive majority of papers
| enter the system? Papers are even less interchangeable than
| movies; if you want to have a particular movie and it isn't
| available on PopcornTime, you might watch something else, for
| papers you just have to go elsewhere that actually does have
| everything.
| wcerfgba wrote:
| Readers may find the Octopus project interesting:
|
| > Designed to replace journals and papers as the place to
| establish priority and record your work in full detail, Octopus
| is free to use and publishes all kinds of scientific work,
| whether it is a hypothesis, a method, data, an analysis or a peer
| review.
|
| > Publication is instant. Peer review happens openly. All work
| can be reviewed and rated.
|
| > Your personal page records everything you do and how it is
| rated by your peers.
|
| > Octopus encourages meritocracy, collaboration and a fast and
| effective scientific process.
|
| > Created in partnership with the UK Reproducibility Network.
|
| https://science-octopus.org/
| akvadrako wrote:
| It's fascinating to imagine what the arxiv of the future would
| look like.
|
| I imagine all scientific publications available on a distrusted
| block store, including raw emails, data and notes on a voluntary
| basis.
|
| Stuff that could be published would include reviews, corrections
| in version control fashion, and enough metadata to model
| scientific progress.
|
| What this article is describing sounds reasonable but not game
| changing.
| stncls wrote:
| The authors first list some issues with arXiv. Next, they
| describe how to fix those issues. Then the good news arrives:
| this improved arXiv already exists. It's called Authorea.com. All
| three authors are Authorea.com employees. They do disclose it as
| their affiliation. Still, this is essentially an ad written in
| LaTeX.
|
| They correctly point out a few of the limitations of arXiv
| (mostly: static LaTeX and PDFs). But I profoundly dislike the
| other things they propose:
|
| 1. "open comments and reviews". I have no problem with open
| reviews on a third-party website, but arXiv is literally a
| "distribution service". It has one job and does it pretty well. I
| don't want it to turn into Reddit or (worse?) ResearchGate.
|
| 2. "alternative metrics". Enough with the metrics already. We all
| know they're destructive, at least all that have been tried so
| far. I didn't even know that arXiv showed some bibliometrics
| (because they are _thankfully_ hidden behind default-disabled
| switches). Their proposed alternatives? "How many times a paper
| has been downloaded, tweeted, or blogged." I am not joking, this
| is what they propose to include in addition to citations.
| Seriously???
|
| PS: Just a heads-up to anyone who, like me, would be wondering
| about the ar5iv.labs.arxiv.org link. The article is a regular
| paper submitted to arXiv. The authors do not belong to the
| organization maintaining arXiv. The usual link is:
| https://arxiv.org/abs/1709.07020
|
| The ar5iv.labs.arxiv.org thing is an experimental html5 paper
| viewer by the arXiv people.
|
| Edit: typos.
| jimhefferon wrote:
| Thanks. It was not clear to me whether this is a white paper by
| the arXiv people, or talk by external folks.
|
| I now see that Wikipedia says this.
|
| _Authorea was launched in February 2013 by co-founders Alberto
| Pepe and Nathan Jenkins and scientific adviser Matteo
| Cantiello, who met while working at CERN. They recognized
| common difficulties in the scholarly writing and publishing
| process. To address these problems, Pepe and Jenkins developed
| an online, web-based editor to support real-time collaborative
| writing, and sharing and execution of research data and code.
| Jenkins finished the first prototype site build in less than
| three weeks.
|
| Bootstrapping for almost two years, Pepe and Jenkins grew
| Authorea by reaching out to friends and colleagues, speaking at
| events and conferences, and partnering with early adopter
| institutions.
|
| In September 2014, Authorea announced the successful closure of
| a $610K round of seed funding with the New York Angels and ff
| Venture Capital groups. In January 2016, Authorea closed a
| $1.6M round of funding led by Lux Capital and including the
| Knight Foundation and Bloomberg Beta. It later acquired the VC-
| backed company The Winnower.
|
| In 2018 Authorea was acquired for an undisclosed amount by
| Atypon (part of Wiley)._
| sdenton4 wrote:
| I don't really see how a for-profit preprint service is
| desirable, given the terrible track record of other for-
| profit entities in academic publishing. The extra features
| will be great until the gatekeeping kicks in after the first
| missed funding round...
| einpoklum wrote:
| They lost me at suggesting that a future ArXiv should be
|
| > Web-native and web-first
|
| Absolutely not. It should be "physical paper first". Any long-
| term archiving cannot rely on electrical devices for viewing
| archived material. Electrical grids fail. Technology changes.
| Even if ArXiv is not a print archive, the material in it must be,
| first and foremost, printable in a consistent manner, and with
| the authors targeting the physical printed form. Of course, one
| would need to actually print ArXiv items to physically archive
| them, but still.
|
| Now, of course archiving data is useful and important; and large
| amounts of data are less appropriate for print archiving. But
| that should always be secondary to the archiving on knowledge.
___________________________________________________________________
(page generated 2022-03-28 23:00 UTC)