[HN Gopher] ArXiv declares independence from Cornell
___________________________________________________________________
ArXiv declares independence from Cornell
Author : bookstore-romeo
Score : 790 points
Date : 2026-03-20 04:24 UTC (1 days ago)
(HTM) web link (www.science.org)
(TXT) w3m dump (www.science.org)
| adamnemecek wrote:
| Good call, ArXiv seems like one of the most important
| institutions out there right now.
| p-e-w wrote:
| It's so important, in fact, that there should be more than one
| such institution.
|
| People keep falling into the same trap. They love monopolies,
| then are shocked when those monopolies jerk them around.
| andbberger wrote:
| there is. bioarxiv.
| auggierose wrote:
| I am using Zenodo for a while now instead. It is more user
| friendly, as well.
| mastermage wrote:
| Zenodo is more for IT Papers and also datasets isn't it?
| auggierose wrote:
| It can host large datasets as well, yes. It is hosted by
| CERN, so it is not specifically IT in any way. It also
| allows you to restrict access to the files of your
| submission. It has no requirements to submit your LaTeX
| sources, any PDF will be fine. There are also no
| restrictions on who can publish. You'll get a DOI, of
| course.
|
| Everything published on arXiv could also be published on
| Zenodo, but not the other way around.
| mastermage wrote:
| oh interesting I didnt know this
| jruohonen wrote:
| Zenodo is great too, yes, but their meta-data management
| is somewhat problematic; i.e., it can be changed at whim,
| which makes indexing difficult.
| Al-Khwarizmi wrote:
| I like it as well, it works great. But I wonder if it would
| scale if at some point there were a massive exodus from
| arXiv.
| auggierose wrote:
| I think it already hosts much more data than arXiv, given
| that they also host large datasets.
| freehorse wrote:
| It is just a preprint repository. It is pretty open (the
| stories where a preprint was rejected or delayed unreasonably
| are extremely rare). It offers the basic services for a
| math/compsci/physics themed preprint repository.
|
| I don't see much of a monopoly, nor any "moat" apart from it
| being recognised. You can already post preprints on a
| personal website or on github, and there are "alternatives"
| such as researchgate that can also host preprints, or zenodo.
| There are also some lesser known alternatives even. I do not
| see anything special in hosting preprints online apart from
| the convenience of being able to have a centralised place to
| place them and search for them (which you call "monopoly").
| If anything, the recognisability and centrality of arxiv
| helped a lot the old, darker days to establish open access to
| papers. There was a time when many journals would not let you
| publish a preprint, or have all kinds of weird rules when you
| can and when you can't. Probably still to some degree.
| koakuma-chan wrote:
| it just hosts pdfs, no?
| aragilar wrote:
| It does do a fair amount of filtering of submissions, and
| it's a long term archive (e.g. for the next 100+ years). I
| suspect both (but with the former dominating) are the issue.
| bonoboTP wrote:
| Just put out a torrent and people of the sort at
| r/DataHoarder will keep it alive for longer than
| bureaucrats.
| pfortuny wrote:
| Also the sources and has a very tame but useful pre-
| acceptance process.
| freehorse wrote:
| Well, technically, it can also compile your tex file if you
| upload the tex file instead of the pdf directly, which helps
| a lot in standardizing the stylistic structure between
| preprints. Most other repositories are wild west and
| inconsistent. I really appreciate the similarity in style
| applied to most preprints there. Moreover, this means you can
| also download not just the pdf, but the source tex file to,
| which can be very useful.
| bonoboTP wrote:
| The similarity in style comes from conference and journal
| templates, not from Arxiv. You can style your paper with
| latex in any style, Arxiv doesn't care. On Arxiv you mostly
| see preprints that people submit to conferences and
| journals and they enforce the style.
| IshKebab wrote:
| Technically yes, socially no.
| kergonath wrote:
| The French government put a bit of money on the table to help
| researchers fulfil their open science requirements for
| government and EU grants, and funded the HAL repository (
| https://hal.science/ ). It's much smaller than arXiv, but it
| exists. In other countries like the UK there are clusters of
| smaller repositories as well, but it's not as well centralised.
| dataflow wrote:
| This sounds terrible. Of _course_ there 's a huge risk of it
| becoming made for-profit. It almost makes you wonder if the
| academic publishers are behind this push somehow.
|
| Could they not have made it into some legal structure that puts
| universities at the top? Say, with a bunch of universities owning
| shares that comprise the entirety of the ownership of arXiv, but
| that would allow arXiv to independently raise funds?
| gucci-on-fleek wrote:
| > Of course there's a huge risk of it becoming made for-profit.
|
| The article says that "it will become an independent nonprofit
| corporation", and as OpenAI's failed attempt showed, converting
| a non-profit to a for-profit organization is either _really_
| hard or impossible.
|
| > Could they not have made it into some legal structure that
| puts universities at the top?
|
| As a corporation (even a non-profit one), it will have a board
| of directors. I have no idea what their charter will look like,
| but I would be surprised if at least one seat wasn't reserved
| for a university representative, and more than that seems quite
| likely as well.
| MostlyStable wrote:
| OpenAI didn't get everything that they wanted, but I very
| much disagree with calling it a "failed attempt". The non-
| profit went from owning the entirety of OpenAI to having ~25%
| stake.
| gucci-on-fleek wrote:
| Ah, thanks for the correction.
| ronsor wrote:
| Sam Altman is a special kind of person; not many could pull
| off the schemes he does.
| gentleman11 wrote:
| I doubt it was him who architected it. A team of lawful
| evil lawyers more likely
| cbolton wrote:
| The non-profit still controls the board doesn't it?
| weedhopper wrote:
| As shown by Altman, not really.
| mort96 wrote:
| Is your argument _really_ that "OpenAI was an independent
| nonprofit corporation and it worked out great, Arxiv will
| remain just as non-profit as OpenAI"?
| gucci-on-fleek wrote:
| No, my argument is that OpenAI could make billions of
| dollars if they converted from a non-profit to a for-
| profit, and they only succeeded after years of effort and
| because they had already structured the company into
| separate for-profit and non-profit entities. And even after
| all this, the non-profit still controls the majority of the
| for-profit entity.
|
| So if OpenAI with billions of dollars only partially
| succeeded at converting to a for-profit business, then that
| suggests that organizations with fewer resources (like
| arXiv) have much worse odds.
| halperter wrote:
| Statement by arXiv: https://tech.cornell.edu/arxiv/
| reed1234 wrote:
| Should be the main link. The original article is based on the
| CEO job posting.
| tornikeo wrote:
| Now the question is, will arxiv wage a decade long bloody war
| with Cornell, using heavy infantry (PhD students), archers
| (reviewers) and field artillery (AI slop papers), or will the
| independence be mostly peaceful? Only time can tell.
| alansaber wrote:
| PhD students are levy infantry at best with Postdocs being the
| armoured levies.
| dmos62 wrote:
| Is this Gondor or Mordor?
| psalminen wrote:
| I might be missing something, but I still don't get the why. I
| don't see any "problem" that needs to be solved.
| kolinko wrote:
| The article lists the reasons quite clearly.
| binsquare wrote:
| For everyone else,
|
| The reason is because arxiv is growing significantly leading
| to 297,000 deficit in operating costs for 2025 alone.
| Corenell has helped with donation a long with other
| organizations that pay membership fees.
|
| As a result, donors + leaders of arxiv think it's best to
| spin off to increase funding.
| vl wrote:
| What is unclear why they need stuff of 27 and 6.7 million
| to operate essentially static hosting website in 2026.
| swiftcoder wrote:
| The "essentially static hosting" isn't the cost centre
| (although with 5 million MAU, it's nothing to sneeze at).
| The real costs are on the input side - they have an
| ingestion pipeline that ensures standardised paper
| formatting and so on, plus at least some degree of human
| review.
| bonoboTP wrote:
| Do you mean that the CPU compute cost of turning latex
| into pdf/HTML is the main cost?
| swiftcoder wrote:
| No, I mean that the pipeline requires software engineers
| to build/maintain, and salaries are (as in basically
| every tech organisation) the dominant cost
| bonoboTP wrote:
| Then drop it and make people upload a pdf and a zip of
| the latex sources.
|
| Most people I talk to hate that pipeline and spend a lot
| of debug hours on it when Arxiv can't compile what
| overleaf and your local latex install can.
| domoritz wrote:
| Arxiv can recompile latex to support accessibility and
| html. Going to pdf submissions would be a major step
| backward.
| bonoboTP wrote:
| Make it an external service then, and leave the thing
| that's already working great to just be.
|
| The reason authors like and use arxiv is that it gives 1)
| a timestamp, 2) a standardized citable ID, and 3) stable
| hosting of the pdf. And readers like the no-nonsense
| single click download of the pdf and a barebones
| consistent website look.
|
| All else is a side show.
| OneDeuxTriSeiGo wrote:
| You have to keep in mind that an increasing portion of
| their time and labor is going towards moderation and
| filtering due to a mass influx of nonsensical AI
| generated papers, non-academic numerology-tier hackery,
| and other useless drivel.
|
| Spinning the service off forces other the labor out onto
| other universities rather than leaving them to solely
| Cornell
| bonoboTP wrote:
| Is the problem the storage cost for hosting them, the
| HDDs? I'm sure they can be offloaded to cold storage
| because most of that slop won't be opened by anyone.
|
| Arxiv doesn't need moderation. Nobody is asking for Arxiv
| moderation. It needs minimal checks to remove overtly
| illegal content.
| swiftcoder wrote:
| > Arxiv doesn't need moderation. Nobody is asking for
| Arxiv moderation
|
| Seems like a lot of people _are_ asking for moderation.
| And moderation is a pretty big part of the existing
| offering[1].
|
| [1]: https://info.arxiv.org/help/moderation/index.html
| OneDeuxTriSeiGo wrote:
| > Is the problem the storage cost for hosting them, the
| HDDs?
|
| No. Around half the cost is infrastructure. The other
| half of the cost is people. i.e. engineers to maintain
| infra and build mod tools for moderators to operate.
|
| > Arxiv doesn't need moderation. Nobody is asking for
| Arxiv moderation.
|
| This is just not true. Tons of people ask for arxiv to
| have moderation. Especially since covid, etc when
| antivaxxers and alternative medicine peddlers started
| trying to pump the medical categories of arxiv with quack
| science preprints and then go on to use the arxiv
| preprint and its DOI to take advantage of non academics
| who don't really understand what arxiv is other than it
| looks vaguely like a journal.
|
| And doubly so now that people keep submitting AI
| generated slop papers to the service trying to flood the
| different categories so they can pad their resumes or
| CVs. And on top of that people who don't actually
| understand the fields they are trying to write papers in
| using AI to generate "innovative papers" that are
| completely nonsensical but vaguely parroting the terms of
| art.
|
| The only reason you don't see more people calling for
| arxiv moderation is because they already spend so much
| time on it. If they were to stop moderating the site it
| would overflow into an absolute nightmare of garbage near
| overnight. And people wouldn't be upset with the users
| uploading this of course, they'd be upset with arxiv for
| failing to take action.
|
| Moderation is inherently unappreciated because in the
| ideal form it should be effectively invisible (which
| arxiv's mostly is).
|
| If you want to see the type of stuff that arxiv keeps
| out, go over to ViXrA [1] or you can watch k-theory's
| video [2] having fun digging through some of the quality
| posts that live over on that site.
|
| 1. https://en.wikipedia.org/wiki/ViXra
|
| 2. https://www.youtube.com/watch?v=1at9BjQP8CI
| lou1306 wrote:
| The PDF formatting is all but standardised. They ingest
| LaTeX sources, which is formatted according to the
| authors' whims (most likely, according to whatever
| journal or conference they just submitted the manuscript
| to). I'll concede that the (relatively novel) HTML
| formatter gives paper a more uniform appearance. They
| also integrate a bunch of external services for e.g.,
| citation metrics and cross-references. Still hard to
| justify such a high cost to operate, but eh.
|
| Also, the "human review" is a simple moderation process
| [1]. It usually does not dig into the submission's
| scientific merits.
|
| [1] https://info.arxiv.org/help/moderation/index.html
| OtherShrezzing wrote:
| I don't see it as an especially exuberant structure or
| budget. I've seen larger teams with bigger budgets
| struggle to maintain smaller applications.
|
| I've contracted into some consultancy teams which you
| could uncharitably describe as "15 people and $4mn/yr to
| create one PDF per month".
| sanex wrote:
| Now they're going to have a deficit of 600,000 in operating
| costs.
| pessimizer wrote:
| > The reason is because arxiv is growing significantly
| leading to 297,000 deficit in operating costs for 2025
| alone.
|
| Dollars? So 300 people's cable bill? That's basically
| nothing. They're spending too much, and it's still nothing,
| and the solution is going to be to privatize it and
| eventually loot it.
|
| You can't hand out a collection plate and get $300K for
| Arxiv? Your local neighborhood church can. Civilization is
| obviously collapsing.
| u1hcw9nx wrote:
| I think the problem described in 6th paragraph needs to be
| solved.
| davnicwil wrote:
| Very unrelated to the article, but I think 'arXiv' as a brand is
| bad, and really detrimental to what the institution aims to
| accomplish.
|
| That is, it's not readily parseable, it really gives an insider
| term vibe - like this isn't for you if you don't already know
| what it means or how you should read or say it. It sort of
| reminds me of the overuse of latin and latinate terms generally
| in the old professions and, well, the academy.
|
| Just always struck me as being somewhat at odds with the goal.
| john-titor wrote:
| I wonder what makes you feel that. I've been publishing
| preprints close to a decade on arxiv now and never had any
| particular feelings about it.
|
| To me it's just a way to get out your work fast, so that there
| is already a trace of it on the Internets - nothing more and
| nothing less.
|
| > That is, it's not readily parseable, it really gives an
| insider term vibe...
|
| Isn't that normal with highly specialized research fields? I
| agree many papers could benefit from clearer wording, but
| working in a niche means you sometimes don't reach a broader
| audience
| davnicwil wrote:
| It's an opinion, and you feeling no particular way about it
| is equally valid.
|
| But I did justify and maybe to reword slightly, surely if one
| of the main drivers is opening up research, the brand name
| should be something that's less obscure and more accessible /
| understandable as to what it is on first sight?
|
| Maybe arXiv evoking the word 'archive' with an ancient Greek
| twist does that for some, but it's clearly a bit cryptic for
| many, and if the point is to open up probably the brand
| should just be something much plainer.
| aragilar wrote:
| No, it's to be a pre-print server. If someone doesn't know
| what that means, then they shouldn't be using arXiv.
| davnicwil wrote:
| everyone has a first time they see a thing and don't yet
| know what it is.
|
| Using a brand as a filter where you have to already know
| what it means to get it is exactly the opposite of what
| it's supposed to achieve.
|
| Consider the most exclusive (successful) brands that
| exist. Even there, where exclusivity is a brand goal,
| none of them have this property of being obscure on first
| contact.
| bonoboTP wrote:
| You usually get introduced to it by your academic
| supervisor or collaborators as a masters or PhD student.
| If you're a solo researcher who has made a significant
| contribution on the frontier of science, I'm sure you'll
| be able to understand how Arxiv works as well. Because I
| assume you have had some conversations with other experts
| in the field. If you're a full on autodidact with no
| contact to any other researchers in the field, well,
| maybe it's better if you chat with some other people in
| that field.
|
| Its reasonable to have a tradeoff here to avoid cranks
| and now AI psychosis slop. You can still post on research
| gate and academia.edu or you own github page or
| webhosting.
| Cordiali wrote:
| I've never even connected the 'X' to the Greek letter chi.
| I just kinda accepted it as one of many groovy web 2.0
| misspellings in search of a domain and trademark.
| matt-noonan wrote:
| This is particularly funny because arXiv doesn't just
| predate Web 2.0, it nearly predates the public web
| entirely (only missing it by about two weeks)
| nixon_why69 wrote:
| > like this isn't for you if you don't already know what it
| means
|
| Isn't that actually kindof a good brand signal for a repo of
| very specialized papers? "Fun with learning" in comic sans
| wouldn't help credibility.
| vasco wrote:
| This the type of guy that will suggest paper.ly as a better
| name with a straight face and then we wonder why the internet
| is turning to shit
| jltsiren wrote:
| It's a classic story of someone having to pick a name quickly,
| which then gets established long before anyone who cares about
| branding is aware of its existence.
|
| The original service didn't even have a name, only a
| description, and it was amusingly hosted at xxx.lanl.gov. But
| LANL wasn't really interested in it, and the founder eventually
| left for Cornell. At that point, the service needed a domain
| name, but archive.org was already taken.
|
| And besides, the name has Ancient Greek influences. A similar
| Latinate term might be something like "archive".
| davnicwil wrote:
| Interesting, thanks for the context! Makes it more
| understandable as a choice.
| bonoboTP wrote:
| I thought the X was an allusion to LaTeX.
| jltsiren wrote:
| Usually, when you see "ch" in a Latin word, it represents a
| "kh" in the original Greek word. Both TeX and arXiv use "X"
| to represent it instead. TeX because Knuth chose to be
| fancy, and arXiv because "archive" was no longer available.
| vulcan01 wrote:
| By your criterion, Google, Apple, and Amazon are terrible names
| as well.
| davnicwil wrote:
| > if you don't already know what it means or how you should
| read or say it
|
| Google I'll grant you, though it's still pretty phonetic and
| easy to read. The other two not at all, they're incredibly
| well known instantaneously recognisable words.
| spiralcoaster wrote:
| You're right. The name is just classic gatekeeping and elitist,
| clearly. I am 100% certain that's why they chose it. If they
| really cared about inclusion, they would have called it
| research.io
| OutOfHere wrote:
| With 300K for the CEO, its enshittification will commence
| imminently. It will now serve to maximize revenue. Just wait and
| watch while they issue a premium membership, payment requirements
| for authors, and other revenue generators to please their
| investors.
| exe34 wrote:
| they'll just turn into a shitty journal at this point, they
| just need to introduce peer review and they can start competing
| with the real journals on price point.
|
| another will need to rise to take its place.
| OutOfHere wrote:
| > they'll just turn into a shitty journal at this point
|
| To this end, they added an endorsement requirement this year:
| https://blog.arxiv.org/2026/01/21/attention-authors-
| updated-...
| Peteragain wrote:
| .. and soon to be dependent on US military funding? Controlled by
| someone who has run-ins with universities? This'll end in tears.
| Garlef wrote:
| Maybe they should implement a graph based trust system:
|
| You need your favourite academic gatekeeper (= thesis advisor) to
| vouch for you in order to be allowed to upload.
|
| Then AI slop gets flagged and the shame spreads through the
| graph. And flaggings need to have evidence attached that can
| again be flagged.
| dmos62 wrote:
| I've often thought that similar trust systems would work well
| in social media, web search, etc., but I've never seen it
| implemented in a meaningful way. I wonder what I'm missing.
| IshKebab wrote:
| Lobsters has this I think. But it also means I've never
| posted there.
| pred_ wrote:
| The endorsement system already works along that line:
| https://info.arxiv.org/help/endorsement.html
|
| It's probably not perfect but in practice, it seems to have
| been enough to get rid of the worst crackpotty spam.
| ryangibb wrote:
| You mean like endorsement?
| https://info.arxiv.org/help/endorsement.html
| justinnk wrote:
| They already had a basic form of this for a while [1]
|
| > arXiv requires that users be endorsed before submitting their
| first paper to arXiv or a new category.
|
| [1] https://info.arxiv.org/help/endorsement.html
| ChrisGreenHeur wrote:
| Science reduced to people with a phd?
| budman1 wrote:
| not a bad first order filter.
|
| can you think of a better one?
| awesome_dude wrote:
| The whole point of the scientific method was that we could
| ignore the source of the information, and were instead
| expected to focus on the value of the information based on
| supporting evidence (data).
|
| If we go back to "Only people that have been inducted into
| the community can publish science" we're effectively saying
| that only the high priests can accrue knowledge.
|
| I say this knowing full well that we have a massive problem
| in science on sorting the wheat from the chaff, have had so
| for a VERY long time, and AI is flooding the zone (thank
| you political commentator I despise) with absolute dross.
| frankling_ wrote:
| The recent announcement to reject review articles and position
| papers already smelled like a shift towards a more "opinionated"
| stance, and this move smells worse.
|
| The vacuum that arXiv originally filled was one of a glorified
| PDF hosting service with just enough of a reputation to allow
| some preprints to be cited in a formally published paper, and
| with just enough moderation to not devolve into spam and chaos.
| It has also been instrumental in pushing publishers towards open
| access (i.e., to finally give up).
|
| Unfortunately, over the years, arXiv has become something like a
| "venue" in its own right, particularly in ML, with some decently
| cited papers never formally published and "preprints" being cited
| left and right. Consider the impression you get when seeing a
| reference to an arXiv preprint vs. a link to an author's
| institutional website.
|
| In my view, arXiv fulfills its function better the less power it
| has as an institution, and I thus have exactly zero trust that
| the split from Cornell is driven by that function. We've seen the
| kind of appeasement prose from their statement and FAQ [1]
| countless times before, and it's now time for the usual routine
| of snapshotting the site to watch the inevitable amendments to
| the mission statement.
|
| "What positive changes should users expect to see?" - I guess the
| negative ones we'll have to see for ourselves.
|
| [1] https://tech.cornell.edu/arxiv/
| hijodelsol wrote:
| I came here to say something similar. As someone who works in a
| field that applies machine learning but is not purely focused
| on it, I interact with people who think that arXiv is the only
| relevant platform and that they don't need to submit their work
| to any journal, as well as people who still think that
| preprints don't count at all and that data isn't published
| until it's printed in an academic journal. It can feel like a
| clash of worlds.
|
| I think both sides could learn from the other. In the case of
| ML, I understand the desire to move fast and that average time
| to publication of 250-300 days in some of the top-tier journals
| can feel like an unnecessary burden. But having been on both
| sides of peer review, there is value to the system and it has
| made for better work.
|
| Not doing any of it follows the same spirit as not benchmarking
| your approach against more than maybe one alternative and that
| already as an after-thought. Or benchmaxxing but not exploring
| the actual real-world consequences, time and cost trade offs,
| etc.
|
| Now, is academic publishing perfect? Of course not, very very
| far from it. It desperately needs to be reformed to keep it
| economically accessible, time efficient for both authors,
| editors and peer reviewers and to prevent the "hot topic of the
| day" from dominating journals and making sure that peer review
| aligns with the needs of the community and actually improves
| the quality of the work, rather than having "malicious peer
| review" to get some citations or pet peeves in.
|
| Given the power that the ML field holds and the interesting
| experiments with open review, I would wish for the field to
| engage more with the scientific system at large and perhaps try
| to drive reforms and improve it, rather than completely
| abandoning it and treating a PDF hosting service as a journal
| (ofc, preprints would still be desirable and are important, but
| they can not carry the entire field alone).
| bonoboTP wrote:
| Simply anticipating basic push backs from reviewers makes
| sure that you do a somewhat thorough job. Not 100% thorough
| and the reviews are sometimes frivolous and lazy and stupid.
| But just knowing that what you put out there has to pass the
| admittedly noisily gatekept gate of peer review overall
| improves papers in my estimation. There is also a negative
| side because people try to hide limitations and honest
| assessments and cherry pick and curate their tables more in
| anticipation of knee jerk reviewers but overall I think
| without any peer review, author culture would become much
| more lax and bombastic and generally trend toward engagement
| bait and social media attention optimized stuff.
|
| The current balance where people wrote a paper with reviers
| in mind, upload it to Arxiv before the review concludes and
| keep it on Arxiv even if rejected is a nice balance. People
| get to form their own opinion on it but there is also enough
| self-imposed quality control on it just due to wanting it to
| pass peer review, that even if it doesn't pass peer review,
| it is still better than if people write it in a way that
| doesn't care or anticipate peer review. And this works
| because people are somewhat incentivized to get peer reviewed
| official publications too. But being rejected is not the end
| of the world either because people can already read it and
| build on it based on Arxiv.
| bjourne wrote:
| I really am not sure about that:
| https://biologue.plos.org/wp-
| content/uploads/sites/7/2020/05...
|
| The problem is that "optimizing for peer-review" is not the
| same thing as optimizing for quality. E.g., I like to add a
| few tongue-in-cheeks to entertain the reader. But then I
| have to worry endlessly about anal-retentive reviewers who
| refuse to see the big picture.
| bonoboTP wrote:
| Currently a kind of rule of thumb is that a PhD student
| can graduate after approximately 3 papers published in a
| good peer reviewed venue.
|
| If peer review were to go away, this whole academic
| system would get into a crisis. It's dysfunctional and
| has many problems but it's kinda load bearing for the
| system to chug along.
| DANmode wrote:
| No hard rule, no crisis.
|
| Maybe we can go back to very opinionated "true" academia,
|
| where there are institutional gatekeepers,
|
| but they _mostly_ get it right on who to award (and not),
|
| vs the current game of
|
| "whoever plays ball with funding sources the best = the
| best academic",
|
| which is obviously bullshit.
| vkou wrote:
| You'll still need to convince the purseholders to pay
| you, and they'll want some objective metric to measure
| your output, and whatever metric they pick will be gamed.
| DANmode wrote:
| The point of my comment was,
|
| in much earlier institutions of knowledge and excellence,
|
| the only transparent metric was whether or not they
| approved you.
| vkou wrote:
| That ossifies intellectual monocultures, though. (Or,
| heaven forbid, if someone has a financial conflict of
| interest in the private sphere...)
| DANmode wrote:
| The current solution _doesn't_ resist capture by capital
| either,
|
| and indeed we're _already left_ with all of the things
| claimed - the worst of both worlds, really.
| fc417fc802 wrote:
| But this is already how the purse holders operate. A big
| group of experts get together and vote on which grant
| proposals within a given category to fund.
|
| I think it comes down to how the system is structured and
| how many players there are. The more difficult it is for
| a small cult to capture control of the funding (or access
| to instrumentation or awarding of degrees or whatever)
| for a given area the less likely you are to end up with a
| monoculture.
|
| Assuming the majority of the funding continues to come
| from governments then you have a centralized point of
| leverage that can shape the system. So it should be
| possible to impose constraints that result in a system
| that actively prevents monocultures from developing.
| mitthrowaway2 wrote:
| Maybe their institution should evaluate whether their
| papers pass muster? It's the one conferring the degree.
| StableAlkyne wrote:
| I've noticed it's field dependent. Some fields don't really
| feel much need to publish in a real journal.
|
| Others (at least in chemistry) will accept it, but it raises
| concern if a paper is _only_ available as a preprint.
| pie_flavor wrote:
| You may have delivered value in peer review, but on the
| whole, peer review delivers negative value.
| https://www.experimental-history.com/p/the-rise-and-fall-
| of-...
|
| The arXiv vs journal debate seems a lot like 'should the work
| get done, or should the work get certified' that you see all
| over 'institutions', and if the certification does not
| actually catch frauds or errors, it's not making the
| foundations stronger, which is usually the only justification
| for the latter side.
| fc417fc802 wrote:
| Can't say I agree with that position.
|
| Responding largely to the linked article, you can't just
| ignore the massive increase in funding and associated
| output that occurred. Scaling almost any system up will be
| expected to result in creative new failure modes. It's easy
| to observe that a system isn't great and suppose that
| removing it would improve things but this very often isn't
| the case. Democracy is one such example.
|
| There's also the publishing ecosystem that developed around
| the increased funding. It isn't clear to me why any blame
| (if it's even valid, see preceding paragraph) should be
| laid at the feet of the practice of peer reviewing
| publications rather than such an obviously dysfunctional
| institution.
|
| Even if we accept the way in which publications have been
| undergoing peer review to somehow be the root of all evil
| (as opposed to the for profit publication of taxpayer
| funded work) - there's more than one way to go about it! A
| glaringly obvious problem, mentioned in the linked article
| yet not meaningfully addressed that I saw, is that peer
| reviewers aren't paid. If this was a compensated task
| presumably it would be performed much more rigorously.
| Building inspectors aren't volunteers and they seem to do a
| good enough job.
| observationist wrote:
| What's the value of academic publishing over the arxiv model
| of freely publishing, free access, and a global, vigorous
| discussion across a wide range of platforms, with experts,
| researchers, amateurs, institutions, and the peanut gallery
| all having the opportunity to participate?
|
| What possible value does a journal like Nature, for example,
| bring to the table by claiming a paper for themselves and
| charging people for it, given the alternative?
|
| I don't see any value there. Maintaining an exclusive clique
| by using artificial scarcity while coasting on the dregs of
| reputation remaining to a once prestigious institution is
| what a lot of these journals are doing.
|
| The world has changed. There's no need for that sort of pay
| to play gatekeeping, and in fact, the model does tremendous
| damage to academic and intellectual integrity. It allows
| people to get away with fraud and it makes the institutions
| motivated to hide and cover it up so as to not damage their
| own reputations by admitting anything slipped by them.
|
| If you contrast the damage done by journals, with regards to
| suppressed research, gatekept access, money taken from
| researchers and readers alike, against the value they might
| plausibly provide, the answer is clear.
|
| They're not needed anymore. The AI era, since 2017, has
| thoroughly demonstrated that journals are materially
| incapable of keeping up, that they're unable to meaningfully
| contribute to the field, and that their curation or other
| involvement has no effective practical value. The same is
| true for other fields, but everyone involved wants to keep
| their piece of the grift going as long as possible.
|
| We don't need them, anymore. I suspect we never did.
| jltsiren wrote:
| The value is the ability to do science as a career without
| being independently wealthy.
|
| Politicians, administrators, donors, and taxpayers don't
| want scientists deciding on their own how to spend the
| money. They want control over what gets funded. They want
| funding decisions with justifications they can understand.
| But they don't understand the science itself, so they need
| "objective" metrics to support the decisions. And because
| those metrics matter, people will inevitably game them.
| ph4rsikal wrote:
| My observation is that research, especially in AI has left
| universities, which are now focusing their research to a lesser
| degree on STEM. It appears research is now done by companies
| like Meta, OpenAI, Anthropic, Tencent, Alibaba, among many
| others.
| bonoboTP wrote:
| Universities (outside a few) just have much weaker PR
| machines so you never hear what they do. Also their work is
| not user facing products so regular people, even tech power
| users won't see them.
| 0x3f wrote:
| Not sure about that. How would a university test scaling
| hypotheses in AI, for example? The level of funding
| required is just not there, as far as I know.
| rsfern wrote:
| This issue of accessibility is widely acknowledged in the
| academic literature, but it doesn't mean that only large
| companies are doing good research.
|
| Personally I think this resource mismatch can help drive
| creative choice of research problems that don't require
| massive resources. To misquote Feynman, there's plenty of
| room at the bottom
| oscaracso wrote:
| Universities are also not suited to test which race car
| is the fastest, but that does not obviate the need for
| academic research in mechanical engineering.
| 0x3f wrote:
| Perhaps but the fastest race car is not possibly
| marshalling in the end of human involvement in science,
| so you might consider these of considerably different
| levels of meriting the funding.
| oscaracso wrote:
| >marshalling in the end of human involvement in science
|
| Good riddance! But not relevant in the least.
| 0x3f wrote:
| Impact size is not relevant to funding allocation?
| oscaracso wrote:
| Your attempts to smuggle your conclusions into the
| conversation are becoming tiresome. Profiling a private
| company's computer program is not impactful research. The
| best-fit parameters AI people call scaling exponents are
| not properties like the proton lifetime or electron
| electric dipole moment. Rest assured, there remain
| scientists at universities producing important work on
| machine learning.
| bonoboTP wrote:
| There are a million other research things to do besides
| running huge pretraining runs and hyperparam grid search
| on giant clusters. To see what, you can start with
| checking out the best paper and similar awards at
| neurips, cvpr, iccv, iclr, icml etc.
| tzs wrote:
| I came across a good example of that a few years ago.
| Caltech had a page on their site listing Caltech startups.
|
| There were quit a few off them--by number of starts per
| year per person Caltech was actually generating startups at
| a higher rate than Stanford. But almost none of those
| Caltech startups were doing anything that would bring them
| to the public's attention, or even to the average HN
| reader's attention.
|
| For example one I remember was a company developing
| improved ion thrusters for spacecraft. Another was doing
| something to automate processing samples in medical labs.
|
| Also almost none of them were the "undergraduates drop out
| to form a company" startup we often hear about, where the
| founders aren't actually using much that they actually
| learned at the school, with the school functioning more as
| a place that brought the founders together.
|
| The Caltech startups were most often formed by professors
| and grad students, and sometimes undergraduates that were
| on their research team, and were formed to commercialize
| their research.
|
| My guess is that this is how it is at a lot of
| universities.
| Fomite wrote:
| Every university I've worked in has been dominated by
| this paradigm, has an office set up to support it, and a
| bunch of policies around what it means for your doctoral
| supervisor to also be your employer, etc.
| PaulHoule wrote:
| That's a specific field at a very specific time. In general
| there is a difference between research and development,
| you're going to expect the early work to be done in academia
| but the work to turn that into a product is done by
| commercial organizations.
|
| You get ahead as an academic computer scientist, for
| instance, by writing papers not by writing software. Now
| there really are brilliant software developers in academic CS
| but most researchers wrote something that kinda works and
| give a conference talk about it -- and that's OK because the
| work to make something you can give a talk about is probably
| 20% of the work it would take to make something you can put
| in front of customers.
|
| Because of that there are certain things academic researchers
| really can't do.
|
| As I see it my experience in getting a PhD and my experience
| in startups is essentially the same: "how do you do make
| doing things nobody has ever done before routine?" Talk to
| people in either culture and you see the PhD students are
| thinking about either working in academia or a very short
| list of big prestigious companies and people at startups are
| sure the PhDs are too pedantic about everything.
|
| It took me a long time of looking at other people's side
| projects that are usually "I want to learn programming
| language X", "I want to rewrite something from _Software
| Tools_ in Rust " to realize just how foreign that kind of
| creative thinking is to people -- I've seen it for a long
| time that a side project is not worth doing unless: (1) I
| really need the product or (2) I can show people something
| they've never seen before or better yet both. These sound
| different, but if something doesn't satisfy (2) you can can
| usually satisfy (1) off the shelf. It just amazes me how many
| type (2) things stay novel even after 20 years of waiting.
| stared wrote:
| > arXiv fulfills its function better the less power it has as
| an institution
|
| It is an interesting instance of the rule of least power,
| https://en.wikipedia.org/wiki/Rule_of_least_power.
| fidotron wrote:
| The irony of the TBL quotes there being the entire problem
| with the semantic web is the ontological tarpit that results
| due to the excessive expressive power of a general triple
| store.
| PaulHoule wrote:
| Well, I'd argue that many things in the semweb are not
| expressive enough and lead to the misunderstandings we
| have.
|
| People think, for instance, that RDFS and OWL are meant to
| SHACL people into bad an over engineered ontologies. The
| problem is these standards _add_ facts and don't subtract
| facts. At risk of sounding like ChatGPT: it's a data
| transformation system not a validation system.
|
| That is, you're supposed to use RDFS to say something like
| ?s :myTermForLength ?o -> ?s :yourTermForLength ?o .
|
| The point of the namespace system is not to harass you, it
| is to be able to suck in data from unlimited sources and
| transform it. Trouble is it can't do the simple math
| required to do that for real, like ?s
| :lengthInFeet ?o -> ?s :lengthInInches 12*?o .
|
| Because if you were trying OWL-style reasoning over
| arithmetic you would run into Kurt Godel kinds of problems.
| Meanwhile you can't subtract facts that fail validation,
| you can't subtract facts that you just don't need in the
| next round of processing. It would have made sense to
| promote SHACL first instead of OWL because garbage-in-
| garbage out, you are not going to reason successfully
| unless you have clean data... but what the hell do I know,
| I'm just an applications programmer who models business
| processes enough to automate them.
|
| Similarly the problem of ordered collections has never been
| dealt with properly in that world. PostgreSQL, N1QL and
| other post-relational and document DB languages can write
| queries involving ordered collections easily. I can write
| rather unobvious queries by hand to handle a lot of cases
| (wrote a paper about it) but I can't cover all the cases
| and I know back in the day I could write SPAQL queries much
| better than the average RDF postdoc or professor.
|
| As for underengineering, Dublin Core came out when I worked
| at a research library and it just doesn't come close in
| capability to MARC from 1970. Larry Masinter over at Adobe
| had to hack the standard to handle ordered collections
| because... the authors of a paper sure as hell care what
| order you write their names in. And it is all like that:
| RDF standards neglect basic requirements that they need to
| be useful and then all the complex/complicated stuff really
| stands out. If you could get the basics done maybe people
| would use them but they don't.
| light_hue_1 wrote:
| > Unfortunately, over the years, arXiv has become something
| like a "venue" in its own right, particularly in ML, with some
| decently cited papers never formally published and "preprints"
| being cited left and right. Consider the impression you get
| when seeing a reference to an arXiv preprint vs. a link to an
| author's institutional website.
|
| This just isn't true. arXiv is not a venue. There's no place
| that gives you credit for arXiv papers. No one cares if you
| cite an arXiv paper or some random website. The vast vast
| majority of papers that have any kind of attention or citations
| are published in another venue.
| contubernio wrote:
| A Fields medal was awarded based mainly on this paper never
| published elsewhere: https://arxiv.org/abs/math/0211159
| auggierose wrote:
| I think there is a misunderstanding here. Does arXiv count
| as a publication? Yes, pretty much anything that gives you
| a DOI does, for example Zenodo. Does it function as a
| reputable anything? No.
|
| The paper you link to counts as a publication, but its
| reputation stands on its own, it has nothing to do with
| arXiv as a venue. Ideally, that's how it is for all papers,
| but it isn't, just by publishing in certain venues your
| paper automatically gets a certain amount of reputation
| depending on the venue.
| fc417fc802 wrote:
| > Ideally, that's how it is for all papers, but it isn't
|
| We require a method of filtering such that a given
| researcher doesn't have to personally vet _in
| excruciating detail_ every paper he comes across because
| there simply isn 't enough time in the day for that.
|
| Ideally such a system would individually for each paper
| provide a multi-dimensional score that was reputable. How
| can those be calculated in a manner such that they're
| reputable? Who knows; that exercise is left for the
| reader.
|
| In practice "well it got published in Nature" makes for a
| pretty decent spam filter followed by metrics such as how
| many times it's been cited since publication, checking
| that the people citing it are independent authors who
| actually built directly on top of the work, and checking
| how many of such citing authors are from a different
| field.
| mitthrowaway2 wrote:
| Can't we do better than that?
|
| PageRank was a decent solution for websites. Can't we
| treat citations as a graph, calculate per-author and per-
| paper trustworthiness scores, update when a paper gets
| retracted, and mix in a dash of HN-style community
| upvotes/downvotes and openly-viewable commentary and Q&A
| by a community of experts and nonexperts alike?
| auggierose wrote:
| You know that is what PageRank was originally for, right?
| mitthrowaway2 wrote:
| Sure. In that case I guess I'm just waiting for a couple
| of college kids in a garage to start a website that
| actually uses it for its intended purpose, so that we can
| finally deprecate PrestigiousPrivateJournalRank.
| fc417fc802 wrote:
| Of course we could! My tongue in cheek "exercise is left
| for the reader" comment was meant to convey that it's
| deceptively simple.
|
| Just one example off the top of my head. How do you
| handle negative citations? For example a reputable author
| citing a known incorrect paper to refute it. You need
| more metadata than we currently have available.
|
| tl;dr just draw the rest of the fucking owl.
|
| Upvotes, downvotes, and commentary? That's _extremely_
| complicated. Long term data persistence? Moderation? Real
| names? Verification of lab affiliations? Who sets the
| rules? How do you cope with jurisdictional boundaries and
| related censorship requirements? The scientific
| literature is fundamentally an open and above all
| international collaboration. Any sort of closed,
| centralized, or proprietary implementation is likely to
| be a nonstarter.
|
| Thus if your goal is a universal system then I'm fairly
| certain you need to solve the decentralized social
| networking problem as a more or less hard prerequisite to
| solving the decentralized scientific literature review
| problem. This is because you need to solve all the same
| problems but now with a much higher standard for data
| retention and replication.
|
| Very topically I assume you'd need a federated protocol.
| It would need to be formally standardized. It would need
| a good story for data replication and archival which
| pretty much rules out ActivityPub and ATProto as they
| currently stand so you're back to the drawing board.
|
| A nontrivial part of the above likely involves also
| solving the decentralized petname system problem that GNS
| attempts to address.
|
| I think a fully generalized scoring or ranking system is
| exceedingly unlikely to be a realistic undertaking.
| There's no problem with isolated private venues (ie
| journals) we just need to rethink how they work. Services
| such as arxiv provide a DOI so there's nothing stopping
| "journals" that are actually nothing more than
| lightweight review platforms that don't actually host any
| papers themselves from being built.
| auggierose wrote:
| > Upvotes, downvotes, and commentary? That's extremely
| complicated.
|
| No, it is not. Don't throw the baby out with the bath
| water. Zenodo is centralized, and that is fine. A system
| hosted by CERN would be universal enough for most
| purposes.
|
| The truth is, most papers cannot stand on their own, they
| need a reputable venue. While it is difficult to get into
| Nature, it is much more difficult to actually contribute
| something substantial to science. That's why we don't
| have a system like that.
| fc417fc802 wrote:
| I think you've misunderstood me. Did you read my final
| paragraph? I was agreeing with what you wrote there -
| that simply rethinking how centralized journals operate
| could accomplish the majority of the goal while
| sidestepping most of the complexity.
|
| That said, I disagree that papers require a centralized
| venue in any fundamental sense. They _currently_ need
| such a venue because we don 't have a better process for
| vetting and filtering them at scale. The issue is that
| decentralizing such a process in an acceptable manner is
| a monstrously complicated prospect.
| auggierose wrote:
| > We require a method of filtering such that a given
| researcher doesn't have to personally vet in excruciating
| detail every paper he comes across because there simply
| isn't enough time in the day for that.
|
| We do require such a method. Isn't that what AI is for?
| Strictly working as a filter. You still need to
| personally vet in excruciating detail every paper you
| rely on for your work.
| fc417fc802 wrote:
| Maybe. I think that's still experimental and far too
| resource intensive to do on an individual basis. However
| an intensive LLM review performed by a centralized
| service once per paper as a sort of independent
| literature watchdog would likely be of value. I haven't
| heard of such a thing yet though.
| light_hue_1 wrote:
| It was not awarded because that paper is on arxiv. That
| paper could have been printed and sent out by mail. Or
| posted on 4chan. etc. It just so happens to be it was on
| arxiv which made no difference to anything.
| queuebert wrote:
| > Unfortunately, over the years, arXiv has become something
| like a "venue" in its own right, ...
|
| In my experience as a publishing scientist, this is partly
| because publishing with "reputable" journals is an increasingly
| onerous process, with exorbitant fees, enshittified UIs, and
| useless reviews. The alternative is to upload to arXiv and move
| on with your life.
| groundzeros2015 wrote:
| That's true. But that's separate than the use in ML in
| Blockchain circles as a form of a marketing - using academic
| appearances.
| jjk166 wrote:
| That sounds more like an issue of certain fields having
| crappy standards because the people in those fields benefit
| from crappy standards than an issue with the site they
| happen to host papers on.
| groundzeros2015 wrote:
| I don't buy "some fields are just more honorable".
| Everyone uses publishing for personal gain.
|
| But yes it's a people problem, not an arxiv problem.
| StableAlkyne wrote:
| Every field and every publisher has this issue though.
|
| I've read papers in the chemical literature that were
| clearly thinly veiled case studies for whatever instrument
| or software the authors were selling. Hell, I've read
| papers that had interesting results, only to dig into the
| math and find something fundamentally wrong. The worst was
| an incorrect CFD equation that I traced through a telephone
| game of 4 papers only to find something to the effect of
| "We speculate adding $term may improve accuracy, but we
| have not extensively tested this"
|
| Just because something passed peer review does not make it
| a good paper. It just means somebody* looked at it and
| didn't find any obvious problems.
|
| If you are engaged in research, or in a position where
| you're using the scientific literature, it is vital that
| you read every paper with a critical lens. Contrary to
| popular belief, the literature isn't a stone tablet sent
| from God. It's messy and filled with contradictory ideas.
|
| *Usually it's actually one of their grad students
| groundzeros2015 wrote:
| I completely agree. Sophisticated marketing campaigns
| include academic literature to bikini clad women.
| Aurornis wrote:
| > and with just enough moderation to not devolve into spam and
| chaos
|
| arXiv has become a target for grifters in other domains like
| health and supplements. I've seen several small scale health
| influencers who ChatGPT some "papers" and then upload them to
| arXiv, then cite arXiv as proof of their "published research".
| It's not fooling anyone who knows how research work but it's
| very convincing to an average person who thinks that that
| they're doing the right thing when they follow sources that
| have done academic research.
|
| I've been surprised as how bad and obviously grifty some of the
| documents I've seen on arXiv have become lately. Is there any
| moderation, or is it a free for all as long as you can get an
| invite?
| aimarketintel wrote:
| This is great news for anyone building tools on top of arXiv
| data. The API (export.arxiv.org/api/) is one of the best free
| academic data sources -- structured Atom feed with full
| abstracts, authors, categories, and publication dates.
|
| I've been using it as one of 9 data sources in a market
| research tool -- arXiv papers are a strong leading indicator of
| where an industry is heading. Academic research today often
| becomes commercial products in 2-3 years.
| PaulHoule wrote:
| Review papers are interesting.
|
| Bibliometrics reveal that they are highly cited. Internal data
| we had at arXiv 20 years ago show they are highly read. Reading
| review papers is a big part of the way you go from a civilian
| to an expert with a PhD.
|
| On the other hand, they fall through the cracks of the normal
| methods of academic evaluation.
|
| They create a lot of value for people but they are not likely
| to advance your career that much as an academic, certainly not
| in proportion to the value they create, or at least the value
| they used to create.
|
| One of the most fun things I did on the way to a PhD was
| writing a literature review on giant magnetoresistance for the
| experimentalist on my thesis committee. I went from knowing
| hardly anything about the topic to writing a summary that
| taught him a lot he didn't know. Given any random topic in any
| field you could task me with writing a review paper and I could
| go out and do a literature search and write up a summary. An
| expert would probably get some details right that I'd get
| wrong, might have some insights I'd miss, but it's actually a
| great job for a beginner, it will teach you the field much more
| effectively than reading a review paper!
|
| How you regulate review papers is pretty tricky. If it is
| original research the criterion of "is it original research" is
| an important limit. There might already be 25 review papers on
| a topic, but maybe I think they all suck (they might) and I can
| write the 26th and explain it to people the way I wish it was
| explained to me.
|
| Now you might say in the arXiv age there was not a limit on
| pages, but LLMs really do problematize things because they are
| pretty good at summarization. Send one off on the mission to
| write a review paper and in some ways they will do better than
| I do, in other ways will do worse. Plenty of people have no
| taste or sense of quality and they are going to miss the latter
| -- hypothetically people could do better as a centaur but I
| think usually they don't because of that.
|
| One could make the case that LLMs make review papers obsolete
| since you can always ask one to write a review for you or just
| have conversations about the literature with them. I know I
| could have spend a very long time studying the literature on
| Heart Rate Variability and eventually made up my mind about
| which of the 20 or so metrics I want to build into my
| application and I did look at some review papers and can
| highlight sentences that support my decisions but I made those
| decisions based on a few weekends of experiments and talking to
| LLMs. The funny thing is that if you went to a conference and
| met the guy who wrote the review paper and gave them the hard
| question of "I can only display one on my consumer-facing HRV
| app, which one do I show?" they would give you that clear
| answer that isn't in the review paper and maybe the odds are
| 70-80% that it will be my answer.
| jballanc wrote:
| I exited academia for industry 15 years ago, and since then I
| haven't had nearly as much time to read review papers as I
| would like. For that reason, my view may be a bit outdated,
| but one thing I remember finding incredibly useful about
| review papers is that they provided a venue for speculation.
|
| In the typical "experimental report" sort of paper, the focus
| is typically narrowed to a knifes edge around the hypothesis,
| the methods, the results, and analysis. Yes, there is the
| "Introduction" and a "Discussion", but increasingly I saw
| "Introductions" become a venue to do citation bartering (I'll
| cite your paper in the intro to my next paper if you cite
| that paper in the intro to your next paper) and "Discussion"
| turn into a place to float your next grant proposal before
| formal scoring.
|
| Review papers, on the other hand, were more open to
| speculation. I remember reading a number that were framed as
| "here's what has been reported, here's what that likely
| means...and here's where I think the field could push forward
| in meaningful ways". Since the veracity of a review is
| generally judged on how well it covers and summarizes what's
| already been reported, and since no one is getting their next
| grant from a review, there's more space for the author to
| bring in their own thoughts and opinions.
|
| I agree that LLMs have largely removed the need for review
| papers as a reference for the current state of a field...but
| I'll miss the forward-looking speculation.
|
| Science is staring down the barrel of a looming crisis that
| looks like an echo chamber of epic proportions, and the only
| way out is to figure out how to motivate reporting negative
| results and sharing speculative outsider thinking.
| PaulHoule wrote:
| My feelings about that outsider thing are pretty mixed.
|
| On one hand I'm the person who implemented the endorsement
| system for arXiv. I also got a PhD in physics did a postdoc
| in physics then left the field. I can't say that I was
| mistreated, but I saw one of the stars of the field today
| crying every night when he was a postdoc because he was so
| dedicated to his work and the job market was so brutal --
| so I can say it really hurts when I see something that I
| think belittles that.
|
| On the other hand I am very much an interested outsider
| when it comes to biosignals, space ISRU, climate change,
| synthetic biology and all sorts of things. With my startup
| and hackathon experience it is routine for me to go look at
| a lot of literature in a new field and cook it down and
| realize things are a lot simpler than they look and build a
| demo that knocks the socks off the postdocs because...
| that's what I do.
|
| But Riemann Hypothesis, Collatz, dropping names of anyone
| who wrote a popular book, I don't do that. What drives me
| nuts about crackpots is that they are all interested in the
| same things whereas real scientists are interested in
| something different. [1] It was a big part of our thinking
| about arXiv -- crackpot submissions were a tiny fraction of
| submission to arXiv but they would have been half the
| submissions to certain fields like quantum gravity.
|
| I've sat around campfires where hippies were passing a
| spliff around and talking about that kind of stuff and was
| really amused recently when we found out that Epstein did
| the thing with professors who would have known better -- I
| mean, I will use my seduction toolbox to get people like
| that to say more than they should but not to have the same
| conversation I could have at a music festival.
|
| [1] e.g. I think Tolstoy got it backwards!
| aleph_minus_one wrote:
| > crackpot submissions were a tiny fraction of submission
| to arXiv but they would have been half the submissions to
| certain fields like quantum gravity
|
| Just some very outsider thought:
|
| Could it be that this problem is rather self-inflected by
| researchers and their marketing?
|
| Physicists market all the time that resolving these
| questions about quantum gravity will give the answers to
| the deepest questions that plagued philosophers over
| millenia. Well, such a marketing attracts crackpots who
| do believe that they have something to tell about such
| topics.
|
| Relatedly, to improve their chances of getting research
| funding, a lot of researchers do an outreach to the
| general public to show the importance of the questions
| that they work on. Of course this means that people from
| the general pyblic who now get interested in such
| questions will make their own attempt to make a
| contribution because - well, this researcher just told me
| how important it is to think about such questions. Of
| course such a person from the general public typically
| does not have the deep scientific knowledge such that
| their contribution meets the high scientific standards.
| abdullahkhalids wrote:
| > Unfortunately, over the years, arXiv has become something
| like a "venue" in its own right, particularly in ML, with some
| decently cited papers never formally published and "preprints"
| being cited left and right.
|
| This has been a common practice in physics, especially the more
| theoretical branches, since the inception of arXiv. Senior
| researchers write a paper draft, and then send copies to some
| of their peers, get and incorporate feedback, and just submit
| to arxiv.
| godelski wrote:
| And this is really how it should be. Honestly the only thing
| I want arxiv to do is become more like open review. Allow
| comments by peers and some better linking to data and project
| pages.
|
| It works for physics because physicists are very rigorous. So
| papers don't change very much. It also works for ML because
| everyone is moving very fast that it's closer to doing open
| research. Sloppier, but as long as the readers are other
| experts then it's generally fine.
|
| I think research should really just be open. It helps
| everyone. The AI slop and mass publishing is exploiting our
| laziness; evaluating people on quantity rather than quality.
| I'm not sure why people are so resistant to making this
| change. Yes, it's harder, but it has a lot of benefits. And
| at the end of the day it doesn't matter if a paper is
| generated if it's actually a quality paper (not in just how
| it reads, but the actual research). Slop is slop and we
| shouldn't want slop regardless. But if we evaluate on quality
| and everything is open it becomes much easier to figure out
| who is producing slop, collision rings, plagiarist rings, and
| all that. A little extra work for a lot of benefits. But we
| seem to be willing to put in a lot of work to avoid doing
| more work
| abdullahkhalids wrote:
| I don't agree actually that is how it should or can work
| for everyone. Senior researchers produce good quality
| research, and they have a network of high quality peers
| built over decades. Both those are necessary for them to
| reach out and ask for feedback, and get genuine and high
| quality feedback.
|
| Junior researchers don't have these typically. They also
| benefit more from anonymous feedback, which enables the
| reviewers to bluntly identify wrong or close to wrong
| results. So I think open journals should continue to exist.
| They fill an essential role in the scientific ecosystem.
| godelski wrote:
| Mostly I'm fine with journals and conferences but I think
| it's the prestige that has fucked everything over.
|
| I want reviews of my papers! But I want reviews by people
| who care. I don't want reviews by people who don't want
| to review. I don't want reviews by people who think it's
| their job to reject or find flaws in the work. I want
| reviews by people who care. I want reviews by people who
| want to make my work better. I want reviews by people who
| understand all works are flawed and we can't tackle every
| one in every paper (the problem isn't solved, so there's
| always more!).
|
| So low bars. Forget the prestige, citation count,
| novelty, and all the bullshit and just focus on the
| actual work and that the act of publishing is about
| communicating. Publishing is the main difference between
| private and public labs. Private labs do fine research,
| without all the formal review. It's just that nobody
| learns about it. They don't give back to the community.
|
| So my ideal system still has reviewers, journals, and
| conferences but I think we'd get along just fine without
| them. I believe that if we can't recognize that then we
| can't use these other tools to make things better.
|
| They aren't fundamental tools needed to make the process
| work, they're tools that _can_ make the process work
| better. But I 'm not convinced they're doing a good job
| of that right now.
| lokar wrote:
| You could imagine separating the "publishing" part, which
| really should just be open with minimal anti-spam etc, from
| the "this was reviewed by a trusted group of people so you
| should give it more consideration" part. You could do the
| second without it being attached to the publishing.
| godelski wrote:
| I think your phrasing was good. A lot of people conflate
| a work being published is equivalent to peer reviewed and
| that "peer reviewed" means "correct".
|
| I think when you think about publishing as what it
| actually is, researchers communicating to researchers,
| what I said makes much more sense. I do think formal
| review does help reduce slop but I think anyone who has
| published anything is also very aware of how noisy the
| system is and how good works get rejected or delayed
| because they aren't "novel" enough.
|
| Honestly, my ideal system is journals with low bars. We
| forget this prestige bullshit and silliness of novelty
| (often it's novel to niche experts but not to others) and
| basically check if it looks like due diligence was done,
| there's not things obviously wrong, no obvious
| plagiarism, and then maybe a little back and forth to
| help communicate. But I think we've gotten too lost in
| this idea of needing to punish fast and that it has to be
| important. Important to who? Tons of stuff is only
| considered important later, we've got a long track record
| of not being so great at that. But we have a long track
| record of at least some people working on what we later
| find out is important.
| nickpsecurity wrote:
| There's a lot of stuff with basic errors in peer reviewed
| journals. Things also can get rejected for anything from
| formatting to politics.
|
| I like Arxiv better. I get the paper, know it's probably
| not reviewed (like in many journals), and review it if I
| want to. I used to ise Citeseerx, too, to get tons of
| CompSci papers. Even better, OpenReview might have some
| good observations.
| fsckboy wrote:
| > _We 've seen the kind of appeasement prose from their
| statement and FAQ [1] countless times before_
|
| what are you referring to, who is being appeased who shouldn't
| be? what are you worried about happening?
| asimpleusecase wrote:
| I wonder if there are plans to licence the content for AI
| training
| KellyCriterion wrote:
| Id guess OAI & co have already copied without asking?
| mkl wrote:
| No need to ask - the whole point is open access.
| https://info.arxiv.org/help/bulk_data.html
| mkl wrote:
| It's been available all along:
| https://info.arxiv.org/help/bulk_data.html
| shevy-java wrote:
| "Recently arXiv's growth has accelerated. Since 2022, it has
| expanded its staff to 27, in large part to deal with a 50%
| increase in submitted manuscripts."
|
| I am wary of that. IMO the business model is damaged therein. You
| can say in 2022 we had 27; bankrupt in 2030.
| Aerolfos wrote:
| And they hired a LinkedIn business idiot to run the new
| organization - so the aim is for an infinite growth tech startup
| in terms of governance, despite the technical legal status of
| non-profit. It shows in the language they use in the
| announcement, too ("improved financial viability in the long
| run")
|
| OpenAI shows exactly how well that works and what that kind of
| governance does to a company and to its support of science and
| the commons.
|
| TL;DR, it's fucked.
| swiftcoder wrote:
| > raised concerns about the proposed $300,000 salary for arXiv's
| new CEO, saying it seemed high
|
| Is a mid-to-high engineering salary outlandish for a CEO of what
| is likely to be a fairly major non-profit? Even non-profits have
| to be somewhat competitive when it comes to salary, and the ideal
| candidate is likely someone who would be balancing this against a
| tenured position at a major university
| mort96 wrote:
| Salaries in the US are so bonkers. Everywhere else outside of
| the US, $300,000 is an outlandish high salary. To call it "mid
| to high" is insane.
| HappyPanacea wrote:
| Yes the obvious play is to move human labor to cheaper
| countries like France (including CEO of course).
| renewiltord wrote:
| The reason the French can't build these things is the same
| reason they shouldn't be allowed to be in charge. It's a
| preprint PDF host. Just make your own if you can run this
| one.
| magnio wrote:
| They do have their own: https://hal.science/
|
| It is actually quite common to come across HAL in
| subfields of mathematics in my experience.
| bjourne wrote:
| HAL is decidedly second-tier. Given the option, everyone
| would pick arXiv over HAL. Hence, HAL hosts lots of stuff
| that didn't (even) make it to arXiv => lots of subpar
| dredge.
| Miraltar wrote:
| > HAL is decidedly second-tier. Given the option,
| everyone would pick arXiv over HAL.
|
| Can you elaborate on that?
| linhns wrote:
| I agree that dredge is a huge problem with HAL, but it's
| getting better. While arXiv is still stuck with a
| unfriendly UI.
| renewiltord wrote:
| That's great. People will use whichever one is better.
| swiftcoder wrote:
| Turns out that "better" for many people means "better
| moderated", since static hosting is hard to
| differentiate. And at present Arxiv is winning that one
| (at the expense of considerably higher running costs due
| to said moderation)
| 0x3f wrote:
| The net salary in France might be low but the overall cost
| of hiring is quite high. Besides, why go to the middle when
| you can just find even cheaper places, if that's your prime
| metric?
| swiftcoder wrote:
| Even in the states, it's more a distortion caused by the big
| tech centres. A software engineer in Ohio doesn't command
| that kind of salary, but in San Francisco or Seattle that'll
| buy you a moderately-senior engineer.
|
| And while academic salaries are generally not great, tenured
| professors at big universities tend to make a fair bit (plus
| a lot more vacation time and perks than is normal in the US)
| philipallstar wrote:
| It's also caused by progressive tax rates. People take
| harder jobs based on net wage, not gross wage, so gross
| wage has to compensate.
| justin66 wrote:
| > A software engineer in Ohio doesn't command that kind of
| salary, but in San Francisco or Seattle that'll buy you a
| moderately-senior engineer.
|
| On the other hand, a CEO of a well-known nonprofit might
| command that kind of salary in Ohio. People often
| underestimate how much the leaders of nonprofits pay
| themselves.
| supern0va wrote:
| I'm not entirely convinced that this is entirely some
| sort of widespread bad behavior. Many non-profit boards
| conduct research on salaries and essentially size their
| organization and pay something akin to a market rate for
| the given size and scope.
|
| However, even a small percentage of bad actors finding a
| way to inflate their salaries will, as a side effect,
| inflate salaries across the board because it influences
| the process that sets the salaries for the honest
| organizations.
|
| It's a fun problem.
| justin66 wrote:
| I suspect abuse is more prevalent at the low end, among
| nonprofits that don't do much.
|
| I stand by the point of my original post: _People often
| underestimate how much the leaders of nonprofits pay
| themselves._ These are figures you can look up and quiz
| your friends to test the hypothesis, if they're into that
| sort of thing. For a good time include some nonprofit
| hospitals.
| supern0va wrote:
| Outside of manipulating the board, they do not pay
| themselves, though. The board decides their comp package.
| justin66 wrote:
| That's fair, but the boards of nonprofits are as
| corruptible (I'm reluctant to use that word since we're
| talking about fairly standard practices, not outright
| crime, but whatever) as those in the corporate world. But
| I wouldn't want to keep talking about this situation as
| if it's all theoretical. In contrast with a lot of the
| corporate world, with nonprofits you can just go and look
| at what their officers are paid (it's public record) and
| decide for yourself what you feel about the figures.
| dev_l1x_be wrote:
| So is the living cost. Insurance, housing, etc. A better
| comparison is PPP.
| carlosjobim wrote:
| Living costs are similarly high in many places that have
| nowhere near the salaries of the US.
|
| It's still the land of opportunities. It's easier to find
| ways to reduce your living costs than ways to increase your
| salary.
| 0x3f wrote:
| Not everywhere. Switzerland exists. Also cost of living is a
| thing so if anything US/CH just ramp up to match that. The
| rest of Europe has high CoL but terrible salaries. Asia has
| bad salaries but low CoL (on average).
| mort96 wrote:
| According to swissdevjobs.ch[1], the top 10% salary for a
| senior software developer in Switzerland is 135,000 swiss
| franc; that's roughly $170,000 per year.
|
| So if this is correct, then even in Switzerland, it seems
| like $300,000 per year would be an obscenely high salary
| for a senior developer.
|
| [1]: https://swissdevjobs.ch/salaries/all/all/Senior
| 0x3f wrote:
| Well first of all it's a CEO position, not an SWE :)
|
| Even if we scope it to SWE, I don't think that's far off
| the US percentiles.
|
| In London I imagine the top 10% SWE is not even 100k GBP.
| In Germany even worse.
| mort96 wrote:
| I responded to the idea that $300,000/year is a "mid-to-
| high engineering salary". CEO salaries are absurdly high
| everywhere.
| 0x3f wrote:
| Oh right, well it depends on CoL doesn't it? You can
| reframe European salaries as 'obscene' by world standards
| too. Both the US and Europe have totally broken and
| unaffordable housing markets, for example, but at least
| the Bay Area compensates with salary. I would say that
| relative to costs it's more that other salaries are
| obscenely low, if anything. People in Europe should be
| rioting, but unfortunately only the home owners are
| politically active.
| mort96 wrote:
| Does cities like San Francisco not have janitors?
| Waiters? Food delivery drivers? Or do those jobs command
| a six-figure salary too? If they can live comfortably in
| the city on a five-figure salary, maybe the argument that
| "cost of living is so high in SF that you can't live
| without a $300,000/year salary" is just a little bit
| overblown?
|
| I can not imagine what one could possibly need $300,000
| per year for unless an apartment costs like $200,000 per
| year.
| 0x3f wrote:
| You get by on a low salary by living with multiple people
| in the same apartment. Or you live far away and commute.
| Or both.
|
| Not really a tenable long-term situation for a senior
| employee with plans to start a family. Family homes of
| decent size and area are literally millions of dollars.
| mort96 wrote:
| I guess I don't understand why programmers somehow
| deserve a better life than other people. Janitors deserve
| to start families too, don't they?
| 0x3f wrote:
| It's not about deserving, programmers just have enough
| market power to be able to choose to go elsewhere.
| Janitors and other more fungible employees do not.
|
| Besides, I did already say that everyone else was
| underpaid relative to costs. But that's not unique to the
| Bay Area. Cost of housing relative to income is terrible
| in almost all of the major European cities too.
|
| Once cities become wealthy enough to develop a home
| owning class, they seem to cease being able to provision
| adequate housing supply in general.
| throw-the-towel wrote:
| Usually this kind of argument leads to _punishing the
| programmers_ , not lifting up the janitors.
| mort96 wrote:
| That's kind of two sides of the same coin, isn't it? The
| cost of living is so high in part _because_ so many have
| ridiculously high salaries, isn 't it?
| swiftcoder wrote:
| > The cost of living is so high in part because so many
| have ridiculously high salaries
|
| Bigger problem in the SF area is that a bunch of folks
| who owned property before the gold rush have ended up
| real-estate-rich, and formed a voting block that actively
| prevents the construction of new housing (on the basis
| that it might devalue their accidental real estate
| investment)
| prepend wrote:
| Its about how the market values those skillsets, not
| about what people "deserve."
|
| No one is sitting around and setting salaries based on
| the intrinsic human dignity of the people working jobs.
| throw-the-towel wrote:
| > I can not imagine what one could possibly need $300,000
| per year for unless an apartment costs like $200,000 per
| year.
|
| Being able to afford unpredictable expenses and not have
| it bankrupt you. In the US, that would include
| healthcare. Everywhere in the world, that would be useful
| if you were laid off.
| mort96 wrote:
| To build an emergency fund, you just need an income
| that's a bit higher than your expenses. If you earn
| $60,000 after tax per year, and spend $50,000 per year,
| you have a decent $10,000 emergency fund after one year
| and a massive $100,000 emergency fund after a decade. You
| don't need $300,000 per year to save.
| swiftcoder wrote:
| > Does cities like San Francisco not have janitors?
| Waiters?
|
| When I used to visit the Meta campus in Menlo Park, the
| QA folk I worked with were commuting 2 hours each way
| just to be able to afford housing. I've no idea how far
| away the janitorial staff must have lived to do the same
| jalla wrote:
| I worked at Redwood Shores. On a walk across the 101, I
| discovered where the cleaning staff and food workers
| lived. In cars, under the bridge or parked in a quiet
| corner of the street next to industrial or commercial
| property.
| swiftcoder wrote:
| > Oh right, well it depends on CoL doesn't it?
|
| To some extent, maybe, but often not. For example, London
| has similar cost of living to the Bay Area, and when I
| was at Meta experienced folks like Dan Abramov over in
| London were making about the same as fresh college hires
| in Menlo Park...
| 0x3f wrote:
| Yeah I was talking more about the definition of obscene.
| Like is it obscene to make 300k if housing is so
| expensive? I say no, and that London salaries are just
| bad. Although it would be preferable to fix the housing
| market.
|
| To be fair though, Dan specifically is kind of notorious
| for messing up his comp negotiation. Did you not see the
| Twitter pile on at the time?
| swiftcoder wrote:
| > Dan specifically is kind of notorious for messing up
| his comp negotiation
|
| Indeed, but having seen the infamous spreadsheet, he
| didn't have all _that_ much headroom (unless he agreed to
| move to the US)
| groundzeros2015 wrote:
| Note that you are seeing an explicit tradeoff of different
| economic systems.
| ZpJuUuNaQ5 wrote:
| >Salaries in the US are so bonkers.
|
| Sure, but the cost of living there is significantly higher as
| well. Anyway, I can hardly even comprehend these kinds of
| sums, though I am a bit of an outlier, as I earn around
| $27,700 as an SWE in Europe, which is low even by the
| standards of companies in my own country.
| nozzlegear wrote:
| > _Sure, but the cost of living there is significantly
| higher as well._
|
| The US is huge though, and the cost of living is
| astronomically lower outside of those big tech hub cities.
| I live in a tiny town in the midwest with a big house and a
| big yard that we bought for $89k USD in 2016[+]. I'm able
| to support myself and my wife comfortably on just my (self-
| employed) SWE salary.
|
| [+] Real estate inflation index for our area says the house
| would have cost us around $130-$150k USD in 2026.
| segmondy wrote:
| Everyone outside the US doesn't deal with USD. Your comment
| is bonkers. Read up on purchasing power. All locations are
| not equal.
| jltsiren wrote:
| The traditional definition of high income starts at 2x the
| median. Looking the US as a whole, anything above $125k
| should be considered high income. But it doesn't feel like
| that, because median wages are unusually low in the US
| relative to mean wages. Upper middle class salaries, on the
| other hand, have grown very high, and they have distorted
| people's perceptions. Even now, we are debating whether
| almost 5x the median should be considered high income.
| MattDamonSpace wrote:
| The us has an enormous per capita gdp for that large a
| country
| ryukoposting wrote:
| Silicon Valley is the _only_ place in the United States where
| $300K is even close to the "middle" of anything.
|
| I just moved to SV a few months ago from the Midwest (and not
| a particularly cheap part of it). Telling my coworkers who
| aren't from the US what a house costs in Wisconsin, you'd
| have thought _I_ was the one who moved from a foreign
| country.
| swiftcoder wrote:
| > Silicon Valley is the only place in the United States
| where $300K is even close to the "middle" of anything.
|
| It does heavily cluster around SV, for sure, but
| Seattle/NewYork/Boston/Arlington will all get you there,
| and Chicago/Austin/etc aren't all that far behind at this
| point
| Supermancho wrote:
| As a datapoint, I get paid just under 250k/yr and I'm an
| above average developer in his very late career, at a
| midwest company. 300k avg for SV is about right.
|
| The local college and medical administrators are the ones
| that own the mansions in my city. I have a family, house
| and mortgage plus my large medical expenses (cardiac) I can
| handle...until I cant.
| snovymgodym wrote:
| It's frankly not that crazy of a salary for an important
| executive position.
|
| The city manager of a small city in Texas gets paid around
| that much and that's taxpayer money.
|
| Now what collegiate football coaches are paid, that's pretty
| crazy.
| mort96 wrote:
| I didn't say it's a crazy salary for an important executive
| position, I said it's wild to call it a "mid-high
| engineering salary"
| Drupon wrote:
| Europoors should keep quiet when talking about US tech
| culture.
| HappyPanacea wrote:
| arXiv's CEO doesn't need to be a tenured professor equivalent
| it is a preprint repository ffs.
| 0x3f wrote:
| It's a bit more complex than an S3 bucket though because the
| value comes from the reputation network, which can't really
| be replicated easily.
|
| Though, saying that, I suppose all the reputation data is
| kind of public. Apart from emails/accounts.
| groundzeros2015 wrote:
| > It's a bit more complex than an S3 bucket
|
| It's even less. I would bet if it's not now, for the vast
| majority of its life it was a machine at someone's desk at
| Cornell.
| PaulHoule wrote:
| When I was involved it was an x86 machine in a rack in
| Rhodes Hall.
|
| I had a copy of the whole thing under my desk though in
| Olin Library on a Pentium 3 machine from IBM that was
| built like a piece of military hardware. In April the sun
| would shine in the windows of my office, the HVAC system
| was unable to cool my office, and temperatures would soar
| above 100F and I'd be sitting there in a tank top and
| drinking a lot of water and sports drinks and visitors
| would ask me how I could stand it.
| groundzeros2015 wrote:
| Thanks for confirming. We need to stop marketing for AWS
| by talking about the ability to use the internet in AWS
| branded product terms.
| 0x3f wrote:
| The S3 API/UX/cost model is so seductively simple for
| static hosting though. I kind of think they deserve their
| ubiquity. Not on 90% of their products though.
| PaulHoule wrote:
| It's great for some applications, like to serve up the QR
| codes for this system
|
| https://mastodon.social/@UP8/116086491667959840
|
| I could even make those cards tradeable like NFTs, use
| DynamoDB as the ledger, and not worry about the cost at
| all.
|
| On the other hand if you are talking about something
| bandwidth heavy forget about AWS. Video hosting with
| Cloudfront doesn't seem that difficult, even developing a
| YouTube clone where anybody could upload a video and it
| gets hosted seems like a moderate sized project. But with
| the bandwidth meter always running that kind of system
| could put you into the poorhouse pretty quickly if it
| caught on. Much of why YouTube doesn't have competition
| is exactly that: Google's costs are very low _and_ they
| have an established system of monetization.
|
| I am keeping my photo albums on Behance rather than self-
| hosting because I lost enough money on a big photo site
| in AWS that it drove my wife furious and it took me a few
| years to pay off the debt.
| groundzeros2015 wrote:
| > I lost enough money on a big photo site in AWS
|
| I'm sorry what. This is supposed to persuade me?
| Hendrikto wrote:
| For anybody outside the SV, and especially outside the US, this
| seems high, yes.
|
| arXiv does not need to and should not optimize for "shareholder
| value", which is at least nominally the justification for
| outlandish CEO pay packages.
| kingstnap wrote:
| arXiv doesn't need much. All they do is host static pdfs
| uploaded by someone else with free CDN services from Fastly
| [0]. I'm sure they could get academics to volunteer
| moderation services as well.
|
| In reality you could host the entire thing for well under
| $50k/year in hardware and storage if someone else is
| providing a free CDN. Their costs could be incredibly low.
|
| But just like Wikipedia I see them very likely very quickly
| becoming a money hole that pretends to barely be kept afloat
| from donations. All when in reality whats actually happening
| is that its a ridiculous number of rent seekers managed to
| ride the coattails of being the defacto preprint server for
| AI papers to land themselves cushy Jobs at a place that
| spends 90+% of their money on flights and hotels and wages
| for their staff.
|
| I'm already expecting their financial reports to look
| ridiculously headcount heavy with Personnel Expenses,
| Meetings and Travel blowing up. As well as the classic
| Wikipedia style we spend a ton of money in unclear costs [1].
|
| Whats already sad is they stopped having a real broken down
| report that used to actually showed things. Like look at this
| beautiful screenshot of a excel sheet. Imagine if Wikipedia
| produced anything this clear. [2]
|
| [0] https://blog.arxiv.org/2023/12/18/faster-arxiv-with-
| fastly/
|
| [1]
| https://info.arxiv.org/about/reports/FY26_Budget_Public.pdf
|
| [2]
| https://info.arxiv.org/about/reports/2020_arXiv_Budget.pdf
| OneDeuxTriSeiGo wrote:
| > arXiv doesn't need much. All they do is host static pdfs
| uploaded by someone else with free CDN services from Fastly
| [0]. I'm sure they could get academics to volunteer
| moderation services as well.
|
| This just isn't true. arXiv nowadays has to deal with major
| moderation demands due to the influx of absolute drivel,
| spam, and slop that non-academics and less-than-quality
| academics have been uploading to the site.
|
| Moderation for arXiv isn't perfect or comprehensive but
| they put so much work into trying to keep the worst of the
| content off their site. At this point while they aren't
| doing full blown peer review, they are putting a lot of
| work into providing first pass moderation that ensures the
| content in their academic categories is of at least some
| level of respectable academic quality.
| prepend wrote:
| Volunteer moderators are a valid option. And I think may
| work out better than paid employees.
| OneDeuxTriSeiGo wrote:
| volunteer moderators are a valid option however this is
| also the way peer review works and the system is
| unfortunately very problematic and exploitative.
|
| First pass sanity checks are also a lot less fun than
| proper peer review so paying moderators to do it is
| probably safer in the long run or else you end up with
| cliques of moderators who only keep moderating out of
| spite/personal vendettas against certain groups or
| fields.
| weitendorf wrote:
| > In reality you could host the entire thing for well under
| $50k/year in hardware
|
| I could pay Anthropic $400 to write more code than you have
| in your entire lifetime.
|
| Sure, you're able to operate a website acting as
| essentially the most important and highest volume venue for
| sharing academic research in the world, but come on, why
| couldn't I just ask Claude Code or some web developer in a
| foreign country to do the same thing?
| jjk166 wrote:
| $300k for a top executive position isn't especially high for
| anywhere in the US. That's around what the administrative
| director of a hospital would be making, which seems like a
| much smaller scope than leading ArXiv. For comparison, my
| roommate works for a non-profit that serves Philadelphia
| whose CEO's salary is $1.1 million. The CEO of the wikimedia
| foundation, which is similar in terms of role, has a salary
| of $450k. General average for US CEOs including for profits
| is around $800k and for large organizations tens of millions
| is not atypical.
|
| Non-profits aren't maximizing stock value, but they do need
| to optimize for stakeholder value - you want to maximize the
| amount of money being donated in and you want to make the
| most of the donations you receive, both to advance the
| primary mission of the non-profit and to instill confidence
| in donors. This demands competent leadership. The idea that
| just because something is not being done for profit means the
| value of the person's contributions is worth less is absurd.
| So long as the CEO provides more than $300k of value by
| leading the organization, which might include access to their
| personal connections, then the salary is sensible.
| DonsDiscountGas wrote:
| Considering the value and prominence of arxiv to the world,
| this seems low to me. Although more importantly the rest of the
| staff needs to be well paid too, and if that's the ceiling its
| a bit concerning. It's crazy to me that people thought this was
| too high.
| prepend wrote:
| Yes, considering the workload and responsibility of the
| position.
|
| Non-profits run into the problem of creating cushy jobs that
| just burn doner money.
|
| Arxiv is basically a giant folder in the cloud and shouldnt
| have such high paying jobs. At least not if they want rational
| people to keep donating.
| bonoboTP wrote:
| I fear their Mozilla-ification and Wikipedia-ification. Scope
| creep, various outreach feel-good programs, ballooning costs,
| lost focus etc. And other types of enshittification.
|
| Any change to the basic premise will be a negative step.
|
| They should just be boring quiet unopininionated neutral
| background infrastructure.
| kergonath wrote:
| > They should just be quiet unopininionated neutral background
| infrastructure.
|
| Exactly. It should be a utility. Not quite dumb pipe, but not
| too far either.
| doctorwho42 wrote:
| We don't do 'utility' in America. Everything has S.V. brain
| rot - it's mixed with wall street brain rot, and now if you
| aren't extracting wealth out of what you have access to - you
| are failing.
| musicale wrote:
| I mean... someone needs to "unlock value" from ArXiv,
| right?
| Hendrikto wrote:
| > Mozilla-ification
|
| All the Mozilla executives have done for the last 15+ years is
|
| * lay off developers
|
| * spend lots of money on stupid side projects nobody asked for
| or wants
|
| * increase their own salaries
|
| and all that with the backdrop of falling quality, market
| share, and relevance.
|
| I would happily donate to Firefox, but this fucked up
| organization will never see a single cent from me. They will
| spend it on anything but Firefox, which is the only thing
| anybody wants them to spend it on.
|
| It might already be too late, and we will be left with a
| browser monopoly.
| bonoboTP wrote:
| And it is a risk for Arxiv too that once they start to drink
| the koolaid and start going to the same cocktail parties that
| these kinds of nonprofit board members and execs go to and
| will feel the need to prance around with some fancy stuff.
|
| "oh no, you see we are not a preprint server host anymore,
| our mission is a values driven blablabla to make a meaningful
| change in the blablabla, we have spent X dollars to promote
| the blablabla, take me seriously please I'm also fancy like
| you! "
| musicale wrote:
| Well, maybe they don't need to be a nonprofit. How about a
| public benefit corporation?
|
| And maybe that public benefit thing, well we don't really
| need it do we? Now that we're deep into AI you know.
|
| For-profit has a nice ring to it. We're delivering value to
| founders and shareholders, where it belongs.
| swed420 wrote:
| > It might already be too late, and we will be left with a
| browser monopoly.
|
| Ladybird continues to have the appearance of making progress,
| fwiw:
|
| https://ladybird.org/newsletter/2026-02-28/
| cge wrote:
| >They will spend it on anything but Firefox, which is the
| only thing anybody wants them to spend it on.
|
| Mozilla certainly won't spend it on Firefox, because the
| structure of the organization legally prohibits them from
| spending any of their donation money on Firefox. The 'side
| projects' are, at least officially, the real purpose of
| Mozilla.
| bonoboTP wrote:
| They built the brand on Firefox then did a bait and switch.
| How many people who donate to Mozilla know that it's not
| helping Firefox?
|
| But yeah, this is just how it works. Things can't stay good
| for too long. One must always be on the lookout for the new
| small thing that's not yet corrupted. Stay with it for a
| while until it rots, then jump to the next replacement.
| musicale wrote:
| > They will spend it on anything but Firefox, which is the
| only thing anybody wants them to spend it on.
|
| ;_;
| musicale wrote:
| My prediction exactly.
|
| Maybe a bloated foundation (pursuing expensive objectives
| completely unrelated to ArXiv's core mission of hosting PDFs),
| new classes of unnecessary management staff, new and useless
| paid features that nobody wants, and obnoxious nag banners
| claiming "ArXiv is not for sale!" but demanding money anyway.
| ACCount37 wrote:
| Frankly, the only beef I have with arXiv as is: its insistence on
| blocking AI access.
|
| I had to tell my AI to set up an MCP for "fetch while bypassing
| arXiv's rate limit" so that it doesn't burn 40k tokens looking
| for workarounds every time it wants to look at a paper and gets
| hit with a "sorry, meatbags only" wall.
|
| Very annoying, given how relevant arXiv papers are for ML
| specifically, and how many of papers there are. Can't "human
| flesh search" through all of them to pick the relevant ones for
| your work, and they just had to insist on making it harder for
| AIs to do it too.
| spiralcoaster wrote:
| I hope they ramp up their blocking of AI access. The last thing
| we need is providers like this getting hammered by AI
| vedantxn wrote:
| we got this before gta 6
| contubernio wrote:
| What is worrisome about this development, and corollary actions
| like the hiring of a CEO with a $300,000/year salary, is that the
| essentially independent and community based platform will
| disappear. The ArXiv exists because mathematicians and
| physicists, and later computer scientists and engineers, posted
| there, freely, their work, with minimal attention to licensing
| and other commercial aspects. It has thrived because it required
| no peer review and made interesting things accessible quickly to
| whomever cared to read them.
|
| A setup as a US-based "non-profit" is worrisome, if only because
| 300K is an obscene salary even in a for-profit setting. That the
| US-based posters can't see this is evidence of the basic problem
| which is that the US, both left and right, has been taken over by
| a neoliberal feudal antidemocratic nativist mindset that is
| anathema to the sort of free interchange of ideas that underlay
| the ArXiv's development in the hands of mathematicians and
| physicists now swept aside and ignored by machine learning
| grifters and technicians who program computers.
| doctorwho42 wrote:
| As a US based academic, I have to say when I saw the salary I
| immediately gawked. I think it's not americans but silicon
| valley-ites and tech bros on here who have lived with inflated
| salary/net worth that think it's just a middle of the road
| salary. As I regularly interact with friends in engineering who
| make like $200k + benefits ($), and I wonder why I don't jump
| ship to that weird land.
| juped wrote:
| >Cornell, for example, had a limited capacity to pay software
| developers to maintain and upgrade the site, which still has a
| very no-frills look and feel.
|
| arXiv is doomed. It was nice while it lasted.
| oscaracso wrote:
| I am not a software engineer, although I do write programs.
| What is it about digital infrastructure that requires
| maintenance? In the natural world, there is corrosion, thermal
| fluctuation, radiation, seismic activity, vandalism,
| whathaveyou. What are the issues facing the arxiv demanding the
| attention of multiple people 'round the clock?
| bonoboTP wrote:
| They have to update the software stack, replace usage of
| deprecated APIs, support new latex packages etc. They could
| probably minimize these by limiting the scope but just
| keeping a small, tightly scoped software functional is always
| boring, people want to work on fun new features, they enjoy
| the brand recognition and feel like they should do more
| stuff.
|
| I wonder when they will introduce the algorithmic feed and
| the social network features.
| taormina wrote:
| Given that Cornell charges what, $50k a year as an Ivy League,
| $300k feels like almost nothing.
| PaulHoule wrote:
| This is going to be in NYC where $300k does not go as far as it
| does in Ithaca.
| peyton wrote:
| Heh, you might want to look up what they're charging young
| people now.
| taormina wrote:
| $71k?! Well, that's 4, 4.5 students worth of tuition then.
| losvedir wrote:
| arXiv is great. It's just a problem that there's so much slop.
| What if arXiv offered a subscription service that people in
| different fields could use to just see a curated selection of the
| top papers in their field each month. Established researchers in
| each field could then review some of the preprints for putting
| into the curated monthly list.
|
| Oh, wait.
| bonoboTP wrote:
| > see a curated selection of the top papers in their field
|
| https://www.scholar-inbox.com
| hereme888 wrote:
| From my limited experience, arXiv appears to include many low-
| quality, unreproducible papers, and some are straight-up self-
| marketing rather than serious scientific work.
| kingstnap wrote:
| If you get some more experience you will find normal journals
| are exactly like that as well.
| whiplash451 wrote:
| I'm not sure why we're so focused on filtering what gets into
| arxiv (which is an uphill battle and DOA at this point) vs fixing
| the _indexing_ , i.e. the page rank of academia.
|
| Google "sorted out" a messy web with pagerank. Academic papers
| link to each others. What prevents us from building a ranking
| from there?
|
| I'm conscious I might be over-simplifying things, but curious to
| see what I am missing.
| tokai wrote:
| Page rank was inspired by bibliometrics and evaluation of
| science publications. It's messed up now because of the
| rankings. Further fiddling with ranking will not fix the
| problem.
| j2kun wrote:
| +1, PageRank was taken from academia. They even cited it in
| their original work. Funny how the origins of these things
| get forgotten.
| krick wrote:
| I am of the same opinion, and ultimately ArXiv becoming a
| journal that can prevent one from publishing a paper -- no
| matter how junk it is -- would pretty much kill its purpose.
| But I suppose that now when flooding the interned with LLM-
| generated garbage is almost endorsed by some satanic people, it
| is pretty much a security issue to have some sort of filter on
| uploads.
|
| Now, honestly, I have no idea why would one spend resources on
| uploading terabytes of LLM garbage to arXiv, but they sure can.
| Even if some crazy person is publishing like 2 nonsense papers
| daily, it is no harm and, if anything, valid data for
| psychology research. But if somebody actually floods it with
| non-human-generated content, well, I suppose it isn't even that
| expensive to make ArXiv totally unusable (and perhaps even
| unfeasible to host). So there has to be some filtering. But
| only to prevent the abuse.
|
| Otherwise, I indeed think that proper ranking, linking and
| user-driven moderation (again, not to prevent anybody from
| posting anything, but to label papers as more interesting for
| the specific community) is the only right way to go.
| muhneesh wrote:
| tangentially related: https://readabstracted.com/
| Drblessing wrote:
| ArXiv is dead. Expect a paywall within three years, or other
| enshittification and slop added.
| Apocryphon wrote:
| Maybe they'll do something like what Anna's Archive did
| hirako2000 wrote:
| Do research papers published on Elsevier's sort of media remain
| more prestigious?
|
| I read a dozen papers a month, typically on arxiv, never from
| paywalled journals. I find the quality on par. But maybe I'm
| missing something.
| Fomite wrote:
| This is _very_ variable based on field. HN is heavily biased
| toward ArXiv-friendly fields.
| krick wrote:
| It's not that hard to make a mirror or arXiv. Basically, anybody
| who can pay for hosting (which, I suppose, isn't very cheap now
| when the whole world uses it). It's a problem to make users
| switch, because academia seems to have this weird tradition of
| resisting all practices that, god forbid, might improve global
| research capabilities and move forward the scientific progress.
| But then, if arXiv _actually_ becomes unusable, I suppose they
| won 't really have much choice than to switch?
|
| And, FWIW, I do think that arXiv truly has a vast potential to be
| improved. It is currently in the position to change the whole
| process of how the research results are shared, yet it is still,
| as others have said, only a PDF hosting. And since the
| universities couldn't break out of the whole Elsevier & co. scam
| despite the internet existing for the 30 years, to me, breaking
| free from the university affiliation sounds like a good thing.
|
| But, of course, I am talking only about the possibilities being
| out there. I know nothing about the people in charge of the whole
| endeavor, and ultimately in depends on them only, if it sails or
| sinks.
| tokai wrote:
| This is exactly what happened last time when scientific
| publishing got cornered. Journals run by departments and research
| groups were spun out or sold off to publishers and independent
| orgs. And they continued to slowly boil the frog over 50 years
| with fees and gate keeping.
|
| Its especially problematic because while ArXiv love to claim to
| be working for open science, they don't default to open
| licensing. Much of the publications they host are not Open
| Access, and are only read access. So there is definitely the
| potential to close things off at some point in the future, when
| some CEO need to increase value.
| lifeisstillgood wrote:
| I am sure it's a dumb idea but why is there a problem for say the
| National Science Foundation or something to run a website that
| replicates ArXiv - if you are from an accredited university or
| whatever you can publish papers, fulfilling the "pdf store"
| function.
|
| Then getting peer reviewed is a harder process but one can see
| some form of credit on the site coming from doing a decent
| reviewers job.
|
| I suspect I am missing a lot of nuance ...
| prepend wrote:
| The moderation is difficult but not unprecedented.
|
| I think NIST hosts the CVE repo (through a contract to MITRE)
| Fomite wrote:
| Given the last two years and what has been done to science
| funding, having a load bearing thing like ArXiv not housed with
| the U.S. government is, I think, pretty self-evidently a good
| idea.
| MetaMonk wrote:
| https://youtu.be/4P5xSntVWQE
| jeremie_strand wrote:
| ArXiv provides such an easy interface to navigate scientific
| papers, most are from computer science of course. Hope they can
| grow bigger and solve the paywall pain in open research. Any
| implication to Bioxiv?
| Fomite wrote:
| bioRxiv is already housed at Cold Spring Harbor Laboratory,
| which is an independent non-profit.
| AccessScan wrote:
| Going independent makes sense for arXiv. But the more interesting
| part is what it tells us about how we fund the stuff that
| actually keeps research moving. arXiv runs on about seven million
| dollars a year and handles hundreds of thousands of papers.
| That's roughly twenty bucks a paper. This is the backbone of how
| physicists, computer scientists, and mathematicians share work.
| Traditional publishers charge thousands per article. The math is
| almost laughable. arXiv has never had an efficiency problem. The
| problem is that we've just accepted that something this important
| should survive on voluntary contributions and the occasional
| donation saving the day. Look at what happened with bioRxiv and
| medRxiv when they spun off into openRxiv. That only happened
| about a year ago. Nobody knows yet if it actually works long-term
| or if it just kicks the money problems down the road. But both
| platforms, totally separately, came to the same conclusion. We
| need to leave the university. That says something. Universities
| aren't built to fund outside infrastructure forever. Their
| budgets follow enrollment, grants, and endowment performance.
| That doesn't line up with the steady, predictable funding arXiv
| needs to keep the lights on. Ginsparg calling it a "Perils of
| Pauline" situation is probably the most honest thing anyone said
| about this. Everyone treats arXiv like it will always be there.
| But it's been one bad year away from serious trouble for most of
| its life. The real test for the nonprofit won't be the first few
| years. Cornell and Simons have that covered. It'll be five or ten
| years from now when the excitement fades and they're competing
| for donor money against whatever the next crisis in academic
| publishing turns out to be. The worry about AI-generated junk is
| actually where independence could help. A university-hosted arXiv
| could only spend so much on moderation tools. An independent org
| with a focused mission can make that a real budget priority.
| Whether they can keep up with the flood of low-quality
| submissions is a different question entirely.
| ide0666 wrote:
| The endorsement system is a real barrier for independent
| researchers. I've been trying to get endorsed for cs.NE for weeks
| -- the work is published on aiXiv with video results, but without
| an institutional email or personal connection to an existing
| author, you're stuck. Glad to see arXiv thinking about
| independence -- hope they also rethink access for non-
| institutional researchers.
| tamimy wrote:
| It's quite interesting to see that a lot of opinions here think
| ArXiv will turn to shit because it will go "corporate". Are there
| any examples where this has not been the case?
| beezle wrote:
| I go back to xxx.lanl.gov days - that is, the beginning. Back
| then it was all physics, some math and a little quantitative
| finance (not bitcoin). And the quality was pretty good because it
| was a _preprint_ archive. In fact, a headline from 2000:
|
| APS and BNL Host XXX e-Print Archive Mirror Feb. 1, 2000
|
| The APS is establishing, in cooperation with Brookhaven National
| Laboratory, the first electronic mirror in the United States for
| the Los Alamos e-Print Archive.
|
| Today, from the landing page, it describes itself as "arXiv is a
| free distribution service and an open-access archive for nearly
| 2.4 million scholarly articles in the fields of [long list].
| Materials on this site are not peer-reviewed by arXiv.
|
| Well, that's a large part of the problem. A lot of the stuff
| there now will never see a journal (even of dubious quality) and
| there is limited filtering of what new submissions will be
| stored. GIGO.
|
| Best thing ArXiv could do is go back to their roots - limit the
| fields and return to preprint only. Spin off the comp sci stuff
| for sure to someone else along with all its headaches.
|
| fixed: url
___________________________________________________________________
(page generated 2026-03-21 23:01 UTC)