[HN Gopher] Show HN: AI Peer Reviewer - Multiagent System for Sc...
___________________________________________________________________
Show HN: AI Peer Reviewer - Multiagent System for Scientific
Manuscript Analysis
After waiting 8 months for a journal response or two months for co-
author feedback that consisted of "looks good" and a single comma
change, we built an AI-powered peer review system that helps
researchers improve their manuscripts rapidly before submission.
The system uses multiple specialized agents to analyze different
aspects of scientific papers, from methodology to writing quality.
Key features: 24 specialized agents analyzing sections, scientific
rigor, and writing quality // Detailed feedback with actionable
recommendations. // PDF report generation. // Support for custom
review criteria and target journals. Two ways to use it: 1. Cloud
version (free during testing): https://www.rigorous.company -
Upload your manuscript - Get a comprehensive PDF report within 1-2
working days - No setup required 2. Self-hosted version (GitHub):
https://github.com/robertjakob/rigorous - Use your own OpenAI API
keys - Full control over the review process - Customize agents and
criteria - MIT licensed The system is particularly useful for
researchers preparing manuscripts before submission to co-authors
or target journals. Would love to get feedback from the HN
community, especially from PhDs and researchers across all academic
fields. The project is open source and we welcome contributions!
GitHub: https://github.com/robertjakob/rigorous Cloud version:
https://www.rigorous.company
Author : rjakob
Score : 83 points
Date : 2025-05-31 13:51 UTC (9 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| poisonborz wrote:
| Do I see this right, you expect people to submit their scientific
| research papers, and there is zero information on who you are,
| how to contact you, what happens with the uploaded data / any
| privacy policy, and so on...
|
| (except your github usernames on the repo posted only here)
|
| Regardless of how useful this is it's hard to take it serious.
| rjakob wrote:
| Fair point.
|
| We're in very early MVP mode, trying to move fast and see if
| this works. We pushed a Cloud version to support users who
| don't want to run the GitHub script themselves. That said,
| you're absolutely encouraged to run it yourself (with your
| openAI key) -- the results are identical.
|
| For context: we're two recent ETH Zurich PhD graduates.
|
| Robert Jakob: https://www.linkedin.com/in/robertjakob Kevin
| O'Sullivan: https://www.linkedin.com/in/kevosull
|
| Going to add contact information immediately.
|
| Thanks again for the feedback -- it's exactly what we need at
| this stage.
| sigmoid10 wrote:
| How can it be run "locally" if you don't support local-hosted
| LLMs? The overlap between people who wouldn't trust a cloud
| api wrapper like yours, but would willingly let their
| (possibly sensitive) documents be sent to some AI provider's
| api seems rather small to me. Either embrace the cloud fully
| and don't worry about data confidentiality, or go full local
| and embrace the anxious community. This in between seems like
| a waste of time tbh.
|
| (I'm not trying to sound overly critical - I very much like
| the idea and the premise. I merely wouldn't use this business
| approach)
| eddythompson80 wrote:
| > This in between seems like a waste of time tbh.
|
| Hard disagree. The "in between" is where you want where
| most are already ending up. Initially you had everyone so
| worried about privacy and what OpenAI is doing with their
| precious private data. "They will train on it. Privacy is
| important to me. I'm not about to like give OpenAI access
| to my private, secure, Google drive back ups or Gmail
| history or Facebook private messages or any real private
| "local only" information.
|
| Also among those who understand data privacy concerns, when
| it come to work data, in the span of 2-3 years, all
| business folks I know went from "this is confidential
| business information. Please never upload to ChatGPT and
| only email it to me" to "just put everything on ChatGPT and
| see what it tells you"
|
| The initial worry was driven by not understanding how LLMs
| worked. What if "it just learned as you talked to it?" And
| "what if it used that learning with somebody else?" Like I
| told it a childhood secret, will it turn around and tell
| others my secret?"
|
| People understand how that works now and some concerns are
| less. Basically most understand that it's similar risk as
| their already existing digital life is
| sigmoid10 wrote:
| As someone who actually deals with this on a regular
| basis, I can guarantee you that serious companies
| definitely do not "just put everything in ChatGPT" if
| they have any sort of respectable legal department.
| Especially in Europe, where you have GDPR concerns on top
| of any business concerns. People who actually understand
| the privacy issues nowadays either use stuff like Azure's
| OpenAI custom hosting to be compliant with the law or go
| full open weight self hosted. Everything else is a legal
| time-bomb.
| eddythompson80 wrote:
| Of course they aren't putting it on ChatGPT. Their data
| is stored in S3, Snowflake, BigQuery, or Azure Storage.
| It makes more sense to use the respective cloud provider
| LLM hosting service. You can use OpenAI's GPT models or
| Anthopic models hosted on Azure or AWS.
|
| You're telling me companies in Europe aren't putting all
| their user data on AWS and Azure regions in Europe? Both
| AWS and Azure are gigantic in Europe.
| spmurrayzzz wrote:
| Supporting local models can be done by overriding one or
| two environment variables, as long as your local inference
| server has an OpenAI-compliant endpoint (which the majority
| of local stacks ship with).
|
| Was there some level of support beyond this that you were
| referring to?
| rjakob wrote:
| Good point. Current focus is on improving AI feedback
| quality, not business model. But we'll definitely consider
| local model support for privacy-conscious users. Thanks for
| the input!
| raphman wrote:
| It seems you have the option to run the tools yourself (with an
| OpenAI API key). The cloud version is for convenience. I agree
| that a privacy/usage policy is necessary.
| rjakob wrote:
| https://www.rigorous.company/privacy
| jpeloquin wrote:
| Even extremely privacy-conscious authors could submit their
| paper to the service at the same time they publish their
| preprint v1, then if the service's feedback is useful, publish
| preprint v2 and submit v2 as the version of record.
| rjakob wrote:
| ...or run it themselves. The code is open source:
| https://github.com/robertjakob/rigorous
|
| Note: The current version uses the OpenAI API, but it should
| be adaptable to run on local models instead.
| atrettel wrote:
| I agree. I've worked at a national lab before and I immediately
| thought this service is a massive security risk. It will
| definitely be hard for some scientists to use these kind of
| cloud services, especially if their research truly is cutting
| edge and sensitive. I think many people will just ignore things
| like this because they want to keep their jobs, etc.
| floren wrote:
| Presumably this would have to come after R&A has ok'd it for
| public release.
| rjakob wrote:
| As mentioned above, there is an open-source version for those
| who want full control. The free cloud version is mainly for
| convenience and faster iteration. We don't store manuscript
| files longer than necessary to generate feedback
| (https://www.rigorous.company/privacy), and we have no
| intention of using manuscripts for anything beyond testing
| the AI reviewer.
| piombisallow wrote:
| Just upload an already published paper to test it
| rjakob wrote:
| Cool! We'll get back asap.
|
| We'd be happy to hear what kind of feedback you find useful,
| what is useless, and what you would want in an ideal review
| report. (https://docs.google.com/forms/d/1EhQvw-
| HdGRqfL01jZaayoaiTWLS)
| 8organicbits wrote:
| One to two days is certainly better than eight months, but I'm
| curious about that delay. Can you explain why working days factor
| in to the turnaround time?
| rjakob wrote:
| Right now, the core workflow takes about 8 minutes locally,
| mostly because we haven't optimized anything yet. There's
| plenty of low-hanging fruit: parallelization, smarter batching,
| etc. With a bit of tuning, we could bring it down to 1-2
| minutes. That said, we'll probably want to add more agents for
| deeper feedback and quality control, so the final runtime might
| go up again. At this stage, we're figuring out what's actually
| useful, what isn't, what additional signals we should look at,
| and what the right format and level of detail for feedback
| should be. The cloud version includes a manual review step,
| partly to control costs, partly because we're still discovering
| edge cases. So the current 1-2 day turnaround is really just a
| safety net while we iterate. If we decide to continue with
| this, the goal is to deliver meaningful feedback in minutes.
| hirenj wrote:
| It is a real shame that peer review reports were only first
| published relatively recently. These would have provided valuable
| training information as to what peer review performs.
| Unfortunately now, I fully expect the public peer review reports
| will be poorer in quality, and oftentimes superficial.
|
| On this tool, I fully expect that it will not capture high level
| conceptual peer review, but could very much serve a role in
| identifying errors of omission from a manuscript as a checklist
| to improve quality (as long as this remains an author controlled
| process).
|
| I will be interested to throw in some of my own published papers
| to see if it catches all the things I know I would have liked to
| improve in my papers.
| rjakob wrote:
| Thanks for the feedback. Totally agree. It's a real shame we
| don't have more historical peer review data. It would be great
| if research was fully transparent.
|
| We did find a few datasets that offer a starting point:
| https://arxiv.org/abs/2212.04972
| https://arxiv.org/abs/2211.06651
| https://arxiv.org/abs/1804.09635
|
| There's also interesting potential in comparing preprints to
| their final published versions to reverse-engineer the kinds of
| changes peer review typically drives.
|
| A growing number of journals and publishers, like PLOS, Nature
| Communications, and BMJ--now publish peer review reports
| openly, which could be valuable as training data.
|
| That said, while this kind of data might help generate feedback
| to improve publication odds (by surfacing common reviewer
| demands early), I am not fully convinced it would lead to the
| best feedback. In our experience, reviewer comments can be
| inconsistent or even unreasonable, yet authors often comply
| anyway to get past the gate.
|
| We're also working on a pre-submission screening tool that
| checks whether a manuscript meets hard requirements like
| formatting or scope for specific journals and conferences,
| hoping this will save a lot of time.
|
| Would love to hear your take on what kind of feedback you find
| useful, what feels like nonsense, and what you would want in an
| ideal review report... via this questionnaire
| https://docs.google.com/forms/d/1EhQvw-HdGRqfL01jZaayoaiTWLS...
| eterm wrote:
| If an AI agent is a "Peer", does that mean you want papers
| written by AI agents to review?
| brookst wrote:
| Does peer review typically select for the same demographics as
| the author?
|
| I was joking, but probably so, to the extent that 80% of peer
| reviewers are men and 80% of authors of peer reviewed articles
| are men[0]
|
| 0.
| https://www.jhsgo.org/article/S2589-5141%2820%2930046-3/full...
| Muller20 wrote:
| Peer review has nothing to do with demographics. It's about
| expertise in the research area.
| falcor84 wrote:
| As they say: in theory, theory and practice are identical;
| in practice, they're aren't.
| yusina wrote:
| You don't have much experience in it, do you? The real
| world peer review process could not be furher from what you
| are describing.
|
| Source: I've personally been involved in peer reviewing in
| fields as diverse as computer science, quantum physics and
| applied animal biology. I've recently left those fields in
| part because of how terrible some of the real-world
| practices are.
| rjakob wrote:
| I think the ideal scenario would include a fair, merit-based
| AI reviewer working alongside human experts. AI could also
| act as a "reviewer of the reviewers," flagging potential
| flaws or biases in human evaluations to help ensure fairness,
| accountability and consistency.
| yusina wrote:
| > a fair, merit-based AI reviewer
|
| That's a dream which is unlikely to come true.
|
| One reason being that the training data is not unbiased and
| it's very hard to make it less biased, let alone unbiased.
|
| The other issue being that the AI companies behind the
| models are not interested in this. You can see the Grok
| saga playing out in plain sight, but the competitors are
| not much better. They patch a few things over, but don't
| solve it at the root. And they don't have incentives to do
| that.
| rjakob wrote:
| Wouldn't that just require a robust, predefined ruleset
| we could all agree on? Let's make the dream come true!
| yusina wrote:
| The rule set is simple: "Don't be biased." What does that
| mean? And _that_ is the problem. It 's hard (read:
| impossible) to define in technical, formal terms. That's
| because bias is at the root a social problem, not a
| technical one. Therefore you won't be able to solve it
| with technology. Just like poverty, world peace, racism.
|
| The best you can hope for is to provide technical means
| to point out indicators of bias. But anything beyond that
| could, at worst, do more harm than good. ("The tool said
| this result is unbiased now! Keep your skepticism to
| yourself and let me publish!")
| rjakob wrote:
| Then let's try to be the least biased and fully
| transparent (which should also help with bias)
| spankibalt wrote:
| The AI pseudoagent[1], unless sentient _and_ proficient in the
| chosen field of expertise, is not a peer. It 's just a
| simulacrum of one. As such, it can only manifest simulacra of
| concepts such as "biases", "fairness", "accountability", etc.
|
| The way I see it, it can function, at best, as a tool of text
| analysis, e. g. as part of augmented analytics engines in a
| CAQDAS.
|
| 1. Agents are defined as _having_ agency, with sentience as an
| obvious prerequisite.
| rjakob wrote:
| We honestly didn't think much about the term "AI peer
| reviewer" and didn't mean to imply it's equivalent to human
| peer review. We'll stick to using "AI reviewer" going
| forward.
| azalemeth wrote:
| I'd really love to try this with a paper I'm trying to get into a
| good journal, but I'm currently getting `Unexpected token R in
| JSON at position 0`. Specifically - Failed to
| load resource: the server responded with a status of 413 ()
| 472-d0f5294693ac495f.js:1 Upload error: SyntaxError: Unexpected
| token R in JSON at position 0 window.console.error @
| 472-d0f5294693ac495f.js:1
| page-16118756a8a735f3.js?dpl=dpl_FUDTF4yxczg6WEnWPsbcumgAgEEa:1
| POST https://www.rigorous.company/api/upload net::ERR_TIMED_OUT
| handleSubmit @
| page-16118756a8a735f3.js?dpl=dpl_FUDTF4yxczg6WEnWPsbcumgAgEEa:1
| fi @ fd9d1056-c0e6d0dfe0642aea.js:9 li @
| fd9d1056-c0e6d0dfe0642aea.js:9 (anonymous) @
| fd9d1056-c0e6d0dfe0642aea.js:9 fn @
| fd9d1056-c0e6d0dfe0642aea.js:9 sm @
| fd9d1056-c0e6d0dfe0642aea.js:9 (anonymous) @
| fd9d1056-c0e6d0dfe0642aea.js:9 Xj @
| fd9d1056-c0e6d0dfe0642aea.js:9 Jk @
| fd9d1056-c0e6d0dfe0642aea.js:9 Tl @
| fd9d1056-c0e6d0dfe0642aea.js:9 Rl @
| fd9d1056-c0e6d0dfe0642aea.js:9 Ql @
| fd9d1056-c0e6d0dfe0642aea.js:9 472-d0f5294693ac495f.js:1
| Upload error: TypeError: Failed to fetch at
| handleSubmit (page-16118756a8a735f3.js?dpl=dpl_FUDTF4yxczg6WEnWPs
| bcumgAgEEa:1:631) at Object.fi
| (fd9d1056-c0e6d0dfe0642aea.js:9:66106) at li
| (fd9d1056-c0e6d0dfe0642aea.js:9:66260) at
| fd9d1056-c0e6d0dfe0642aea.js:9:131817 at fn
| (fd9d1056-c0e6d0dfe0642aea.js:9:131916) at sm
| (fd9d1056-c0e6d0dfe0642aea.js:9:132330) at
| fd9d1056-c0e6d0dfe0642aea.js:9:137821 at Xj
| (fd9d1056-c0e6d0dfe0642aea.js:9:87105) at Jk
| (fd9d1056-c0e6d0dfe0642aea.js:9:116086) at Tl
| (fd9d1056-c0e6d0dfe0642aea.js:9:133637)
| window.console.error @ 472-d0f5294693ac495f.js:1
| handleSubmit @
| page-16118756a8a735f3.js?dpl=dpl_FUDTF4yxczg6WEnWPsbcumgAgEEa:1
| await in handleSubmit (async) fi @
| fd9d1056-c0e6d0dfe0642aea.js:9 li @
| fd9d1056-c0e6d0dfe0642aea.js:9 (anonymous) @
| fd9d1056-c0e6d0dfe0642aea.js:9 fn @
| fd9d1056-c0e6d0dfe0642aea.js:9 sm @
| fd9d1056-c0e6d0dfe0642aea.js:9 (anonymous) @
| fd9d1056-c0e6d0dfe0642aea.js:9 Xj @
| fd9d1056-c0e6d0dfe0642aea.js:9 Jk @
| fd9d1056-c0e6d0dfe0642aea.js:9 Tl @
| fd9d1056-c0e6d0dfe0642aea.js:9 Rl @
| fd9d1056-c0e6d0dfe0642aea.js:9 Ql @
| fd9d1056-c0e6d0dfe0642aea.js:9
|
| which may be the HN hug of death.
|
| The article in question is currently on the ArXiV, and I'd love
| to know what you think of it - https://arxiv.org/abs/2504.16621.
| The latest version of the manuscript (locally, shrunk) is 9.4 MiB
| (9879403 bytes) so I don't get the 413 error if you have a 10 MB
| / MiB limit :-).
|
| As an aside, I know from many others that chatGPT is already
| writing a lot of reports for journals - curated by a human but
| not exclusively so. Is this a good thing for science?
| rjakob wrote:
| Hi azalemeth, I'd be happy to run the script on your manuscript
| and debug the issue. Could you send me your contact info via X
| (https://x.com/robertjakob) or LinkedIn
| (https://www.linkedin.com/in/robertjakob/) so I can send the
| feedback report back to you once it's ready? Feedback would be
| super appreciated.
| eddythompson80 wrote:
| I was taking with a friend/coworker this week. I came to the
| realization "Code Reviews Are Dead".
|
| They were already on life support. The need to "move fast" "there
| is no time", "we have a 79 file PR with 7k line changes that we
| have been working on for 6 weeks. Can you please review it
| quickly? We wanna demo tomorrow GTM meeting". Management found
| zero value in code reviews. You still can't catch everything, so
| what's the point? They can't measure what the value of such
| process is.
|
| Now? Now every junior dev is pushing 12 PRs a week, all adding 37
| new files and thousands of lines that have been auto generated
| with a ton of patterns and themes that are all over the place and
| you are expecting anyone to keep up?
|
| Just merge it. I have seen people go from:
|
| > "asking who is best to review changes in area X? I have a
| couple of questions to make sure I'm doing things right"
|
| To
|
| > "this seems to work fine. Can I get a quick review? Trying to
| push it out and see how it works"
|
| To
|
| > "need 2 required approvals on this PR please?"
| mikojan wrote:
| Oh my god.. The horror.. Please do not let this be my future..
| eddythompson80 wrote:
| The horror indeed, but I don't really see a way out of this.
| Was mainly curious to see how it would affect something like
| "Peer Review" though I suspect the incetives there are
| different so the processes might only shares the word
| "Review" without much baring on each other.
|
| Regarding code reviews, I can't see a way out unfortunately.
| We already have github (and others) agents/features where you
| write an issue on a repo, and kick off an agent to "implement
| it and send a PR for the repo". As it exists today, every
| repo has 100X more issues and discussions and comments than
| it has PRs. now imagine if the barrier to opening a PR is
| basically: Open an issue + click "Have a go at it, GitHub"
| button. Who has the time or bandwidth to review that? That
| wouldn't make any sense either.
| rjakob wrote:
| Based on my experience, many reviewers are already using AI
| extensively. I recently ran reviewer feedback from a top CS
| conference through an AI detector, and two out of three
| responses were clearly flagged as AI-generated.
|
| In my view, the peer-review process is flawed. Reviewers
| have little incentive to engage meaningfully. There's no
| financial compensation, and often no way to even get credit
| for it. It would be cool to have something like a Google
| Scholar page for reviewers to showcase their contributions
| and signal expertise.
| AStonesThrow wrote:
| The only thing worse than an LLM for making stuff up and
| giving fake numbers is an LLM "Detector". They are so
| full of false positives and false negatives and bogus
| percentages, as to be actively harmful to human trust and
| academic integrity. And how do you follow up, to verify
| or falsify their results?
| rjakob wrote:
| Fair. Though in this case, it was obvious even without a
| detector.
| rjakob wrote:
| wild times
| yusina wrote:
| > I came to the realization "Code Reviews Are Dead".
|
| If that's how it works at your company then run as fast as you
| can. There are many reasonable alternatives that won't push
| this AI-generated BS on you.
|
| That is, if you care. If you don't then please stay where you
| are so reasonable places don't need to fight in-house pressure
| to move in that direction.
| deepdarkforest wrote:
| For easy/format stuff for specific journals it will be useful.
| But please, please for the love of god don't try to give actual
| feedback. We have enough GPT generated reviews on openreview as
| it is. The point of reviews is to get deep insights/comments from
| industry experts who have knowledge ABOVE the LLMs. The bar is
| very low, i know, but we should do better as the research
| community.
| rjakob wrote:
| Totally agree! The tool is designed to provide early-stage
| feedback, so experts can focus their attention on the most
| relevant points at later review stages.
| yusina wrote:
| Please convince us that these are not just words. As much as
| I'd want to believe you, the sweet VC money is in the
| feedback that many people here advise you against. It will be
| hard to stay away from it.
| skartik wrote:
| I uploaded a paper.
| rjakob wrote:
| thanks! We'll get back to you with the feedback report
| (currently receiving loads of submissions).
| rjakob wrote:
| Once you receive the report, we'd really appreciate your
| feedback on what we can improve via
| https://docs.google.com/forms/d/1EhQvw-HdGRqfL01jZaayoaiTWLS...
| etrautmann wrote:
| Russ Poldrack just did a deep dive on reviewing papers with AI,
| finding serious issues with the results:
| https://substack.com/home/post/p-164416431
| rjakob wrote:
| Thanks for sharing. We'll take a closer look. There's
| definitely something we can learn from it.
| karencarits wrote:
| I'll hopefully get to test it soon. To me, LLMs have so far been
| great for proofreading and getting suggestions for alternative -
| perhaps more fluent - phrasings. One thing that immediately
| struck me, though: having 'company' in the URL makes me think
| corporate and made me much more skeptical than a more generic
| name would.
| rjakob wrote:
| Haha fair point, domain name was a 5-second, "what's available
| for $6" kind of decision. Definitely not trying to go full
| corporate just yet
| karencarits wrote:
| Great! Also, checking journal author guidelines is usually
| very boring and time consuming, so that would be a nice
| addition! Like, pasting the guidelines in full and getting
| notified if I am not following some specs
| rjakob wrote:
| We are already looking into that: https://github.com/robert
| jakob/rigorous/tree/main/Agent2_Out...
|
| Would be great to see contributions from the community!
| yusina wrote:
| IMO that's what this focus on. Language. That's what LLMs excel
| at. Perhaps branch out to providing localized papers to markets
| like China or France (hah, sorry).
|
| Judging the actual contents may feel like the holy grail but is
| unlikely to be taken well by the actual academic research
| community. At least the part that cares about progressing human
| knowledge instead of performative paper milling.
| stephenstevo wrote:
| Bank
| rjakob wrote:
| NOTE: We've received a bunch of submissions from you all (which
| is awesome -- thank you!).
|
| We're working through them and will send out reports asap!
|
| Since we're currently covering the model costs for you, we'd
| appreciate any feedback via this short form in return:
| https://docs.google.com/forms/d/1EhQvw-HdGRqfL01jZaayoaiTWLS...
|
| Thanks again for testing!
| Syrocco wrote:
| I uploaded a 5.5mb pdf on your website, but after a JSON error
| similar to another comment, I had to compress it (using a random
| online compressor) to 3.3mb for the upload to work!
| rjakob wrote:
| Thanks for the heads-up. We'll raise the file size limit
| shortly.
| isoprophlex wrote:
| Submitting your original, important, unpublished, research to
| some random entity. I would be VERY surprised if more than 2% of
| academics think this is a good idea.
| rjakob wrote:
| I wish my own manuscripts would be that important...
|
| Regarding security concerns: there is an open-source version
| for those who want full control. The free cloud version is
| mainly for convenience and faster iteration. We don't store
| manuscript files longer than necessary to generate feedback
| (https://www.rigorous.company/privacy), and we have no
| intention of using manuscripts for anything beyond testing the
| AI reviewer.
| karencarits wrote:
| I guess the paper would be complete enough to publish as a
| preprint at the stage where this specific service is most
| useful
| atrettel wrote:
| I'm a PhD and researcher who has worked in various fields,
| including at a national lab.
|
| I think AI systems like this could greatly help with peer review,
| especially as a first check before submitting a manuscript to a
| journal.
|
| That said, this particular system appears to focus on the wrong
| issues with peer review, in my opinion. I'll ignore the fact that
| an AI system is _not a peer_ since another person already brought
| that up [1]. Even if this kind of system was a peer, the system
| appears to be checking superficial issues and not the deeper
| issues that many peer reviewer /referrers care about. I'll also
| ignore any security risks (other posts discuss that too).
|
| A previous advisor of mine said that a good peer review needs to
| think about one major/deep question when reviewing a manuscript:
| Does the manuscript present any novel theories, novel
| experiments, or novel simulations; or does it serve as a useful
| literature review?
|
| Papers with more novelty are inherently more publishable. This
| system does not address this major question and focuses on
| superficial aspects like writing quality, as if peer review is
| mere distributed editing and not something deeper. It is possible
| for even a well-written manuscript to lack any novelty, and
| novelty is what makes it worthy of publication. Moreover, many
| manuscripts have at best superficial literature reviews that name
| drop important papers and often mischaracterize their importance.
|
| It takes deep expertise in a subject to see how a work is novel
| and fits into the larger picture of a given field. This system
| does nothing to aid in that. Does it help identify what parts of
| a paper you should emphasize to prove its novelty? That is, does
| it help you find the "holes" in the field that need patching?
| Does it help show what parts of your literature review are
| lacking?
|
| A lot of peer review is kinda performative, but if we are going
| to create automated systems to help with peer review, I would
| like them to focus on the most important task of peer review:
| assessing the novelty of the work.
|
| (I will note that I have not tried out this particular system
| myself. I am basing my comments here on the documentation I
| looked at on GitHub and the information in this thread.)
|
| [1] https://news.ycombinator.com/item?id=44144672
| yusina wrote:
| I would say the exact opposite. Let the machine do the
| easy/simple/boring stuff. Let the human peer reviewer do the
| big question. That's what the human is good at and excited
| about and it's what the machine will not be good at. The
| question is a philosophical one: Is this a good idea? Is it
| relevant? Is it important? This is highly subjective and needs
| folks in the field to build consensus about. Back in my PhD
| days, I'd have loved if a machine could have taken care of the
| simple stuff so humans could focus entirely on the big
| questions.
|
| (A machine could point to similar work though.)
| atrettel wrote:
| You raise a good point overall. I was just trying to respond
| to the idea of it replacing a human entirely, as if the
| authors submit it to the system and a journal editor has to
| made the decision to publish it or not. I would love to focus
| more on the big picture stuff, but in my experience most peer
| reviews amount to "Could you phrase this different?" rather
| than "Is this a good idea?". I think the latter is a much
| better better question to ask.
| rjakob wrote:
| Thanks for the thoughtful feedback. That's very helpful.
|
| We didn't think too deeply about the term "AI peer reviewer"
| and didn't mean to imply it's equivalent to human peer review.
| Based on your comments, we'll stick to using "AI reviewer"
| going forward.
|
| Regarding security: there is an open-source version for those
| who want full control. The free cloud version is mainly for
| convenience and faster iteration. We don't store manuscript
| files longer than necessary to generate feedback
| (https://www.rigorous.company/privacy), and we have no
| intention of using manuscripts for anything beyond testing the
| AI reviewer.
|
| On novelty. totally agree it's a core part of good peer review.
| The current version actually includes agents evaluating
| originality, contribution, impact, and significance. It's still
| v1 of course but we want to improve it. We'd actually love for
| critical thinkers like you to help shape it. If you're open to
| testing it with a preprint and sharing your thoughts on the
| feedback, that would be extremely valuable to us.
|
| Thanks again for engaging, we really appreciate it.
| atrettel wrote:
| No worries, I appreciate that you took the time to read and
| respond!
|
| When I first read "Originality and Contribution" at [1], I
| actually assumed it was a plagiarism check. It did not occur
| to me until now that you were referring to novelty with that.
| Similarly, I assumed "Impact and Significance" referred to
| more about whether the subject was appropriate for a given
| journal or not (would readers of this journal find this
| information significant/relevant/impactful or should it be
| published elsewhere?). That's a question that many journals
| do ask of referees, independent of overall novelty, but I see
| how you mean a different aspect of novelty instead.
|
| I'm not opposed to testing your system with a manuscript of
| my own, but currently the one manuscript that I have
| approaching publication is still in the internal review stage
| at my national lab, and I don't know precisely when it will
| be ready for external submission. But I'll keep it in mind
| whenever it passes all of the internal checks.
|
| [1] https://github.com/robertjakob/rigorous/blob/main/Agent1_
| Pee...
| mattmanser wrote:
| I had a quick look at the repo as I wondered what you meant by
| multiple specialized agents.
|
| Funsamentally each of those 24 agents seem to be just:
|
| "load from pdf > put text into this prompt > Call OpenAI API"
|
| So is it actually just posting 24 different prompts to a
| generalist AI?
|
| I'm also wondering about the prompts, one I read said "find 3-4
| problems per section....find 10-15 problems per paper". What
| happens when you put up a good paper, does this force it to find
| meaningless, nit-picky, problems? Have you tried papers which are
| acknowledged to be well written on it?
|
| From a programming perspective the code has got a lot of room
| from improvements.
|
| The big one is if you'd used the same interface for each "agent"
| you could have had them all self register and call themselves in
| a loop rather than having to do what you've done in this file:
|
| https://github.com/robertjakob/rigorous/blob/main/Agent1_Pee...
|
| TBH, that's a bit of a WTF file. The `def
| _determine_research_type` method looks like a placeholder you've
| forgotten about too, as it use a pretty wonky way to determine
| the paper type.
|
| Also, you really didn't need specialized classes for each prompt,
| you could have just had the prompts as text files a single class
| loaded as templates that you just replace text into. That will
| mean you're going to have a lot of work whenever you need to
| update the way your prompting works, having to change 24 files
| each time, probably cut/pasting which is error prone.
|
| I've done it before where you have the templates in a folder, and
| the program just dynamically loads them. So you can add more
| really easily. Next stage is to add pre-processor directives to
| your loader that allows you to put some config at the top of each
| text file.
|
| I'm also not looking that hard at the code, but it seems you dump
| the entire paper into each prompt, rather than just the section
| it needs to review, which seems like an easy money saver if you
| asked an AI to chop up the paper, then just inject the section
| needed to reduce your costs for tokens. Although you then run the
| risk of it chopping it up badly.
|
| Finally, and this is a real nitpick but it's twitch inducing when
| reading the prompts, comments in javascript are two forward
| slashes not a hash.
| rjakob wrote:
| Best feedback so far!
|
| You're right: In the current version each "agent" essentially
| loads the whole paper, applies a specialized prompt, and calls
| the OpenAI API. The specialization lies in how each prompt
| targets a specific dimension of peer review (e.g.,
| methodological soundness, novelty, citation quality). While
| it's not specialization via architecture yet (i.e., different
| models), it's prompt-driven specialization, essentially
| simulating a review committee, where each member is focused on
| a distinct concern. We're currently using a long-context, cost-
| efficient model (GPT-4.1-nano style) for these specialized
| agents to keep it viable for now. Think of it as an army of
| reviewers flagging areas for potential improvement.
|
| To synthesize and refine feedback, we also run Quality Control
| agents (acting like an associate editor), which reviews all
| prior outputs from the individual agents to reduce redundancy
| and surface the most constructive insights (and filter out less
| relevant feedback).
|
| On your point about nitpicking: we've tested the system on
| several well-regarded, peer-reviewed papers. While the output
| is generally reasonable and we did not discover "made up"
| issues yet, there are occasional instances where feedback is
| misaligned. We're convinced, however, we can almost fully
| reduce such noise in future iterations (Community Feedback is
| super important to achieve this).
|
| On the code side: 100% agree. This is very much an MVP focused
| on testing potential value to researchers, and the repeated
| agent classes were helpful for fast iteration. However, your
| suggestion of switching to template-based prompt loading and
| dynamic agent registration is great and would improve
| maintainability and scalability. We'll 100% consider it in the
| next version.
|
| The _determine_research_type method is indeed a stub. Good
| catch. Also, lol @ the JS comment hashes, touche.
|
| If you're open to contributing or reviewing, we'd love to
| collaborate!
| howon92 wrote:
| This is a great idea. Can you share more about what "24
| specialized agents" mean in this context? I assume each agent is
| not simply an LLM model with a specific prompt (e.g. "You're the
| world's best biologist. Review this biology research paper.") but
| is a lot more sophisticated. I am trying to learn how
| sophisticated it is
| rjakob wrote:
| Here is a description of how it works:
| https://github.com/robertjakob/rigorous/tree/main/Agent1_Pee...
___________________________________________________________________
(page generated 2025-05-31 23:00 UTC)