Jun 19, 2026 by Katsutoshi Seki
Tags: english paper science

Citation: Seki, K. (2026). How will paper peer review change in the age of AI. Zenodo. https://doi.org/10.5281/zenodo.20763535

In recent years, there has been active debate over whether it is appropriate to use AI for peer review of papers. However, much of this debate is premised on the current performance of AI. It is true that AI-based peer review has its limits at present, but what matters in designing institutions is not the present but the future. While AI capabilities will continue to improve, the institutions of academic publishing do not change easily. In this article, therefore, I consider how human reviewers and AI would divide their roles if AI were to be fully incorporated into the peer review process.

AI review

Problems with the current peer review system

In the current peer review system, human reviewers take on a great many roles. Reviewers read papers, search for logical errors and insufficient explanations, check the validity of cited references, and examine problems with statistical analyses and experimental design. At the same time, they must also judge whether the research is genuinely novel, how important it is, and whether it is worthy of publication.

In recent years, however, the burden of peer review itself has also been increasing. The spread of large language models has lowered the cost of writing papers, and researchers can now write papers more easily than before. As AI comes to support research activities, this trend will likely grow even stronger. As a result, while the number of submitted papers increases, it is hard to imagine that the number of humans available to review them will increase at the same rate.

If AI makes it easier to write papers, then unless AI is also introduced on the reading side—that is, on the peer review side—the balance of the entire academic publishing system may collapse.

AI handles the front stage of the peer review process

Against this backdrop, in the future, AI peer review may be incorporated into the stage prior to paper submission.

However, this does not mean that authors freely use AI of their own choosing. Considering the confidentiality of papers and the fairness of peer review, it will likely take the form of publishers or academic societies providing an official AI peer review system.

Once authors finish writing a paper, they submit the manuscript to the AI peer review system provided by that publisher. The AI reads the paper and points out logical leaps, insufficient explanations, missing citations, questions about statistical processing, and so on. Authors make revisions in response to these comments and have the AI check it again. This exchange may be repeated any number of times.

Even today, authors improve their manuscripts while receiving comments from collaborators and colleagues. The dialogue with AI can be thought of as a mechanism for doing this more systematically and continuously. Note that this back-and-forth at the draft stage is, so to speak, free consultation, and its content itself does not become part of the submission.

What is submitted is not the log but the session ID

The exchange with AI at the draft stage may amount to dozens, or in some cases hundreds, of rounds. If human reviewers had to read all of it, the burden would only increase, and there is no need in the first place to show reviewers the process of trial and error.

Therefore, what is included in the submission is not the exchange at the draft stage, but one more round of AI peer review that is executed at the point when the author judges the paper to be “complete.” The manuscript submitted to the AI at this stage is the completed version itself at that point. Accordingly, if the AI points out a new problem here, the author has only one option left. That is, to explain why that comment is not valid. If the author considers the AI’s comment valid and revises the manuscript, this means not a “revision” within the same session, but the creation of a new completed version of a different kind. In that case, the author must execute the final-check AI peer review anew for the new completed version. The author repeats this final check until reaching a stage where they can judge that all of the AI’s comments can be resolved through explanation. And once they judge that no further revisions or explanations are necessary, they submit.

In the publisher’s system, among this series of attempts, only the last final-check session corresponding to the manuscript that was actually submitted is recorded in association with the paper. The history of earlier attempts and of consultation at the draft stage is not included in the submission. In other words, what is submitted at the time of submission is not a lengthy log, but only the session ID pointing to this single, final session.

Human reviewers check the unresolved points

Human reviewers first read the paper. Afterward, as needed, they refer to the AI peer review session at that submission stage. Particularly important are the points on which the AI and the author did not reach agreement until the end.

The AI considers there to be a problem, but the author responded to it through explanation rather than revision. Such places become candidates for human reviewers to check intensively. However, we need to think carefully about how this mechanism affects the author’s behavior (this point will be taken up again at the end of the article). On the other hand, for problems that the AI pointed out and the author has already revised, reviewers may not need to follow the details.

Through this mechanism, human reviewers can concentrate not on the task of searching out problems from among countless possibilities, but on the task of evaluating already-organized points.

The reviewer’s role shifts from “error hunting” to “value judgment”

If such a mechanism takes hold, the very role of human reviewers will likely change.

Much of the technical work—discovering logical contradictions and insufficient explanations, searching for related literature, and checking statistical processing—will come to be handled by AI. On the other hand, human reviewers will spend more time on judgments such as whether the research is genuinely novel, how much impact it has on the field, and whether it is worth publishing.

Of course, there will likely continue to be cases where humans discover problems the AI overlooked. However, I think that the main role of human reviewers will gradually shift from “searching for errors” to “evaluating the value of the research.”

Publishing the session log as an appendix to the paper

The final session at the submission stage is not only recorded within the publisher’s system; when a paper is accepted, perhaps that record could also be published as an appendix.

In discussions of open peer review by human reviewers, the resistance to having reviewers’ identities and evaluations made public has long been the greatest obstacle. Reviewers fear retaliatory evaluations from authors in response to candid criticism, and so they seek anonymity. However, in an AI peer review session, there is no “reviewer’s career” to protect. The greatest objection that open peer review has faced does not apply to this design.

Furthermore, when combined with the design in which the author has only “explanation,” not “revision,” as an option in the final session, another effect can be expected. This is because the author’s explanation comes to be seen not only by a limited readership of editors and reviewers, but by the eyes of all experts who read the paper. Rather than entrusting it to the judgment of a single reviewer, the verification of the validity of the explanation is distributed across the entire field. This is an idea that naturally connects with the practice, used by some academic journals, of publishing review comments and author responses.

However, if publication were left to the author’s free choice (opt-in), the very fact of “not publishing” could give rise to suspicion that something is being hidden, and as a result it could come to have de facto coercive force. To avoid this, it would be more realistic to operate on the principle that accepted papers are published. On the other hand, how to handle this session when a paper is rejected or resubmitted to another publisher has not yet been sufficiently examined.

And this mechanism of publication is connected to a more fundamental problem concerning author incentives. I would like to discuss that again in the next section.

However, in order to make the mechanism described so far actually function, there remain issues that have not yet been resolved. Finally, I would like to list the main points of contention.

Points of contention that require future discussion

Author incentives and the risk of “pandering”

In this design, the places within the final session at the submission stage where the AI and the author did not agree become the focus of human reviewers’ intensive checking. However, the rule that disagreement itself draws the reviewers’ attention turns the author’s genuine rebuttal itself into a risk. If authors come to think it is safer to superficially accept the AI’s comments and revise even when they are confident, the quality of scientific discussion could end up declining.

The proposal to publish the session log as an appendix to the paper does not resolve this problem; rather, it may strengthen it in a different form. If the explanation is to be published, the author will write that explanation not as “words to convince a single reviewer” but as “a permanent record exposed to the eyes of the entire field.” This can work toward raising the quality of the explanation, but at the same time it can also work toward increasing the psychological cost of leaving the disagreement with the AI on record. As a result, there is a risk that the pressure toward “just revising it for now” rather than “rebutting” will become even stronger than before.

This is a problem that exists in a different form even in current peer review between humans, but in a mechanism where the dialogue with AI is mechanically recorded and may ultimately be made public, that pressure may work more strongly and more uniformly. Operational ingenuity is needed so as not to make disagreement a “hidden penalty,” but this is a problem that involves peer review culture and the training of evaluators, and it is not something that can be solved immediately through technology.

The governance problem of single/multiple AI systems

The design in which publishers or academic societies provide an official AI peer review system seems reasonable from the standpoint of confidentiality and fairness. However, this creates a new problem of power concentration. If one publisher depends on a single AI vendor, the systematic biases and blind spots of that AI affect the review of every paper that publisher puts out. Conversely, if different vendors are adopted by each publisher, review standards could implicitly fragment by field or medium.

Whichever way it proceeds, problems that belong to governance rather than technology come to the fore—such as the criteria for selecting AI vendors, the transparency of evaluations, and ensuring consistency among vendors. This is a point of contention that cannot be resolved by the judgment of a single researcher or a single publisher, and requires consensus-building across the entire academic publishing world.

Evaluation and auditing of the AI peer review system itself

When introducing an official AI peer review system, the question of who evaluates that AI itself, and how, becomes a problem. Since the AI influences the author’s revision policy and the allocation of human reviewers’ attention, it is not a mere auxiliary tool but a part of the institutional infrastructure of academic publishing.

Therefore, it is necessary to continuously evaluate which kinds of problems the AI is good at discovering and which kinds of problems it tends to overlook. It is also necessary to verify whether there is any systematic bias toward particular research methods, theoretical positions, writing styles, languages, or regions. Furthermore, since review standards may implicitly change with model updates, mechanisms such as version control, impact assessment at the time of updates, and third-party auditing will likely be necessary.

Differences by field

The usefulness and risks of AI peer review differ by field. In fields with many checking items that are relatively easy to formalize—such as statistical analysis, experimental design, data processing, and conformity to reporting guidelines—AI peer review may function effectively.

On the other hand, in fields where theoretical originality, reinterpretation of concepts, historical context, and the fine details of reading literature are important, the AI may lean too heavily toward existing standard understandings. In that case, AI peer review may work in the direction of encouraging conformity to existing frameworks rather than supporting new arguments. Therefore, rather than introducing AI peer review uniformly across all fields, operation suited to the peer review culture and the nature of papers in each field is necessary.

Cost burden and fairness of access

An official AI peer review system incurs costs for computational resources, maintenance, security, auditing, and so on. Who bears that cost—the publisher, the academic society, the author, the research institution, or the funding agency—is an important point of contention.

If the usage fee is added on top of submission fees or publication fees, it could become a new barrier for authors with limited research funds. Moreover, if only major publishers and leading academic societies can put in place high-performance AI peer review systems, the gap with small-scale journals could also widen. If AI peer review is to be institutionalized, mechanisms for ensuring fairness of access—such as fee waivers, shared infrastructure, and nonprofit foundations—need to be discussed as well.