Why GenAI Makes Research Evaluation Reform Even More Urgent 

December 03, 2024

There is a lot of hype surrounding generative AI (GenAI) tools, with discussions ranging from fears of AI taking over the world to hopes that it could solve crises like climate change. In academic research, GenAI currently offers tools that can make our work easier by automating repetitive tasks, helping with grammar and debugging code. In theory, these tools should allow researchers to focus on more meaningful aspects of their work, such as critical thinking, reviewing, and supervising students. GenAI also has great potential to boost research quality by supporting literature search, aiding in the understanding of complex concepts, pointing out different perspectives or concepts from other disciplines, and brainstorming next steps. However, an increase in quality of research through AI is not guaranteed - thanks to the current academic reward system, which focuses almost exclusively on research output in the form of published papers.
 
 There is often little recognition or reward for conducting thorough, transparently documented research, especially if it does not yield results deemed 'exciting' enough for publication in so-called 'high-impact journals'—the prestigious outlets in each field. GenAI is becoming increasingly proficient at generating finished products—whether polished text, code, or visual outputs. The ease with which these products can be used created makes it tempting for researchers to rely heavily on AI tools to speed up paper production. Is this necessarily a problem? Given how overworked researchers are, any help they can get should be appreciated. However, I do see some significant dangers with outsourcing significant amounts of the research process to 'black box' tools, which will make it even harder to evaluate and assure the quality of the research process behind finished papers. In this blog post, I will explore why the rise of GenAI, combined with the flaws in current evaluation systems, makes reforming research assessment more urgent than ever.

Where Overreliance on GenAI Can Be Dangerous

1. Data Analysis

GenAI is highly effective at teaching and assisting with data analysis. You can upload spreadsheets and receive fully completed statistical analyses with conclusions. It can even write entire analysis scripts for you. This process, if done manually, can be time-consuming, especially for researchers with limited experience in data analysis. However, if you do not fully understand the code that GenAI generates, it is easy to overlook important analytical details. Any decision you make during the data analysis pipeline needs to be well justified and the principles of good scientific practice require that researchers remain accountable for the entire research process. Over-reliance on GenAI can undermine this accountability. Additionally, AI tools can unintentionally lead to p-hacking—a questionable research practice in which multiple exploratory analyses are run, increasing the likelihood of false positives and ultimately compromising the integrity of research findings.

2. Writing Papers / Thinking with GenAI

Writing a high quality scientific paper requires deep understanding of the topic and logical coherence of your thoughts. Ideally (and of course this is not the case for all papers I have ever read), every word must connect back to a well-thought-out argument. When GenAI steps in to aid with writing, it can change the meaning in subtle ways, and this may go unnoticed if you do not have a clear, logical idea in mind. AI-generated text may sound convincing, but the logic may not hold. In fact, for me, a lot of my thinking actually happens while writing, because the writing process forces me to think about arguments more deeply. Nevertheless, I have for example used ChatGPT to help write this article. In this case I knew pretty well what I wanted to convey, asked for some phrasing suggestions and I then used the output for my next iteration. This helped me work more efficiently, but I still took ownership of the text and made sure it reflects what I want to say. When working under pressure, having a tool that can also help seemingly fix some of the missing links, is surely welcome. GenAI can accelerate paper production and I fear that it will substantially contribute to the already existing problem of low-quality literature, papers that are polished on the surface but lack substance, flooding the research landscape. We are already seeing this manifest in the rise of paper mills

 

3. Plagiarism and Attribution

While it is possible for multiple people to arrive at similar ideas independently, coincidentally producing the exact same phrasing is statistically unlikely in traditional writing. Such copying is called plagiarism and considered a form of scientific misconduct. In academic work, direct quotes or paraphrased ideas must be appropriately cited. With GenAI, however, the way a large language model encodes and represents information is only partially understood, and it is nearly impossible to trace specific outputs back to the data that influenced them. As a result, proper attribution becomes problematic when using GenAI. This may not be a significant issue when AI is used for tasks like grammar checks, but it becomes a serious concern when AI generates entire paragraphs of polished content or creating illustrations. With the uptake of these tools, we will have to have broader debates about the nature of creativity and what intellectual work we aim to protect and for what reasons. As of now, using someone else’s wording or ideas without attribution is considered a serious violation of academic ethics. In addition to that, with GenAI, we will not only have more literature, but the literature may get less and less diverse in terms of which ideas are expressed.

Why GenAI Worsens the Existing Incentification Problems

The current pressure to publish, particularly in 'high-impact journals', already drives researchers toward questionable research practices. GenAI opens even more doors for boost your publication lists. When the focus is on output—on how many papers you can produce rather than the robustness of your research process—AI tools are likely to be misused. It is easy to imagine researchers delegating large portions of their work to AI, like we would to a student or research assistant. Of course, a student would also complete tasks independently, but the supervisor still holds responsibility to ensure the work is done correctly. Over time, as trust builds, they might double-check their work less rigorously. I can see this same dynamic emerging with AI tools. The critical difference, however, is that students themselves can be held accountable for adhering to good scientific practices. AI, on the other hand, cannot.

 

One can argue that research has always required a certain level of trust in tools. Even before GenAI, we trusted that hardware was constructed correctly and that software programs worked as intended. This is true, but responsible researchers always perform sanity checks on the tools they use. So the same should apply to GenAI. To conduct these checks, we need a solid understanding of the processes that generate the results. The more experience a researcher has with these processes, the easier it is to spot potential errors. However, if the next generation of scientists becomes accustomed to automating entire processes simply because they are rewarded for output, the research process itself will likely be neglected more and more.

The Need for Transparency

How can we preserve (or better: bring back) the focus on the processes that create the scientific output? By rewarding researchers for making their entire research process as transparent as possible. Not just rewarding them for a finished paper, but also for well documented and reviewed code, shared and reproduced protocols, thought-through research strategies and replication samples. If we want to ensure quality in research, research evaluation needs to focus on the entire research process, not just the pretty Pdf that comes at the end. Only this way, we can even attempt to properly evaluate the quality of a piece of scientific work. This ties into the broader movement towards open science, where transparency and accountability are key to advancing knowledge responsibly.

GenAI makes the need for transparency more urgent than ever, but also more challenging. Simply stating that you used a particular AI tool for certain parts of your research is an important declaration, but it does not tell the whole story. It is equally critical to explain why you used the tool, what it does, and how you validated its output. While GenAI can certainly speed up and streamline processes, you still need to fully understand what is happening in order to remain accountable and transparent. This is essential to maintaining good scientific practice and ensuring the integrity of your research.

Impact on the Academic Environment

In a recent article, Watermeyer et al. critically discuss the broader implications of AI tools on the academic working environment. One of the dangers discussed is that the pressure to produce in combination with GenAI, which can do tasks out of the researcher’s own skillset, we will pushing academics further towards the idea of individualistic excelling and away from the collaborative nature of robust research. 

In the current academic system, success is usually measured by journal impact factors, citation counts, and the sheer number of papers published. The more researchers are pushed to meet these numerical benchmarks, the less time they have to deeply engage with their work. If not managed carefully, GenAI could exacerbate this problem, making it easier to produce papers but harder to ensure they make meaningful contributions to their fields. What I hope to see is GenAI being used to alleviate researchers’ workloads, giving them the space to develop more grounded theories, create more robust tools, and dedicate more time to collaboration with other researchers and broader society. Achieving this, however, requires a shift in how we evaluate and hire researchers, rewarding them for such activities.

Conclusion: Reform is Urgent

If we continue to reward output over process, GenAI will exacerbate the problems that we already have with reliability of scientific findings. We need to reform how research is evaluated, focusing on transparency, process, and quality over quantity. Only then can we ensure that AI tools are used to enhance, rather than undermine the integrity of academic research. To see which reforms are already ongoing, see my previous blog post.