The ethics of using artificial intelligence in writing medical research papers
Article information
Abstract
The rapid integration of large language models into medical publishing offers considerable potential for improving drafting efficiency but simultaneously raises substantial concerns regarding research integrity, accountability, and the reliability of the scientific record. Recent incidents in which artificial intelligence (AI) systems were listed as coauthors have prompted urgent regulatory revisions. In this review, we identify a global consensus that strictly prohibits AI authorship, as algorithms lack both moral agency and legal accountability. Transparency has emerged as an essential requirement, and undisclosed AI use is increasingly regarded as a form of ethical misconduct. Key risks include “hallucinations” (notably citation fabrication), algorithmic bias, and potential violations of privacy regulations (e.g., the Health Insurance Portability and Accountability Act) when protected health information is processed through cloud-based platforms. The analysis indicates that rigid prohibitions are operationally unenforceable, supporting instead a “human stewardship” model in which AI functions as a drafting scaffold subjected to rigorous human verification. AI represents a lasting transformation in medical writing that necessitates a shift from simple prohibition to structured governance. To preserve epistemic validity, we propose a framework built on task segmentation, mandatory cross-referencing of claims, and data sovereignty. Ultimately, AI must remain a transparent assistive tool, with full responsibility for the manuscript residing exclusively with the human investigator.
Introduction
Recent years have witnessed a rapid convergence of medical publishing and artificial intelligence (AI) [1,2]. With the advent of accessible large language models (LLMs) such as ChatGPT in late 2022, the threshold for utilizing generative text technologies has effectively vanished [3]. Although these tools hold significant potential for alleviating the burden of manuscript drafting and polishing [4,5], their integration inevitably raises substantial concerns regarding research integrity and ethical liability [6].
Despite their utility in refining academic prose, LLMs have come under scrutiny for generating hallucinations and counterfeit references, thereby compromising the reliability of scientific literature [7]. The vulnerability of the publishing ecosystem was starkly illustrated by recent incidents in which AI models were formally recognized as coauthors, a practice that drew sharp criticism [8-10]. These events served as a tipping point, prompting immediate and decisive action from medical journals to redefine the boundaries of accountability and authorship.
There is a growing consensus that while AI functions as a robust mechanism for scholarly assistance, its application requires rigorous oversight to maintain the epistemic validity of medical science [11,12]. In response, the academic publishing community has mobilized to establish governance frameworks capable of addressing these novel legal and ethical complexities [13,14]. Central to these emerging protocols are the dual imperatives of absolute transparency and protection of scientific rigor [15,16]. Against this backdrop, this narrative review offers a structured analysis of the ethical implications of AI in medical authorship. The article follows a formal IMRaD structure—comprising Introduction, Methods, Results, Discussion, and Conclusion—to ensure a comprehensive evaluation of this shifting paradigm.
As this study relied exclusively on publicly available literature, Institutional Review Board approval was not required.
Methods
To delineate the ethical landscape of AI-assisted medical writing, a comprehensive narrative review was conducted. We performed a targeted literature search of PubMed/MEDLINE and Google Scholar, covering the period from inception through November 2025, using search terms including “AI writing medical research ethics,” “AI in scientific publishing guidelines,” and “ChatGPT authorship medical journals.” The inclusion criteria encompassed peer-reviewed original research, editorial perspectives, and consensus statements addressing the integration of generative AI in manuscript preparation. The analysis prioritized regulatory frameworks established by major governance bodies, specifically the International Committee of Medical Journal Editors (ICMJE), World Association of Medical Editors (WAME) [17], and Committee on Publication Ethics (COPE). Additionally, we examined the submission protocols of high-impact journals such as The New England Journal of Medicine (NEJM), Nature, Science, the Journal of the American Medical Association (JAMA), and the British Medical Journal (BMJ) to identify evolving standards. Information from these sources was thematically synthesized into six core ethical domains: authorship accountability, transparency, plagiarism/originality, factual accuracy, algorithmic bias, and data confidentiality.
Results
We identified key ethical dimensions requiring vigilance in AI-assisted scholarship, particularly concerning authorship and accountability. A review of global guidelines reveals a strict prohibition against listing generative AI as a coauthor [11,18] (Table 1). This restriction is grounded in the understanding that AI lacks the capacity for accountability—a core requirement for authorship [19,20]. As AI cannot accept responsibility for the accuracy or integrity of a study, it fails to meet the established criteria for academic credit. The 2023 ICMJE guidelines reinforce this, stipulating that AI contributions must be acknowledged within the text but never included in the byline [11]. Therefore, human authors must retain full ownership of the scientific output. Even when AI assists in text generation, the human researcher acts as the final guarantor of the work’s quality, bias control, and adherence to ethical norms [21,22]. In practice, this requires AI to be used solely as a drafting aid, subject to meticulous human review to ensure that scientific validity is not compromised by automated content generation [14,15].
Transparency and disclosure represent the bedrock of trust in the evolving landscape of AI-assisted scholarship [23,24]. A robust consensus has been crystallized within the academic community: the use of generative AI must be explicitly declared [25]. Leading publishers and research institutions now mandate that any application of these tools—ranging from initial drafting to linguistic refinement—be formally documented within the Methods, Acknowledgments, or a designated disclosure section [26-28]. The primary objective of these protocols is to provide editors and peer reviewers with the context necessary to rigorously assess the manuscript’s integrity. Noncompliance with these disclosure norms is increasingly categorized as a serious ethical violation, carrying potential repercussions ranging from editorial rejection to post-publication retraction. For example, the American College of Gastroenterology has stipulated that failure to attest to AI use constitutes a breach of publication ethics [24]. Similarly, major high-impact journals including Nature, BMJ, NEJM, JAMA, and Science have codified policies in 2023–2024 requiring granular reporting; authors are often expected to specify the software version, prompts employed, and precise sections of text affected [29-31]. This level of transparency ensures that the distinction between human intellectual contribution and algorithmic output remains unambiguous. Although specific formatting requirements may vary across journals, the underlying ethical imperative is uniform: obscuring AI involvement is deceptive [32,33]. By adhering to open disclosure, authors uphold the principle that AI is a tool to be acknowledged rather than a silent collaborator to be concealed.
The domain of plagiarism and originality presents a complex ethical frontier in the era of AI-augmented authorship. Plagiarism—defined as the uncredited appropriation of another’s intellectual work—remains a cardinal violation of research integrity [34,35]. Because LLMs operate by synthesizing probabilistic patterns from massive training corpora of existing literature, there is a tangible risk that they may inadvertently reproduce verbatim phrases or distinctive syntactic structures from prior publications [14,36]. Consequently, researchers utilizing these tools must exercise extreme vigilance to ensure that algorithmic outputs do not constitute derivative works lacking proper attribution. Routine editorial screening now involves sophisticated plagiarism detection software; thus, AI-generated text that closely mirrors published sources is likely to trigger similarity flags [37-39]. Furthermore, the absence of genuine semantic understanding in AI means it may regurgitate common scientific descriptions that encroach upon plagiarism if not rigorously cited. Beyond textual similarity, significant ambiguities persist regarding intellectual property (IP) rights [15].
The terms of service for various AI platforms differ markedly—some assigning ownership to the user only under specific payment tiers (Midjourney) [40], while others grant ownership but retain complex rights over generated content. For instance, both OpenAI [41] and Google [42] explicitly state that while users own their output, the models 'may generate the same or similar content for others,' thereby creating potential legal disputes over the provenance and exclusivity of the text.
There is also a latent risk of copyright laundering, wherein an AI incorporates protected material from its training data into the output without citation [43]. Ethically, the burden of verification rests entirely with the human author, who must guarantee that the manuscript is original. To mitigate these risks, AI should be employed strictly as a scaffolding tool for drafting, with the final manuscript subjected to rigorous human editing and citation verification to uphold the standards of academic honesty [44,45].
A paramount ethical challenge in AI-assisted writing lies in the domain of accuracy and verifiability [46-48]. Generative models have an inherent limitation often termed hallucination or confabulation, wherein the system synthesizes rhetorically persuasive but factually spurious content [49,50]. Within the rigorous context of medical research, this propensity poses a severe risk; data interpretations, clinical claims, or statistical figures generated by AI may be fundamentally erroneous despite their authoritative tone. Of particular concern is the documented phenomenon of citation hallucination, where models invent nonexistent bibliographic references or misattribute findings to unrelated sources [51]. This behavior threatens the epistemic integrity of the scientific record, as peer review and reproducibility depend entirely on the ability to trace assertions back to primary evidence. The propagation of such illusory citations not only wastes the time of readers and editors but also creates a false lineage of knowledge. Consequently, the burden of verification falls unequivocally on the human investigator. It is incumbent upon authors to rigorously cross-reference every AI-generated claim and citation against established databases (e.g., PubMed/MEDLINE) to confirm their existence and relevance [13,52]. Treating AI outputs with skepticism is a nonnegotiable standard; failure to perform due diligence in verifying algorithmic content may be construed as professional negligence or scientific misconduct, as it compromises the veracity of the published work [53,54].
The issue of bias and fairness is a critical ethical consideration. LLMs are trained on massive datasets that are not manually curated; as a result, they tend to absorb and reproduce the historical prejudices and structural imbalances present in human text [55]. Without careful human supervision, there is a risk that these tools will repeat or even amplify these inequities. In medical writing, this problem often manifests as a lack of inclusive terminology or a narrow focus on research topics relevant only to certain populations [56]. There is also the risk of citation bias, whereby AI suggests references primarily from prominent Western journals because they appear most frequently in its training data, thereby overlooking important work from other regions [57,58]. While some argue that AI could make language more neutral, experts warn that these models contain hidden biases that may simply replace one form of discrimination with another [59]. Therefore, authors must act as a filter. It is the responsibility of the researcher to carefully edit any AI-generated text to ensure it meets the standards of equity and objectivity expected in medicine (Table 2).
The preservation of privacy and confidentiality stands as a paramount obligation in the era of algorithmic writing. Medical scholarship frequently involves handling sensitive datasets, including protected health information and proprietary clinical trial results. A critical ethical hazard arises when such data is submitted to generative AI systems, which predominantly operate on external, third-party cloud infrastructure [60]. Because user inputs are often retained to retrain model architectures, there is a tangible risk that confidential information could be absorbed and subsequently exposed [61]. Specifically, the input of identifiable patient details into commercial AI platforms may constitute a direct violation of privacy laws, such as the Health Insurance Portability and Accountability Act in the United States [62]. Furthermore, the transmission of unpublished manuscripts—potentially containing novel findings or confidential peer review data—to unsecured AI servers jeopardizes data sovereignty. In response to these concerns, editorial bodies have implemented strict prohibitions; for example, JAMA explicitly prohibits peer reviewers from using AI to process submissions, citing breach of confidentiality as a disqualifying factor [20]. Consequently, researchers must abandon the assumption that online tools are secure by default. Ethical practice dictates that AI utilization be strictly compartmentalized: it may be applied to nonsensitive structural editing but must be rigorously excluded from processing raw patient data or IP [63]. Where AI is deemed necessary for sensitive tasks, it should be restricted to local, offline environments that guarantee data isolation [64,65]. Ultimately, the efficiency conferred by AI cannot supersede the inviolable duty to protect patient privacy and research confidentiality; compliance with institutional data protection standards remains nonnegotiable.
The integration of AI into scientific writing presents a classic dual-use dilemma: it offers substantial efficiency gains that must be carefully balanced against ethical vulnerabilities [66,67]. The arguments for beneficence are strong, particularly in terms of linguistic equity; AI tools can level the playing field for researchers facing language barriers, thereby enhancing the diversity of the scientific record [68]. Moreover, by managing the mechanical aspects of writing, AI enables researchers to focus on substantive inquiry. Recognizing these advantages, the field is shifting away from prohibition—which risks creating a shadow economy of unregulated use—toward a model of transparency and oversight [69]. The results of this review highlight that the integrity of the manuscript depends on the primacy of the human author. AI should be conceptualized as a scaffold for writing not an architect of ideas. Ethical practice dictates that authors must remain the sole originators of the scientific narrative, using AI only to refine expression. One proposed framework asserts that responsible use necessitates comprehensive human verification of all AI-generated suggestions to ensure the work remains a genuine scholarly effort [70]. Ultimately, the distinction between ethical support and academic misconduct lies in the depth of human engagement: AI can assist in the process, but it cannot be allowed to supplant the intellectual labor of research.
Discussion
Collectively, these findings clearly show that the integration of AI into medical scholarship necessitates a rigorous ethical infrastructure. The dominant paradigm emerging from the literature is one of human stewardship: while AI may serve in an auxiliary capacity to refine the mechanics of prose, the preservation of scientific integrity remains the exclusive domain of the human investigator. A strong consensus has crystallized around several inviolable tenets, most notably the absolute disqualification of AI from authorship eligibility [71]. By uniformly proscribing the listing of algorithms as coauthors, the scientific community reinforces the principle that authorship is not merely a function of text generation but of intellectual responsibility. This distinction is critical; it ensures that the locus of accountability—for factual accuracy, ethical compliance, and originality—resides with a moral agent capable of answering for the work [72,73]. Ultimately, this framework defends the human element of research, positioning AI as a sophisticated instrument subject to the unwavering control and scrutiny of the author.
A second nonnegotiable standard is transparency. Ethical practice requires that the utilization of AI tools be fully visible to the reader [71]. This act of disclosure is not a trivial formality but a crucial safeguard for the validity of the scientific record. By revealing which sections were generated or refined by AI, authors enable the audience to calibrate their trust and scrutinize the content for potential machine-generated inaccuracies. Transparency also serves a broader community function: it allows the field to identify patterns of error—such as specific types of hallucinations—that may be linked to particular algorithms [23,74-76]. The current trend toward requiring comprehensive disclosure (listing model architecture, version numbers, and specific prompts) is vital for ensuring the reproducibility of the drafting process. This level of detail transforms the black box of AI writing into a transparent methodology. By treating AI usage as a form of methodological disclosure—comparable to reporting statistical software or laboratory reagents—researchers uphold the long-standing tradition of openness that underpins credible science.
The discussion also underscores the critical role of education and cultural norms in mitigating ethical risks. Although formal regulations establish a necessary scaffold, they are insufficient to ensure ethical behavior universally. This insufficiency is exemplified by the failure of AI-detection tools [77,78]. Technological efforts to discern AI-generated text remain unreliable, frequently producing false positives in which authentic human writing is erroneously flagged [78]. As detection becomes increasingly impractical, the scientific community must abandon the pursuit of perfect policing in favor of fostering ethical stewardship. Experts argue that the essence of authorship lies in intellectual ownership rather than text generation [72,79]. Thus, provided the human author retains full control over the analysis and vouches for its accuracy, AI functions as a permissible auxiliary tool. The challenge is to ensure AI is treated as an aid and not as a scientist. To achieve this, mentorship and institutional training must address the limitations of generative models, ensuring that researchers understand the distinction between augmentation and automation [80,81]. By embedding these values into academic training, AI can be normalized as a method for enhancing productivity without compromising standards. Ultimately, AI offers operational speed, but it can never replace the ethical and critical burden of the human investigator.
It is also evident that total bans represent an ineffective solution. Early attempts by journals such as Science to proscribe AI-generated text entirely faced criticism for being operationally unenforceable [9,68,82]. Scholars argue that such rigidity inadvertently encourages opaque, undisclosed usage, thereby making the influence of the technology difficult to monitor [68,69]. A superior alternative is the adoption of pragmatic governance. This approach is exemplified by the recent Delphi recommendations in Regional Anesthesia & Pain Medicine, which emphasize the development of ethical guardrails and constructive guidelines rather than strict prohibition [83]. The group achieved consensus on mandating transparency, precluding AI authorship, and creating operational checklists for editors. This represents a strategic shift from interdiction to regulation, aiming to integrate AI into the scientific process under strict oversight. By acknowledging the ubiquity of these tools, this framework focuses on managing risks rather than resisting adoption. Crucially, it promotes fairness by establishing a uniform set of rules, preventing disparities between transparent users and those who might employ AI covertly. In the long term, such well-defined guidelines will likely elevate the ethical stigma of undisclosed AI use to the level of serious misconduct. Ultimately, passing off algorithmic text as human work may come to be viewed with the same gravity as data falsification.
Practical implementation is the final, critical step in the ethical adoption of AI. Based on a synthesis of emerging evidence and current editorial guidelines, we propose five concrete pillars for medical researchers. (1) Task segmentation: deploy AI strictly for mechanical improvements (e.g., grammar and formatting) while retaining human control over scientific interpretation and result synthesis to mitigate error. (2) Rigorous validation: treat all AI-generated content as a draft requiring skepticism; manual verification of statistics and references is mandatory. (3) Archival transparency: maintain a record of the specific prompts used. This practice supports reproducibility and serves as evidence of authorship if challenged; some guidelines even recommend submitting these logs. (4) Data sovereignty: ensure no protected health information is transmitted to unsecured servers; anonymization is a prerequisite for use. (5) Methodological citation: acknowledge AI appropriately as a software tool. Current scholarship rejects AI authorship, advocating instead for technical citation (e.g., ChatGPT-5 used for editing) within the methods section or references [17,68,84]. By adhering to these protocols, the research community can harness the speed and clarity offered by AI technologies while simultaneously reinforcing the ethical rigor that defines medical science (Fig. 1).
A framework for the ethical integration of artificial intelligence (AI) in medical manuscript development and reporting. This diagram outlines five operational pillars intended to guide medical researchers in the responsible use of AI tools throughout the writing and publication processes. The framework highlights the necessity of maintaining human intellectual control over scientific interpretation while allowing AI to support mechanical aspects of drafting. It requires rigorous manual verification of all AI-generated content as provisional material, mandates archival transparency of prompts to support reproducibility, safeguards patient information through anonymization before use, and complies with current standards stipulating that AI be cited as a methodological tool rather than listed as an author. PHI, protected health information.
Conclusions
AI technology represents a permanent evolution in medical writing, yet its use is contingent upon strict ethical compliance. This review underscores that maintaining the trustworthiness of scientific publishing requires authors to treat AI as a transparent, assistive tool rather than as an intellectual proxy. The ethical imperative is twofold: leveraging AI to improve clarity and efficiency while rigorously safeguarding accuracy, confidentiality, and originality through human verification. As technology advances, the focus must shift toward educational initiatives that normalize responsible use and updated governance frameworks. Ultimately, the partnership between human expertise and AI can benefit science, but only if moral agency and final liability for the work remain exclusively with the human investigator.
Notes
Conflicts of interest
Hyunyong Hwang is an associate editor of the journal but was not involved in the peer reviewer selection, evaluation, or decision process of this article. No other potential conflicts of interest relevant to this article were reported.
Funding
None.
Author contributions
Conceptualization: SY, HH. Data curation: SY, HH. Formal analysis: SY, HH. Investigation: SY, HH. Methodology: SY, HH. Project administration: HH. Resources: SY, HH. Supervision: HH. Visualization: HH. Writing-original draft: SY. Writing-review & editing: HH. All authors read and approved the final manuscript.
