Generative AI and Psychiatric Diagnosis: Progress or Threat to Subjectivity?

neuronapis
May 2
6 min read

By Paulo HA Oliveira | NEURONAPIS Research Center

The incorporation of generative artificial intelligence (GAI) systems into clinical practice in mental health is one of the fastest-growing and, paradoxically, least evaluated phenomena in contemporary medicine. Between 2023 and 2025, the number of scientific publications on large language models (LLMs) applied to psychiatry grew significantly—but the speed of technical production has not been matched by the depth of ethical and epistemological reflection that the field demands.

This article does not propose to reject technological potential. Rather, it proposes to examine it with the rigor that the complexity of human psychological suffering demands.

1. The Current Scenario: Documented Capabilities

Peer-reviewed literature documents promising applications of LLMs in psychiatry. A systematic review published in Frontiers in Psychiatry (Omar et al., 2024), based on 771 retrieved studies and 16 included according to PRISMA criteria, identified three main domains of use: clinical reasoning, social media analysis, and educational support. The GPT-3.5 and GPT-4 models demonstrated the ability to generate psychodynamic formulations with statistical significance (Kendall's W = 0.728; p = 0.012) and superior alignment with general practitioner recommendations for cases of mild depression.

More recently, in August 2025, a study published in Frontiers in Psychiatry ( Digital Mental Health section) evaluated 15 next-generation LLMs—including DeepSeek-R1, GPT-4.1, and Llama4—in diagnostic screening tasks. The results indicated that models such as DeepSeek-R1, QwQ, and GPT-4.1 performed better in assessing clinical knowledge and diagnostic support, with still significant limitations in generalization to non-Western contexts (Frontiers in Psychiatry, DOI: 10.3389/fpsyt.2025.1646974).

A systematic review published in the Journal of Medical Internet Research – Mental Health (Wang et al., 2025), conducted according to PRISMA 2020 guidelines in six databases (PubMed, ACM, Scopus, Embase, PsycInfo, and Google Scholar), analyzed 79 studies (out of 783 identified) on algorithmic diagnostic tools in mental health between 2019 and 2024. The authors acknowledge advances in diagnostic accuracy and reduced access barriers, but identify ethical concerns across all domains—privacy, algorithmic bias, and the absence of consolidated regulatory frameworks (PMC12254713).

2. The Problem of Structural Bias: Empirical Data

One of the most significant—and disturbing—findings in the field emerged in a study published in npj Digital Medicine (Nature) in June 2025. Bouguettaya, Stuart, and Aboujaoude evaluated four leading LLMs (Claude, ChatGPT, Gemini, and NewMes-15) using ten psychiatric cases representing five distinct diagnoses, presented under three experimental conditions: neutral race, implied race, and explicitly stated race.

The evaluators—a clinical psychologist and a social psychologist—scored 120 outputs to identify bias. The results revealed that when the patient's race was indicated (implicitly or explicitly), the models frequently proposed inferior treatment plans—even though diagnostic decisions remained relatively stable. NewMes-15 exhibited the highest degree of racial bias, while Gemini showed the lowest. The study concluded that LLMs have the potential to perpetuate racial disparities in psychiatric care (Bouguettaya et al., npj Digital Medicine , 8:332, 2025. DOI: 10.1038/s41746-025-01746-4).

This finding is not accidental—it is structural. LLMs are trained on corpora of biomedical literature and clinical records that reflect decades of systemic inequalities: underrepresentation of non-white populations, pathologization of culturally specific behaviors, and asymmetries in access to care. A scoping review published in Frontiers in Psychology (June 2025) reinforces this dimension by documenting that models trained predominantly on Western principles—such as Cognitive Behavioral Therapy (CBT)—show limited cultural resonance in African and non-Western contexts, compromising engagement and therapeutic efficacy (Frontiers in Psychology, DOI: 10.3389/fpsyg.2025.1715306).

3. Methodological Limitations and the Problem of Clinical Validity

The most comprehensive systematic review published to date on the subject—analyzing 205 studies in psychiatry, psychology, and psychotherapy, conducted between March and July 2025 and published in Electronics (MDPI) in January 2026—points to a critical methodological weakness: the vast majority of LLM performance evaluations are based on small, non-longitudinal datasets from single sessions. This severely compromises the clinical generalizability of the results (Electronics, MDPI, DOI: 10.3390/electronics15030524).

A systematic review published in JMIR Mental Health, analyzing 40 articles (Guo et al., 2024), identified inconsistencies in text generation, the production of factual hallucinations , and the absence of a standardized and benchmarked ethical framework—risks that, in the psychiatric context, can have direct clinical consequences. In the same study, the authors conclude that, in the current state, the risks of the clinical use of LLMs may outweigh the benefits for applications that go beyond initial support or psychoeducation (Guo et al., 2024. DOI: 10.2196/57400).

4. The Epistemological Dimension: Subjectivity as an Irreducible Object

Psychiatry occupies a unique position in medicine: its central clinical object is the patient's subjective experience—a phenomenon that, by its very nature, resists full algorithmic operationalization. Formal psychiatric evaluation, structured in instruments such as the DSM-5-TR or the ICD-11, presupposes not only the recognition of diagnostic criteria, but also the contextualized interpretation of narratives, affects, and behaviors within a clinical relationship.

Linguistic Learning Models (LLMs) operate through statistical modeling of linguistic patterns—an operation fundamentally distinct from clinical understanding. As pointed out by a systematic review in JMIR Mental Health (Wang et al., 2025), IAG presents substantial limitations in the management of complex cases, in the assessment of suicide risk, and in the integration of contextual variables that escape the typed text. The review also points to the risk of the so-called " black box ": the opacity of the models' internal processes makes it impossible to audit the reasoning that generated a clinical recommendation.

This opacity is not merely a technical problem—it is an ethical one. In high-risk contexts, such as managing suicidal crises, the inability to understand and hold accountable the decision-making process of an automated system represents a breach of fundamental principles of medical ethics.

5. The Emerging Regulatory Framework

In June 2025, the American Psychological Association (APA) published the document Ethical Guidance for AI in the Professional Practice of Health Service Psychology —the first of its kind produced by the association. The document, updated in July 2025, establishes that the final clinical decision and ethical responsibility must rest exclusively with the human professional. AI can suggest and support; it cannot diagnose or replace clinical judgment. The document further emphasizes the need for explicit informed consent for the use of AI tools in a therapeutic context (APA, 2025. Available at: apa.org/topics/artificial-intelligence-machine-learning/ethical-guidance-ai-professional-practice ).

In the academic field, Pillay (2025), in an article published in the journal Healthcare (MDPI), proposes an integrated ethical framework based on the codes of the APA, ACA, AMA, and NASW, organized into five pillars: autonomy and informed consent; beneficence and non-maleficence; confidentiality, privacy, and transparency; justice and equity; and professional integrity and accountability. This model represents the state of the art in normative discussion in the field.

6. NEURONAPIS Positioning

NEURONAPIS recognizes that IAG has genuine potential to expand access to mental health care in overburdened health systems—a structural problem that is especially relevant in the Brazilian context. This potential, however, must be rigorously evaluated scientifically, and not celebrated uncritically.

The available data support a clear conclusion: LLMs, in their current state, do not have sufficient evidence of validity, fairness, and safety to be used as stand-alone psychiatric diagnostic tools. Their clinical application, when justified, should be restricted to support functions, initial screening, and psychoeducation, under continuous professional supervision and with clearly established consent and audit structures.

The question that remains open—and which NEURONAPIS proposes as a research agenda—is epistemological: to what extent can systems based on statistical correlation of language contribute to the understanding of an object as unique as psychic suffering? The answer to this question is not found in the technical laboratory. It lies at the intersection of neuroscience, philosophy of mind, and clinical ethics—precisely the territory that this institution proposes to inhabit.

References

Bouguettaya, A., Stuart, E. M., & Aboujaoude, E. (2025). Racial bias in AI-mediated psychiatric diagnosis and treatment: a qualitative comparison of four large language models. npj Digital Medicine , 8, 332. https://doi.org/10.1038/s41746-025-01746-4
Wang, X., Zhou, Y., & Zhou, G. (2025). The Application and Ethical Implication of Generative AI in Mental Health: Systematic Review. JMIR Mental Health , 12, e70610. https://doi.org/10.2196/70610 (PMC12254713)
Omar, M. et al. (2024). Applications of large language models in psychiatry: a systematic review. Frontiers in Psychiatry . https://doi.org/10.3389/fpsyt.2024.1422807 (PMC11228775)
Frontiers in Psychiatry – Digital Mental Health. (2025). Evaluation of large language models on mental health: from knowledge test to illness diagnosis. https://doi.org/10.3389/fpsyt.2025.1646974
Frontiers in Psychology. (2025). Exploring the application boundaries of LLMs in mental health: a systematic scoping review. https://doi.org/10.3389/fpsyg.2025.1715306
Guo, Z., Lai, A., Thygesen, J., Farrington, J., Keen, T., & Li, K. (2024). Large language models for mental health applications: Systematic review. JMIR Mental Health , 11, e57400. https://doi.org/10.2196/57400
Pillay, Y. (2025). Ethical decision-making guidelines for mental health clinicians in the artificial intelligence (AI) era. Healthcare , 13(23), 3057. https://doi.org/10.3390/healthcare13233057
American Psychological Association. (2025). Ethical Guidance for AI in the Professional Practice of Health Service Psychology (updated Jul. 2025). https://www.apa.org/topics/artificial-intelligence-machine-learning/ethical-guidance-ai-professional-practice
Electronics – MDPI. (2026). A Systematic Review of Large Language Models in Mental Health: Opportunities, Challenges, and Future Directions. https://doi.org/10.3390/electronics15030524

Generative AI and Psychiatric Diagnosis: Progress or Threat to Subjectivity?

Recent Posts

Comments