DialogGuard: Multi-Agent Psychosocial Safety Evaluation Interface of Sensitive LLM Responses

Luo, Han; Laban, Guy

Published in ACL 2026 System Demonstrations.

Han Luo

University of Leeds / Southwest Jiaotong University

Guy Laban

Ben-Gurion University of the Negev

DialogGuard: Multi-Agent Psychosocial Safety Evaluation Interface of Sensitive LLM Responses

Abstract

DialogGuard is an open-source web interface for psychosocial safety assessment in sensitive LLM-mediated interactions. It wraps arbitrary generative models with four LLM-as-a-judge pipelines: single-agent scoring, dual-agent correction, multi-agent debate, and majority voting. The system supports live chat and manual input workflows, visualizes per-dimension risk scores, and provides natural-language rationales for practitioner-facing auditing and supervisory decision-making.

DialogGuard provides a practitioner-facing interface for inspecting prompted LLM agents across five psychosocial safety dimensions. It keeps practitioners in the loop by exposing both risk scores and the reasoning traces behind multi-agent judgments.

DialogGuard evaluation mechanisms

The evaluation layer supports single-agent scoring, dual-agent correction, majority voting, and multi-agent debate over shared psychosocial safety rubrics.

DialogGuard reasoning panel

The reasoning panel surfaces step-by-step critiques and agreement signals, making the audit trail easier to inspect.

Materials

PDF Demo BibTeX

@inproceedings{luo2026dialogguard,
  title = {DialogGuard: Multi-Agent Psychosocial Safety Evaluation Interface of Sensitive LLM Responses},
  author = {Han Luo and Guy Laban},
  booktitle = {Proceedings of the 64th Annual Meeting of the Association for Computational Linguistics: System Demonstrations},
  year = {2026}
}