DialogGuard: Multi-Agent Psychosocial Safety Evaluation Interface of Sensitive LLM Responses
Published in ACL 2026 System Demonstrations.
Abstract
DialogGuard is an open-source web interface for psychosocial safety assessment in sensitive LLM-mediated interactions. It wraps arbitrary generative models with four LLM-as-a-judge pipelines: single-agent scoring, dual-agent correction, multi-agent debate, and majority voting. The system supports live chat and manual input workflows, visualizes per-dimension risk scores, and provides natural-language rationales for practitioner-facing auditing and supervisory decision-making.
DialogGuard provides a practitioner-facing interface for inspecting prompted LLM agents across five psychosocial safety dimensions. It keeps practitioners in the loop by exposing both risk scores and the reasoning traces behind multi-agent judgments.

The evaluation layer supports single-agent scoring, dual-agent correction, majority voting, and multi-agent debate over shared psychosocial safety rubrics.

The reasoning panel surfaces step-by-step critiques and agreement signals, making the audit trail easier to inspect.
Materials