DialogGuard is a web interface for auditing LLM responses in sensitive conversational contexts. It wraps arbitrary generative models with reusable LLM-as-a-judge pipelines and presents practitioner-facing risk scores and natural-language rationales.
The system supports manual input and live chat workflows, evaluates responses across psychosocial safety dimensions, and exposes multiple evaluation mechanisms including single-agent scoring, dual-agent correction, majority voting, and multi-agent debate.
DialogGuard interface for entering dialogue context, comparing evaluation mechanisms, and inspecting per-dimension risk scores.
Reasoning panel exposing step-by-step multi-agent safety judgments.
[My role in this project]
First author; system design, interface implementation, evaluation, study analysis, and writing.