paperarXivTrust 82 · PrimaryPublished 4d agoLive · 3d ago
DialogPII: A multilingual dataset of synthetic dialog transcripts to detect personal information
Conversational data collected in domains such as healthcare or social sciences is a valuable resource for research and automated analysis. However, responsible data sharing requires the detection and removal of personally identifiable and sensitive information to protect individual privacy. To support the development and evaluation of automatic de-identification systems, we present DialogPII, a multilingual dataset of synthetic dialogs and speech-derived transcripts for personal information detection. DialogPII covers eight interaction scenarios (emergency calls, medical anamnesis interviews,
Lineage graph
Paper → model → repo connections mined from source citations (Tier-1 exact match).
