Summary
Combining the latest advancements in speech-to-text transcription and speaker-labeling (diarization), we are developing a platform that can quickly produce accurate transcriptions of audio-video files. The goal is to provide a scalable, automated and secure approach to generating transcriptions and performing analysis on those outputs.
This effort began as a solution to the administrative burden healthcare professional face, specifically in patient note taking. Evaluation so far has explored over 40 hours of medical conversations between a patient and their provider. The system is adept at handling complex conversations, noisy environments, and overlapping dialog with ease. Above all, the system strives to make manual evaluation more efficient.
CAT-Talk is a secure, web-based AI platform offering fast, speaker-labeled, and time-stamped transcripts, integrating summarization and theme extraction tools on UK-owned NIST-53, HIPAA-compliant infrastructure.
Access
Our transcription services platform is available for experimental use. For more information, please reach out to ai@uky.edu.
Available Models
Integral to this system are two highly advanced open-source opens:
- PyAnnote speaker-diarization-3.1 model for speaker labeling
- OpenAI’s whisper-large-v3 for transcription.
Diarization and transcription tasks are executed and combined into a single time-stamped, speaker-labeled transcript using WhisperX. When a user uploads an audio file via our web interface ClearML manages job scheduling ensuring simultaneous processing. The result is one unified, time-stamped, speaker-labeled transcript optimized for simple human verification via a user-friendly web-interface.
Collaborative Models using CAT-Talk
A few of the collaborative projects using CAAI’s CAT-Talk transcription platform:
SpeakEZ – A collaboration with UK’s Nunn Center for Oral History
Ambient Listening – An exploration of quality improvement in healthcare settings.
Resources
Read more about the development Toward Automated Clinical Transcriptions in the paper linked.
A training video is available on YouTube.