CAT-Talk: A Transcription & Analysis Self-Service Tool

on October 21, 2024

Summary

Combining the latest advancements in speech-to-text transcription and speaker-labeling (diarization), we are developing a platform that can quickly produce accurate transcriptions of audio-video files. The goal is to provide a scalable, automated and secure approach to generating transcriptions and performing analysis on those outputs.

This effort began as a solution to the administrative burden healthcare professional face, specifically in patient note taking. Evaluation so far has explored over 40 hours of medical conversations between a patient and their provider. The system is adept at handling complex conversations, noisy environments, and overlapping dialog with ease. Above all, the system strives to make manual evaluation more efficient.

CAT-Talk is a secure, web-based AI platform offering fast, speaker-labeled, and time-stamped transcripts, integrating summarization and theme extraction tools on UK-owned NIST-53, HIPAA-compliant infrastructure.

Access

Our transcription services platform is available for experimental use. For more information, please reach out to ai@uky.edu.

Available Models

Integral to this system are two highly advanced open-source opens:

PyAnnote speaker-diarization-3.1 model for speaker labeling
OpenAI’s whisper-large-v3 for transcription.

Diarization and transcription tasks are executed and combined into a single time-stamped, speaker-labeled transcript using WhisperX. When a user uploads an audio file via our web interface ClearML manages job scheduling ensuring simultaneous processing. The result is one unified, time-stamped, speaker-labeled transcript optimized for simple human verification via a user-friendly web-interface.

Collaborative Models using CAT-Talk

A few of the collaborative projects using CAAI’s CAT-Talk transcription platform:

SpeakEZ – A collaboration with UK’s Nunn Center for Oral History

Ambient Listening – An exploration of quality improvement in healthcare settings.

Resources

Read more about the development Toward Automated Clinical Transcriptions in the paper linked.

A training video is available on YouTube.

Categories:

LLM Project Self-Service Tool

Tags:

large language models