CAT-Talk: A Transcription & Analysis Self-Service Tool

on October 21, 2024

Summary

Combining the latest advancements in speech-to-text transcription and speaker-labeling (diarization), we are developing a platform that can quickly produce accurate transcriptions of audio-video files. The goal is to provide a scalable, automated, and secure approach to generating transcriptions and performing analysis on those outputs.

This effort began as a solution to the administrative burden healthcare professional face, specifically in patient note-taking. Evaluation so far has explored over 40 hours of medical conversations between a patient and their provider. The system is adept at handling complex conversations, noisy environments, and overlapping dialog with ease. Above all, the system strives to make manual evaluation more efficient.

CAT-Talk is a secure, web-based AI platform offering fast, speaker-labeled, and time-stamped transcripts, integrating summarization and theme extraction tools on UK-owned NIST-53, HIPAA-compliant infrastructure.

Datasets/Models

Integral to this system are two highly advanced open-source opens:

PyAnnote speaker-diarization-3.1 model for speaker labeling
OpenAI’s whisper-large-v3 for transcription.

Diarization and transcription tasks are executed and combined into a single time-stamped, speaker-labeled transcript using WhisperX. When a user uploads an audio file via our web interface, ClearML manages job scheduling, ensuring simultaneous processing. The result is one unified, time-stamped, speaker-labeled transcript optimized for simple human verification via a user-friendly web interface.

Access

Our transcription services platform is available for experimental use.

Email ai@uky.edu for more information.

A training video is available on YouTube.

Ownership

A few of the collaborative projects using CAAI’s CAT-Talk transcription platform:

SpeakEZ – A collaboration with UK’s Nunn Center for Oral History

Ambient Listening – An exploration of quality improvement in healthcare settings.

Read more about the development Toward Automated Clinical Transcriptions in the paper linked.

Resources

Vaiden Logan is the lead developer on all projects utilizing CAT-Talk.

Categories:

LLM Project Self-Service Tool

Tags:

large language models