Ultra-Fast Speech Transcription
Powered by WhisperX

A high-performance microservice for accurate, speaker-aligned speech transcription at scale.

System Telemetry

Buffer: active_stream_01.wav

00:42.15

Speaker 1
Speaker 2
Speaker 1
Speaker 3

[00:02.1] S1: Synchronizing neural weights...

[00:05.4] S2: Awaiting user confirmation for ingest.

[00:08.9] S1: Transcription protocol established.

>_

Powerful Features

Everything you need for enterprise-grade speech-to-text processing.

High-Quality Transcription

Built on WhisperX for state-of-the-art accuracy with word-level timestamps and speaker diarization.

Easy Integration

Simple REST API to submit transcription tasks and retrieve results in standard formats (JSON, SRT, VTT).

Scalable Architecture

GPU-accelerated workers managed by Redis queues to handle high-volume transcription workloads.

Customizable Models

Support for different Whisper models and configurations tailored to your specific language or performance needs.

Getting Started

Launch your high-performance transcription microservice in minutes with Docker.

1Clone the repository
2Configure your models in GEMINI.md
3Run docker-compose up --build
$git clone https://github.com/your-repo/whisperx.git
$cd whisperx
$docker-compose up --build -d
# API starts on :8002
# Whisper starts on :9000