EasyDeL Whisper API Server#
The EasyDeL Whisper API Server provides an API compatible with OpenAI’s Whisper API for audio transcription and translation. This allows you to run your own Whisper service using JAX/Flax via EasyDeL.
Installation#
The server requires FastAPI and other dependencies. Install them with:
pip install "fastapi[all]" uvicorn easydel
Running the Server#
Option 1: Using the CLI#
EasyDeL provides a built-in CLI to start the Whisper server:
python -m easydel.inference.vwhisper.cli --model "openai/whisper-large-v3-turbo" --port 8000
Option 2: Using the Server Module#
import easydel as ed
from jax import numpy as jnp
ed.inference.vwhisper.run_server(
model_name="openai/whisper-large-v3-turbo",
host="0.0.0.0",
port=8000,
dtype=jnp.bfloat16
)
Option 3: Using the Example Script#
python examples/whisper_server_example.py --model "openai/whisper-large-v3-turbo" --port 8000
API Endpoints#
The server provides the following OpenAI-compatible endpoints:
Transcription#
POST /v1/audio/transcriptions
Transcribes audio into the same language as the audio.
Parameters:
file: The audio file to transcribe (required)model: Model name (ignored, but required for OpenAI API compatibility)prompt: Optional text to guide the model’s styleresponse_format: The format of the transcript output (json,text,srt,verbose_json,vtt)temperature: Sampling temperature (0-1)language: Language code of the input audiotimestamp_granularities: When returning timestamps, specify granularity (word,segment)
Translation#
POST /v1/audio/translations
Translates audio into English.
Parameters:
file: The audio file to translate (required)model: Model name (ignored, but required for OpenAI API compatibility)prompt: Optional text to guide the model’s styleresponse_format: The format of the transcript output (json,text,srt,verbose_json,vtt)temperature: Sampling temperature (0-1)timestamp_granularities: When returning timestamps, specify granularity (word,segment)
Client Usage#
Python Requests#
import requests
# For transcription
files = {"file": open("audio.mp3", "rb")}
data = {"model": "whisper-large-v3-turbo", "response_format": "json"}
response = requests.post("http://localhost:8000/v1/audio/transcriptions", files=files, data=data)
result = response.json()
print(result["text"])
# For translation
files = {"file": open("audio.mp3", "rb")}
data = {"model": "whisper-large-v3-turbo", "response_format": "json"}
response = requests.post("http://localhost:8000/v1/audio/translations", files=files, data=data)
result = response.json()
print(result["text"])
Using the Example Client#
python examples/whisper_client_example.py audio.mp3 --server http://localhost:8000 --mode transcribe --language en --timestamps
Response Format#
The server returns responses in a format compatible with the OpenAI Whisper API:
{
"text": "Transcribed or translated text",
"segments": [
{
"start": 0.0,
"end": 2.5,
"text": "Text segment with timestamp"
},
...
]
}
Model Support#
The server supports any Whisper model available on Hugging Face, including:
openai/whisper-tinyopenai/whisper-baseopenai/whisper-smallopenai/whisper-mediumopenai/whisper-large-v3-turboopenai/whisper-large-v3
Performance Considerations#
The first request will be slower as the model is loaded and compiled
Subsequent requests will be much faster
Using smaller models can significantly improve speed
Using bfloat16 provides a good balance of speed and accuracy
Compatibility with OpenAI API#
This API implementation is designed to be drop-in compatible with the OpenAI Whisper API. You can use existing clients by just changing the base URL.
Limitations#
Some advanced OpenAI features may not be fully supported
Performance will depend on your hardware (GPU/TPU recommended)