Pro-TeVA: Prototype-based Explainable Tone Recognition for Yoruba
Pro-TeVA: Prototype-based Explainable Tone Recognition for Yoruba
Upload an audio file or record your voice to detect Yoruba tone patterns.
Yoruba Tones:
- High Tone (H) (◌́): Syllable with high pitch
- Low Tone (B) (◌̀): Syllable with low pitch
- Mid Tone (M) (◌): Syllable with neutral/middle pitch
- Space ( | ): Word boundary (detected automatically)
Space Detection: combined
🎧 Audio Examples
Click on an example to load it
🎤 Input Audio
👩 Female Voice (Yof):
👨 Male Voice (Yom):
📝 Tips:
- Speak clearly in Yoruba
- Keep recordings under 10 seconds
- Avoid background noise
- Pause slightly between words for better boundary detection
🎯 Results
About Pro-TeVA:
Pro-TeVA (Prototype-based Temporal Variational Autoencoder) is an explainable neural model for tone recognition.
Unlike black-box models, Pro-TeVA provides transparency through:
- Interpretable F0 (pitch) features
- Visualizable tone prototypes
- F0 reconstruction for explainability
- High performance: 17.74% Tone Error Rate
Model Architecture:
- Feature Extractor: HuBERT (Orange/SSA-HuBERT-base-60k)
- Encoder: 2-layer Bidirectional GRU (512 neurons)
- Variational Autoencoder: Compact latent representations
- Prototype Layer: 10 learnable tone prototypes
- Decoder: F0 reconstruction (VanillaNN)
- Output: CTC-based sequence prediction
Space Detection:
- Method: combined
- Uses F0 contours, silence patterns, and tone duration
- Automatically detects word boundaries in continuous speech
API Access:
- REST API enabled for programmatic access
- Use Gradio client:
pip install gradio_client - See README for full API documentation
Built with ❤️ using SpeechBrain and Gradio
Model Checkpoint: CKPT+2025-10-20+08-19-07+00
Pro-TeVA JSON API
Upload file first via /gradio_api/upload, then pass the returned path here
Pro-TeVA JSON API
Upload file first via /gradio_api/upload, then pass the returned path here
API Documentation
Pro-TeVA provides two API endpoints for programmatic access to tone recognition.
Available Endpoints
| Endpoint | Description | Output Type |
|---|---|---|
/predict |
UI endpoint with visualizations | Text + Plots |
/predict_json |
Pure JSON for APIs | Structured JSON |
JSON API Endpoint (Recommended)
Endpoint: /predict_json
This is the recommended endpoint for programmatic access as it returns pure JSON data.
Input
- audio_file: Audio file (WAV, MP3, FLAC)
- Recommended: 16kHz sample rate, mono
- Max duration: ~10 seconds
Output Schema
{
"success": true,
"tone_sequence": [
{
"index": 1,
"label": "H",
"name": "High Tone",
"symbol": "◌́"
}
],
"tone_string": "H → B → M",
"statistics": {
"total_tones": 3,
"word_boundaries": 1,
"sequence_length": 4,
"high_tones": 1,
"low_tones": 1,
"mid_tones": 1
},
"f0_data": {
"extracted": [120.5, 125.3, ...],
"predicted": [118.2, 123.8, ...],
"length": 100
},
"settings": {
"space_detection_enabled": true,
"space_detection_method": "combined"
}
}
Python Examples
Installation
pip install gradio_client
Basic Usage
from gradio_client import Client
# Connect to Pro-TeVA
client = Client("https://huggingface.co/spaces/Obiang/Pro-TeVA")
# Get JSON response
result = client.predict(
audio_file="path/to/audio.wav",
api_name="/predict_json"
)
# Parse results
print(f"Success: {result['success']}")
print(f"Tones: {result['tone_string']}")
print(f"Statistics: {result['statistics']}")
Batch Processing
from gradio_client import Client
client = Client("https://huggingface.co/spaces/Obiang/Pro-TeVA")
audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"]
for audio_path in audio_files:
result = client.predict(
audio_file=audio_path,
api_name="/predict_json"
)
if result['success']:
print(f"{audio_path}: {result['tone_string']}")
else:
print(f"{audio_path}: Error - {result['error']}")
cURL Examples
Step 1: Submit Request
curl -X POST "https://Obiang-Pro-TeVA.hf.space/call/predict_json" \
-H "Content-Type: application/json" \
-d '{
"data": ["https://example.com/audio.wav"]
}'
Response:
{"event_id": "abc123def456"}
Step 2: Get Results
curl -N "https://Obiang-Pro-TeVA.hf.space/call/predict_json/abc123def456"
Response (Server-Sent Events):
event: complete
data: {"success": true, "tone_sequence": [...], ...}
One-liner with jq
# Submit and get event_id
EVENT_ID=$(curl -s -X POST "https://Obiang-Pro-TeVA.hf.space/call/predict_json" \
-H "Content-Type: application/json" \
-d '{"data": ["audio.wav"]}' | jq -r '.event_id')
# Get results
curl -N "https://Obiang-Pro-TeVA.hf.space/call/predict_json/$EVENT_ID"
JavaScript Example
import { client } from "@gradio/client";
async function predictTones(audioBlob) {
const app = await client("https://huggingface.co/spaces/Obiang/Pro-TeVA");
const result = await app.predict("/predict_json", {
audio_file: audioBlob
});
console.log("Tones:", result.data.tone_string);
console.log("Statistics:", result.data.statistics);
return result.data;
}
Error Handling
Error Response Schema
{
"success": false,
"error": "Error message here",
"traceback": "Full error traceback..."
}
Python Error Handling
from gradio_client import Client
try:
client = Client("https://huggingface.co/spaces/Obiang/Pro-TeVA")
result = client.predict(
audio_file="audio.wav",
api_name="/predict_json"
)
if result['success']:
print(f"Tones: {result['tone_string']}")
else:
print(f"Error: {result['error']}")
except Exception as e:
print(f"Connection error: {str(e)}")
Rate Limits
- Hugging Face Spaces: Standard rate limits apply
- Free tier: Suitable for development and testing
- For high-volume usage: Consider deploying your own instance
Tone Labels Reference
| Index | Label | Name | Symbol |
|---|---|---|---|
| 0 | BLANK | CTC Blank | - |
| 1 | H | High Tone | ◌́ |
| 2 | B | Low Tone | ◌̀ |
| 3 | M | Mid Tone | ◌ |
| 4 | SPACE | Word Boundary | | |
Support
For questions or issues, please open an issue on the repository or check the README for more details.