Pro-TeVA: Prototype-based Explainable Tone Recognition for Yoruba

Upload an audio file or record your voice to detect Yoruba tone patterns.

Yoruba Tones:

High Tone (H) (◌́): Syllable with high pitch
Low Tone (B) (◌̀): Syllable with low pitch
Mid Tone (M) (◌): Syllable with neutral/middle pitch
Space ( | ): Word boundary (detected automatically)

Space Detection: combined

🎧 Audio Examples

Click on an example to load it

🎤 Input Audio

Record or Upload Audio

👩 Female Voice (Yof):

👨 Male Voice (Yom):

📝 Tips:

Speak clearly in Yoruba
Keep recordings under 10 seconds
Avoid background noise
Pause slightly between words for better boundary detection

🎯 Results

Predicted Tone Sequence

Extracted vs Predicted F0

About Pro-TeVA:

Pro-TeVA (Prototype-based Temporal Variational Autoencoder) is an explainable neural model for tone recognition.

Unlike black-box models, Pro-TeVA provides transparency through:

Interpretable F0 (pitch) features
Visualizable tone prototypes
F0 reconstruction for explainability
High performance: 17.74% Tone Error Rate

Model Architecture:

Feature Extractor: HuBERT (Orange/SSA-HuBERT-base-60k)
Encoder: 2-layer Bidirectional GRU (512 neurons)
Variational Autoencoder: Compact latent representations
Prototype Layer: 10 learnable tone prototypes
Decoder: F0 reconstruction (VanillaNN)
Output: CTC-based sequence prediction

Space Detection:

Method: combined
Uses F0 contours, silence patterns, and tone duration
Automatically detects word boundaries in continuous speech

API Access:

REST API enabled for programmatic access
Use Gradio client: pip install gradio_client
See README for full API documentation

Built with ❤️ using SpeechBrain and Gradio

Model Checkpoint: CKPT+2025-10-20+08-19-07+00

Pro-TeVA Architecture

Pro-TeVA JSON API

Upload file first via /gradio_api/upload, then pass the returned path here

Audio File Path

Prediction Result

Pro-TeVA JSON API

Upload file first via /gradio_api/upload, then pass the returned path here

Audio File Path

Prediction Result

API Documentation

Pro-TeVA provides two API endpoints for programmatic access to tone recognition.

Available Endpoints

Endpoint	Description	Output Type
`/predict`	UI endpoint with visualizations	Text + Plots
`/predict_json`	Pure JSON for APIs	Structured JSON

JSON API Endpoint (Recommended)

Endpoint: /predict_json

This is the recommended endpoint for programmatic access as it returns pure JSON data.

Input

audio_file: Audio file (WAV, MP3, FLAC)
Recommended: 16kHz sample rate, mono
Max duration: ~10 seconds

Output Schema

{
  "success": true,
  "tone_sequence": [
    {
      "index": 1,
      "label": "H",
      "name": "High Tone",
      "symbol": "◌́"
    }
  ],
  "tone_string": "H → B → M",
  "statistics": {
    "total_tones": 3,
    "word_boundaries": 1,
    "sequence_length": 4,
    "high_tones": 1,
    "low_tones": 1,
    "mid_tones": 1
  },
  "f0_data": {
    "extracted": [120.5, 125.3, ...],
    "predicted": [118.2, 123.8, ...],
    "length": 100
  },
  "settings": {
    "space_detection_enabled": true,
    "space_detection_method": "combined"
  }
}

Python Examples

Installation

pip install gradio_client

Basic Usage

from gradio_client import Client

# Connect to Pro-TeVA
client = Client("https://huggingface.co/spaces/Obiang/Pro-TeVA")

# Get JSON response
result = client.predict(
    audio_file="path/to/audio.wav",
    api_name="/predict_json"
)

# Parse results
print(f"Success: {result['success']}")
print(f"Tones: {result['tone_string']}")
print(f"Statistics: {result['statistics']}")

Batch Processing

from gradio_client import Client

client = Client("https://huggingface.co/spaces/Obiang/Pro-TeVA")

audio_files = ["audio1.wav", "audio2.wav", "audio3.wav"]

for audio_path in audio_files:
    result = client.predict(
        audio_file=audio_path,
        api_name="/predict_json"
    )

    if result['success']:
        print(f"{audio_path}: {result['tone_string']}")
    else:
        print(f"{audio_path}: Error - {result['error']}")

cURL Examples

Step 1: Submit Request

curl -X POST "https://Obiang-Pro-TeVA.hf.space/call/predict_json" \
  -H "Content-Type: application/json" \
  -d '{
    "data": ["https://example.com/audio.wav"]
  }'

Response:

{"event_id": "abc123def456"}

Step 2: Get Results

curl -N "https://Obiang-Pro-TeVA.hf.space/call/predict_json/abc123def456"

Response (Server-Sent Events):

event: complete
data: {"success": true, "tone_sequence": [...], ...}

One-liner with jq

# Submit and get event_id
EVENT_ID=$(curl -s -X POST "https://Obiang-Pro-TeVA.hf.space/call/predict_json" \
  -H "Content-Type: application/json" \
  -d '{"data": ["audio.wav"]}' | jq -r '.event_id')

# Get results
curl -N "https://Obiang-Pro-TeVA.hf.space/call/predict_json/$EVENT_ID"

JavaScript Example

import { client } from "@gradio/client";

async function predictTones(audioBlob) {
  const app = await client("https://huggingface.co/spaces/Obiang/Pro-TeVA");

  const result = await app.predict("/predict_json", {
    audio_file: audioBlob
  });

  console.log("Tones:", result.data.tone_string);
  console.log("Statistics:", result.data.statistics);

  return result.data;
}

Error Handling

Error Response Schema

{
  "success": false,
  "error": "Error message here",
  "traceback": "Full error traceback..."
}

Python Error Handling

from gradio_client import Client

try:
    client = Client("https://huggingface.co/spaces/Obiang/Pro-TeVA")
    result = client.predict(
        audio_file="audio.wav",
        api_name="/predict_json"
    )

    if result['success']:
        print(f"Tones: {result['tone_string']}")
    else:
        print(f"Error: {result['error']}")

except Exception as e:
    print(f"Connection error: {str(e)}")

Rate Limits

Hugging Face Spaces: Standard rate limits apply
Free tier: Suitable for development and testing
For high-volume usage: Consider deploying your own instance

Tone Labels Reference

Index	Label	Name	Symbol
0	BLANK	CTC Blank	-
1	H	High Tone	◌́
2	B	Low Tone	◌̀
3	M	Mid Tone	◌
4	SPACE	Word Boundary	\|

Support

For questions or issues, please open an issue on the repository or check the README for more details.