Pro-TeVA: Prototype-based Explainable Tone Recognition for Yoruba

Pro-TeVA: Prototype-based Explainable Tone Recognition for Yoruba

Upload an audio file or record your voice to detect Yoruba tone patterns.

Yoruba Tones:

  • High Tone (H) (◌́): Syllable with high pitch
  • Low Tone (B) (◌̀): Syllable with low pitch
  • Mid Tone (M) (◌): Syllable with neutral/middle pitch
  • Space ( | ): Word boundary (detected automatically)

Space Detection: combined

🎧 Audio Examples

Click on an example to load it

🎤 Input Audio

👩 Female Voice (Yof):

👨 Male Voice (Yom):

📝 Tips:

  • Speak clearly in Yoruba
  • Keep recordings under 10 seconds
  • Avoid background noise
  • Pause slightly between words for better boundary detection

🎯 Results


About Pro-TeVA:

Pro-TeVA (Prototype-based Temporal Variational Autoencoder) is an explainable neural model for tone recognition.

Unlike black-box models, Pro-TeVA provides transparency through:

  • Interpretable F0 (pitch) features
  • Visualizable tone prototypes
  • F0 reconstruction for explainability
  • High performance: 17.74% Tone Error Rate

Model Architecture:

  • Feature Extractor: HuBERT (Orange/SSA-HuBERT-base-60k)
  • Encoder: 2-layer Bidirectional GRU (512 neurons)
  • Variational Autoencoder: Compact latent representations
  • Prototype Layer: 10 learnable tone prototypes
  • Decoder: F0 reconstruction (VanillaNN)
  • Output: CTC-based sequence prediction

Space Detection:

  • Method: combined
  • Uses F0 contours, silence patterns, and tone duration
  • Automatically detects word boundaries in continuous speech

API Access:

  • REST API enabled for programmatic access
  • Use Gradio client: pip install gradio_client
  • See README for full API documentation

Built with ❤️ using SpeechBrain and Gradio

Model Checkpoint: CKPT+2025-10-20+08-19-07+00

Pro-TeVA JSON API

Upload file first via /gradio_api/upload, then pass the returned path here