ArVoice Dataset

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis

Creative Commons License

  • 84 hours
  • 7 human speakers (train/test), 4 synthetic speakers (train/test)
  • 2.5K+ unique utterances
  • TTS, Diacritic Restoration

ArVoice   HF Dataset   ArVoice @inproceedings{coming soon. }

ArVoice is a multi-speaker Modern Standard Arabic (MSA) speech corpus with fully diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection. ArVoice comprises: (1) a new professionally recorded set from $6$ voice talents with diverse demographics, (2) a modified subset of the Arabic Speech Corpus; and (3) high-quality synthetic speech from $2$ commercial systems. The complete corpus consists of a total of $83.52$ hours of speech across $11$ voices; around $10$ hours consist of human voices from $7$ speakers.The modified subset and full synthetic subset are available on HuggingFace. To access the new professionally recorded subset, sign this agreement . If you use the dataset or transcriptions provided in Huggingface, place cite the paper.