Speech Lab

ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis

84 hours
7 human speakers (train/test), 4 synthetic speakers (train/test)
2.5K+ unique utterances
TTS, Diacritic Restoration

Paper Dataset ArVoice @inproceedings{toyin2025arvoice, title={ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis}, author={Toyin, Hawau Olamide and Marew, Rufael and Alblooshi, Humaid and Magdy, Samar M and Aldarmaki, Hanan}, booktitle={Proc. Interspeech 2025}, year={2025} }

ArVoice is a multi-speaker Modern Standard Arabic (MSA) speech corpus with fully diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection. ArVoice comprises: (1) a new professionally recorded set from $6$ voice talents with diverse demographics, (2) a modified subset of the Arabic Speech Corpus; and (3) high-quality synthetic speech from $2$ commercial systems. The complete corpus consists of a total of $83.52$ hours of speech across $11$ voices; around $10$ hours consist of human voices from $7$ speakers.The modified subset and full synthetic subset are available on HuggingFace. To access the new professionally recorded subset, sign this agreement . If you use the dataset or transcriptions provided in Huggingface, place cite the paper.