Lagos-NWU Yoruba Speech Corpus

Daniel van Niekerk; Etienne Barnard; Oluwapelumi Giwa; Azeez Sosimi

Title	Lagos-NWU Yoruba Speech Corpus
Description	This speech corpus consisting of 16 female speakers and 17 male speakers was recorded in Lagos, Nigeria for the purpose of speech recognition research. Each speaker recorded about 130 utterances read from short texts selected for phonetic coverage. Recordings were done using a microphone connected to a laptop computer in a quiet office environment.
Contact name	Daniel van Niekerk
Contact email	daniel.vanniekerk@nwu.ac.za
Publisher(s)	North-West University; Centre for Text Technology (CTexT); University of Lagos (Nigeria)
License	Creative Commons Attribution 2.5 South Africa License: http://creativecommons.org/licenses/by/2.5/za/legalcode
Language(s)	Yoruba
Author(s)	Daniel van Niekerk; Etienne Barnard; Oluwapelumi Giwa; Azeez Sosimi
URI	https://hdl.handle.net/20.500.12185/431
ISLRN	573-526-122-515-8
Media type	Speech
Type	Data
Media category	Monolingual speech corpora: Annotated
Format extent	268 Mb (zipped)
Version	1
Format size	Number of speakers: 33, Number of utterances: 4316, Audio length: 165 mins. (including non-speech segments) Per speaker: approx. 130 utterances amounting to approx. 5 minutes of audio
Format medium	UTF8; UTF-8 encoded Unicode text; RIFF-WAVE 16-bit PCM samples at 16kHz sampling rate
Source	Web; Magazines; Literature and student reports; Audio recordings (normal office environment)
Stratum	16 female speakers and 17 male speakers recorded in Lagos, Nigeria
Primary collection	Resource Catalogue
Secondary collection	Resource Index
ISO639 code	yor
Submit date	2018-02-05T20:20:56Z; 2018-03-05T17:51:10Z
Date available	2018-02-05T20:20:56Z; 2018-03-05T17:51:10Z
Date created	2015-02-06

Files in this item

Name:: yorubaspeechcorpus.zip
Size:: 267.5Mb
Format:: application/zip
MD5:: 1fe4be6a91524db8ea4f7032760a9c1d

Download

This item appears in the following Collection(s)

Resource Catalogue [335]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
Resource Index [386]
A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Show simple item record

Lagos-NWU Yoruba Speech Corpus

Files in this item

License agreement

This item appears in the following Collection(s)