NCHLT Speech II Corpus

Title	NCHLT Speech II Corpus
Description	The speech corpus generated from aligned audio samples from National Parliament using Hansard transcriptions are provided in terms of audio and transcriptions. The XML files provide the following metadata for each session: - audio filename - audio orthography - GOP (goodness of pronunciation) score - start time (seconds) - end time (seconds) The audio files are formatted as 16-bit Signed Integer PCM, single channel, and 16kHz sample rate.
Contact name	Karen Calteaux
Contact email	KCalteaux@csir.co.za
Publisher(s)	Meraka Institute, CSIR
License	Creative Commons Attribution 3.0 South Africa (CC BY 3.0 ZA): http://creativecommons.org/licenses/by/3.0/za/
Language(s)	English
Author(s)	Jaco Badenhorst; Febe de Wet; Neil Kleynhans; Thipe Modipa
Contributor	Alfred Tshoane; Georg Schlunz; Stanly Ramunyisi; Raymond Molapo; Nic de Vries
URI	https://hdl.handle.net/20.500.12185/273
Media type	Speech
Type	Data
Media category	Monolingual speech corpora: Annotated
Format extent	5.6 Gb
Version	1
Format medium	Text; 16 kHz; 16 bit; *.wav
Project	NCHLT Speech II
Source	Audio recordings smartphone-collected in non-studio environment; Text prompts from various sources, predominantly from .gov.za (web)
Database	Monolingual Speech Corpora: Annotated
Primary collection	Resource Catalogue
Secondary collection	Resource Index
ISO639 code	eng
Submit date	2018-02-06T09:46:40Z; 2018-03-05T15:23:12Z
Date available	2018-02-06T09:46:40Z; 2018-03-05T15:23:12Z
Date created	2016-05-09

Resource Catalogue [335]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
Resource Index [386]
A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.