NCHLT Siswati Auxiliary Speech Corpus

Febe de Wet; Laura Martinus; Jaco Badenhorst

Title	NCHLT Siswati Auxiliary Speech Corpus
Description	The corpus contains orthographically transcribed broadband speech in each of South Africa's eleven official languages. Transcriptions are provided in XML format.
Contact name	Karen Calteaux
Contact email	KCalteaux@csir.co.za
Publisher(s)	CSIR Meraka Institute; North-West University
License	Creative Commons Attribution 3.0 Unported (CC BY 3.0): https://creativecommons.org/licenses/by/3.0/legalcode
Language(s)	Siswati
Author(s)	Febe de Wet; Laura Martinus; Jaco Badenhorst
Contributor	Charl van Heerder; Etienne Barnard; Marelie Davel; Alta de Waal
Subject	Siswati; Speech corpora; Transcribed
Citation	Jaco Badenhorst, Laura Martinus and Febe de Wet, "BLSTM harvesting of auxiliary NCHLT speech data", In Proceedings of SAUPEC/ROBMECH/PRASA 2019, Bloemfontein, South Africa, January 2019.; Etienne Barnard, Marelie H. Davel, Charl van Heerden, Febe de Wet and Jaco Badenhorst, "The NCHLT Speech Corpus of the South African languages", In Proc. 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), St Petersburg, Russia, May 2014.; Charl van Heerden, Marelie H. Davel and Etienne Barnard, "The semi-automated creation of stratified speech corpora", In Proc. Pattern Recognition Association of South Africa annual symposium (PRASA), Johannesburg, South Africa, Dec 2013, pp. 115-119.; N.J. de Vries, M.H. Davel, J. Badenhorst, W.D. Basson, F. de Wet, E. Barnard and A. de Waal, "A smartphone-based ASR data collection tool for under-resourced languages", Speech Communication, Volume 56, January 2014, pp. 119-131.; Marelie H. Davel, Charl van Heerden, and Etienne Barnard, "Validating Smartphone-Collected Speech Corpora", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 68-75.; C van Heerden, M.H. Davel and E. Barnard, "Medium-Vocabulary Speech Recognition for Under-Resourced Languages", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 146-151.; J. Badenhorst, A. De Waal and F. de Wet, "Quality measurements for mobile data collection in the developing world", in In Proc. 3rd International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU), Cape Town, South Africa, May 2012, pp. 139-145.
URI	https://hdl.handle.net/20.500.12185/515
Media type	Speech
Media category	Annotated Monolingual Speech Corpus
Format extent	Aux 1: 78:48:56 Aux 2: 167:42:11
Version	1
Format size	Aux 1: 6.17 GB, Aux 2: 13.1 GB
Format medium	N/A
Project	NCHLT Speech
Primary collection	Resource Catalogue
Secondary collection	Resource Index
ISO639 code	ssw
Submit date	2019-07-17T06:49:56Z
Date available	2019-07-17T06:49:56Z
Date created	2019-06-01

Files in this item

Name:: ssw-aux1.zip
Size:: 6.179Gb
Format:: application/zip
MD5:: 5bc57100e3c43d9d0a0dc00a062a206a

Download

Name:: ssw-aux2.zip
Size:: 13.14Gb
Format:: application/zip
MD5:: 73c14e956ce284d6d2c9e2e0df7915eb

Download

This item appears in the following Collection(s)

Resource Catalogue [335]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture.
Resource Index [386]
A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.

Show simple item record

NCHLT Siswati Auxiliary Speech Corpus

Files in this item

License agreement

License agreement

This item appears in the following Collection(s)