South African Directory Enquiries (SADE) Name Corpus
Title | South African Directory Enquiries (SADE) Name Corpus |
Description | "Audio and tagged orthographic transcriptions of South African names produced by first-language speakers of 4 languages: Afrikaans, English, isiZulu, Sesotho. Utterances are tagged with speaker language, word language, speaker identity, speaker gender, broad phonemic pronunciation and pronunciation modality ('intended language')." |
Contact name | Marelie H. Davel |
Contact email | marelie.davel@gmail.com |
Publisher(s) | North-West University; Molo Afrika Speech Technologies; IntSyst Labs CC |
License | Creative Commons Attribution 3.0 Unported License (CC BY 3.0): http://creativecommons.org/licenses/by/3.0/ |
Language(s) | Afrikaans; English; isiZulu; Sesotho |
Author(s) | Charl van Heerden; Marelie Davel; Oluwapelumi Giwa; J.W.F Thirion |
Contributor | Anina Lambrechts; Bulelwa Matjene; Etienne Barnard; Marelie H.Davel; Nadia Barnard; Sarina le Roux; and various language practitioners from 'The Translation World'. |
Citation | Thirion, J.W., van Heerden, C., Giwa, O. and Davel, M.H. 2019. The South African directory enquiries (SADE) name corpus. Language Resources and Evaluation, pp.1-30. |
URI | https://hdl.handle.net/20.500.12185/378 |
Media type | Speech |
Type | Data |
Media category | Multilingual speech corpora: annotated |
Format extent | 494 Mb (zipped) |
Version | 1.1 |
Format size | 13h56m09s (40 speakers, each producing 400 utterances, 16,000 utterances in total) |
Format medium | Text; Microsoft Wav files |
Project | South African Directory Enquiry System |
Source | Telephone recordings |
Primary collection | Resource Catalogue |
Secondary collection | Resource Index |
ISO639 code | afr; eng; zul; sot |
Submit date | 2018-02-05T20:21:10Z; 2018-03-05T17:48:33Z |
Date available | 2018-02-05T20:21:10Z; 2018-03-05T17:48:33Z |
Date created | 2015-09-07 |
Files in this item
This item appears in the following Collection(s)
-
Resource Catalogue [335]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture. -
Resource Index [386]
A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.