Linguistically enriched corpora for conjunctively written South African languages

Title	Linguistically enriched corpora for conjunctively written South African languages
Description	This resource contains linguistically annotated data for four official South African languages with a conjunctive orthography from the Nguni family (isiNdebele, isiXhosa, isiZulu and Siswati) as well as English. The data set is parallel for all five languages and the Nguni languages have been annotated for three different types of linguistic information: morphology, part-of-speech and lemmas. We have also included the protocols and tagsets used during annotation.
Contact name	Tanja Gaustad
Contact email	tanja.gaustad@nwu.ac.za
Publisher(s)	North-West University, Centre for Language Technology (CTexT)
License	CC BY 4.0 - https://creativecommons.org/licenses/by/4.0/
Language(s)	English; isiNdebele; isiXhosa; isiZulu; Siswati
Author(s)	Puttkammer, Martin; Gaustad, Tanja
Contributor	Pienaar, Wikus; du Toit, Jaco; Gent, Sunny
Subject	Nguni languages; POS; Morphology; Lemma; Parallel data
URI	https://hdl.handle.net/20.500.12185/546
Media type	Text
Media category	Parallel multilingual annotated text corpus
Format extent	min. 50'000 tokens per language
Version	1.0
Format size	10Mb
Format medium	N/A
Project	Linguistic corpus enrichment for conjunctively written South African languages
Submit date	2021-09-30T12:41:11Z
Date available	2021-09-30T12:41:11Z
Date created	2021-09
Verification status	Level 0

Resource Index [387]
A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.