Afribooms Afrikaans Dependency Treebank
Title | Afribooms Afrikaans Dependency Treebank |
Description | This is the annotated corpus developed for Afrikaans for the Afribooms project. The corpus includes annotations for lemma, part-of-speech (POS) and dependency relations. Lemma and POS information originates from the source corpus used, in this project only the dependency tags and relations were added. |
Contact name | Daniel van Niekerk |
Contact email | daniel.vanniekerk@nwu.ac.za |
Publisher(s) | North-West University; Centre for Text Technology (CTexT); Katholieke Universiteit Leuven (Belgium) |
License | Creative Commons Attribution 2.5 South Africa License: http://creativecommons.org/licenses/by/2.5/za/legalcode |
Language(s) | Afrikaans |
Author(s) | Daniel van Niekerk |
Citation | L. Augustinus, P. Dirix, D.R. van Niekerk, I. Schuurman, V. Vandeghinste, F. van Eynde and G.B. van Huyssteen, "AfriBooms: an online treebank for Afrikaans," in Proceedings of Language Resources and Evaluation Conference (LREC), pp 677-682, Portorož, Slovenia, May 2016. |
URI | https://hdl.handle.net/20.500.12185/282 |
ISLRN | 798-848-095-593-5 |
Media type | Text |
Type | Data |
Media category | Monolingual text corpora: Annotated |
Format extent | 1.5 Mb (zipped) |
Version | 1 |
Format size | Train set: 1663 sentences, 43895 words. Test set: 271 sentences, 5381 words |
Format medium | UTF8; FoLiA XML (http://ilk.uvt.nl/folia) format |
Software requirements | XML-capable software |
Source | Government Documents |
Stratum | Government domain text |
Database | Monolingual Text Corpora: Annotated |
Primary collection | Resource Catalogue |
Secondary collection | Resource Index |
ISO639 code | afr |
Submit date | 2018-02-05T20:20:56Z; 2018-03-05T17:44:42Z |
Date available | 2018-02-05T20:20:56Z; 2018-03-05T17:44:42Z |
Date created | 2015-02-10 |
Files in this item
This item appears in the following Collection(s)
-
Resource Catalogue [335]
A collection of language resources available for download from the RMA of SADiLaR. The collection mostly consists of resources developed with funding from the Department of Arts and Culture. -
Resource Index [386]
A collection of language resource metadata mostly collected during the NHN funded technology audit of 2009, as well as the SADiLaR technology audit of 2018. Not all resources in this collection are available for download.