Now showing items 171-180 of 386

    • African Wordnet: Tshivenda 1.0 

      African Wordnet Project (UNISA, 2017-06-20) ~ Resource Catalogue
      Developed using the expand model with Princeton WordNet 2.0 as basis. Each wordnet contains synsets with at least the following fields:\nWord form (lemma; ...
    • African Wordnet: Sesotho sa Leboa 1.0 

      African Wordnet Project (UNISA, 2017-06-20) ~ Resource Catalogue
      Developed using the expand model with Princeton WordNet 2.0 as basis. Each wordnet contains synsets with at least the following fields:\nWord form (lemma; ...
    • NCHLT Optical Character Recognition for South African Languages 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2017-02-23) ~ Resource Catalogue
      An OCR system is an application that enables one to convert scanned paper documents into editable and searchable texts. The engine analyses the structure ...
    • Lwazi Setswana TTS corpus 

      Daniel van Niekerk, et al. (Meraka Institute, CSIR, 2013-03-27) ~ Resource Catalogue
      Orthographic and phonemically aligned transcriptions
    • Autshumato English-Setswana Parallel Corpora 

      Cindy McKellar (North-West University; Centre for Text Technology (CTexT), 2016-10-28) ~ Resource Catalogue
      Aligned English-Setswana parallel corpus. This set contains data that was translated by professional translators, data that was sourced as translated ...
    • NCHLT English Text Corpora 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2016-09-09) ~ Resource Catalogue
      Collection consisting of a clean corpus, lexicon, frequency list and named-entity lists developed during the NCHLT Text project.
    • Bukantswe Sesotho-English Bilingual Dictionary 

      J. A. K. Olivier (North-West University, 2016-07-07) ~ Resource Catalogue
      Bilingual English-Sesotho dictionary. This dataset represents a basic Sesotho dictionary compiled in the creation of a Sesotho language resource. The ...
    • Autshumato Setswana Monolingual Corpora 

      Cindy McKellar (North-West University; Centre for Text Technology (CTexT), 2016-10-28) ~ Resource Catalogue
      Setswana monolingual corpus as a deliverable of the Autshumato project. The data is given as a UTF-8 text file; with each sentence on a new line.
    • NCHLT South African Language Identifier 

      Martin Puttkammer, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue
      A graphical user interface and command line tool to automatically classify a document, paragraph, sentence or phrase as one of the eleven official South ...
    • NCHLT Sepedi Phrase Chunk Annotated Corpus 

      D.J. Prinsloo, et al. (North-West University; Centre for Text Technology (CTexT), 2016-04-29) ~ Resource Catalogue
      Phrase chunk annotated data for the NCHLT Text Resource Development: Phase II Project. The phrase chunk annotated data is a subset of the 50,000 tokens ...