Development of a digital dictionary for N|uu
The ǂKhomani are an indigenous community of the southern Kalahari, found predominantly in the Northern Cape, South Africa, but also in the southern regions of Namibia and Botswana (Crawhall, 2001). Today they are the speakers of the last living South African !Ui-Taa language, Nǀuu (Sands et al. 2007). Currently, Nǀuu has only two, elderly, living speakers remaining (Jones, 2019). In an attempt to document and preserve the language and to freely share this information, we aim to develop a dictionary featuring the Nǀuu language.
The data that forms the basis of this dictionary stems from a project that started about 20 years ago. At that time, 26 fluent speakers of Nǀuu were identified, who were asked to provide information about their mother tongue. As a result, recordings of over 1,500 lexical items were collected as well as their accompanying translations in Afrikaans, Khoekhoegowab and English. For the Nǀuu entries, IPA transcriptions were created based on the audio recordings of multiple speakers. Furthermore, a wide range of audio recordings by Nǀuu speakers is available. Of these, 1,561 audio files are labeled and referenced as “Dictionary Recording”. They contain one sample recording per lexical entry. Another 4,860 audio files are labeled and referenced as “Recordings” in which the lexical entries are used in context (in the form of a sentence). Finally, approximately 20,000 additional audio files are present in various categories, e.g., diphthong recordings, primer recordings, and targeted lists.
Within the “Digital Dictionary Resources for Nǀuu” project, we will develop two main resources: a physical dictionary as well as a digital dictionary that can be accessed online (through a dictionary portal) as well as on mobile phones (in the form of a mobile app). For this to happen, several steps are essential. Firstly, the existing dataset will need to be cleaned up, making sure that transcriptions are consistent and uniform, and translations are appropriate. Also, descriptive metadata, which describes the dictionary itself (e.g., title, authors, unique identifier); structural metadata, which denotes how the information is represented within the electronic files; administrative metadata, which provides information on file types, access rights, etc., should be properly assigned. Secondly, the clean dataset will be made available in a repository, making sure it adheres to FAIR principles (Findable, Accessible, Interoperable, and Reusable) (Wilkinson et al., 2016). Finally, the content of the dataset will be converted to a format that allows for the incorporation in a database that forms the backend of both a dictionary app (usable on mobile phones) and a dictionary portal (usable using a web browser).
Several specific challenges, mostly related to the accessibility and user-friendliness of the mobile dictionary app, are currently being worked on. For instance, how can we facilitate users to properly search for lexical items that include symbols that represent the click sounds? Typical mobile phone keyboards do not provide these symbols. How can we best create a list of suggestions for lexical items when click symbols are either missing or incorrect in the search query? How can we provide information on semantically related words, so people can browse through the dictionary, not only alphabetically, but also based on words with similar meaning? On a more practical note, we would like to know how we can make audio recordings available through the mobile app while keeping the data usage to a minimum.
Finally, the project plans to provide an educational component. The limited-edition print versions of the physical dictionaries will be made available for elderly community members and those without computer and internet access. Additionally, several demonstration workshops are planned, illustrating the use of the mobile dictionary app, with a priority to mother tongue speakers of Nǀuu or Khoekhoe varieties and their descendants in both the Northern and Western Cape. With this educational component, we hope to bridge the information gap between academics, speakers of endangered languages, and the South African public, which hopefully creates a better environment for understanding of our historical and contemporary context.
Funding
Department of Sport, Arts and Culture Republic of South Africa
Department of Science and Innovation Republic of South Africa
Rhodes University
SADiLaR
History
Usage metrics
Categories
- Applied linguistics and educational linguistics
- Communication technology and digital media studies
- Comparative language studies
- Computational linguistics
- Language studies not elsewhere classified
- Sociolinguistics
- Other language, communication and culture not elsewhere classified
- Lexicography and semantics
- Linguistics not elsewhere classified
- Translation and interpretation studies