THE CORPUS

The CorpAfroAs corpus is archived in the form of interrelated wav, IMDI, and eaf files. It will be accessible in beta version to the public in December 2012 and in final version in the fall of 2013.

Corpafroas Samples

You are welcome to use the CorpAfroAs Format and Tools for your data., Please quote the CorpAfroAs project when you use our annotation scheme and/or software and/or procedures.Thank you.


tx: the text in broad phonetics, into 'phonological' words (with assimilations, sandhis etc.).
mot: phonological (i.e. 'morphophonological' as compared to the broad phonetics one) transcription into grammatical words, with no morphemic separators.
mb: morphophonological transcription into morphemes (one cell per morpheme); - goes in the cell that contains the affix, = goes in the cell that contains the clitic.
ge: a gloss for each morpheme cell. The glossing is into grammatical category labels, and is based on the Leipzig Glossing Rules, expanded by CorpAfroAs. Other relevant information (parts of speech, verb class, syncretism phenomena, etc.) goes into the tier rx.
rx: "tier x" : all information relevant and necessary for retrieval purposes. If there is more than one label per cell, we separate them with a slash.
ft: free translation.

Sample in ELAN

The following Beja sample can be viewed using the software ELAN, or ELAN-CorpA

The list of abbreviations and symbols can be found on the Glosses page.
When searched on the corpus site, it will appears as:
Sample sound file : BEJ_MV_NARR_11.wav
Sample Elan file : BEJ_MV_NARR_11.eaf