Corpus Ladino

EN | ES

MenĂº principal


Powered by TEITOK
© Maarten Janssen, 2014

Corpus Ladino

Welcome to the CoDiAJe - the Annotated Diachronic Corpus of Judeo-Spanish.

CoDiAJe is a structured multi-genre diachronic corpus that includes text samples, classified by types, period and geographical origins, from the 16th century to the 21th century enriched automatically or semi-automatically with different types of linguistic annotations.

CoDiAJe is also accompanied by metadata providing information on the authors (birth place, place of residence, social status, etc.) and on the documents (text type, date, place, alphabet, print/manuscript, original/translation).

The digital edition workflow in CoDiAJe is composed of two main tasks: the linguistic processing and annotation of the documents using various NLP tools (Freeling: http://nlp.lsi.upc.edu/freeling/ Neotag: http://www.lrec-conf.org/proceedings/lrec2012/summaries/1098.html), the encoding of metadata and linguistic annotation incorporated in the texts using XML to be visualized and searched via TEITOK.