The texts in this corpus have been annotated at two different levels: textually and linguistically.
The textual mark-up was done by the HSMS and then converted into the format of the Text Encoding Initiative (TEI). The original HSMS transcription manual can be found here. In the conversion to TEI, all the information in the transcription was kept and converted into their TEI equivalents.
Each text in the corpus has been tokenized (split into words), and each word was adorned with a Part-of-Speech label (POS) and a lemmatized form, as well as a normalized form. For improved readability, the normalized form follows the current Spanish spelling. The tags used for the POS labels is described here.