Textos Andinos

Betelgeuse - The Armpit of Orion - Or how Betelgeuse Got it’s Name.

The etymology of this famous star has been garbled by ignorance and disregard. Wikipedia sums up the current thinking well:

The last part of the name, “-elgeuse”, comes from the Arabic الجوزاء (al-Jauzā’), a historical Arabic name of the constellation Orion, a feminine name in old Arabian legend, and of uncertain meaning. Because جوز j-w-z, the root of jauzā’, means “middle”, al-Jauzā’ roughly means “the Central One”. The modern Arabic name for Orion is الجبار al-Jabbār (“the Giant”), although the use of الجوزاء al-Jauzā’ in the name of the star has continued.The full name is a corruption of the Arabic يد الجوزاء Yad al-Jauzā’ meaning “the Hand of al-Jauzā”, i.e., Orion.

European mistransliteration into medieval Latin led to the first character y:
, with two dots underneath) being misread as a b: , with only one dot underneath. During the Renaissance, the star’s name was written as: بيت الجوزاء Bait al-Jauzā’ (“house of Orion”) or: بط الجوزاء Baţ al-Jauzā’, incorrectly thought to mean “armpit of Orion” (a true translation of “armpit” would be: ابط, transliterated as Ibţ). This led to the modern representation of Betelgeuse

And that’s how one of our most famous stars (no, not Michael Keaton - the other Betelgeuse), ended up as the armpit of Orion. Hopefully, I will make different, not similar, mistakes!

Textos Andinos - An Introduction

Textos Andinos, a two-volume book set written by Marti Pärssinen and Jukka Kiviharju, is a translation of original court documents from the Spanish archives, where the plaintiffs used khipus in their court recitations. To quote the authors:

Synopsis of Andean Texts
The first Spanish chroniclers regularly mentioned that the Incas memorized important matters of the towns and of the Inca state activity through the use of curious knots tied in colored cords. In fact, among the Andean settlers, these ropes or khipus continued in common use until 1583, the year in which the III Limense Council decreed their total prohibition. Since then, scholars have been discussing whether the khipus system was a type of writing or merely a mnemonic system. Surprisingly, the texts found in the archives have received very little attention in the general discussion, even though these texts are based on the knots and ropes referred to. Archival texts make up one of the most authentic and important sources on the indigenous history of the Andean countries.

In the Andean Texts project, the former director of the Ibero-American Institute of Finland, Martti Pärssinen, and the researcher Jukka Kiviharju have collected, transcribed and analyzed numerous texts based on the khipus. They have located the texts in various archives (for example, in the General Archive of the Indies in Seville, in the Departmental Archive in Cuzco) and published sources. Spanish transcriptions and translations of the khipus clearly demonstrate that writing by means of knotted cords was a possible activity practiced in the Andean area. The results of the analysis support the hypothesis that most of the khipus functioned as an ideographic system, intelligible, without prior knowledge of any particular language, but there is also evidence that the texts included phonetic elements, especially in the codification of names of people and place names.

Within the framework of the project, two works have been published, the first of which includes 22 texts or memoirs based on the khipus and their presentations. The second volume includes 43 texts.

These texts were the subject of a Bachelor’s thesis project at Harvard, by Manuel Medrano:

This thesis analyzes the Textos Andinos, a compilation of sixteenth-century Spanish transcriptions of indigenous narrations of khipus—knotted-string recording devices used in the Inka Empire for recording information. I compile the largest digitized and syntactically-annotated corpus of khipu transcriptions to date from the Textos. Textual interpretation is employed to suggest an exegetical typology of khipu transcriptions. I apply Ascher and Ascher’s (1997) concept of “insistence” to illuminate the idiosyncrasies of the texts. The output of the close reading—a primordial division of 72 khipu transcriptions—is subjected to exploratory multivariate analysis, based in corpus linguistics, to suggest a statistical typology of the corpus. Chronology and the recording of currency emerge as the most significantly distinguishable typological categories for describing khipu narration in the early colonial Andes. A significant differentiation is found in the essential narrative structures of pre- and postconquest khipu transcriptions. Novel statistical support is offered for Urton’s (1998) hypothesis: postconquest khipu narrations were characterized by attenuated clauses and enumerated lists, constituting a flattening of the expressive capacity of khipus following the Spanish conquest. I offer formal principles for a Khipu Transcription Corpus (KTC)—a novel online repository of early colonial khipu transcriptions. Following these principles, it is argued that aggregate analysis of the texts in a corpus framework establishes the enabling infrastructure for a statistically-informed khipu transcription insistence.

From Manuel, I have recieved the original XML files used in his thesis. The XML files are word based elements, where each word consists of: - The index/location in the document - The modernized Spanish text - The *lemmatized- text (soft instead of softly, run instead of running, etc.) - The part-of-speech/POS (noun, verb, etc.).

Each word has been translated into modern Spanish, although they have not been translated into modern Quechua. For example Quechua wak’a- (I’ll loosely translate it as a “temple”), Spanish’icized as huaca, stayed in its original form as guaca. Because I think of a tasty fruit when I see guaca, I’ve changed all occurences of guaca to huaca. Many more typos and fixes remain. Ceques, the Quechua word for line- and a metaphor for a road name/number, are misspelled as many ways as possible in the corpus LOL, my favorite being czeke.

For the rest of this study, when needed to avoid ambiguity, I will call these Textos Andinos documents, recitations.

A Browser For Text Recitations

To see and read the text recitations, two browsers have been built:

  • The WordCloud Image Browser with the documents sorted by “type” (see LDA analyses below).
  • The Recitation Browser, a conventional columnar with the documents sorted as you wish by columns such as year, number of measured nouns, etc.

Either of these browsers will take you to a more in-depth page describing each recitation.


Initially, we can use conventional data science NLP techniques to analyze and cluster the recitations. However, the goal is to build a bridge between recitations and actual corded khipu!

I suppose there are a million ways to do this, and all of them seem hard…

Problems worth of attack, prove their worth by hitting back - Piet Hein - Grooks 1

The following analyses help understand the overall nature of the Textos Andinos…