Importing The Harvard Khipu Database (KDB)



Betelgeuse - The Armpit of Orion - Or how Betelgeuse Got it’s Name.

The etymology of this famous star has been garbled by ignorance and disregard. Wikipedia sums up the current thinking well:

The last part of the name, “-elgeuse”, comes from the Arabic الجوزاء (al-Jauzā’), a historical Arabic name of the constellation Orion, a feminine name in old Arabian legend, and of uncertain meaning. Because جوز j-w-z, the root of jauzā’, means “middle”, al-Jauzā’ roughly means “the Central One”. The modern Arabic name for Orion is الجبار al-Jabbār (“the Giant”), although the use of الجوزاء al-Jauzā’ in the name of the star has continued.The full name is a corruption of the Arabic يد الجوزاء Yad al-Jauzā’ meaning “the Hand of al-Jauzā”, i.e., Orion.

European mistransliteration into medieval Latin led to the first character y:
, with two dots underneath) being misread as a b: , with only one dot underneath. During the Renaissance, the star’s name was written as: بيت الجوزاء Bait al-Jauzā’ (“house of Orion”) or: بط الجوزاء Baţ al-Jauzā’, incorrectly thought to mean “armpit of Orion” (a true translation of “armpit” would be: ابط, transliterated as Ibţ). This led to the modern representation of Betelgeuse

And that’s how one of our most famous stars (no, not Michael Keaton - the other Betelgeuse), ended up as the armpit of Orion. Hopefully, I will make different, not similar, mistakes!

Importing the Harvard Khipu Database

The Harvard Khipu Database Project stored khipu measurements in two forms: Excel and a larger set as SQL tables. The SQL tables are almost…. ready for datamining. However, in an attempt to make this project portable (in a software sense, not a cloth one LOL) and to save hassles with a SQL server, etc. I am converting the SQL to CSV files. I note that Urton, et.al.. prefix the database tables with the Quechua word for warehouse collca. A Quechua spelling is qollqa. I say one because Quechua orthography is dialectical. It could be collca, khollqa, kolka….

Since the khipu database/tables are so small (a total of 100Mb in SQL statements) I used an open-source mySQL, and TablePlus, a SQL GUI to:

  1. Restore the Khipu Project MySQL Database by concatenating all the SQL files:
    cat collca*.sql > make_khipu_db.sql
    and running the resulting SQL file on the MariaDB MySQL server.

  2. Save all the tables and query results of the Khipu Project MySQL Database as CSV files (using TablePlus)

    Voila - we now have a bunch of python pandas DataFrame ready CSV files.

The Harvard Khipu database schema (description image by the Khipu Database Project) is shown below. The python classes reconstruct this schema.



1. Creating the Initial Khipu DB


As said previously, the SQL database tables and query results are stored as a CSV tables instead of as SQL CREATE statements. Key tables include khipu_main, cord and cord cluster. Tables that end with dc are code descriptors for symbolic codes in tables. For example the ascher_color_dc tells you that color MB -> translates to Medium Brown…

2. Data Cleansing


We start by building a virginal object-oriented database (OODB) of khipus (essentially Python class objects). Building the initial Khipu OODB of about 620 khipus takes about 10 minutes.

2.1 Create fresh copies of CSV Database

Code
############################################################################################################################################
# Load required libraries and intialize Jupyter notebook
############################################################################################################################################
# Khipu Imports
import khipu_utils as ku
import khipu_kamayuq as kamayuq  # A Khipu Maker is known (in Quechua) as a Khipu Kamayuq
import khipu_qollqa as kq

# Make a clean CSV directory to build  KFG Database from scratch. 
# Copy cleaner (i.e some minor data fixes like UR189 instead of Ur180) original CSV files to working directory. 
# Copy those files to 'clean' files which become the working CSV's
import shutil
BUILD_FRESH_OODB = True
CSV_dir = kq.qollqa_data_directory()
if BUILD_FRESH_OODB:
    kq.prepare_data_directory()
    #Which does this:
    #     os.system(f"cd {CSV_dir};cp collca_CSV/CSV_BEGIN/*.csv {CSV_dir}")
    #     shutil.copy(f"{CSV_dir}khipu_main.csv", f"{CSV_dir}khipu_main_clean.csv");
    #     shutil.copy(f"{CSV_dir}primary_cord.csv", f"{CSV_dir}primary_cord_clean.csv");
    #     shutil.copy(f"{CSV_dir}cord_cluster.csv", f"{CSV_dir}cord_cluster_clean.csv");
    #     shutil.copy(f"{CSV_dir}cord.csv", f"{CSV_dir}cord_clean.csv");
    #     shutil.copy(f"{CSV_dir}ascher_cord_color.csv", f"{CSV_dir}ascher_cord_color_clean.csv");
    #     shutil.copy(f"{CSV_dir}knot_cluster.csv", f"{CSV_dir}knot_cluster_clean.csv");
    #     shutil.copy(f"{CSV_dir}knot.csv", f"{CSV_dir}knot_clean.csv");

2.2 Create Foundation OODB

Code
# Build a fresh version of the object oriented database (OODB) that starts with the "raw" database.
# Remove Duplicates along the way...
print("Building initial khipu OODB")
kq.update_data_directory(CSV_dir)
all_khipus = [aKhipu for aKhipu in kamayuq.fetch_all_khipus(clean_build=True).values()]
print(f"Done - built and fetched {len(all_khipus)} khipus")
Building initial khipu OODB
0: 1000166
25: 1000334
50: 1000364
75: 1000044
100: 1000143
125: 1000070
150: 1000421
175: 1000446
200: 1000581
225: 1000642
250: 1000023
275: 1000303
300: 1000266
325: 1000249
350: 1000340
375: 1000176
400: 1000057
425: 1000120
450: 1000291
475: 1000407
500: 1000412
Unable to create khipu id 1000484 - exception 1000484
525: 1000499
550: 1000524
575: 1000553
600: 1000605
625: 1000653
Done - built and fetched 595 khipus

3. Data Errors and Data Cleansing

3.1 Big Picture Review

Let’s start by looking at the big picture - what khipus do we have to work with. What’s the “quality” and “integrity” of the data. We’ve already had one khipu fail - Khipu ID 1000484, known as UR167 or B/3453A from the American Museum of Natural History.This failure happens because there is also a Khipu Id 1000474 that is known as UR167!

The database contains many many errors (Have I said that already? 😂). Some are structural, like mispointed cords, and some are transcription errors. I fix transcription errors in three locations:

  • Changing the SQL/CSV data - i.e. replacing the SQL data directly in the beginning CSVs. For example, note that khipu_main.csv (or the equivalent SQL table) has two errors - one, an empty row without any information (khipu id 10000500) and one mislableled investigator name Ur189. I deleted the empty row by hand, and edited the name to UR189 using MS Excel, prior to starting the database loading. As another example, I restore primary cord information missing from the tables for khipu AS014 in the primary cord CSV file using MS Excel.
  • Modifying the data as it’s being saved. For example, Ascher Cord Colors have hundreds of typos and non-regular codes that need cleansing and normalizing.
  • Replacing/fixing data in the code itself. The most common example, fixing “impossible” cord cluster information for AS014, AS024, AS094, AS187, and AS207B, whose clusters consist of things like 3 cords, starting at 35 cm, spaced 66 cm, apart on a 3 cm primary cord. Other examples include fixing incorrectly labeled top cords, the trimming of cord’s lengths (over 25’) for UR149, handling of knots with missing cords, long knots with num_turns=0, but values > 0, etc…

A quick glance of khipu_main, the top-level khipu dataframe:

Code
khipu_main_df = pd.read_csv(f"{CSV_dir}khipu_main.csv") 
khipu_main_df = kq.clean_column_names(khipu_main_df)
khipu_main_df.head()
khipu_id earliest_age latest_age provenance date_discovered discovered_by museum_descr museum_name nickname museum_num ... investigator_num complete created_by created_on changed_by changed_on duplicate_flag duplicate_id archive_num orig_inv_num
0 1000166 0000-00-00 0000-00-00 NaN 0000-00-00 NaN NaN "Niedersächsische Landesmuseum, Hanover, West ... NaN 6271 ... AS010 0.0 katie 5/24/12 13:33 NaN 0000-00-00 00:00:00 0.0 0.0 0.0 AS010
1 1000167 0000-00-00 0000-00-00 NaN 0000-00-00 NaN NaN "Niedersächsische Landesmuseum, Hanover, West ... NaN 10087 ... AS011 0.0 katie 5/24/12 13:33 NaN 0000-00-00 00:00:00 0.0 0.0 0.0 AS011
2 1000180 0000-00-00 0000-00-00 NaN 0000-00-00 NaN NaN "Niedersächsische Landesmuseum, Hanover, West ... NaN 10217 ... AS012 0.0 leah 5/24/12 13:33 leah 10/21/03 9:59 0.0 0.0 0.0 AS012
3 1000184 0000-00-00 0000-00-00 NaN 0000-00-00 NaN NaN "Niedersächsische Landesmuseum, Hanover, West ... NaN 10086 ... AS013 0.0 leah 5/24/12 13:33 leah 11/10/03 13:07 0.0 0.0 0.0 AS013
4 1000185 0000-00-00 0000-00-00 NaN 0000-00-00 NaN NaN British Museum NaN NaN ... AS014 0.0 leah 11/17/03 13:07 leah 11/17/03 13:09 0.0 0.0 0.0 AS014

5 rows × 22 columns


So we have 634 khipus to start with in the database. In our first pass at creating a database above, we were only able to create 595 khipus, with 28 culled as duplicates and 11 being culled due to zero cords, or data integrity issues.

As I have discovered, however, that’s not where the culling stops. A liberal amount of mispointered or incomplete records exist in the SQL database. Most of the issues have to due with cords pointing to the wrong place - for example Pendant Cord 1 belonging to Khipu 1 having a subsidiary cord that is attached to Khipu 2…which in turn has a subsidiary cord attached to Khipu 1, which in turn….

Let’s clean up funky values like ‘NaN’ (Not a Number):

Code
khipu_df = khipu_main_df
khipu_df.museum_descr = khipu_df.museum_descr.fillna(value='')
khipu_df.nickname = khipu_df.nickname.fillna(value='')
khipu_df.provenance = khipu_df.provenance.fillna(value='')
khipu_df.provenance = np.where(khipu_df.provenance == 'unknown','Unknown', khipu_df.provenance)
khipu_df.provenance = np.where(khipu_df.provenance == '','Unknown', khipu_df.provenance)
khipu_df.region = khipu_df.region.fillna(value='')
khipu_df.region = np.where(khipu_df.region == 'unknown','Unknown', khipu_df.region)
khipu_df.region = np.where(khipu_df.region == '','Unknown', khipu_df.region)
khipu_df.conditionofkhipu = khipu_df.conditionofkhipu.fillna(value='')
print(f"Size of khipu dataframe is {khipu_df.shape}")
# khipu_df
Size of khipu dataframe is (634, 22)


Apparently some khipu are in fragmentary condition. Let’s remove those for the purpose of this study. Also the orig_inv_num meaning the original author who described the khipu generally matches with the investigator_num Some Ascher descriptions are replaced by Urton descriptions, but on the whole most Ascher descriptions are honored and labeled as such. In the khipu drawings, I display and restore the original investigator name from the palimpset labeling of Ascher and Brezine khipus by Urton.

Code
fragmentary_khipus_df = khipu_df[khipu_df.conditionofkhipu == "Fragmentary"]
fragmentary_khipu_ids = list(fragmentary_khipus_df.khipu_id.values)
fragmentary_khipu_names = list(fragmentary_khipus_df.investigator_num.values)
print(f"fragmentary_khipu_names: {fragmentary_khipu_names}")
fragmentary_khipu_names: ['QU03', 'QU04', 'QU05', 'QU06', 'QU07', 'QU10', 'QU14', 'QU15', 'QU17', 'QU18', 'QU19']

3.2. Examining Primary Cord Data

We now have a clean khipu database with 635 khipus to investigate.
Most?! khipus have a primary cord. Let’s examine the primary cord database:

Code
primary_cord_df = pd.read_csv(f"{CSV_dir}primary_cord.csv") 
# Once again, let's clean up the columns
primary_cord_df = kq.clean_column_names(primary_cord_df)
primary_cord_df
khipu_id pcord_id structure thickness notes attached_to pcord_length fiber termination beginning created_by created_date changed_by changed_date twist plainnotes
0 1000000 1000000 P 0.0 NaN 0.0 26.0 CN K K cbrezine 11/23/11 19:42 NaN 0000-00-00 00:00:00 S NaN
1 1000001 1000001 P 0.0 nudo de comienzo entre 0.0 - 0.5 cm 0.0 16.5 CN K T cbrezine 11/23/11 19:42 NaN 0000-00-00 00:00:00 S nudo de comienzo entre 0.0 - 0.5 cm
2 1000002 1000002 P 0.0 solamente existe cordon principal entre: 0.0-5... 0.0 10.5 CL NaN NaN cbrezine 11/23/11 19:42 NaN 0000-00-00 00:00:00 S solamente existe cordon principal entre: 0.0-5...
3 1000003 1000003 P 0.0 4.0 cm: nudo que une khipu 109B con up Top Cor... 0.0 98.0 CN K K cbrezine 11/23/11 19:42 cbrezine 5/29/03 9:40 S 4.0 cm: nudo que une khipu 109B con up Top Cor...
4 1000004 1000004 P 0.0 65.5 cm: una prolongacion del cordon principal... 0.0 65.5 CN K T cbrezine 11/23/11 19:42 cbrezine 3/3/04 12:05 S 65.5 cm: una prolongacion del cordon principal...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
630 1000657 1000657 P 0.0 NaN NaN 25.0 CN K T gurton 2/26/18 12:29 gurton 2/26/18 12:29 NaN NaN
631 1000658 1000658 P 0.0 NaN NaN 44.0 CN B NB gurton 2/26/18 12:29 gurton 2/26/18 12:29 NaN NaN
632 1000659 1000659 P 0.0 NaN NaN 57.0 CN K T gurton 2/26/18 12:29 gurton 2/26/18 12:29 NaN NaN
633 1000660 1000660 P 0.0 NaN NaN 19.0 CN K K gurton 2/26/18 12:29 gurton 2/26/18 12:29 NaN NaN
634 1000661 1000661 P 0.0 NaN NaN 31.0 CN K T gurton 2/26/18 12:29 gurton 2/26/18 12:29 NaN NaN

635 rows × 16 columns


Two questions immediately are raised. Are there any primary cords that are not attached to a khipu? (In which case we should remove them). The notes for primary cords should be reviewed, as well.

Remove primary cords belonging to fragmentary khipus or to the null row…

Code
print(f"Before: primary_cord_df.shape = {primary_cord_df.shape}")
errant_khipu_ids = list((set(primary_cord_df.khipu_id.values) - set(khipu_df.khipu_id.values)) - set(fragmentary_khipu_ids))
errant_khipu_names = khipu_main_df[khipu_main_df.khipu_id.isin(errant_khipu_ids)].investigator_num.values
print(f"Removing errant_khipu_ids {errant_khipu_ids}")
print(f"Removing errant_khipu_names {errant_khipu_names}")

khipu_ids = khipu_df.khipu_id.values
primary_cord_df = primary_cord_df[primary_cord_df.khipu_id.isin(khipu_ids)]
print(f"After: primary_cord_df.shape = {primary_cord_df.shape}")

primary_cord_khipu_ids = primary_cord_df.khipu_id.values
print(f"Before: khipu_df.shape = {khipu_df.shape}")
khipu_df = khipu_df[khipu_df.khipu_id.isin(primary_cord_khipu_ids)]
print(f"After: khipu_df.shape = {khipu_df.shape}")
Before: primary_cord_df.shape = (635, 16)
Removing errant_khipu_ids [1000498, 1000594]
Removing errant_khipu_names []
After: primary_cord_df.shape = (633, 16)
Before: khipu_df.shape = (634, 22)
After: khipu_df.shape = (633, 22)

3.3 Cords and Cord Groups


A few khipus have no cords. The khipu_kamayuq fetch routine filters out zero-cord khipus, so this step is redundant, but I include it here for occasional reference.

Code
all_khipus = [aKhipu for aKhipu in kamayuq.fetch_all_khipus(remove_zero_cord_khipus=False).values()]
zero_cord_khipu_ids = [aKhipu.khipu_id for aKhipu in all_khipus if aKhipu.num_attached_cords()==0]
zero_cord_khipu_name = [kq.khipu_name_from_id(anId) for anId in zero_cord_khipu_ids]
zero_cord_khipu_name.sort()

print(f"Removing zero_cord_khipu_name {zero_cord_khipu_name}")
print(f"Before: khipu_df.shape = {khipu_df.shape}, Zero cord ids: {len(zero_cord_khipu_ids)}")
khipu_df = khipu_df[~khipu_df.khipu_id.isin(zero_cord_khipu_ids)]
print(f"After: khipu_df.shape = {khipu_df.shape}")
print(len(all_khipus))
Removing zero_cord_khipu_name []
Before: khipu_df.shape = (633, 22), Zero cord ids: 0
After: khipu_df.shape = (633, 22)
595


Do the same for cords, cord clusters, and ascher_cord_colors

Code
valid_khipu_ids = list(set(khipu_df.khipu_id.values) & set(kq.cord_cluster_df.khipu_id.values))
print(f"Before: cord_cluster_df.shape = {kq.cord_cluster_df.shape}")
cord_cluster_df = kq.cord_cluster_df[kq.cord_cluster_df.khipu_id.isin(valid_khipu_ids)]
print(f"After: cord_cluster_df.shape = {cord_cluster_df.shape}")

cord_df = pd.read_csv(f"{CSV_dir}cord.csv") 
cord_df = kq.clean_column_names(cord_df)

print(f"Before: cord_df.shape = {cord_df.shape}")
cord_df = cord_df[cord_df.khipu_id.isin(valid_khipu_ids)]
print(f"After: cord_df.shape = {cord_df.shape}")

ascher_cord_color_df = pd.read_csv(f"{CSV_dir}ascher_cord_color.csv") 
ascher_cord_color_df = kq.clean_column_names(ascher_cord_color_df)

# Ascher cord colors also point to primary cords (see pcord_flag)
print(f"Before: ascher_cord_color_df.shape = {ascher_cord_color_df.shape}")
valid_cord_color_ids = list(set(cord_df.cord_id.values) | set(primary_cord_df.pcord_id.values))
ascher_cord_color_df = ascher_cord_color_df[ascher_cord_color_df.cord_id.isin(valid_cord_color_ids)]

print(f"After: ascher_cord_color_df.shape = {ascher_cord_color_df.shape}")

# Many cords (1 in 6!) have NaN as their attached_to. What's up with that?
print(cord_df[np.isnan(cord_df.attached_to)].shape)
Before: cord_cluster_df.shape = (15699, 18)
After: cord_cluster_df.shape = (15699, 18)
Before: cord_df.shape = (56870, 25)
After: cord_df.shape = (55805, 25)
Before: ascher_cord_color_df.shape = (58609, 27)
After: ascher_cord_color_df.shape = (57341, 27)
(9253, 25)

3.4. Cord Clusters with Incorrect Cord Pointers

Some cords have missing parent cords. By comparing the pendant_from fields of cords versus the cord_id of clusters, I discovered that 44 khipu have cord clusters that point to cords that don’t belong to the khipu. For example, look at UR181/1000491 which has a cord cord_id=3052039 whose pendant_from 1000592 actually points to UR254/1000592

UR003 and UR149

Two of the khipus, UR003 and UR149 have excel files. On viewing the Excel files, I find that UR003 has 371 cords that have something in their fields, and a total of 146 cords that say nothing, while the database says it has 758 directly attached pendants, and 761 cord cluster pendants. UR149 says it has 256 to 265 cords, but the Excel spreadsheet says it has 272 cords. Clearly something’s wrong.

Code
has_cord_parents_mask = cord_df.pendant_from.isin(cord_df.cord_id.values)
has_pcord_parents_mask = cord_df.pendant_from.isin(primary_cord_df.pcord_id.values)
has_parents_mask = (has_cord_parents_mask | has_pcord_parents_mask)
num_orphan_cords = sum(~has_parents_mask)
print(f"# of cords missing parents = {num_orphan_cords}")
print(f"Before: cord_df.shape = {cord_df.shape}")
cord_df = cord_df[has_parents_mask]
print(f"After: cord_df.shape = {cord_df.shape}")
# of cords missing parents = 288
Before: cord_df.shape = (55805, 25)
After: cord_df.shape = (55517, 25)

Some clusters have zero cords. What to do about these? For now we leave them in, and do defensive coding…

Code
khipus_with_zero_cord_clusters = []
for khipu in all_khipus:
    for cluster in khipu.cord_clusters():
        if cluster.num_cords() == 0:
            khipus_with_zero_cord_clusters.append(khipu.name())
malformed_khipus = sorted(list(set(khipus_with_zero_cord_clusters)))
print(f"Khipus with zero cord clusters ({len(malformed_khipus)}): {malformed_khipus}")
Khipus with zero cord clusters (5): ['AS012', 'UR146', 'UR188', 'UR190', 'UR255']
Code
clusters_from_cords = set(list(cord_df.cluster_id.values)) 
clusters_in_clusters = set(list(cord_cluster_df.cluster_id.values))
missing_cord_clusters = clusters_from_cords - clusters_in_clusters
def is_existing_cluster(aClusterID): return (aClusterID in clusters_in_clusters)
is_existing_cluster_mask = [is_existing_cluster(aClusterID) for aClusterID in cord_df.cluster_id.values]
cord_df = cord_df[is_existing_cluster_mask]

kq.save_khipu_df(khipu_df)
kq.save_primary_cord_df(primary_cord_df)
kq.save_cord_cluster_df(cord_cluster_df)
kq.save_cord_df(cord_df)
kq.save_ascher_cord_color_df(ascher_cord_color_df)

3.5 Rebuild DB

And finally a new rebuild

Code
%%capture cell_output.txt
kq.load_KFG_DB()
kq.build_KFG_qollqa()
all_khipus = [aKhipu for aKhipu in kamayuq.fetch_all_khipus(clean_build=True).values()]
kq.backup_qollqa(f"CSV_BASECAMP_2_{len(all_khipus)}")

4. “Fixing” Malformed Harvard SQL Khipus:

Some cords have missing clusters. I found 82 khipus that had cords with malformed clusters. In most cases, they appear to be subsidiary cords, or cords of an unknown attachment (maybe unattached from the main khipu). I suspect the clusters failed to appear, when a translation of the Excel format for khipus was made to SQL.

After six weeks of work, I was able to reconstruct 77 of these malformed 82 khipu by splicing a combination of a publicly available set of excerpted Excel files from Jon Clindaniel’s Ph.D. Thesis, containing only knot and cord information and the validly placed, but otherwise incorrect pendants and clusters and primary cord from the SQL database. That work is documented here.

For each recoverable khipu, the SQL khipu is stripped down to it’s pendants, and then a new structure of knots and subsidiary cords is grafted onto that from the Excel spreadsheet for that khipu. This is all done by the utility class ExcelKhipu which updates all the CSV pandas files with new information.

Ultimately, these files have been saved in KFG Excel format, and we will reconstruct these files later when we read in pure KFG Excel files.

Code
print("Khipus that were recoverable using Jon Clindaniel's Thesis files.")
# Recoverable Khipus
recoverable_khipu_rep = ku.multiline(", ".join(kq.recoverable_khipus()))
print(f" {recoverable_khipu_rep}")
Khipus that were recoverable using Jon Clindaniel's Thesis files.
 HP009, HP033, HP034, HP036, HP037, HP038, HP039, HP040, HP041, HP042, HP043, 
 HP044, HP046 A, HP046 B, HP047, HP051 A, HP053, HP054, HP057, JC001, JC002, 
 JC003, JC004, JC005, JC006, JC007, JC008, JC009, JC010, JC011, JC012, JC013, 
 JC014, JC015, JC016, JC017, JC018, JC019, JC020, JC021, JC022, JC023, UR196, 
 UR206, UR209, UR251, UR252, UR254, UR257, UR258, UR259, UR260, UR261, UR262, 
 UR263, UR266, UR267A, UR267B, UR268, UR269, UR270, UR271, UR272, UR273A, UR273B, 
 UR274A, UR275, UR276, UR277, UR278, UR279, UR281, UR284, UR288, UR293
Code
#%%capture cell_output.txt
# Run as an offline process due to namespace issues
# import excel_khipu
# excel_khipu.reconcile_recoverable_khipus()
#os.system(f"cd {kq.project_directory()}/code/classes;python excel_khipu.py")
# kamayuq.rebuild_khipu_OODB()
# kq.build_KFG_qollqa()

# (khipu_dict, all_khipus) = kamayuq.fetch_khipus()
# kq.backup_qollqa(f"CSV_BASECAMP_3_{len(all_khipus)}")

After that offline process, let’s continue…

Code
kq.load_KFG_DB()

khipu_df = kq.khipu_df
primary_cord_df = kq.primary_cord_df
cord_cluster_df = kq.cord_cluster_df
cord_df = kq.cord_df
ascher_cord_color_df = kq.ascher_cord_color_df
knot_cluster_df = kq.knot_cluster_df
knot_df = kq.knot_df

5. Delete Orphaned Data

We now have lots of orphaned data. Let’s remove it.
  1. First delete all known bad khipus
  2. Then delete orphaned primary cords.
  3. Then delete clusters with no cords.
  4. Then delete ascher colors with no cords.
  5. Then delete orphaned knot clusters.
  6. Then delete orphaned knots.
Code
# Make sure all khipu that are in deleted_khipus_df are deleted in khipu_df
deleted_khipus_df = pd.read_csv(f"{CSV_dir}/deleted_khipus.csv")
khipu_df = khipu_df[~ khipu_df.investigator_num.isin(deleted_khipus_df.name)]

# Delete orphaned primary cords
primary_cord_df = primary_cord_df[primary_cord_df.khipu_id.isin(khipu_df.khipu_id)]

#Remove cords and cord_clusters that have no khipus associated with them as a result of all this deletion
cord_cluster_df = cord_cluster_df[cord_cluster_df.khipu_id.isin(khipu_df.khipu_id.values)]
cord_df = cord_df[cord_df.khipu_id.isin(khipu_df.khipu_id.values)]

# Remove any ascher colors that have no cords associated with them
# Note that Ascher cord colors also point to primary cords (see pcord_flag)
print(f"Before: ascher_cord_color_df.shape = {ascher_cord_color_df.shape}")
valid_cord_color_ids = list(set(cord_df.cord_id.values) | set(primary_cord_df.pcord_id.values))
ascher_cord_color_df = ascher_cord_color_df[ascher_cord_color_df.cord_id.isin(valid_cord_color_ids)]

# Remove knot clusters and knots from previously eliminated khipu. 
print(f"Before: knot_cluster_df.shape = {knot_cluster_df.shape}")
knot_cluster_df = knot_cluster_df[knot_cluster_df.cord_id.isin(cord_df.cord_id.values)]
print(f"After: knot_cluster_df.shape = {knot_cluster_df.shape}")

# Remove orphaned knots
knot_df = knot_df[knot_df.cord_id.isin(cord_df.cord_id)]
Before: ascher_cord_color_df.shape = (57341, 28)
Before: knot_cluster_df.shape = (59506, 12)
After: knot_cluster_df.shape = (33116, 12)

6. Ascher Cord Colors

Many of the colors in the khipus are mal-formed, ill-formed, etc. For example YB:W and W:YB are the same cord color (mottled) but listed as separate colors. So normalize it so the cords are always sorted by grey-scale value. Similarly recode badly formed Ascher cord colors such as W**BS to W:BS.

Code
import khipu_cord_color as kcc

# A sampling of bogus/transformed color descriptors
bogus_mixed_color_types = {'-':'PK', '-MB:AB':'MB:AB', 'BY':'YB', 'GGW':'W:GG', 'AB:AB-GG':'AB-GG', 'MB:MB-W':'MB-W', 'W-MB-MB':'W-MB', 'AB-AB-GG-GG-MB':'AB-GG-MB',
                           'W**SR':'W:SR', 'W**BS':'W:BS', 'W*BS':'W:BS', 'W**VB':'W:VB', 'GR***':'GR', 'KB**SY':'KB:SY', 'W-AB-AB-MB':'W-AB-MB', 'MB-MB-TG-TG':'MB-TG', 'RB(0-0)AB:MB(0-0)':'RB:AB:MB',
                           'W*BS*':'W:BS', 'W**SR':'W:SR', 'W-MB-MB':'W-MB', 'W**BS':'W:BS', 'BS*KB*':'BS:KB', 'ABW':'AB:W', 'MB-MB:W':'MB:W', 'AB-AB-MB':'AB-MB', 
                           'BS*DB*':'BS:DB', 'DB*VB*':'D:VB', 'DB*0G*':'DB:0G', 'DB*KB*':'DB:KB', 'DB*SR*':'DB:SR', 'AB-AB-KB':'AB-KB', 'BS*_SY*':'BS:SY', '*':'PK', 
                           'W-W-AB-KB':'W-AB-KB', 'AB-AB-KB-KB':'AB-KB', 'W-W-AB-MB':'W-AB-MB', 'W-W-KB-KB':'W-KB', 'MBABGG':'MB:AB:GG', 'AB-MB-MB':'AB-MB', 'W**VB*':'W:VB', 
                           'GR***':'GR', 'W-AB-AB-MB':'W-AB-MB', 'KB**SY':'KB:SY', 'W-DB-DB':'W-DB', '#NAME?':'PK', 'AB-AB-CB':'AB-CB', 'MB-MB-TG-TG':'MB-TG', 
                           'W-AB-AB-GG':'W-AB-GG', 'AB-AB-CB-CB':'AB-CB', 'LG-AB-AB':'LG-AB', 'AB-GG-GG-KB-KB':'AB-GG-KB', 'W-W-AB-AB':'W-AB', 'BS*SY*':'BS:SY',
                           'KB*SR*':'KB:SR', 'W*0G*':'W:0G', 'W*SR*':'W:SR', 'W*SY*':'W:SY', 'W*VB*':'W:VB', 'W*DB*':'W:DB', 'KB*W*':'KB:W', 'W*KB*':'W:KB', 'BS*VB*':'BS:VB',
                           'BS*0G*':'BS:0G', 'BS*SR*':'BS:SR', 'DB*SY*':'DB:SY', 'FB*W*':'FB:W', 'DB*BS*':'DB:BS', 'KB*BS*':'KB:BS', 'W*':'W', 'W**DB*':'W:DB', 'W*FB*':'W:FB',
                           'W*KB**':'W:KB', 'W*SR**':'W:SR', 'DB*GR*':'DB:GR', 'BS*GR*':'BS:GR', 'DB*BS**':'DB:BS', 'B***':'B', 'DB*_W*':'DB:W', 'DB*W*':'DB:W','DB*':'DB',
                           'W-W-KB':'W-KB', 'GLSRYBMG':'GL:SR:YB:MG', 'GLSRYBWMG':'GL:SR:YB:W:MG', 'W-AB-MB-MB': 'W-AB-MB', 'WGSRMG':'W:G:SR:MG',
                           'W-W-MB-HB-HB':'W-MB-HB', 'AB-AB-BG-KB':'AB-BG-KB', 'KBW':'KB:W', 'MBAB':'MB:AB', 'YB-YB-FB':'YB-FB', 
                           'W-W-MB-MB':'W-MB', 'W*_DB':'W:DB', 'W**DB':'W:DB', 'W:W:GG:KB':'W:GG:KB', 'ABKB':'AB:KB', 'MBCB':'MB:CB', 'RL-RL-FR':'RL-FR', 'W-AB-AB':'W-AB',
                           'AB-AB-HB-HB':'AB-AB-HB', 'GG-GG-MB-MB':'GG-MB', 'W-W-AB':'W-AB', 'W-MB-MB:W':'W-MB:W', 'AB-AB-MB-MB':'AB-MB', 'MB(0-0)RB(0-0)MB:RB(0-0)':'MB:RB:MB',
                           'FR(0-0)AB:MB(0-0)':'FR:AB:MB', 'FR:AB(0-0)AB-MB(0-0)':'FR:AB-MB', 'AB:LG(0-0)MB(0-0)':'AB:LG:MB', 'MB:AB(0-0)MB(0-0)':'MB:AB', 'BDW':'BD:W',
                           'AB:YB(0-0)AB(0-0)MB:AB(0-0)':'AB:YB:MB', 'BY(0-0)KB:BY(0-0)BY(0-0)':'BY:KB', 'RB:AB(0-0)AB(0-0)':'RB:AB', 'MB:W(0-9)W(9-41.5)':'MB:W',
                           'W:':'W', ':-W':'W', ':W-':'W', ':W-W':'W', 'W***':'W', 'W***':'W', }
    

# An illustration of the correction:
def well_formed_color(colorcode_descriptor):
    colorcode_descriptor = kcc.fix_color_code(colorcode_descriptor) # Transform bogus colors
    (pattern, color_codes, rgbcolors) = kcc.parse_color_code(colorcode_descriptor)
    
    # Make it so that MB:W and W:MB are the same (W:MB)
    if pattern == 'barberpole': 
        colorcode_descriptor = "-".join(sorted(color_codes, key=lambda x: (kcc.color_code_to_grey_value(x),x), reverse=True))
    elif pattern == 'mottled': 
        colorcode_descriptor = ":".join(sorted(color_codes, key=lambda x: (kcc.color_code_to_grey_value(x),x), reverse=True))
    elif pattern == 'striped': 
        colorcode_descriptor = "%".join(sorted(color_codes, key=lambda x: (kcc.color_code_to_grey_value(x),x), reverse=True))
        
    return colorcode_descriptor

# Update the Ascher Cord Color Dataframe with well-formed colors
well_formed_colors = [kcc.well_formed_color(x) for x in list(ascher_cord_color_df.full_color.values)]
ascher_cord_color_df['full_color'] = well_formed_colors
/var/folders/ky/m3w1tm2d79v8dsqhx97f7hsm0000gn/T/ipykernel_48475/390068618.py:40: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  ascher_cord_color_df['full_color'] = well_formed_colors

Once again, save the cleaned DataFrames and rebuild.

Code
%%capture cell_output.txt
kq.save_khipu_df(khipu_df)
kq.save_primary_cord_df(primary_cord_df)
kq.save_cord_cluster_df(cord_cluster_df)
kq.save_cord_df(cord_df)
kq.save_ascher_cord_color_df(ascher_cord_color_df)
kq.save_knot_cluster_df(knot_cluster_df)
kq.save_knot_df(knot_df)

kq.load_KFG_DB()
kamayuq.rebuild_khipu_OODB()
kq.build_KFG_qollqa()
all_khipus = [aKhipu for aKhipu in kamayuq.fetch_all_khipus(clean_build=True).values()]
kq.backup_qollqa(f"CSV_BASECAMP_4_{len(all_khipus)}")

7. Importing Additional Khipus

7.1 Using the KFG Excel format

The Khipu Field Guide has its own khipu Excel file format - a six sheet Excel workbook that documents a khipu. Using this format, khipus from external sources can be translated into Excel and imported. These currently include:
  • 1 khipu reconstructed from Carol Mackey’s thesis (CM009)
  • 2 new khipus from Sabine Hyland (SH001, SH002)
  • 9 new khipus from Marcia and Robert Ascher’s journal article Numbers and Relations from Ancient Andean Quipus. Using the source from the original article I have imported their 9 khipus into the Khipu Field Guide (KFG). That work is documented here. The Ascher_Excel_Book class does all the heavy lifting and produces, after many transformations, a set of standardized KFG Excel Khipu files.
  • 22 new khipus via spreadsheets from Manuel Medrano
  • 22 new khipus via spreadsheets from Kylie Quave to completely replace malformed khipus originally referenced from the Harvard KDB
  • 42 khipus, rebuilt by hand by Ashok Khosla, via spreadsheets, from multiple sources, that completely replace malformed Pereyra and Urton khipus.
  • 77 khipus, rebuilt computationally by Ashok Khosla, via spreadsheets, from multiple sources, that completely replace malformed KDB SQL khipus, using information from Jon Clindaniel’s Ph.D. thesis.

First let’s list the khipus to import:

Code
working_directory = f"{kq.project_directory()}/data/XLS/ADD_XLSX"
khipu_names = sorted(ku.basename_glob(r'[A-Z].*\.xlsx', working_directory))
khipu_rep = ku.multiline(", ".join([ku.basename(name) for name in khipu_names]))
print(f"Processing {len(khipu_names)} khipus: {khipu_rep}")
Processing 253 khipus: AS001, AS002, AS003, AS004, AS005, AS006, AS007, AS008, AS009, AS012, AS014, 
 AS016, AS019, AS024, AS027, AS028, AS042, AS044, AS045, AS063, AS069, AS071, 
 AS072, AS073, AS074, AS080, AS090_N2, AS094, AS156, AS170, AS182B, AS187, 
 AS207B, AS209, CM009, HP009, HP016, HP017, HP027, HP033, HP034, HP037, HP038, 
 HP039, HP040, HP042, HP043, HP044, HP045, HP046_A, HP046_B, HP047, HP048, 
 HP051_A, HP053, HP054, HP055, HP057, JC001, JC002, JC003, JC004, JC005, JC006, 
 JC007, JC008, JC009, JC010, JC011, JC012, JC013, JC014, JC015, JC016, JC017, 
 JC018, JC019, JC020, JC021, JC022, JC023, KH0001, KH0032, KH0033, KH0049, 
 KH0057, KH0058, KH0067, KH0080, KH0081, KH0083, KH0197, KH0227, KH0267, KH0350, 
 MM001, MM002, MM003, MM004, MM005, MM006_AN001, MM007_AN002, MM008, MM009, 
 MM010, MM011, MM012, MM013, MM014, MM015, MM016, MM017, MM018, MM019, MM020, 
 MM021, MM022, MM1086, QU001, QU002, QU003, QU004, QU005, QU006, QU007, QU008, 
 QU009, QU010, QU011, QU012, QU013, QU014, QU015, QU016, QU017, QU018, QU019, 
 QU020, QU021, QU022, SH001, SH002, UR001, UR002, UR003, UR004, UR011, UR012, 
 UR014, UR015, UR017, UR018, UR022, UR026, UR032, UR039, UR044, UR047, UR049, 
 UR050, UR052, UR053A, UR054, UR055, UR059, UR063, UR074, UR085, UR088, UR089, 
 UR1034, UR104, UR1057, UR108, UR1095, UR1098, UR110, UR112, UR113, UR1131, 
 UR1138, UR1149, UR118, UR119, UR122, UR130, UR131A, UR138, UR141, UR143, UR144, 
 UR146, UR147, UR149, UR153, UR154, UR155, UR164, UR165, UR166, UR167, UR169, 
 UR177, UR184, UR188, UR190, UR191, UR193, UR196, UR198, UR201, UR206, UR209, 
 UR211, UR221, UR231, UR237, UR251, UR252, UR253, UR254, UR255, UR256, UR257, 
 UR258, UR259, UR260, UR261, UR262, UR263, UR264, UR266, UR267A, UR267B, UR268, 
 UR269, UR270, UR271, UR272, UR273A, UR273B, UR274A, UR275, UR276, UR277, UR278, 
 UR279, UR280, UR284, UR288, UR291A, UR292A, UR293

And then import them:

Code
%%capture cell_output.txt
import warnings
warnings.simplefilter("ignore")

import kfg_excel_reader
khipu_builder = kfg_excel_reader.KFG_Excel_Reader(working_directory, khipu_names, base_id=6000000, publish=True, run_silent=True)

kamayuq.rebuild_khipu_OODB()
kq.build_KFG_qollqa()
all_khipus = [aKhipu for aKhipu in kamayuq.fetch_all_khipus(clean_build=True).values()]
KFG_Excel_Reader: Importing AS002.xlsx
KFG_Excel_Reader: Importing AS003.xlsx
KFG_Excel_Reader: Importing AS004.xlsx
KFG_Excel_Reader: Importing AS005.xlsx
KFG_Excel_Reader: Importing AS006.xlsx
KFG_Excel_Reader: Importing AS007.xlsx
KFG_Excel_Reader: Importing AS008.xlsx
KFG_Excel_Reader: Importing AS009.xlsx
KFG_Excel_Reader: Importing AS012.xlsx
KFG_Excel_Reader: Importing AS014.xlsx
KFG_Excel_Reader: Importing AS016.xlsx
KFG_Excel_Reader: Importing AS019.xlsx
KFG_Excel_Reader: Importing AS024.xlsx
KFG_Excel_Reader: Importing AS027.xlsx
KFG_Excel_Reader: Importing AS028.xlsx
KFG_Excel_Reader: Importing AS042.xlsx
KFG_Excel_Reader: Importing AS044.xlsx
KFG_Excel_Reader: Importing AS045.xlsx
KFG_Excel_Reader: Importing AS063.xlsx
KFG_Excel_Reader: Importing AS069.xlsx
KFG_Excel_Reader: Importing AS071.xlsx
KFG_Excel_Reader: Importing AS072.xlsx
KFG_Excel_Reader: Importing AS073.xlsx
KFG_Excel_Reader: Importing AS074.xlsx
KFG_Excel_Reader: Importing AS080.xlsx
KFG_Excel_Reader: Importing AS090_N2.xlsx
KFG_Excel_Reader: Importing AS094.xlsx
KFG_Excel_Reader: Importing AS156.xlsx
KFG_Excel_Reader: Importing AS170.xlsx
KFG_Excel_Reader: Importing AS182B.xlsx
KFG_Excel_Reader: Importing AS187.xlsx
KFG_Excel_Reader: Importing AS207B.xlsx
KFG_Excel_Reader: Importing AS209.xlsx
KFG_Excel_Reader: Importing CM009.xlsx
KFG_Excel_Reader: Importing HP009.xlsx
KFG_Excel_Reader: Importing HP016.xlsx
KFG_Excel_Reader: Importing HP017.xlsx
KFG_Excel_Reader: Importing HP027.xlsx
KFG_Excel_Reader: Importing HP033.xlsx
KFG_Excel_Reader: Importing HP034.xlsx
KFG_Excel_Reader: Importing HP037.xlsx
KFG_Excel_Reader: Importing HP038.xlsx
KFG_Excel_Reader: Importing HP039.xlsx
KFG_Excel_Reader: Importing HP040.xlsx
KFG_Excel_Reader: Importing HP042.xlsx
KFG_Excel_Reader: Importing HP043.xlsx
KFG_Excel_Reader: Importing HP044.xlsx
KFG_Excel_Reader: Importing HP045.xlsx
KFG_Excel_Reader: Importing HP046_A.xlsx
KFG_Excel_Reader: Importing HP046_B.xlsx
KFG_Excel_Reader: Importing HP047.xlsx
KFG_Excel_Reader: Importing HP048.xlsx
KFG_Excel_Reader: Importing HP051_A.xlsx
KFG_Excel_Reader: Importing HP053.xlsx
KFG_Excel_Reader: Importing HP054.xlsx
KFG_Excel_Reader: Importing HP055.xlsx
KFG_Excel_Reader: Importing HP057.xlsx
KFG_Excel_Reader: Importing JC001.xlsx
KFG_Excel_Reader: Importing JC002.xlsx
KFG_Excel_Reader: Importing JC003.xlsx
KFG_Excel_Reader: Importing JC004.xlsx
KFG_Excel_Reader: Importing JC005.xlsx
KFG_Excel_Reader: Importing JC006.xlsx
KFG_Excel_Reader: Importing JC007.xlsx
KFG_Excel_Reader: Importing JC008.xlsx
KFG_Excel_Reader: Importing JC009.xlsx
KFG_Excel_Reader: Importing JC010.xlsx
KFG_Excel_Reader: Importing JC011.xlsx
KFG_Excel_Reader: Importing JC012.xlsx
KFG_Excel_Reader: Importing JC013.xlsx
KFG_Excel_Reader: Importing JC014.xlsx
KFG_Excel_Reader: Importing JC015.xlsx
KFG_Excel_Reader: Importing JC016.xlsx
KFG_Excel_Reader: Importing JC017.xlsx
KFG_Excel_Reader: Importing JC018.xlsx
KFG_Excel_Reader: Importing JC019.xlsx
KFG_Excel_Reader: Importing JC020.xlsx
KFG_Excel_Reader: Importing JC021.xlsx
KFG_Excel_Reader: Importing JC022.xlsx
KFG_Excel_Reader: Importing JC023.xlsx
KFG_Excel_Reader: Importing KH0001.xlsx
KFG_Excel_Reader: Importing KH0032.xlsx
KFG_Excel_Reader: Importing KH0033.xlsx
KFG_Excel_Reader: Importing KH0049.xlsx
KFG_Excel_Reader: Importing KH0057.xlsx
KFG_Excel_Reader: Importing KH0058.xlsx
KFG_Excel_Reader: Importing KH0067.xlsx
KFG_Excel_Reader: Importing KH0080.xlsx
KFG_Excel_Reader: Importing KH0081.xlsx
KFG_Excel_Reader: Importing KH0083.xlsx
KFG_Excel_Reader: Importing KH0197.xlsx
KFG_Excel_Reader: Importing KH0227.xlsx
KFG_Excel_Reader: Importing KH0267.xlsx
KFG_Excel_Reader: Importing KH0350.xlsx
KFG_Excel_Reader: Importing MM001.xlsx
KFG_Excel_Reader: Importing MM002.xlsx
KFG_Excel_Reader: Importing MM003.xlsx
KFG_Excel_Reader: Importing MM004.xlsx
KFG_Excel_Reader: Importing MM005.xlsx
KFG_Excel_Reader: Importing MM006_AN001.xlsx
KFG_Excel_Reader: Importing MM007_AN002.xlsx
KFG_Excel_Reader: Importing MM008.xlsx
KFG_Excel_Reader: Importing MM009.xlsx
KFG_Excel_Reader: Importing MM010.xlsx
KFG_Excel_Reader: Importing MM011.xlsx
KFG_Excel_Reader: Importing MM012.xlsx
KFG_Excel_Reader: Importing MM013.xlsx
KFG_Excel_Reader: Importing MM014.xlsx
KFG_Excel_Reader: Importing MM015.xlsx
KFG_Excel_Reader: Importing MM016.xlsx
KFG_Excel_Reader: Importing MM017.xlsx
KFG_Excel_Reader: Importing MM018.xlsx
KFG_Excel_Reader: Importing MM019.xlsx
KFG_Excel_Reader: Importing MM020.xlsx
KFG_Excel_Reader: Importing MM021.xlsx
KFG_Excel_Reader: Importing MM022.xlsx
KFG_Excel_Reader: Importing MM1086.xlsx
KFG_Excel_Reader: Importing QU001.xlsx
KFG_Excel_Reader: Importing QU002.xlsx
KFG_Excel_Reader: Importing QU003.xlsx
KFG_Excel_Reader: Importing QU004.xlsx
KFG_Excel_Reader: Importing QU005.xlsx
KFG_Excel_Reader: Importing QU006.xlsx
KFG_Excel_Reader: Importing QU007.xlsx
KFG_Excel_Reader: Importing QU008.xlsx
KFG_Excel_Reader: Importing QU009.xlsx
KFG_Excel_Reader: Importing QU010.xlsx
KFG_Excel_Reader: Importing QU011.xlsx
KFG_Excel_Reader: Importing QU012.xlsx
KFG_Excel_Reader: Importing QU013.xlsx
KFG_Excel_Reader: Importing QU014.xlsx
KFG_Excel_Reader: Importing QU015.xlsx
KFG_Excel_Reader: Importing QU016.xlsx
KFG_Excel_Reader: Importing QU017.xlsx
KFG_Excel_Reader: Importing QU018.xlsx
KFG_Excel_Reader: Importing QU019.xlsx
KFG_Excel_Reader: Importing QU020.xlsx
KFG_Excel_Reader: Importing QU021.xlsx
KFG_Excel_Reader: Importing QU022.xlsx
KFG_Excel_Reader: Importing SH001.xlsx
KFG_Excel_Reader: Importing SH002.xlsx
KFG_Excel_Reader: Importing UR001.xlsx
KFG_Excel_Reader: Importing UR002.xlsx
KFG_Excel_Reader: Importing UR003.xlsx
KFG_Excel_Reader: Importing UR004.xlsx
KFG_Excel_Reader: Importing UR011.xlsx
KFG_Excel_Reader: Importing UR012.xlsx
KFG_Excel_Reader: Importing UR014.xlsx
KFG_Excel_Reader: Importing UR015.xlsx
KFG_Excel_Reader: Importing UR017.xlsx
KFG_Excel_Reader: Importing UR018.xlsx
KFG_Excel_Reader: Importing UR022.xlsx
KFG_Excel_Reader: Importing UR026.xlsx
KFG_Excel_Reader: Importing UR032.xlsx
KFG_Excel_Reader: Importing UR039.xlsx
KFG_Excel_Reader: Importing UR044.xlsx
KFG_Excel_Reader: Importing UR047.xlsx
KFG_Excel_Reader: Importing UR049.xlsx
KFG_Excel_Reader: Importing UR050.xlsx
KFG_Excel_Reader: Importing UR052.xlsx
KFG_Excel_Reader: Importing UR053A.xlsx
KFG_Excel_Reader: Importing UR054.xlsx
KFG_Excel_Reader: Importing UR055.xlsx
KFG_Excel_Reader: Importing UR059.xlsx
KFG_Excel_Reader: Importing UR063.xlsx
KFG_Excel_Reader: Importing UR074.xlsx
KFG_Excel_Reader: Importing UR085.xlsx
KFG_Excel_Reader: Importing UR088.xlsx
KFG_Excel_Reader: Importing UR089.xlsx
KFG_Excel_Reader: Importing UR1034.xlsx
KFG_Excel_Reader: Importing UR104.xlsx
KFG_Excel_Reader: Importing UR1057.xlsx
KFG_Excel_Reader: Importing UR108.xlsx
KFG_Excel_Reader: Importing UR1095.xlsx
KFG_Excel_Reader: Importing UR1098.xlsx
KFG_Excel_Reader: Importing UR110.xlsx
KFG_Excel_Reader: Importing UR112.xlsx
KFG_Excel_Reader: Importing UR113.xlsx
KFG_Excel_Reader: Importing UR1131.xlsx
KFG_Excel_Reader: Importing UR1138.xlsx
KFG_Excel_Reader: Importing UR1149.xlsx
KFG_Excel_Reader: Importing UR118.xlsx
KFG_Excel_Reader: Importing UR119.xlsx
KFG_Excel_Reader: Importing UR122.xlsx
KFG_Excel_Reader: Importing UR130.xlsx
KFG_Excel_Reader: Importing UR131A.xlsx
KFG_Excel_Reader: Importing UR138.xlsx
KFG_Excel_Reader: Importing UR141.xlsx
KFG_Excel_Reader: Importing UR143.xlsx
KFG_Excel_Reader: Importing UR144.xlsx
KFG_Excel_Reader: Importing UR146.xlsx
KFG_Excel_Reader: Importing UR147.xlsx
KFG_Excel_Reader: Importing UR149.xlsx
KFG_Excel_Reader: Importing UR153.xlsx
KFG_Excel_Reader: Importing UR154.xlsx
KFG_Excel_Reader: Importing UR155.xlsx
KFG_Excel_Reader: Importing UR164.xlsx
KFG_Excel_Reader: Importing UR165.xlsx
KFG_Excel_Reader: Importing UR166.xlsx
KFG_Excel_Reader: Importing UR167.xlsx
KFG_Excel_Reader: Importing UR169.xlsx
KFG_Excel_Reader: Importing UR177.xlsx
KFG_Excel_Reader: Importing UR184.xlsx
KFG_Excel_Reader: Importing UR188.xlsx
KFG_Excel_Reader: Importing UR190.xlsx
KFG_Excel_Reader: Importing UR191.xlsx
KFG_Excel_Reader: Importing UR193.xlsx
KFG_Excel_Reader: Importing UR196.xlsx
KFG_Excel_Reader: Importing UR198.xlsx
KFG_Excel_Reader: Importing UR201.xlsx
KFG_Excel_Reader: Importing UR206.xlsx
KFG_Excel_Reader: Importing UR209.xlsx
KFG_Excel_Reader: Importing UR211.xlsx
KFG_Excel_Reader: Importing UR221.xlsx
KFG_Excel_Reader: Importing UR231.xlsx
KFG_Excel_Reader: Importing UR237.xlsx
KFG_Excel_Reader: Importing UR251.xlsx
KFG_Excel_Reader: Importing UR252.xlsx
KFG_Excel_Reader: Importing UR253.xlsx
KFG_Excel_Reader: Importing UR254.xlsx
KFG_Excel_Reader: Importing UR255.xlsx
KFG_Excel_Reader: Importing UR256.xlsx
KFG_Excel_Reader: Importing UR257.xlsx
KFG_Excel_Reader: Importing UR258.xlsx
KFG_Excel_Reader: Importing UR259.xlsx
KFG_Excel_Reader: Importing UR260.xlsx
KFG_Excel_Reader: Importing UR261.xlsx
KFG_Excel_Reader: Importing UR262.xlsx
KFG_Excel_Reader: Importing UR263.xlsx
KFG_Excel_Reader: Importing UR264.xlsx
KFG_Excel_Reader: Importing UR266.xlsx
KFG_Excel_Reader: Importing UR267A.xlsx
KFG_Excel_Reader: Importing UR267B.xlsx
KFG_Excel_Reader: Importing UR268.xlsx
KFG_Excel_Reader: Importing UR269.xlsx
KFG_Excel_Reader: Importing UR270.xlsx
KFG_Excel_Reader: Importing UR271.xlsx
KFG_Excel_Reader: Importing UR272.xlsx
KFG_Excel_Reader: Importing UR273A.xlsx
KFG_Excel_Reader: Importing UR273B.xlsx
KFG_Excel_Reader: Importing UR274A.xlsx
KFG_Excel_Reader: Importing UR275.xlsx
KFG_Excel_Reader: Importing UR276.xlsx
KFG_Excel_Reader: Importing UR277.xlsx
KFG_Excel_Reader: Importing UR278.xlsx
KFG_Excel_Reader: Importing UR279.xlsx
KFG_Excel_Reader: Importing UR280.xlsx
KFG_Excel_Reader: Importing UR284.xlsx
KFG_Excel_Reader: Importing UR288.xlsx
KFG_Excel_Reader: Importing UR291A.xlsx
KFG_Excel_Reader: Importing UR292A.xlsx
KFG_Excel_Reader: Importing UR293.xlsx
So as to not blotto hand-edited previous files, copy Text files by hand if needed!
Starting fetch:
0: 1000166
25: 1000361
50: 1000231
75: 1000058
100: 1000070
125: 1000423
150: 1000564
175: 1000303
200: 1000298
225: 1000336
250: 1000208
275: 1000053
300: 1000122
325: 1000322
350: 1000416
375: 1000503
400: 1000536
425: 1000649
450: 6000017
475: 6000042
500: 6000067
525: 6000092
550: 6000117
575: 6000142
600: 6000167
625: 6000192
650: 6000217
675: 6000242
That took 224 seconds --- 3.7 minutes
Made 654 khipus
(654, 49)
Made: khipu_summary: (654, 49)
0: 1000166
25: 1000361
50: 1000231
75: 1000058
100: 1000070
125: 1000423
150: 1000564
175: 1000303
200: 1000298
225: 1000336
250: 1000208
275: 1000053
300: 1000122
325: 1000322
350: 1000416
375: 1000503
400: 1000536
425: 1000649
450: 6000017
475: 6000042
500: 6000067
525: 6000092
550: 6000117
575: 6000142
600: 6000167
625: 6000192
650: 6000217
675: 6000242

8. Updated Museum Numbers and Provenance

The OKR/Open Khipu Repository has updated Museum numbers and Provenance for several khipus. Let’s use their inventory of updates to update the KFG database:

Code
museum_num_updates = [('KH0120', 'VA24370(A)'), ('KH0121', 'VA24370(B)'), ('KH0142', 'VA63042(A)'), ('KH0143', 'VA63042(B)'),
('KH0189', 'VA16145(A)'), ('KH0190', 'VA16145(B)'), ('KH0193', 'VA37859(A)'), ('KH0194', 'VA37859(B)'), ('KH0197', 'VA66832'),
('KH0264', 'TM 4/5446'), ('KH0265', 'TM 4/5446'), ('KH0273', '32.30.30/53(A)'), ('KH0348', '1924.18.0001'), ('KH0349', '1931.37.0001'),
('KH0437', 'VA42597(A)'), ('KH0438', 'VA42597(B)'), ('KH0441', 'VA47114c(A)'), ('KH0442', 'VA47114c(B)'), ('KH0443', 'VA47114c(C)'),
('KH0447', 'VA16141(A)'), ('KH0448', 'VA16141(B)'), ('KH0450', 'VA42508(A)'), ('KH0451', 'VA42508(B)'), ('KH0458', 'VA47114b'),
('KH0463', 'VA44677a(A)'), ('KH0464', 'VA44677a(B)'), ('KH0468', 'VA63038(A)'), ('KH0469', 'VA63038(B)'), ('KH0478', 'VA42607(A)'),
('KH0479', 'VA42607(B)'), ('KH0480', 'VA42607(C)'), ('KH0481', 'VA42607(D)'), ('KH0484', 'VA42578i28'), ('KH0535', 'MSP 1389/RN 43370'),
('KH0558', 'MSP 1422/RN 43403'), ('KH0567', 'MNAAHP 4202'), ('KH0587', 'MNAAHP 30564'), ('KH0588', 'B397/T41299.22'), ('KH0589', 'B376/T41299.23'),
('KH0590', 'B388/T41299.24'), ('KH0591', 'B378/T41299.25'), ('KH0592', 'B377/T41299.26'), ('KH0593', 'B384/T41299.27'), ('KH0594', 'B372/T41299.28'),
('KH0595', 'B367/T41299.29'), ('KH0596', 'B366/T41299.30'), ('KH0597', 'B374/T41299.31'), ('KH0598', 'B375/T41299.32'),
('KH0599', 'B391/T41299.20'), ('KH0600', 'B369/T41299.33.A-B'), ('KH0601', 'B399/T41299.34'), ('KH0602', 'B373/T41299.18'),
('KH0603', 'B383&B383A/T41299.35.A-B'), ('KH0604', 'B395/T41299.36'), ('KH0605', 'B382/T41299.37'), ('KH0606', 'B371/T41299.38'),
('KH0405', '41.0/1550, B/3453A')]

provenance_updates = [('KH0085', 'Rancho San Juan, Ica Valley'), ('KH0086', 'Rancho San Juan, Ica Valley')]

kq.load_KFG_DB()
khipu_df = kq.khipu_df
for (okr_name, new_museum_name) in museum_num_updates:
    kdb_name = kq.okr_name_to_kfg_name(okr_name)
    khipu_df.loc[khipu_df.investigator_num==kdb_name,'museum_num'] = new_museum_name
for (okr_name, new_provenance) in provenance_updates:
    kdb_name = kq.okr_name_to_kfg_name(okr_name)
    khipu_df.loc[khipu_df.investigator_num==kdb_name,'provenance'] = new_provenance
    
kq.save_khipu_df(khipu_df)

9. Export of Harvard KDB Khipus to Khipu FieldGuide Excel

And a final complete rebuild of the SQL Database, followed by an export of each in-memory khipu to a standardized KFG Excel File, which is then used as the basis for any further needed modifications.

Code
%%capture cell_output.txt 
# Final Complete rebuild.
# Refresh in-memory databases
kq.load_KFG_DB()
kamayuq.rebuild_khipu_OODB()

# Update khipu similar neighbors list
kamayuq.make_html_notebook(f"{kq.project_directory()}/khipufieldguide/notebook/Khipu_EDA.ipynb")

# Update khipu summary statistics, etc
kq.build_KFG_qollqa()

all_khipus = [aKhipu for aKhipu in kamayuq.fetch_all_khipus(clean_build=False, run_silent=True).values()]
kq.backup_qollqa(f"CSV_FINAL_{len(all_khipus)}")

# Save the final KFG database in excel format
from kfg_excel2khipu import export_all_KFG_khipus
export_all_KFG_khipus()
[NbConvertApp] Converting notebook /Users/ashokkhosla/Desktop/Khipu/fieldguide/khipufieldguide/notebook/Khipu_EDA.ipynb to notebook
[NbConvertApp] Writing 4876550 bytes to /Users/ashokkhosla/Desktop/Khipu/fieldguide/khipufieldguide/notebook/Khipu_EDA.ipynb
[NbConvertApp] Converting notebook /Users/ashokkhosla/Desktop/Khipu/fieldguide/khipufieldguide/notebook/Khipu_EDA.ipynb to html
[NbConvertApp] Writing 5013019 bytes to /Users/ashokkhosla/Desktop/Khipu/fieldguide/khipufieldguide/notebook/Khipu_EDA.html
Code
print(f"Final build - Made {len(all_khipus)} khipus")

After considerable cross-referencing from various sources, programming and hand-work, all of the khipus referenced in the KDB collection that were lost were reconstructed, and an additional 50 khipus have been added! Time for a celebration! :-)

🎊 🎉 🎇 🥳 🎊 🎉 🎇 🥳 🎊 🎉 🎇 🥳

10. Nudo Desnudo

So how do you represent an unknotted knot? The authors of the Harvard database decided to make knot-clusters with zero knots. Did this break the computer code till fixed. Absolutely! :-) Zen knots.