Reconciling Excel Files

As mentioned in the Database Review page, approximately 80 khipus in the Harvard KDB are malformed. The root of the problem is that some khipu cords have missing clusters. I found 82 khipus that had cords with missing clusters. They appear to be subsidiary cords, or cords of an unknown attachment (maybe unattached from the main khipu). I suspect the clusters failed to appear, when a translation of the Excel format for khipus was made to SQL.

After a month of work, I was able to reconstruct 51 of these malformed 82 khipu by splicing a combination of a publicly available set of excerpted Excel files, containing only knot and cord information and the validly placed, but otherwise incorrect pendants and clusters and primary cord from the SQL database. That work is documented here.

For each recoverable khipu, the SQL khipu is stripped down to it’s pendants, and then a new structure of knots and subsidiary cords is grafted onto that from the Excel spreadsheet for that khipu. This is all done by the utility class ExcelKhipu which updates all the CSV pandas files with new information.

There are approximately 80 funky files. What percentage of these can be recovered?

Code
#all_khipu_names = kq.sorted_khipu_names()
good_khipu_names = kq.sorted_khipu_names()
funky_khipus = sorted(kq.recoverable_khipus()+kq.funky_khipus())
len(funky_khipus),print(funky_khipus)
['HP009', 'HP033', 'HP034', 'HP036', 'HP037', 'HP038', 'HP039', 'HP040', 'HP041', 'HP042', 'HP043', 'HP044', 'HP045', 'HP046 A', 'HP046 B', 'HP047', 'HP048', 'HP051 A', 'HP053', 'HP054', 'HP055', 'HP057', 'JC001', 'JC002', 'JC003', 'JC004', 'JC005', 'JC006', 'JC007', 'JC008', 'JC009', 'JC010', 'JC011', 'JC012', 'JC013', 'JC014', 'JC015', 'JC016', 'JC017', 'JC018', 'JC019', 'JC020', 'JC021', 'JC022', 'JC023', 'UR044', 'UR144', 'UR190', 'UR193', 'UR196', 'UR206', 'UR209', 'UR251', 'UR252', 'UR253', 'UR254', 'UR257', 'UR258', 'UR259', 'UR260', 'UR261', 'UR262', 'UR263', 'UR266', 'UR267A', 'UR267B', 'UR268', 'UR269', 'UR270', 'UR271', 'UR272', 'UR273A', 'UR273B', 'UR274A', 'UR275', 'UR276', 'UR277', 'UR278', 'UR279', 'UR280', 'UR281', 'UR284', 'UR288', 'UR292 A', 'UR293']
(85, None)

How many of these 85 funky khipu have Excel files that we can use for some data recovery?

Code
exceldir = f"{ku.project_directory()}/data/XLS"
def find_khipu_excel_file(name):
    clindaniel_directory = "/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus"
    search_path =  [f"{clindaniel_directory}/Khipu_{name}.xlsx", f"{exceldir}/{name}.xls", f"{exceldir}/{name}.xls"]
    first_file_found = [filename for filename in search_path if os.path.exists(filename)]
    return first_file_found[0] if first_file_found else ""
                            
recoverable_files = [(name, find_khipu_excel_file(name)) for name in funky_khipus if find_khipu_excel_file(name)]
len(recoverable_files)
recoverable_files
84
[('HP009',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP009.xlsx'),
 ('HP033',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP033.xlsx'),
 ('HP034',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP034.xlsx'),
 ('HP036',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP036.xlsx'),
 ('HP037',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP037.xlsx'),
 ('HP038',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP038.xlsx'),
 ('HP039',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP039.xlsx'),
 ('HP040',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP040.xlsx'),
 ('HP041',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP041.xlsx'),
 ('HP042',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP042.xlsx'),
 ('HP043',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP043.xlsx'),
 ('HP044',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP044.xlsx'),
 ('HP045',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP045.xlsx'),
 ('HP046 A',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP046 A.xlsx'),
 ('HP046 B',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP046 B.xlsx'),
 ('HP047',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP047.xlsx'),
 ('HP048',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP048.xlsx'),
 ('HP051 A',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP051 A.xlsx'),
 ('HP053',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP053.xlsx'),
 ('HP054',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP054.xlsx'),
 ('HP055',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP055.xlsx'),
 ('HP057',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_HP057.xlsx'),
 ('JC001',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC001.xlsx'),
 ('JC002',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC002.xlsx'),
 ('JC003',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC003.xlsx'),
 ('JC004',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC004.xlsx'),
 ('JC005',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC005.xlsx'),
 ('JC006',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC006.xlsx'),
 ('JC007',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC007.xlsx'),
 ('JC008',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC008.xlsx'),
 ('JC009',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC009.xlsx'),
 ('JC010',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC010.xlsx'),
 ('JC011',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC011.xlsx'),
 ('JC012',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC012.xlsx'),
 ('JC013',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC013.xlsx'),
 ('JC014',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC014.xlsx'),
 ('JC015',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC015.xlsx'),
 ('JC016',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC016.xlsx'),
 ('JC017',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC017.xlsx'),
 ('JC018',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC018.xlsx'),
 ('JC019',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC019.xlsx'),
 ('JC020',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC020.xlsx'),
 ('JC021',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC021.xlsx'),
 ('JC022',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC022.xlsx'),
 ('JC023',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_JC023.xlsx'),
 ('UR044',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR044.xlsx'),
 ('UR144',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR144.xlsx'),
 ('UR190',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR190.xlsx'),
 ('UR193',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR193.xlsx'),
 ('UR196',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR196.xlsx'),
 ('UR206',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR206.xlsx'),
 ('UR209',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR209.xlsx'),
 ('UR251',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR251.xlsx'),
 ('UR252',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR252.xlsx'),
 ('UR253',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR253.xlsx'),
 ('UR254',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR254.xlsx'),
 ('UR257',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR257.xlsx'),
 ('UR258',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR258.xlsx'),
 ('UR259',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR259.xlsx'),
 ('UR260',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR260.xlsx'),
 ('UR261',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR261.xlsx'),
 ('UR262',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR262.xlsx'),
 ('UR263',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR263.xlsx'),
 ('UR266',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR266.xlsx'),
 ('UR267A',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR267A.xlsx'),
 ('UR267B',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR267B.xlsx'),
 ('UR268',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR268.xlsx'),
 ('UR269',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR269.xlsx'),
 ('UR270',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR270.xlsx'),
 ('UR271',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR271.xlsx'),
 ('UR272',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR272.xlsx'),
 ('UR273A',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR273A.xlsx'),
 ('UR273B',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR273B.xlsx'),
 ('UR274A',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR274A.xlsx'),
 ('UR275',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR275.xlsx'),
 ('UR276',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR276.xlsx'),
 ('UR277',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR277.xlsx'),
 ('UR278',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR278.xlsx'),
 ('UR279',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR279.xlsx'),
 ('UR280',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR280.xlsx'),
 ('UR281',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR281.xlsx'),
 ('UR284',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR284.xlsx'),
 ('UR288',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR288.xlsx'),
 ('UR293',
  '/Users/ashokkhosla/Desktop/Khipu/research/Clindaniel/Code and Data for DASH/Data/Individual_All_Khipus/Khipu_UR293.xlsx')]

So we lose UR292A. Additionally, as we will see in a bit, UR280 requires special, by-hand, attention. So in total we are able to reconstruct, automatically, 80 of the broken khipus.

Code
reconciliable_khipus = []
reconciliable_names = kq.recoverable_khipus() + kq.funky_khipus()
reconciliable_khipus = [(name, filename) for (name, filename) in recoverable_files if name in reconciliable_names]
len(reconciliable_khipus); 
#reconciliable_khipus
Code
import xlrd
import csv

def convert_excel_file_to_csv(input_xlsx_file, output_csv_file):
    with xlrd.open_workbook(input_xlsx_file) as wb:
        sheet = wb.sheet_by_index(0)  # wb.sheet_by_name('sheet_name')
        with open(output_csv_file, 'w', newline="") as f:
            col = csv.writer(f)
            for row in range(sheet.nrows):
                col.writerow(sheet.row_values(row))
            
for khipu_name, filename in reconciliable_khipus:
    input_xlsx_file = filename
    output_csv_file = f"{exceldir}/CSV/{kq.legal_filename(khipu_name)}.csv"
    convert_excel_file_to_csv(input_xlsx_file, output_csv_file)

The first order of business for each of these files is giving them a “level-name” like the existing drawings. The first requirement is labeling pendants vs subs.

Code
problem_files = kq.funky_khipus()
def label_cords(aKhipuName):
    try:
        csv_file = f"{exceldir}/CSV/{kq.legal_filename(aKhipuName)}.csv"
        the_khipu_df = pd.read_csv(csv_file)
    except:
        print(f"label_cords({aKhipuName}) => Unable to find file {csv_file}")
        return False
    
    is_subsidiary_mask = [attach_pos > 0 for attach_pos in the_khipu_df['attach position'].values]
    if not 'is_subsidiary' in the_khipu_df.columns:
        the_khipu_df.insert(1, 'is_subsidiary', is_subsidiary_mask)
    level_names = []

    for index, is_subsidiary in enumerate(is_subsidiary_mask):
        suffix = int(list(the_khipu_df['element pos-n'].values)[index])
        is_pendant = not is_subsidiary
        if is_pendant:
            level_name = f"p{suffix}"
            if level_name in level_names:
                is_subsidiary = True
                the_khipu_df.iloc[index].is_subsidiary = True
            else:
                last_prefix = [level_name]
                last_state = 'pendant'
                
        if is_subsidiary and last_state=='pendant':
            last_prefix.append(f"s{suffix}")
            level_name = "".join(last_prefix)
            last_suffix = suffix
            last_state = 'subsidiary'
        elif is_subsidiary and last_state=='subsidiary':
            if suffix == last_suffix + 1:
                last_prefix.pop()
                last_prefix.append(f"s{suffix}")
            elif suffix < last_suffix + 1:
                last_prefix.append(f"s{suffix}")
            else:
                last_prefix.pop()
                last_prefix.pop()
                last_prefix.append(f"s{suffix}")
                if aKhipuName in problem_files:
                    print(f"??? - {aKhipuName}-{index}:{suffix}")

            level_name = "".join(last_prefix)
            last_suffix = suffix
            last_state = 'subsidiary'
        level_names.append(level_name)

    if (len(level_names)) != len(set(level_names)):
        success = False
        # print(f"WARNING- nonunique level names for khipu {aKhipuName}")
    else:
        success = True
        
    if not 'level_name' in the_khipu_df.columns:
        the_khipu_df.insert(1, 'level_name', level_names)
    else:
        the_khipu_df['level_name'] = level_names
        
    # Sometimes last subsidiary_state is wrong...
    # So let's recheck
    is_subsidiary_mask = ['s' in level_name for level_name in level_names]
    for column_name in the_khipu_df.columns:
        if column_name.startswith("Unnamed"):
            the_khipu_df.pop(column_name)

    if not 'is_subsidiary' in the_khipu_df.columns:
        the_khipu_df.insert(1, 'is_subsidiary', is_subsidiary_mask)
    else:
        the_khipu_df.pop('is_subsidiary')
        the_khipu_df.insert(2, 'is_subsidiary', is_subsidiary_mask)
        
     
    the_khipu_df.to_csv(csv_file)
    return success
    
labeling_failures = []
labeling_attempts = [label_cords(aKhipuName) for aKhipuName in reconciliable_names]
for index, success in enumerate(labeling_attempts): 
    if not success: 
        labeling_failures.append(reconciliable_names[index])
if labeling_failures:
    print(f"Following {len(labeling_failures)} khipu files failed to produce good level names:\n\t{labeling_failures}")
??? - HP045-71:3
??? - UR253-28:18
??? - UR253-64:10
??? - UR253-68:37
??? - UR280-8:5
??? - UR280-30:3
??? - UR280-34:3
??? - UR280-46:3
??? - UR280-48:4
??? - UR280-50:5
??? - UR280-52:6
??? - UR280-62:5
??? - UR280-103:4
??? - UR280-111:4
??? - UR280-151:5
??? - UR280-185:3
??? - UR280-191:3
??? - UR280-198:3
label_cords(UR292 A) => Unable to find file /Users/ashokkhosla/Desktop/Khipu/fieldguide/data/XLS/CSV/UR292_A.csv
Following 2 khipu files failed to produce good level names:
    ['UR280', 'UR292 A']

HP045, UR253, and UR280 fails with this code. Let’s examine UR280:

Code
csv_file = f"{exceldir}/CSV/UR280.csv"
the_khipu_df = pd.read_csv(csv_file)
the_label_names = list(the_khipu_df.level_name.values)
from collections import Counter
Counter(the_label_names).most_common(10)
[('p6s1s2s3', 2),
 ('p6s1s2s4', 2),
 ('p6s1s2s5', 2),
 ('p7s2s1s3', 2),
 ('p7s2s1s4', 2),
 ('p7s2s1s4s1', 2),
 ('p1', 1),
 ('p1s1', 1),
 ('p1s2', 1),
 ('p1s2s1', 1)]

Looking at the actual CSV file, I can see why the algorithm fails. I also don’t see any easy way to solve this, other than hand editing and fixing it. So that’s what we will do. This leaves with a list of 80 recoverable files:

Ashok goes off and hand-edits UR280 Maybe…

Code
# For now..
reconciliable_names = [name for name in reconciliable_names if name != 'UR280']
print(reconciliable_names)
['HP009', 'HP033', 'HP034', 'HP036', 'HP037', 'HP038', 'HP039', 'HP040', 'HP041', 'HP042', 'HP043', 'HP044', 'HP046 A', 'HP046 B', 'HP047', 'HP051 A', 'HP053', 'HP054', 'HP057', 'JC001', 'JC002', 'JC003', 'JC004', 'JC005', 'JC006', 'JC007', 'JC008', 'JC009', 'JC010', 'JC011', 'JC012', 'JC013', 'JC014', 'JC015', 'JC016', 'JC017', 'JC018', 'JC019', 'JC020', 'JC021', 'JC022', 'JC023', 'UR190', 'UR193', 'UR196', 'UR206', 'UR209', 'UR251', 'UR252', 'UR254', 'UR257', 'UR258', 'UR259', 'UR260', 'UR261', 'UR262', 'UR263', 'UR266', 'UR267A', 'UR267B', 'UR268', 'UR269', 'UR270', 'UR271', 'UR272', 'UR273A', 'UR273B', 'UR274A', 'UR275', 'UR276', 'UR277', 'UR278', 'UR279', 'UR281', 'UR284', 'UR288', 'UR293', 'HP045', 'HP048', 'HP055', 'UR044', 'UR144', 'UR253', 'UR292 A']

The last thing we have to do is ensure we at least have pendant matches in the reconciliable khipu. That way we can use any cord cluster data that successfully came through in the SQL database.

Code
khipu_dict, all_khipus = kamayuq.fetch_khipus(allow_funky_khipus = True)
    
def csv_pendant_names(aKhipuName):
    try:
        csv_file = f"{exceldir}/CSV/{kq.legal_filename(aKhipuName)}.csv"
        the_khipu_df = pd.read_csv(csv_file)
    except:
        print(f"csv_pendant_names(): CSV_file for {aKhipuName} does not exist => {csv_file}")
        return []
    
    the_label_names = list(the_khipu_df.level_name.values)
    the_pendant_names = [x for x in the_label_names if not "s" in x]
    return the_pendant_names

def khipu_pendant_names(aKhipuName):
    try:
        khipu = khipu_dict[aKhipuName]
    except:
        print(f"Unable to retrieve khipu {aKhipuName} from khipu_kamayuq")
        return []
    
    # Try different combos of pendant cord extraction to validate dbase
    # pendant_cords = khipu.pendant_cords()
    # pendant_cords = [aCord for aCord in khipu.cc_cords() if ("p" in aCord.level_name) and (not "s" in aCord.level_name)]
    pendant_cords = ku.flatten_list([aCluster.pendant_cords() for aCluster in khipu[:]])
    the_pendant_names = [aCord.level_name for aCord in pendant_cords]
    return the_pendant_names

for aKhipuName in reconciliable_names:
    csv_names = csv_pendant_names(aKhipuName)
    khipu_names = khipu_pendant_names(aKhipuName)
    if set(csv_names) != set(khipu_names):
        print(f"DANG!: {aKhipuName} has {len(csv_names)} csv pendants and {len(khipu_names)} khipu_pendants")
        union_of_two = set(csv_names).union(set(khipu_names))
        intersection_of_two = set(csv_names).intersection(set(khipu_names))
        diff_of_two = sorted(list(union_of_two.difference(intersection_of_two)))
        print(f"Difference = {diff_of_two}")
        print(f"CSV pendants: ({len(csv_names)}): {csv_names}")
        print(f"Khipu pendants:  ({len(khipu_names)}): {khipu_names}")
    else:
        #print(f"SMILE: {aKhipuName} has {len(csv_pendant_names)} csv pendants and {len(khipu_pendant_names)} khipu_pendants")
        continue
Unable to retrieve khipu HP045 from khipu_kamayuq
DANG!: HP045 has 48 csv pendants and 0 khipu_pendants
Difference = ['p1', 'p10', 'p11', 'p12', 'p13', 'p14', 'p15', 'p16', 'p17', 'p18', 'p19', 'p2', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p26', 'p27', 'p28', 'p29', 'p3', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p38', 'p39', 'p4', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48', 'p5', 'p6', 'p7', 'p8', 'p9']
CSV pendants: (48): ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9', 'p10', 'p11', 'p12', 'p13', 'p14', 'p15', 'p16', 'p17', 'p18', 'p19', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p26', 'p27', 'p28', 'p29', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p38', 'p39', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48']
Khipu pendants:  (0): []
Unable to retrieve khipu HP048 from khipu_kamayuq
DANG!: HP048 has 8 csv pendants and 0 khipu_pendants
Difference = ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8']
CSV pendants: (8): ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8']
Khipu pendants:  (0): []
Unable to retrieve khipu HP055 from khipu_kamayuq
DANG!: HP055 has 189 csv pendants and 0 khipu_pendants
Difference = ['p1', 'p10', 'p100', 'p101', 'p102', 'p103', 'p104', 'p105', 'p106', 'p107', 'p108', 'p109', 'p11', 'p110', 'p111', 'p112', 'p113', 'p114', 'p115', 'p116', 'p117', 'p118', 'p119', 'p12', 'p120', 'p121', 'p122', 'p123', 'p124', 'p125', 'p126', 'p127', 'p128', 'p129', 'p13', 'p130', 'p131', 'p132', 'p133', 'p134', 'p135', 'p136', 'p137', 'p138', 'p139', 'p14', 'p140', 'p141', 'p142', 'p143', 'p144', 'p145', 'p146', 'p147', 'p148', 'p149', 'p15', 'p150', 'p151', 'p152', 'p153', 'p154', 'p155', 'p156', 'p157', 'p158', 'p159', 'p16', 'p160', 'p161', 'p162', 'p163', 'p164', 'p165', 'p166', 'p167', 'p168', 'p169', 'p17', 'p170', 'p171', 'p172', 'p173', 'p174', 'p175', 'p176', 'p177', 'p178', 'p179', 'p18', 'p180', 'p181', 'p182', 'p183', 'p184', 'p185', 'p186', 'p187', 'p188', 'p189', 'p19', 'p2', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p26', 'p27', 'p28', 'p29', 'p3', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p38', 'p39', 'p4', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48', 'p49', 'p5', 'p50', 'p51', 'p52', 'p53', 'p54', 'p55', 'p56', 'p57', 'p58', 'p59', 'p6', 'p60', 'p61', 'p62', 'p63', 'p64', 'p65', 'p66', 'p67', 'p68', 'p69', 'p7', 'p70', 'p71', 'p72', 'p73', 'p74', 'p75', 'p76', 'p77', 'p78', 'p79', 'p8', 'p80', 'p81', 'p82', 'p83', 'p84', 'p85', 'p86', 'p87', 'p88', 'p89', 'p9', 'p90', 'p91', 'p92', 'p93', 'p94', 'p95', 'p96', 'p97', 'p98', 'p99']
CSV pendants: (189): ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9', 'p10', 'p11', 'p12', 'p13', 'p14', 'p15', 'p16', 'p17', 'p18', 'p19', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p26', 'p27', 'p28', 'p29', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p38', 'p39', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48', 'p49', 'p50', 'p51', 'p52', 'p53', 'p54', 'p55', 'p56', 'p57', 'p58', 'p59', 'p60', 'p61', 'p62', 'p63', 'p64', 'p65', 'p66', 'p67', 'p68', 'p69', 'p70', 'p71', 'p72', 'p73', 'p74', 'p75', 'p76', 'p77', 'p78', 'p79', 'p80', 'p81', 'p82', 'p83', 'p84', 'p85', 'p86', 'p87', 'p88', 'p89', 'p90', 'p91', 'p92', 'p93', 'p94', 'p95', 'p96', 'p97', 'p98', 'p99', 'p100', 'p101', 'p102', 'p103', 'p104', 'p105', 'p106', 'p107', 'p108', 'p109', 'p110', 'p111', 'p112', 'p113', 'p114', 'p115', 'p116', 'p117', 'p118', 'p119', 'p120', 'p121', 'p122', 'p123', 'p124', 'p125', 'p126', 'p127', 'p128', 'p129', 'p130', 'p131', 'p132', 'p133', 'p134', 'p135', 'p136', 'p137', 'p138', 'p139', 'p140', 'p141', 'p142', 'p143', 'p144', 'p145', 'p146', 'p147', 'p148', 'p149', 'p150', 'p151', 'p152', 'p153', 'p154', 'p155', 'p156', 'p157', 'p158', 'p159', 'p160', 'p161', 'p162', 'p163', 'p164', 'p165', 'p166', 'p167', 'p168', 'p169', 'p170', 'p171', 'p172', 'p173', 'p174', 'p175', 'p176', 'p177', 'p178', 'p179', 'p180', 'p181', 'p182', 'p183', 'p184', 'p185', 'p186', 'p187', 'p188', 'p189']
Khipu pendants:  (0): []
Unable to retrieve khipu UR044 from khipu_kamayuq
DANG!: UR044 has 54 csv pendants and 0 khipu_pendants
Difference = ['p1', 'p10', 'p11', 'p12', 'p13', 'p14', 'p15', 'p16', 'p17', 'p18', 'p19', 'p2', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p26', 'p27', 'p28', 'p29', 'p3', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p38', 'p39', 'p4', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48', 'p49', 'p5', 'p50', 'p51', 'p52', 'p53', 'p54', 'p6', 'p7', 'p8', 'p9']
CSV pendants: (54): ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9', 'p10', 'p11', 'p12', 'p13', 'p14', 'p15', 'p16', 'p17', 'p18', 'p19', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p26', 'p27', 'p28', 'p29', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p38', 'p39', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48', 'p49', 'p50', 'p51', 'p52', 'p53', 'p54']
Khipu pendants:  (0): []
Unable to retrieve khipu UR144 from khipu_kamayuq
DANG!: UR144 has 96 csv pendants and 0 khipu_pendants
Difference = ['p1', 'p10', 'p11', 'p12', 'p13', 'p14', 'p15', 'p16', 'p17', 'p18', 'p19', 'p2', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p26', 'p27', 'p28', 'p29', 'p3', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p38', 'p39', 'p4', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48', 'p49', 'p5', 'p50', 'p51', 'p52', 'p53', 'p54', 'p55', 'p56', 'p57', 'p58', 'p59', 'p6', 'p60', 'p61', 'p62', 'p63', 'p64', 'p65', 'p66', 'p67', 'p68', 'p69', 'p7', 'p70', 'p71', 'p72', 'p73', 'p74', 'p75', 'p76', 'p77', 'p78', 'p79', 'p8', 'p80', 'p81', 'p82', 'p83', 'p84', 'p85', 'p86', 'p87', 'p88', 'p89', 'p9', 'p90', 'p91', 'p92', 'p93', 'p94', 'p95', 'p96']
CSV pendants: (96): ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9', 'p10', 'p11', 'p12', 'p13', 'p14', 'p15', 'p16', 'p17', 'p18', 'p19', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p26', 'p27', 'p28', 'p29', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p38', 'p39', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48', 'p49', 'p50', 'p51', 'p52', 'p53', 'p54', 'p55', 'p56', 'p57', 'p58', 'p59', 'p60', 'p61', 'p62', 'p63', 'p64', 'p65', 'p66', 'p67', 'p68', 'p69', 'p70', 'p71', 'p72', 'p73', 'p74', 'p75', 'p76', 'p77', 'p78', 'p79', 'p80', 'p81', 'p82', 'p83', 'p84', 'p85', 'p86', 'p87', 'p88', 'p89', 'p90', 'p91', 'p92', 'p93', 'p94', 'p95', 'p96']
Khipu pendants:  (0): []
DANG!: UR253 has 57 csv pendants and 57 khipu_pendants
Difference = ['p56', 'p57']
CSV pendants: (57): ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9', 'p10', 'p11', 'p12', 'p13', 'p14', 'p15', 'p16', 'p17', 'p18', 'p19', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p26', 'p27', 'p28', 'p29', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p38', 'p39', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48', 'p49', 'p50', 'p51', 'p52', 'p53', 'p54', 'p55', 'p56', 'p57']
Khipu pendants:  (57): ['p1', 'p2', 'p3', 'p4', 'p5', 'p6', 'p7', 'p8', 'p9', 'p10', 'p11', 'p12', 'p13', 'p14', 'p15', 'p16', 'p17', 'p18', 'p18', 'p19', 'p20', 'p21', 'p22', 'p23', 'p24', 'p25', 'p26', 'p27', 'p28', 'p29', 'p30', 'p31', 'p32', 'p33', 'p34', 'p35', 'p36', 'p37', 'p37', 'p38', 'p39', 'p40', 'p41', 'p42', 'p43', 'p44', 'p45', 'p46', 'p47', 'p48', 'p49', 'p50', 'p51', 'p52', 'p53', 'p54', 'p55']
csv_pendant_names(): CSV_file for UR292 A does not exist => /Users/ashokkhosla/Desktop/Khipu/fieldguide/data/XLS/CSV/UR292_A.csv
Unable to retrieve khipu UR292 A from khipu_kamayuq

Alas, the khipu in the Harvard Khipu DB for HP045 has three pendants labeled p35. So the reconciliation scheme won’t work. Also, the khipu in the Harvard Khipu DB for UR255 has two pendants named p37, so that also won’t work under the current reconciliation scheme.

Code
reconciliable_names = [name for name in reconciliable_names if (name != 'HP045') and (name != 'UR255')]
print(reconciliable_names)
['HP009', 'HP033', 'HP034', 'HP036', 'HP037', 'HP038', 'HP039', 'HP040', 'HP041', 'HP042', 'HP043', 'HP044', 'HP046 A', 'HP046 B', 'HP047', 'HP051 A', 'HP053', 'HP054', 'HP057', 'JC001', 'JC002', 'JC003', 'JC004', 'JC005', 'JC006', 'JC007', 'JC008', 'JC009', 'JC010', 'JC011', 'JC012', 'JC013', 'JC014', 'JC015', 'JC016', 'JC017', 'JC018', 'JC019', 'JC020', 'JC021', 'JC022', 'JC023', 'UR190', 'UR193', 'UR196', 'UR206', 'UR209', 'UR251', 'UR252', 'UR254', 'UR257', 'UR258', 'UR259', 'UR260', 'UR261', 'UR262', 'UR263', 'UR266', 'UR267A', 'UR267B', 'UR268', 'UR269', 'UR270', 'UR271', 'UR272', 'UR273A', 'UR273B', 'UR274A', 'UR275', 'UR276', 'UR277', 'UR278', 'UR279', 'UR281', 'UR284', 'UR288', 'UR293', 'HP048', 'HP055', 'UR044', 'UR144', 'UR253', 'UR292 A']

And we’re good to go! Using the Excel files as guides, we should be able to recover 51 files out of the 82 funky khipu (about 60%). This will grow the KDB from 510 good khipu to 561 - 10%!

Code
#import excel_khipu
#excel_khipu.reconcile_recoverable_khipus(reconciliable_names)

This strategy works for the “funky” khipus. Are there other files that might be salvageable from their Excel records?

Code
# Are there other files that might be salvageable?
# Yes - 11...
CSV_dir = kq.qollqa_data_directory()
deleted_khipus_df = pd.read_csv(f"{CSV_dir}deleted_khipus.csv")
#dtale.show(deleted_khipus_df)
deleted_khipu_names = list(deleted_khipus_df.name.values)
one_sheet_files = [name for name in deleted_khipu_names if os.path.exists(f"{exceldir}/{name}.xlsx")] 
multi_sheet_files = [name for name in deleted_khipu_names if os.path.exists(f"{exceldir}/{name}.xls")] 
(len(one_sheet_files), len(multi_sheet_files))
print(f"({len(one_sheet_files)}) One sheet salvageable files:")
print(one_sheet_files)
print(f"({len(multi_sheet_files)}) Multi sheet salvageable files:")
print(sorted(multi_sheet_files))
(1, 11)
(1) One sheet salvageable files:
['UR280']
(11) Multi sheet salvageable files:
['HP025', 'HP026', 'HP028', 'UR044', 'UR044', 'UR070', 'UR071', 'UR082', 'UR144', 'UR144', 'UR158']
Code
multi_sheet_paths = " ".join([f"{exceldir}/{name}.xls" for name in multi_sheet_files])
os.system(f"open {multi_sheet_paths}")
0

So only 3 files/khipus are worth reconstructing. And the three are not very useful on first review. Punt for now.