Emesal proto

Origins of Emesal // University of Helsinki

Analysis
Morphology	Distribution of nominal morphemes over EG and ES vocabulary
Predictability by word	Concordances, 3D scatterplots and statistics on word embeddings
Predictability by text	Heatmaps, Oracc links
More info	Information about the process and data.
Acknowledgements: Niek Veldhuis, Steve Tinney, Noah Kröll, Sebastian Fink, Krister Lindén (PI)

generated with emesal_vectors.py -- asahala 2022

def demo():
    PREFIX = '2nd_mill_'
    threshold = 10
    #purge(PREFIX)

    window = 10 # prediction context window
    vector_window = 3 # vector window
    split_lines = True

    dataset = [text for text in EmesalFinder.find_texts()
               if text.millennium == '2nd' and
               text.word_count > 30 and
               text.emesal_ratio >= 0.1 and
               text.lacuna_ratio <= 0.33]

    #batch_process(n=5, threshold=threshold, data=dataset, filename=PREFIX, vector_window=vector_window, window=window, split_lines=split_lines)
    #general_vectors_makedata(PREFIX, vector_window=vector_window)
    #general_vectors_similarities(PREFIX)
    predict_emesal(PREFIX, dataset, window=window)
    generate_statistics(PREFIX, dataset, threshold)
    generate_morphology_table(PREFIX, dataset, threshold)

demo()