| Analysis | |
|---|---|
| Morphology | Distribution of nominal morphemes over EG and ES vocabulary |
| Predictability by word | Concordances, 3D scatterplots and statistics on word embeddings |
| Predictability by text | Heatmaps, Oracc links |
| More info | Information about the process and data. |
Acknowledgements: Niek Veldhuis, Steve Tinney, Noah Kröll, Sebastian Fink, Krister Lindén (PI) | |
generated with emesal_vectors.py -- asahala 2022
def demo():
PREFIX = '2nd_mill_'
threshold = 10
#purge(PREFIX)
window = 10 # prediction context window
vector_window = 3 # vector window
split_lines = True
dataset = [text for text in EmesalFinder.find_texts()
if text.millennium == '2nd' and
text.word_count > 30 and
text.emesal_ratio >= 0.1 and
text.lacuna_ratio <= 0.33]
#batch_process(n=5, threshold=threshold, data=dataset, filename=PREFIX, vector_window=vector_window, window=window, split_lines=split_lines)
#general_vectors_makedata(PREFIX, vector_window=vector_window)
#general_vectors_similarities(PREFIX)
predict_emesal(PREFIX, dataset, window=window)
generate_statistics(PREFIX, dataset, threshold)
generate_morphology_table(PREFIX, dataset, threshold)
demo()