Here are the results of a small study to calculate the distance between words in english and other languages. The way the computation is done is by going through a list of basic english words, using the Google Translate API to get translations into other languages and finally computing a levenshtein between each English/translated pair of words. The final distance is an average.
This only looks at the spelling words, the next step is to look at their phonemes.
Feel free to use the datasets bellow and please let me know what you’re working on 🙂
language | distance from english |
Swedish | 63.88% |
Danish | 66.69% |
Dutch | 66.78% |
French | 69.31% |
German | 72.27% |
Italian | 76.89% |
Spanish | 82.14% |
Albanian | 88.61% |
Croatian | 90.74% |
Estonian | 91.45% |
Polish | 92.48% |
Hungarian | 102.2% |