Here are the results of a small study to calculate the distance between words in english and other languages. The way the computation is done is by going through a list of basic english words, using the Google Translate API to get translations into other languages and finally computing a levenshtein between each English/translated pair of words. The final distance is an average.

This only looks at the spelling words, the next step is to look at their phonemes.

Feel free to use the datasets bellow and please let me know what you’re working on :)

language      distance from english
Swedish 63.88%
Danish 66.69%
Dutch 66.78%
French 69.31%
German 72.27%
Italian 76.89%
Spanish 82.14%
Albanian 88.61%
Croatian 90.74%
Estonian 91.45%
Polish 92.48%
Hungarian 102.2%