Visualizing Lexical Distance in Three Dimensions

If you do not know Stephan Steinbach’s blog Alternative Transport, check it out. Among subjects that interest me, Stephan covers data visualization. At one time, a picture from his post on lexical distance among languages of Europe went viral. Since then, he has been pondering ways to improve the illustration by including more languages, positioning them more scientifically, and going into the third dimension. Eventually, we decided to co-operate in this endeavour. Yesterday, Stephan published his article on the dimensional limits of graphs and today, I am publishing this post.

Without more ado, here is the visualization of lexical distance among 210 living and extinct languages coming mainly from the Old World.

210-Languages

Here is what it took to make this visualization:

  • Get Vincent Beaufils’s list of 18 most stable word stems: “eye”, “ear”, “nose”, “hand”, “tongue”, “tooth”, “death”, “water”, “sun”, “wind”, “night”, “two”, “three”, “four”, “I”, “you”, “who”, and “name” for the languages in question.
  • For each pair of languages, add up the Brown–Holman–Wichmann distance between all pairs of corresponding words. This kind of distance takes into account 692 correspondences between consonants and vowels that recur among world’s languages.
  • Apply multidimensional scaling to the matrix of distances. This approach minimizes the sum of squared differences between the input and output distances. Take three initial dimensions of the result.
  • Using Matplotlib, draw the labels of the languages in three dimensions. Move the viewpoint with uniform speed along the tennis ball seam as defined parametrically by López-López:
    x(t) = (1 − b)sin t + b sin 3t
    y(t) = (1 − b)cos t − b cos 3t
    z(t) = √4(1 − b)b cos 2t
    Following López-López, set b = 0.20.
  • The closer the label of a language is to the viewpoint, the less transparent make the label. Concretely, vary the alpha channel of the labels from 0.05 far away from the viewpoint to 1.0 close to it.

As a bonus, here are visualizations of lexical distance among languages of the Indo–European family and its Germanic, Italic, and Romance branches.

Indo-European
Germanic

Italic-and-Romance

The colors come from Ethan Schoonover’s Solarized palette. The elegant narrow font is Delicious, a free font from exljbris Font Foundry.

Advertisements
Visualizing Lexical Distance in Three Dimensions

One thought on “Visualizing Lexical Distance in Three Dimensions

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.