Reproducing the AuthaGraph World Map

The AuthaGraph world map by Hajime Narukawa took the media by storm when it was honoured with the Japan Good Design Grand Award in October 2016. After several rounds of copying the news, the journalists dubbed it “the most accurate world map ever”, comparing it with the usual straw man, Mercator projection. Hype aside, the facts are: the map almost avoids the chopping of land, almost has the equal-area property, and does a decent job with land shapes. Its large distortions near the vertices of the basal tetrahedron and drastic discontinuities along the boundary are masked by the ocean.

Since the map projection is not public, I set out to develop its lookalike with PROJ.4. My way is simpler than the AuthaGraph method at the cost of losing the proportions of areas that AuthaGraph claims to preserve. You can compare the results below. Among the most visible changes, my Iceland lies on the European side of the map, Australia properly looks smaller than China, and Brazil’s shape is distorted differently. Given the overall distortions of AuthaGraph, treating Earth as an ellipsoid would not buy us much, so my method assumes it is a sphere. The code is available on Bitbucket.

Original AuthaGraph

pseudo-AuthaGraph

The top border of the original AuthaGraph map corresponds to a great-circle arc that separates Africa from Iceland. If the border passed through the North Pole, it would have to follow a meridian. Any meridian near Africa with westernmost longitude 17° 33′ W and Iceland with easternmost longitude 13° 16′ W must pass through either land. As the border avoids land, it must not pass through the North Pole, which is projected inside the map. Therefore, in spite of the claims about tiling the plane with the map, juxtaposing two maps yields a map with two North Poles close to each other. Incidentally, even the Bering Strait is blocked for a land-avoiding meridian by St. Lawrence Island and Unalaska.

The map of the Arctic below shows the possible locations of the northern vertex with a land-avoiding top border. My northern vertex lies around 73° N 121° E, near the delta of Lena.

arctic

Another constraint is to keep Australia and New Zealand in the middle triangle of the map. The original map does not fulfill it entirely: the Stewart Island lies in its Antarctic triangle. Therefore, I based the projection on a disphenoid, an irregular tetrahedron that can be rotated, scaled, and translated so that its vertices have Cartesian coordinates (+a, +b, +c), (+a, -b, -c), (-a, +b, -c), and (-a, -b, +c). A disphenoid can still be developed to a parallelogram so we can cut a right triangle or trapezoid near the acute-angle vertex of its net and attach it to the opposite side, making the map rectangular.

My approach to projecting a point P from a sphere onto a plane begins with finding the spherical triangle ABC, ABD, ACD, or BCD that contains P.

Given a tetrahedron KLMN inscribed into the unit sphere, the point P lies inside the spherical triangle KLM if \left(P \cdot (K \times L)\right)\left(N \cdot (K \times L)\right) \le 0, \left(P \cdot (K \times M)\right)\left(N \cdot (K \times M)\right) \le 0, and \left(P \cdot (L \times M)\right)\left(N \cdot (L \times M)\right) \le 0.

I considered several ways to map spherical triangles onto planar triangles, keeping in mind that the projection should be continuous on the edges shared by two triangles and smooth in their interiors. Here are my attempts:

  • Barycentric coordinates.

    The barycentric coordinates of point P with respect to a spherical or planar triangle KLM are the ratios of areas of spherical or planar triangles: (\Delta PLM / \Delta KLM, \Delta KPM / \Delta KLM, \Delta KLP / \Delta KLM).

    I was finding a point P' with planar barycentric coordinates corresponding to the spherical barycentric coordinates of P. Unfortunately, this elegant mapping is continuous on the shared edge of two triangles only if they are symmetric with respect to this edge. In an irregular tetrahedron, this is not the case.

  • Gnomonic projection, that is casting from the centre of the sphere onto the plane KLM. Although this projection is continuous on the edges of triangles, I find its distortion unacceptable.
  • Find points K', L', and M' on the sphere, being the intersections of great circles LM with KP, KM with LP, and KL with MP. Express the location of point P as a triple \left(\lvert KM'\rvert / \lvert KL \rvert, \lvert LK'\rvert / \lvert LM \rvert, \lvert ML'\rvert / \lvert MK \rvert\right). On a plane, the three lines that correspond to this triple in general do not cross in one point. Treating the centre of the intersection triangle as P' leads to discontinuities along the edges. I took the incentre of the intersection triangle instead.

The above method produces the map shown below, with sharp turns on the edges of the basal tetrahedron. I smoothed the joints by interpolation.

authagraph1

The cheetah spots on the map below are Tissot indicatrices for the projection, magnified images of infinitesimal circles on Earth’s surface.

The Tissot indicatrix is an ellipse with scale factor along the meridian h = \sqrt{(\partial x/\partial\phi)^2 + (\partial y/\partial\phi)^2}/R, scale factor along the parallel k = \sqrt{(\partial x/\partial\lambda)^2 + (\partial y/\partial\lambda)^2}/(R\cos\phi), and the angle \theta' at which the meridian and parallel intersect given by \sin\theta' = \left((\partial y/\partial\phi)(\partial x/\partial\lambda) - (\partial x/\partial\phi)(\partial y/\partial\lambda)\right)/(R^2 hk\cos\phi). The angular distortion is equal to \lvert\pi/2 - \theta'\rvert.

tissot

Using the Nelder–Mead method, I numerically minimized the total angular distortion in 461 approximately equally spaced points located on land while keeping the borders of the map at least 20 kilometres from any land. With respect to these conditions, the optimal parameters of the projection turned out to be:

  • The coordinates of the northern vertex: 73.10° N 120.96° E.
  • The rotation of the tetrahedron: −23.56°.
  • The coordinates of the vertices: b = 0.763 a, c = 0.963 a.
  • The vertical scale factor: 1.03.

These parameters yield a rectangle whose sides are in ratio 4:1.96. To separate Antarctica from Africa, I cut the map vertically at x = 1.6.

With a forward projection, you can make vector maps. For making raster maps, you need the inverse projection. As inverting the equations of the projection algebraically is a hopeless task, I resorted to the numerical Broyden’s method. The initial guess of (\phi, \lambda) = f^{-1}(x, y) comes from inverting the gnomonic projection. When the accuracy is set to 10-7 R, i.e. about one metre, the method usually finds \phi and \lambda after 5 iterations.

The map below is based on a public domain raster image from Natural Earth. The other maps in this post use public domain shapefiles from the same source.

earth

Reproducing the AuthaGraph World Map

Warping Maps

Polish railway network
In this warped map of Poland, straight line distance (emphatically not distance measured along the segments) between points in the network is approximately proportional to their rail distance. I do not have a good name for this kind of map. All I know is: if it showed the travel time, we would call it a time-space map. By the way, here is the unwarped version.

Interestingly, the map is the best possible in the sense of minimizing the squared error of the approximation. The rest of this article explains how to produce it.

  1. Scrape the data about Polish railway lines from http://semaforek.kolej.org.pl and put it into an SQLite database. Here is a sample:
    SELECT line, name, kind, metrage, another_line
    FROM Lines JOIN Names USING(name_id) JOIN Kinds USING(kind_id)
    WHERE line = 137 ORDER BY metrage LIMIT 21;
    
    137 | Katowice              | stacja węzłowa     |   381 |   1
    137 | Katowice              | stacja węzłowa     |   381 | 138
    137 | Katowice              | stacja węzłowa     |   381 | 139
    137 | Katowice              | stacja węzłowa     |   381 | 656
    137 | Katowice              | stacja węzłowa     |   381 | 713
    137 | Katowice Załęże       | przystanek osobowy |  2528 |
    137 | Katowice Towarowa KTC | stacja węzłowa     |  2913 | 713
    137 | Chorzów Batory        | stacja węzłowa     |  6166 | 131
    137 | Chorzów Batory        | stacja węzłowa     |  6166 | 164
    137 | Chorzów Batory        | stacja węzłowa     |  6166 | 713
    137 | Świętochłowice        | przystanek osobowy |  8413 |
    137 | Ruda Chebzie          | stacja węzłowa     | 11740 | 187
    137 | Ruda Chebzie          | stacja węzłowa     | 11740 | 189
    137 | Ruda Śląska           | przystanek osobowy | 14068 |
    137 | Zabrze                | stacja             | 18926 |
    137 | Gliwice               | stacja węzłowa     | 27100 | 141
    137 | Gliwice               | stacja węzłowa     | 27100 | 147
    137 | Gliwice               | stacja węzłowa     | 27100 | 168
    137 | Gliwice               | stacja węzłowa     | 27100 | 200
    137 | Gliwice               | stacja węzłowa     | 27100 | 671
    137 | Gliwice               | stacja węzłowa     | 27100 | 711
    
  2. Construct a list of 1838 edges between all terminal and junction nodes of the network, labelled with their length in metres:
    edges = [
        (u’Katowice’, u’Katowice Towarowa KTC’, 2532),
        (u’Katowice Towarowa KTC’, u’Chorzów Batory’, 3253),
        (u’Chorzów Batory’, u’Ruda Chebzie’, 5574),
        (u’Ruda Chebzie’, u’Gliwice’, 15360), ... ]
    
  3. Convert the list of edges into an adjacency matrix, putting infinite distance between nonadjacent vertices. The matrix has 1258 rows and columns.
  4. Compute the shortest distances between all pairs of vertices with the Floyd–Warshall algorithm, obtaining the so-called proximity matrix D.
  5. Perform multidimensional scaling (MDS) of the proximity matrix D, in its classical version invented by Torgerson in 1952:
    coords = sklearn.manifold.MDS(
        dissimilarity=’precomputed’).fit_transform(D)
    

    Under the hood, it would perform four steps:

    1. set up the matrix of squared proximities D(2)
      [if the ith point has coordinates xiyizi,…, then dij(2) = (xixj)2 + (yiyj)2 + (zizj)2 +⋯],
    2. perform the “double centring”: subtract the row mean and the column mean from the elements of D(2), add the total mean, and multiply by −1/2, obtaining matrix B
      [bij = xixj + yiyj + zizj +⋯],
    3. find the singular value decomposition of B into the product of three matrices USUT,
    4. take the initial two singular vectors U2 and two corresponding singular values S2 — the rows of matrix X = U2S21/2 give the coordinates of the points
      [the initial two coordinates computed by SVD follow the principal axes of B, the remaining coordinates are less important, so bijxixj + yiyj or in matrix form BXXT = U2S2U2T],

    were it not using another algorithm, called SMACOF (Scaling by Majorizing a Complicated Function).

  6. The obtained map is randomly oriented. Rotate it to move Gdańsk Główny (G) straight north of Katowice (KO).
  7. If Lublin (Lb) happens to lie west of the Katowice–Gdańsk Główny meridian, flip the X axis.

On my machine, steps 2–7 take about 20 seconds. The code is available on Bitbucket.

Warping Maps

Tracking Spanish Flu through Austro–Hungarian Press

The area of the red circles on this map is proportional to the smoothed relative number of mentions of influenza in newspapers throughout the Austro–Hungarian lands in the last four months of 1918. In the same vein as Google was detecting influenza epidemics using search engine query data, perhaps these numbers approximate the local timeseries of influenza morbidity.

Austria-Hungary

The 1918–1919 pandemic, called Spanish flu, killed at least five times more people than World War I. Here is a chart of its mortality in England and Wales (source: Edwin O. Jordan, Epidemic Influenza: A Survey, 1927):

weekly-mortality

The three waves occurred at similar times worldwide. I concentrated on the second, most deadly wave in the Austro–Hungarian empire and in the states that emerged in its lands. The full-text search of ANNO (AustriaN Newspapers Online, a digitisation initiative of the Austrian National Library) allowed me to count all the occurrences of the words {Grippe, Influenza}, {chřipka, chřipky, chřipce, chřipku, chřipkou} (in the Czech newspaper Dělnické listy), and {grypa, grypą, grypę, grypie, grypy, influenza, influenzy} (in the Polish newspaper Kurjer Lwowski) by the day of their publication in 27 newspapers from 14 cities and towns. A little side discovery is that the name “Spanish flu” in all three languages first appeared in print at the beginning of July 1918.

Despite the inevitable inaccuracy of this method, among others due to OCR errors and the shadowing of the epidemic by other topics like the surrender of Austria–Hungary, the end of the monarchy, and the independence of national states, noise tends to cancel out. I inspected all 29 mentions of flu in Kurjer Lwowski in the week of Oct 7–13, 1918. Out of them:

  • 19 concerned local events,
  • 4 concerned events elsewhere in Austria,
  • 5 concerned events elsewhere in Europe.

I fitted the Gompertz curve y = c \exp(-e^{-b(t-t_0)}) to the cumulative numbers, keeping in mind the fact that the issues of some newspapers from the end of the year are missing from ANNO (recall the independence). For instance, here is the best fit for the cumulative number of times flu was mentioned in newspapers published in Vienna:

Vienna1

and a full-year chart of mentions of flu per week in the same newspapers:

Vienna2

I tried also to verify Andrew Price–Smith’s claim that Spanish flu tipped the balance of power towards the Entente in the last days of World War I. Unfortunately, many belligerent states do not have online archives of newspapers from 1918, the German archives are unsearchable, and Gallica and the British Newspaper Archive are useless: the mentions of grippe or influenza in French and British press are few and far between outside advertisements, a sign of wartime censorship. However, this table from Epidemic Influenza: A Survey showing that the highest mortality co-occurred in France and Germany makes me doubt Price–Smith’s claim:

highest-weekly-mortality

The map of Austria–Hungary above comes from MPIDR [Max Planck Institute for Demographic Research] and CGG [Chair for Geodesy and Geoinformatics, University of Rostock] 2012: MPIDR Population History GIS Collection (slightly modified version of a GIS-File by Rumpler and Seger 2010) – Rostock (Rumpler, H. and Seger, M. 2010: Die Habsburgermonarchie 1848–1918. Band IX: Soziale Strukturen. 2. Teilband: Die Gesellschaft der Habsburgermonarchie im Kartenbild. Verwaltungs-, Sozial- und Infrastrukturen. Nach dem Zensus von 1910, Wien).

Tracking Spanish Flu through Austro–Hungarian Press

Slicing Earth, Carefully

On wet and dry great and small circles

Can you slice Earth flat through its centre while touching no land? If not, then which cut minimizes the length of sliced land? Which cut maximizes it? My favourite search engine returned only partial and inaccessible answers. Which planar cut of Earth that intersects no land is the largest? I have not even found this question online.

With right tools and some brute force, such problems prove easy. I used Cartopy for map projections and Shapely for geometric objects.

Assuming that Earth is a sphere, its intersection with the cutting plane is a great circle when the plane passes through the centre of Earth or a small circle otherwise.
Let us map the surface of Earth with the stereographic projection with centre at the point (lat, lon).

# The source projection for proj.transform_point()
# and proj.project_geometry(). This one serves as
# a no-op projection.
PLATE_CARREE = crs.PlateCarree()

# Small circles on a sphere are easy. I do not know
# how to specify their equivalent on an ellipsoid.
SPHERICAL_GLOBE = crs.Globe(ellipse='sphere')

EARTH_RADIUS_KM = 6370.997

def Project(lat, lon, land, klass=crs.Stereographic):
  proj = klass(
      central_longitude=lon,
      central_latitude=lat,
      globe=SPHERICAL_GLOBE)
  # Allow one metre of error. In fact, the error
  # is at most on the order of 1e-8 metres.
  assert math.hypot(*proj.transform_point(
      lon, lat, PLATE_CARREE)) <= 1.
  land_projected = [
      # buffer(0) fixes most problems with geometric
      # objects, such as self-crossing.
      proj.project_geometry(g, PLATE_CARREE).buffer(0)
      for g in land]
  return proj, land_projected

Centre in Krakow

The stereographic projection with centre at (50°N, 20°E)

It tempted me to use the azimuthal equidistant projection known from the United Nations emblem instead. It preserves the ratios of distances along the diameters, and the diameters correspond to great circles. So I could, in theory, get the answer by measuring the length of the intersection of straight lines with land. Unfortunately, in my tests, land on antipodes often disappeared after this projection, which made the intersection undermeasured.

The stereographic projection proved more robust, maybe because it cannot map the antipodes so their neighbourhood is cut from the map. One of circles concentric to its conventional boundary corresponds to a great circle, and the ratios of distances are preserved along it. I measured the length of its intersection with land. The (lat, lon) pairs that determine the centre of the projection were vertices of a more or less evenly spaced mesh covering the sphere.

The code that realizes the above idea is so short that I show it all:

NUM_VERTICES = 3600

def Circle(lat, lon, radius_deg, proj):
  # (lon, clat) is a vertex of a polygon that
  # approximates the circle.
  # The correction makes the perimeter of the polygon
  # equal to the circumference of the circle.
  # For a 3600-gon, the correction factor is approximately
  # 1 + 7.6e-7.
  # The nonlinearity of the projection is negligible.
  shift = radius_deg * math.pi / (
      NUM_VERTICES * math.sin(math.pi / NUM_VERTICES))
  clat = lat + shift
  if clat > 90:
    clat = 180 - clat
    lon = lon + 180
  r = math.hypot(*proj.transform_point(
      lon, clat, PLATE_CARREE))
  return geometry.LinearRing(
      [(r * math.sin(t), r * math.cos(t))
       for t in numpy.linspace(
       0, 2 * math.pi, NUM_VERTICES, endpoint=False)])

def LengthOfIntersectionInKm(
      great_circle, land_projected):
  # The geometric objects in |land_projected|
  # only overlap on their boundaries.
  return sum(
      g.intersection(great_circle).length
      for g in land_projected) / great_circle.length * (
      2 * math.pi * EARTH_RADIUS_KM)

def GreatCircleLandIntersection(lat, lon, land):
  proj, land_projected = Project(lat, lon, land)
  great_circle = Circle(lat, lon, 90, proj)
  try:
    return LengthOfIntersectionInKm(
        great_circle, land_projected)
  except geos.TopologicalError:
    # The exception is raised e.g. for (lat, lon) =
    # (6, -135) and land from 'ne_110m_land'.
    return -1.

def EvaluateForMeshPoints(
    fn, lat0, lat1, lon0, lon1, lat_step, land, log):
  for lat in numpy.linspace(
      lat0, lat1, abs(lat1 - lat0) / lat_step + 1):
    # Make the spacing of longitudes approximately
    # the same as the spacing of latitudes.
    endpoint = (abs(lon1 - lon0) != 360)
    num_lons = (
        abs(lon1 - lon0) / lat_step *
        math.cos(math.radians(lat)) + endpoint
        if abs(lat) != 90 else 1)
    for lon in numpy.linspace(
        lon0, lon1, num_lons, endpoint=endpoint):
      print >>log, lat, lon, fn(lat, lon, land)

def ReadShapes(filename):
  shapes = list(shapereader.Reader(filename).geometries())
  # 'ne_110m_land' contains 127 MultiPolygons.
  # 'ne_10m_land' contains 1 MultiPolygon with
  # 4063 Polygons.
  # Intersecting with multiple Polygons proved faster
  # than intersecting with one MultiPolygon.
  return shapes if len(shapes) > 1 else shapes[0]

def main():
  # For this crude approximation, use land data dubbed
  # 'scale 1:110 million', with simplified coastline
  # and no smaller islands. Read a local copy.
  land = ReadShapes('shapes/ne_110m_land')
  with open('great_circle_110m.txt', 'w') as log:
    EvaluateForMeshPoints(
        GreatCircleLandIntersection,
        0, 90, -180, +180,
        1, land, log)

On my machine, it was running for about an hour. Let us check out the circles with the shortest land intersection:

$ sort -k3 -n great_circle_110m.txt | head
4.0 -108.802228412 2637.84025153
6.0 -116.648044693 2675.44337382
6.0 -114.636871508 2706.60790556
4.0 -109.805013928 2747.17793236
6.0 -115.642458101 2784.60240991
5.0 -114.636871508 2789.91418816
5.0 -113.631284916 2839.06706021
6.0 -112.625698324 2868.19346786
6.0 -113.631284916 2880.62812684
4.0 -110.807799443 2939.329449

and those with the longest land intersection:

$ sort -k3 -n -r great_circle_110m.txt | head
4.0 -154.930362117 22662.3962712
3.0 -154.930362117 22560.3219242
3.0 -155.933147632 22438.262132
4.0 -153.927576602 22430.6255353
3.0 -153.927576602 22401.2028276
4.0 -155.933147632 22367.8729212
5.0 -153.854748603 22325.8557016
5.0 -152.849162011 22318.0202642
6.0 -153.854748603 22241.6673472
5.0 -154.860335196 22239.5669657

For ‘ne_110m_land’ data coming from Natural Earth, the error in the measured length may be on the order of a few kilometres, at least on the equator where we have duplicate data. For instance, the points (0.0 −160.0) and (0.0 20.0) determine the same great circle seen from two sides:

$ grep ^0 great_circle_110m.txt | sort -k3 -n -r | head
0.0 -160.0 22153.8112556
0.0 20.0 22153.1791523
0.0 21.0 21444.598626
0.0 -159.0 21441.5582888
0.0 -161.0 21171.2386781
0.0 19.0 21170.0842492
0.0 -156.0 20870.1373834
0.0 24.0 20868.6096133
0.0 22.0 20795.1111135
0.0 -158.0 20794.7358011

When we use the most accurate land shapes, ‘ne_10m_land’, the error decreases to a few dozen metres:

$ sort -k3 -n -r great_circle_through_poles_10m.txt | head
0.0 -160.0 21984.2636822
0.0 20.0 21984.1709172
0.0 21.0 21459.8373531
0.0 -159.0 21459.7796093
0.0 19.0 21080.7771226
0.0 -161.0 21080.7567727
0.0 -158.0 20878.5033907
0.0 22.0 20878.4813585
0.0 26.0 20706.5668797
0.0 -154.0 20706.5664931

Another program searched more thoroughly for great circles using projections with centres in six regions that had the lowest local minimums of the overlap fraction: (4–8°N 108–118°W), (23–25°N 78–81°E), (51–64°N 45–50°E), (49–53°N 3–6°W), (42–45°N 153–157°E), and (76–79°N 98–103°E), and in the region with the highest local maximum: (0–7°N 151–161°W). The latitude step was 30′ and the ’ne_10m_land’ shapes were used. Then I ran another search with latitude step 6′ in the most promising subregions. Finally, I applied scipy.optimize.minimize():

from scipy import optimize
import earth

def LandIntersection(latlon, land):
  return earth.GreatCircleLandIntersection(
      latlon[0], latlon[1], land)

def NegativeLandIntersection(latlon, lands):
  return -LandIntersection(latlon, lands)

METHOD = 'Nelder-Mead'

def main():
  land = earth.ReadShapes('shapes/ne_10m_land')
  for x0 in [
      (5.2, -114.45), (24.1, 79.17), (57.5, 46.75),
      (50.7, -4.76), (44.1, 154.79), (77.4, 100.5)]:
    print optimize.minimize(
        LandIntersection, x0=x0,
        args=(land,), method=METHOD)
  print optimize.minimize(
      NegativeLandIntersection, x0=(3.4, -154.14),
      args=(land,), method=METHOD)

Here is the gallery of the locally optimal great circles, in the azimuthal equidistant projection. I swapped the centres of the first and the last map with their antipodes to keep the Old World near the centre.

On to the question about the largest small circle that does not pass through land. For crude results, it is enough to use ‘ne_110m_land’ and evaluate this function at the vertices of the mesh:

def LargestCircleThroughOcean(
    lat, lon, land, r0=90, r1=0, rstep=0.1):

  def CircleIntersectsLand(lat, lon, r):
    circle = Circle(lat, lon, r, proj)
    return any(
        g.intersects(circle) for g in land_projected)

  proj, land_projected = Project(lat, lon, land)
  for r in numpy.linspace(
      r0, r1, abs(r1 - r0) / rstep + 1):
    try:
      if not CircleIntersectsLand(lat, lon, r):
        break
    except geos.TopologicalError:
      return -1.
  else:
    return r
  while rstep > 1e-10:
    rstep /= 2.
    rm = r + rstep
    circle = Circle(lat, lon, rm, proj)
    if not CircleIntersectsLand(lat, lon, rm):
      r = rm
  return r

I fine-tuned the results similarly to the way described above. You may have guessed that there is a large circle in the Pacific Ocean between Queensland in Australia and Baja California Sur in Mexico, shown here in the orthographic projection:

 Pacific

But I bet you would not imagine that its radius is beaten by this circle around Australia and Antarctica:

Australia-and-Antarctica

This is its tour de force south of the Rote Island, between the Thursday Island and the Moa Island in the Torres Strait, and through the Louisiade Archipelago:

Torres-Strait

Here is the link to the code and some open problems: one hard, one tricky, and two easy ones. Please leave a comment if you solve any of them.

  1. What hemisphere of Earth has the most ocean? Hints: i. use an equal-area projection, ii. Shapely objects have an area property.
  2. What hemisphere has the most land?
  3. What is the largest small circle that intersects only land? You can confine the search to Asia and find the radii of candidate circles through plain binary search because the largest circle does not contain the Caspian Sea.
  4. How would the results change on an Earth ellipsoid?
Slicing Earth, Carefully

Smiths, Millers, Priests: European Occupational Surnames

Here is the map of the most frequent occupational surnames in European countries and the corresponding trades.

surnames

Country Surname Transliteration
Belarus Кавалёў Kavalyow
Bulgaria Попов Popov
Greece Παπαδόπουλος Papadopoulos
Macedonia Поповски Popovski
Russia Кузнецов Kuznetsov
Serbia Поповић Popovic
Ukraine Мельник Melnyk

I made it with Cartopy, Shapely, and Natural Earth data. The surnames are taken mainly from the appropriate Wikipedia page. Redditors provided data for Sweden, Norway, Lithuania, Latvia, Bulgaria, Macedonia, Serbia, Montenegro, Bosnia and Herzegovina, Turkey, and Catalonia (Ferrer = Smith), as well as corrected my mistakes in Ukraine and Austria. I sincerely appreciate their help. Click on the links to see relevant comments.

This is a quick hack, not serious research. The map takes into account countries rather than ethnic or cultural areas (update as of October 1, 2015: now the maps of Spain and Serbia include the most frequent Catalan and Kosovar occupational surnames, respectively). The methodology is simplistic: I always picked the most frequent occupational surname even though Wikipedia aptly notices that in the Netherlands the set of {Smit, Smits, Smid, de Smit, Smet, Smith} outnumbers both {Visser, Visscher, Vissers, de Visser} and {Bakker, Bekker, de Bakker, Backer}. Similarly, redditors commented that {Schmidt, Schmitt, Schmitz, Schmid} outnumber {Meier, Meyer, Maier, Mayer} and {Müller} in Germany, {Maier, Mair, Mayer} outnumber {Huber} in Austria, {Seppälä, Seppänen} outnumber {Kinnunen} in Finland, and {Herrero, Herrera, Ferrer} outnumber {Molina} in Spain. I learned that occupational surnames are alien to Nordic countries so Möller and Møller are relatively rare imports in Sweden and Norway, that Molina and Ferreira are “second-order occupational surnames” as they derive from places rather than from professions, and that surnames in Turkey are so recent invention that Avcı probably was not a real occupation.

In case this is not obvious, the disappearance of the smallest countries on the map is not my fault.

The code is available on Bitbucket. Please leave a comment if you know how to fill the blank spots or have other ideas for improvement.

Smiths, Millers, Priests: European Occupational Surnames