Wisła in Fact Likes Cracovia but Doesn’t Know How to Start Talking

The relationships between fans of football clubs in Poland can be fourfold: neutrality, friendship (zgoda), enmity (kosa), or alliance (układ). The belief that there are two disjoint blocs gathered around The Great Triad (Arka, Cracovia, and Lech) and Three Kings of Great Cities (Śląsk, Wisła, and Lechia) is false. Here is the largest connected component of the graph of friendships.


The graph of enmities would be less clear. For instance, Cracovia has friendships with Tarnovia and Sandecja but Tarnovia and Sandecja are enemies. Or GKS, Górnik, Ruch, and Zagłębie: every two of them are enemies.

Source: http://polscyhools.w.interiowo.pl/ekipy.html.

Wisła in Fact Likes Cracovia but Doesn’t Know How to Start Talking

Stylometry—It Works! (in Some Circumstances)

This post was supposed to reveal the author of the 13th Book of Pan Tadeusz, an anonymous pornographic sequel to the Polish national epic. Despite my attempts that took into account rhyming sounds, word syllable count, and custom morphological analysis for Early Modern Polish, I failed to identify the author. Which is not that bad: authorship attribution, especially when regurgitated by journalists, is often reduced to ex cathedra statements: “a computer has proven that work X was written by author Y”; the fact that the confidence level is unknown is not reported.

Instead of a literary discovery, I present you a little game: Which Polish text is your writing like? It tells me that The 13th Book is most similar to Antymonachomachia by Ignacy Krasicki who died 33 years before the publication of Pan Tadeusz. Oh well.

The game is based on texts from Wolne Lektury, the Polish equivalent of Project Gutenberg. I appreciate Radek Czajka’s help in downloading them.

Since I know little about writing style analysis (known as stylometry), the entire sophistication of my program lies in calculating the frequency of a few dozen of tokens in each text. This idea is similar to Alphonse Bertillon’s anthropometry, a late-19th-century efficient system of identifying recidivists by classifying eleven body parts as small, medium or large.

We compare text style rather than text topics, so the program pays little attention to content words. It counts final punctuation marks, commas, and 86 frequent function words, that is conjunctions, prepositions, adverbs, and so-called qubliks. These counts are divided by the total number of tokens in the text, yielding a 90-dimensional vector of token frequency for each text.

The figure below shows the results of hierarchical clustering of the texts longer than 5000 tokens, obtained with

        frequency_matrix, method=’ward’, distance=’euclidean’))


I, for one, am impressed by its gathering together most of texts written by Kasprowicz, Krasicki, Rzewuski, and Sienkiewicz, or translated by Boy–Żeleński and Ulrich.

How reliable are the results? To answer this question, I perturbed the token counts: for each text composed of N tokens, I replaced k occurrences of each counted token by a random variable with the binomial distribution B(N, k/N), that is the count of heads in N tosses of a biased coin whose heads probability is k/N. For each text from Wolne Lektury, the x axis in the figures below shows the total number of tokens. The y axis shows the frequency with which the nearest point by the Euclidean metric corresponded to a different text or a text by another author/translator, measured in 1000 such random perturbations. In case you wonder how the y axis appears logarithmic and contains zero at once, the plotted variable is log(y + 0.001).

I approximated both the text misattribution probability and the author misattribution probability by 1−(erf(√N/c))b, with empirical values of constants b and c depending on the language, the tokens, and the texts.

Here is my hand-waving explanation of this formula. The coordinates of perturbed points, multiplied by N, have a multivariate binomial distribution (it does not matter whether the coordinates are correlated or not). When N approaches infinity and k/N remains constant, the binomial distribution is asymptotically normal with variance proportional to N (by the central limit theorem applied to tossing the coin), and the multivariate binomial distribution is asymptotically multivariate normal. Dividing the random variables by N, we return to the coordinates, which asymptotically have a multivariate normal distribution with individual variances and covariances proportional to 1/N.

The points divide the 90-dimensional vector space into Voronoi cells whose centres correspond to the mean vectors of the distributions. Moving a point to the other side of some wall of its Voronoi cell means moving it by more than d in the direction perpendicular to the wall. The projection of any multivariate normal distribution with variances and covariances proportional to 1/N onto a vector is a (univariate) normal distribution with variance proportional to 1/N. The probability that a random variable with variance σ2=a/N differs from its mean by more than d (that is, that the permuted point crosses the wall, causing a misattribution) equals 1−erf(d/σ) = 1−erf(dN/√a) = 1−erf(√N/c). Since the Voronoi cell has many walls in different directions, the overall probability that the point exits its cell is approximately equal to 1−erf(√N/c1)×⋯×erf(√N/cn). The erf function decreases rapidly so the factors with the smallest cis dominate the product, which can be approximated by the formula 1−(erf(√N/c))b.


The figures explain why it was hard to ascribe the author to The 13th Book: even if other works by the author belonged to the Wolne Lektury corpus (they probably do not), The 13th Book has merely 1773 tokens.

Stylometry—It Works! (in Some Circumstances)

Sipping Rum: Some New Palindromes

A somewhat popular sport [1, 2] is extending Leigh Mercer’s immortal palindrome “A man, a plan, a canal—Panama!” It occurred to me that its principle can be applied to the Polish palindrome by Julian Tuwim: “Popija rum as, samuraj i pop.” (“Both an ace, a samurai, and an Orthodox priest are sipping rum.”) All we need is a computer program and a list of Polish animate nouns. Here we go:

Popija rum as, said, diak, goj, drab, tokolog, igrek, odlewca, mim, tenor, abba, rodak, imam, alkad, gigant, alb, ober, retor, fan, ilot, rapper, nowy car, usar, adresat, efor, papa, grek, saper, treser, epik, bob, angol, ananas, aga, mameluk, urka, tatka, ergolog, ladro, lis, ork, induna, grum, fleja, batiar, akyn, wał, sowar, psar, kudła, renegat, symplak, ilota, kat, alumn, amor, eponim, daremnik, spec, tan, gajowy, durnota, kret, inka, mods, esbol, rajtar, bidak, tamada, mongoloid, arat, sir, abat, imamita, barista, radiolog, nomada, mat, kadi, brat, jarl, obses, domak, niter, katon, rudy woj, agnat, cep, skin, mer, admin, operoman, mulat, akatolik, alp, mysta, generał, duk, ras, prawosławny, karaita, baj, elf, murga, nudnik, rosi, lord, algolog, reak, tata, kruk, ulema, mag, asan, analog, nabob, kiper, eser, trep, asker, gapa, profeta, serdar, asura, cywon, rep, partolina, froter, reb, oblat, nagi gdak, lama, mikado, rab, baronet, mima, cwel, doker, gigolo, kot, bard, jog, kaid, dias, samuraj i pop.

(The adjectives nowy, rudy, and nagi got mixed among nouns. For a better effect, I manually removed the commas that followed them.)

Lazily, I used only slightly modified Peter Norvig’s backtracking program. I extracted the nouns from a text file used in the Polish morphological analyzer Polimorfologik. The lines of the file look like this:

samuraj         samuraj subst:sg:nom:m1
samuraja        samuraj subst:sg:acc:m1+subst:sg:gen:m1
samurajach      samuraj subst:pl:loc:m1
samurajami      samuraj subst:pl:inst:m1

The appropriate forms of nouns can be extracted with

$ grep subst:sg.*nom.*m1 polimorfologik.txt | cut -f 1 > npdict.txt

The m1 class contains masculine-personal aka virile nouns. Although the names of animals from the m2 class would also suit our purposes, that class contains also names of odd things like currencies, dances, car brands, or mushroom genera that would look strange in the sentence. With apologies to feminists, I have no means of extracting feminine-personal nouns or neuter-personal nouns automatically as they play no special role in Polish grammar.

The palindrome above contains only singular forms of common nouns (152 words in total). If we allow also singular masculine forms of adjectives, we can get a 269-word palindrome:

Popija rum as, said, diak, goj, drab, perski murga, kadi beż netto, klawy rzutki froter, kto, penolog, iglany sini magaski frant, utyty bojowny kaper, trak, sanowy cynawy rusy rotny sracz, sowar, aspan, ilot, raptor, mamlas, on, rebe, wali lis, jebak, cacy woli amor, fan, wodnik, ergolog, lama, mim, asker, gad, jarski men, as, pajac, darmy rosi preser, oferent, rapsod, nabab, abat, symplak, induna, tebriski gamrat, akatolik, rodak, ladro, lisi migany wandal, papa, cwel, doker, gid, rumski golkiper, asura, elew, okej arat, siwy ratar, maksi potowy żywotni wig, orski alb, obyły baca, cywil, paskuda, gemajn, inka, tępy wolowaty raby skin, turkos, esbol, użyty spec, ki marecki mima, kruk, a-ż popi lżywy wozak, angol, ananas, a-z raja, jary picer, kok, tato, nominat, inaki cap, ospały wilk, muli pupka, mods, aborter, dr, enat, siny talib, abba, reb, be rab, babi latynista, nerd, retro, bas, domak, pupil, umkliwy łaps, opaci kani tani mono tatko, kreci pyra, jajarz, asan, analog, nakazowy wyżli pop, żak, urka, mimik, ceramik, cep, syty żul, obses, okrutnik, sybaryta, wolowy pętak, ninja, mega duk, sapliwy caca były bob, laik, srogi wintowy żywotopis, kamrat, arywista, rajek, ow, elear, usar, epik, logik, smurd, igrek, odlewca, papla, dnawy nagi misi lord, alkad, ork, ilota, katar, magik, sir, beta, nudnik, alp, mysta, baba, bandos, partner, efor, eser, pisorym, radca, japs, anemik, srajda, grek, sam, imam, algolog, rekin, down, afro, mailowy cacka, bej, si lila, weber, nosal, mamrot, partolina, psar, awosz, car, syn, torys, urywany cywon, askar, trep, akyn, woj, obyty tutnar, fiks, aga, mini synal, gigolo, nepot, kret, orfik, tuz, rywal, kot, tenże bidaka, grum, iks, rep, bard, jog, kaid, dias, samuraj i pop.

Using plural and proper nouns, we can reach at least 1493 words, for instance:

Popija rum as, kalif, drab, Belg, Remus, Leon, pank, said, diak, Goj, Acis, Pac, Tabak, Kajus, reb, luj, Iwon, Noe, Selim, kacap, Damon, Melcer, epicy, Rob, Atkinson, Jahn, Wahl, Omar, turowiec, ubici, nabab, Mobutu, helota, Nagórski, Dyda, kraker, Gola, Timur, Gil, Ramzes, Romanik, imperator, Ajnos, waleci, biker, rajtar, Umer, Miotk, Popek, Nils, Lefeld, epik, car, Tamil, Amoni, Capała, Sarnat, Jeron, Urban, Orwelle, Tym, Einar, Usarek, rabi, nemrodzi, Kramnik, Sałacki, Sawini, basza, Idzi, Zaremba, raja, Rogacz, Rubeni, Atlas, ontolog, anemicy, Ron, imitator, Inka, Bulik, setkarz, sublokator, Eisler, akatolicy, tato, idioci, Vidor, Apacz, Alan, Ziomek, Albin, Rom, Oktawian, odaj, Arka, kadi, boss, Ursyn, Ahmad, Dassin, Eden, Siadlak, Oscar, Idec, udecy, Rupert, rastamani, Armin, Onak, Rola, Gill, Eco, były, Tom, Eluard, Niski, Nyk, Anatol, Olson, Orkan, rasta, Geller, Bask, oblat, Emil, Ado, Izydor, ras, sipaj, Amnon, Nelson, imam, lapicyda, Bill, Iwan, Alo, Skiba, Tim, Jankes, Uryga, Nycz, Darek, Ardelli, arbek, lirycy, Raczek, Orion, Roda, Cortazar, Odo, idol, Otis, Sobik, cep, opaci, Sak, Lew, Morka, bydlak, Sommer, Feret, Tyrsi, robol, opilec, Rama, Gert, ufolog, Iżyccy, Malka, Kwak, Malak, Tal, Redak, Solti, Lopez, Sot, rabbi, Latacz, Darren, Sopata, Taj, ninje, limnolog, flisak, udek, Kimak, Byrski, Fibak, Masaj, Elsner, efeb, lokator, Papała, Bacik, Cepil, Sabat, sietniak, tokolog, Igrek, Sade, Mahomet, Opacki, sowar, Racki, Amor, Rawik, suswał, Syta, rwacz, Dow, Zadura, Dunin, Olas, Bakuła, Bil, Loeb, ergolog, Lasek, Liw, jarle, Varga, mener, Tabaka, logicy, Razin, Roja, Paganini, Lak, Baran, ilot, rapper, twoi, Wołosi, Colin, Eweni, Borak, Jaksa, gimnazjasta, Rams, Uri, Cywka, Durka, Meisels, Ilia, Koj, Jordi, Vadim, Arrow, Agaton, Rudnik, Eric, Neil, knajak, Tyrawa, Bazan, Nestor, Olek, Nawoj, uwol, Racine, Magnus, mahdi, Wronka, Bełza, Floryni, Zub, Ołdak, Lasota, Linde, Ibsen, Negr, Ezopi, Woda, Swatek, odlewcy, Cezary, preser, eks, ubol, Koba, katar, Katz, Sapir, Nehru, lider, ubek, Collina, blogger, git, Nastak, Inuita, Maj, akolici, Fuk, nomada, rapsod, Adad, Jagger, autycy, Ted, Jeka, Jakub, ulema, Kwoka, Byrnes, rajas, abaci, Bukała, Bigos, satrapa, kortyzani, Lewek, alastor, profes, Ares, Kobak, lord, Labuda, Prada, Gleba, Trak, Sas, Able, Baka, Dubik, Cortez, Carsten, Rokita, Sornat, stenograf, Asser, Roth, celnik, Surmiak, Kern, Elgar, Edgar, Agis, tutnar, Ozawa, Kret, ultrasi, Bulak, Rurak, Dubois, Eldar, Bator, askar, Buksa, Kulig, Annamita, Rodak, Edyp, Agha, psor, Engel, Orione, Rota, King, asan, agregat, symplak, Ezaw, Ed, Toni, Gamrat, Artur, Ammon, organik, radny, doyeni, woj, agnat, sir, tatul, Pini, Drabik, modele, Depa, Dowi, Losey, Lesik, Sobota, Lis, pajac, Rolnik, Topor, Knysak, Turner, Olsen, Rabin, Mizak, Lada, Tyl, kady, filareci, patron, etatowi, Sasak, lemani, Wontor, Wanat, esbol, logik, Seliga, Mały, Waluk, Komsta, radiesteci, Nata, lump, Radlak, Stefani, Samsel, Tibor, Deptuła, baje, rabini, Norris, Ajmar, Grecy, zabici, Cini, akrobata, Dustin, Abel, Bąba, krytycy, Brando, Baj, Lizut, Zin, Bielat, Fokker, Edison, Jarema, logograf, Ilje, Bata, Pazik, smerd, Renat, Hempel, Rus, Rey, matoł, Spytko, jubilat, fani, Sudnik, etolog, Nanaj, Abram, Otokar, wyrypaje, Izraele, Elkana, Motak, Josh, Calik, Kelles, seksoman, acanek, Kazulin, Odon, alim, Kahn, abba, Dante, Mulawa, Lasak, Puk, Jagła, renegat, Samon, Osak, Nijak, Golan, Aznar, Ficek, Odil, aktor, Gruza, maruda, Mick, eleata, Pol, Alec, Lehr, Habakuk, oblaci, Fin, Adam, Rojak, modsi, Welt, Opałka, Sroka, Galaty, Malik, Sałata, Inuici, Walas, homeopata, Kunz, Siwik, Cąkała, Kazko, Wonder, Flak, urka, Maldini, mohele, nemrod, Nash, Allan, orator, Mamak, John, Eros, Kenar, Fik, Cebulak, Lompa, Grek, Rapak, Pot, stangreci, Pen, Rubaj, elf, Kukiz, androfag, Rumas, Ornat, Ante, fajter, rajca, geje, Becu, Mak, lokat, pedagog, adept, togat, Simon, Erik, Cudak, sensat, Natanek, inki, Piccoli, kretyn, Liszcz, sipaje, tamada, Morgała, Duff, Alois, Anan, Aresi, Reje, Makuła, Bułat, Olak, Jedak, Repin, Nita, Kudyba, Breza, gej, Dejon, Gała, Parda, Hatak, Lussac, Ulf, larrikini, leweller, rafiner, Atapask, insi, Boni, Capone, Fryz, sumici, Mann, Arent, Socyn, Tomas, Roger, groom, kat, Rufin, Tomasze, Najder, froter, Pinda, Rataj, Zappa, Prodi, skini, model, papa, Dudała, Kalukin, Lisik, Swat, Sycz, Durak, trombon, Snell, atleci, foks, Kamil, Lutz, Solak, Renn, Ernest, pokraka, Meir, uczeni, tramp, pedał, Duk, wał, Carnot, Romeo, Fedak, Ratka, Dydycz, Rubik, Celej, Opoka, baca, Kiwak, Bubak, Orest, Keler, Ebert, Saługa, mimik, pupka, Trawka, mods, Rudi, Wadas, Korpak, Majak, Messi, papla, Sawka, tępak, Dudycz, Delon, Allach, Corelli, Heine, Lejb, Resnais, sahib, mozarab, Otto, Klein, adresant, sardar, homo, niter, Al, Hadała, Bugała, Basta, serdar, DJ, adamici, Numida, wałkoń, erudyci, Gama, Kutz, Skałka, nudnik, Nobel, opętany, zupak, typas, Elamici, Follett, Sujka, Basak, Tutak, Pałka, Perez, Cliff, Latała, hip, opat, Sierak, Pęksa, Haba, Bremer, kok, Loba, Pękała, pokraki, Kasiak, Lepka, Boba, goje, tatka, Pułka, Pełka, Josif, Lars, esbole, Druszcz, sułtan, Hawel, pludrak, Solon, gapa, Klim, Kazik, alumni, Soski, Marecki, Sowa, profeci, Noam, Nalepa, Korda, Lece, dewot, Stowe, Dece, ladro, kapelan, Mao, Nicefor, Paw, Osik, ceramik, Sosin, Mulak, Izak, Milka, Pagnol, Oskar, Dul, Plewa, Hnat, Łuszcz, Surdel, obses, Ralfi, Sojak, łepak, Łupak, Tate, joga, Bobak, Pelka, Isak, Ikar, Kopała, Kępa, Bolko, Kremer, baba, Has, Kępka, reista, popi, Hałat, Alf, Fil, Czerepak, Łapka, Tutka, Sabak, Just, Tell, ofici, Malesa, Pytka, Puzyna, tępole, Bonk, induna, Kłak, Sztuka, magicy, dureń, Okła, Wadim, unici, Madaj, dr, adresat, Sabała, Gubała, Dahl, Aretino, Mohr, Adrast, Naser, Daniel, Kott, Obara, zombi, Hass, Ian, Serb, jelenie, Hille, Roch, Callan, Oledzcy, Dudka, pętak, wasal, Papis, Semka, Jamka, Proksa, Dawid, Urs, domak, Wartak, pupki, mima, Guła, streber, elekt, Seroka, Bubka, Wika, Cabak, opoje, Lecki, burzcy, Dydak, Tarka, Defoe, Morton, Racław, Kudła, Depp, Martinez, Curie, Makar, Kopt, Sen, Renner, kalosz, Tulli, Maks, Kofi, Celt, Allen, snob, Mortka, Rudzcy, Stawski, silni, Kula, Kała, Duda, paple, Dominik, Sidor, Papp, Azjata, radni, pretor, Fred, Janez, samotni, Furtak, Moor, Gregor, samotny, Costner, Annamici, muszyr, fenopaci, Nobis, Niksa, patareni, Farrell, Ewelini, Kir, Ralf, Lucas, Sulka, Taha, Drapała, gnoje, DJ-e, gazer, baby, Dukat, inni, Perka, Dej, Kalota, Łuba, Łuka, Mejer, Iser, ananasi, Olaf, Fudała, Gromada, Mateja, Piszcz, silny, Terki, Locci, piknik, enat, Antas, Neska, Ducki, renomista, Gott, pedagoga, Depta, Kolka, muce, beje, Gac, Jarret, Jafet, Natan, Rosa, murga, Ford, nazi, Kuk, fleja, Burne, picer, Gnat, Stopka, Parker, gap, Molka, Lubecki, Franek, Soren, Hojka, mamrot, Aron, Allah, Sandor, menele, hominid, lama, Kruk, Alfred, Nowok, zakała, Kącki, Wisznu, Kata, Poe, Mohs, alawici, uniata, Łaski, lamy, Talaga, Korsak, Łapot, Lewis, Domka, Jorma, Dani, Fic, alb, Okuka, Bahr, Helcel, alopata, elekci, Madura, Mazur, Grot, Kali, dokeci, Franz, analog, Kaj, Inkas, onomasta, generał, Gaj, Kupka, Salawa, Lumet, Nadab, ban, Hak, Milan, Odoni, luzak, Ken, acan, Amos, Kessel, Lekki, Lach, Sojka, Tomana, Klee, Lear, Zieja, pyry, wrak, Otomar, Bajan, Angol, Otek, Indusi, Naftali, Bujok, typ, Słota, Myers, Urlep, Mehta, nerd, Remski, Zapata, bej, Lifar, gogo, lamer, Ajnosi, Derek, Kofta, Leibniz, tuz, Ilja, Bodnar, Bycy, Tyrka, bąble, banit, Suda, Tabor, kainici, Ciba, zycer, Gram, Jasir, ronini, Bareja, Bałut, Pedro, bitles, Masina, Fet, skald, Arp, Mulat, Anicet, seid, arat, Smok, kulawy, łamagi, Leski, Gollob, Seta, Nawrot, Nowina, Melka, Sasi, Wota, tenor, tapicer, Ali, Fyda, Klyta, Dalka, Zimni, Barnes, Loren, Rutka, syn, Kropotkin, Lorca, japsi, Lato, boski, Selye, Soliwoda, pedele, Domki, Bardini, Pluta, Tristan, gajowi, Ney, Odyn, Darkin, agronom, Marut, ratar, Maginot, dewa, zek, alp, mysta, Ger, Ganas, Agni, Kato, Renoir, Oleg, Nero, spah, gapy, Deka, Dorati, man, nagi, Lukas, Kubrak, Sarota, Brad, Lesio, Budka, Rurka, lubi, Sart, Luter, Kawa, Zoran, Tutsi, Gara, gdera, Glen, Rek, Kaim, Ruskin, Lech, Torres, Safar, Gonet, Stan, Rosati, Kornet, sracze, Trocki, Buda, kabel, Basa, skartabel, Gad, Arpad, Ubald, Rolka, bokser, as, efor, Prot, Salak, Ewelin, Azy, Troka, par, Tasso, Gibała, Kubica, Basaj, Arsen, Rybakow, Kamel, Ubu, Kajak, ejdetycy, Tuareg, Gajda, Dados, Parada, Monk, ufici, lokaj, Amati, unikat, Santi, Greg, Golba, Nil, Locke, Bur, Edi, Lur, Henri, Paszta, Krata, kaboklo, busker, eser, pyra, zecy, cwel, doketa, wsadowi, pozer, Gennes, biedni, Latos, alkad, łobuziny, Rolf, az, łebak, Norwid, Hamsun, gameni, Carlo, wujo, Wankel, Or-Ot, Senna, Zabawa, Rytka, Jan, klienci, rekin, durnota, Gawor, Rami, David, Roj, Jokai, Lisle, Siemak, Rudak, wycirus, Marat, Saj, Zan, Migas, Kajka, Robin, Ewen, iloci, Sołowiow, trep, partolina, rab, Kalinin, aga, Pajor, nizaryci, Golak, abat, Rene, mag, Ravel, Raj, Wilkes, algolog, rebe, Olli, Bałuk, Absaloni, nuda, Ruda, zwodzca, Wratysław, Suski, Warro, Maik, Carra, Wosik, Capote, Mohamed, asker, gigolo, Kot, Kain, teista, bas, Lipecki, Cabała, Paprota, Kolbe, Ferens, Leja, Sam, kabi, fiks, Rybka, Mikke, Dukas, Ilf, Golon, Milej, ninja, tata, Posner, radzca, Talib, Bartosze, Polit, Loska, Derlatka, Lam, Kawka, Klamyccy, żigolo, Futrega, Marceli, Polo, Boris, Rytter, Efrem, Moskal, Dyba, Kromwel, Kasica, Popecki, Bossi, Tolo, Diodor, Aza, Troc, Adorno, Irokez, carycy, Rilke, Braille, Drake, Radzcy, Nagy, Rusek, najmita, Bik, Solana, Will, ibadyci, Palma, Minos, Lennon, Maja, Pissarro, Dyzio, Dali, metal, Boksa, Brel, legat, Sarna, Kronos, Lolo, tan, akyn, Iks, Indra, ulem, otyły, Bocelli, Gal, ork, anonim, Raina, mat, Sartre, puryce, duce, Dirac, Sokal, Daisne, Denis, Saddam, hanys, Russo, bidaka, Kraj, Adonai, Wat, komorni, Blake, moi, znalazca, Parodi, Vico, idiota, Tyc, ilota, Karel, Sierota, Kolbusz, Rak, Teski, Lubak, Niro, Tati, minoryci, menago, Lot, nosal, Taine, burzca, Goraj, Arab, mer, Aziz, Diaz, Sabini, Wasik, Cała, skin, markiz, Dor, meni, Barker, asura, niemy, Telle, Wrona, Bruno, Rejtan, Rasała, Pacino, Mali, Matracki, pedle, Fels, Linke, pop, kto, Imre, Murat, Jarre, kibice, Lawson, Jarota, rep, Mikina, Morse, zmarli, grum, italogrek, Arkady, Diks, Róg, Anatole, Hutu, Bomba, banici, buce, Iwo, Rut, ramol, Hawn, Hajnos, Nik, taboryci, Perec, Lem, nomad, Pacak, Miles, eon, nowi, Jul, Ber, Sujak, Kabat, cap, Sica, jog, kaid, dias, Knap, Noel, Sumer, Gleb, bard, Filak, samuraj i pop.

Sipping Rum: Some New Palindromes

Tracking Spanish Flu through Austro–Hungarian Press

The area of the red circles on this map is proportional to the smoothed relative number of mentions of influenza in newspapers throughout the Austro–Hungarian lands in the last four months of 1918. In the same vein as Google was detecting influenza epidemics using search engine query data, perhaps these numbers approximate the local timeseries of influenza morbidity.


The 1918–1919 pandemic, called Spanish flu, killed at least five times more people than World War I. Here is a chart of its mortality in England and Wales (source: Edwin O. Jordan, Epidemic Influenza: A Survey, 1927):


The three waves occurred at similar times worldwide. I concentrated on the second, most deadly wave in the Austro–Hungarian empire and in the states that emerged in its lands. The full-text search of ANNO (AustriaN Newspapers Online, a digitisation initiative of the Austrian National Library) allowed me to count all the occurrences of the words {Grippe, Influenza}, {chřipka, chřipky, chřipce, chřipku, chřipkou} (in the Czech newspaper Dělnické listy), and {grypa, grypą, grypę, grypie, grypy, influenza, influenzy} (in the Polish newspaper Kurjer Lwowski) by the day of their publication in 27 newspapers from 14 cities and towns. A little side discovery is that the name “Spanish flu” in all three languages first appeared in print at the beginning of July 1918.

Despite the inevitable inaccuracy of this method, among others due to OCR errors and the shadowing of the epidemic by other topics like the surrender of Austria–Hungary, the end of the monarchy, and the independence of national states, noise tends to cancel out. I inspected all 29 mentions of flu in Kurjer Lwowski in the week of Oct 7–13, 1918. Out of them:

  • 19 concerned local events,
  • 4 concerned events elsewhere in Austria,
  • 5 concerned events elsewhere in Europe.

I fitted the Gompertz curve y = c \exp(-e^{-b(t-t_0)}) to the cumulative numbers, keeping in mind the fact that the issues of some newspapers from the end of the year are missing from ANNO (recall the independence). For instance, here is the best fit for the cumulative number of times flu was mentioned in newspapers published in Vienna:


and a full-year chart of mentions of flu per week in the same newspapers:


I tried also to verify Andrew Price–Smith’s claim that Spanish flu tipped the balance of power towards the Entente in the last days of World War I. Unfortunately, many belligerent states do not have online archives of newspapers from 1918, the German archives are unsearchable, and Gallica and the British Newspaper Archive are useless: the mentions of grippe or influenza in French and British press are few and far between outside advertisements, a sign of wartime censorship. However, this table from Epidemic Influenza: A Survey showing that the highest mortality co-occurred in France and Germany makes me doubt Price–Smith’s claim:


The map of Austria–Hungary above comes from MPIDR [Max Planck Institute for Demographic Research] and CGG [Chair for Geodesy and Geoinformatics, University of Rostock] 2012: MPIDR Population History GIS Collection (slightly modified version of a GIS-File by Rumpler and Seger 2010) – Rostock (Rumpler, H. and Seger, M. 2010: Die Habsburgermonarchie 1848–1918. Band IX: Soziale Strukturen. 2. Teilband: Die Gesellschaft der Habsburgermonarchie im Kartenbild. Verwaltungs-, Sozial- und Infrastrukturen. Nach dem Zensus von 1910, Wien).

Tracking Spanish Flu through Austro–Hungarian Press

When Will Earnings in Poland Match Those in Western Europe?

They already did, 450 years ago.


Source of data: Bob Allen’s home page; indirectly also three Polish books: Ceny w Krakowie w latach 1369–1600, 1601–1795, 1796–1914. The prices at the end of the period would appear less inflated if expressed in gold: the gold/silver price ratio was approximately 15.5 until 1872, whereas in the early 1900s, it grew to 33–40.

When Will Earnings in Poland Match Those in Western Europe?

Bad Philosophy: Clark’s Argument against Spectrum Inversion

Many philosophical debates are haunted by Locke’s thought experiment of spectrum inversion: “if the idea that a violet produced in one man’s mind by his eyes were the same that a marigold produced in another man’s, and vice versa” [1].

Austen Clark attempted to prove rigorously that spectrum inversion would be detectable by observing the subjects’ response to colours [2, 3]. He argued as follows: Suppose that two people perceive mutually inverted spectra and agree in their judgements of visual stimuli and the relations between them. Therefore, the colour solid (i.e. the visible colours as co-ordinates in some space) must exhibit symmetry. Since this solid (represented in three-dimensional Euclidean space e.g. as the Munsell colour system [4]) is in fact asymmetric, the premise must be false.

Can you spot the fallacy? It stems from assuming that everyone’s colour solid is identical. Clark mistook a textbook model for the real thing — a fallacy known as reification. The map is not the territory. Both the Munsell colour system and other proposed colour solids abstract away from differences in visual perception that occur among observers and even across experiments with the same observer [5–9]. One can easily conceive of a personal colour solid that remains the same (within measurement error) when mirrored or rotated. If there cannot be people who function like us with a mirrored or rotated colour solid, it is not its asymmetry that forbids their existence.

[1] Locke, John. “An Essay Concerning Human Understanding”, book II, ch. XXXII, par. 15. http://humanum.arts.cuhk.edu.hk/Philosophy/Locke/echu/lok0042.htm#15

[2] Clark, Austen. Spectrum Inversion and the Color Solid. Southern Journal of Philosophy, vol. 23, no. 4, Winter 1986, pp. 431–443. http://selfpace.uconn.edu/paper/CSOLID.HTM

[3] Clark, Austen. “A Theory of Sentience”. Oxford University Press: 2000. pp. 17–18. http://citeseerx.ist.psu.edu/viewdoc/download?doi=

[4] https://en.wikipedia.org/wiki/Munsell_color_system

[5] Webster, Michael A.; Eriko Miyahara; Gokhan Malkoc; Vincent E. Raker. Variations in normal color vision. I. Cone-opponent axes. Journal of the Optical Society of America, vol. 17, no. 9, September 2000, pp. 1535–1544. http://www.ncbi.nlm.nih.gov/pubmed/10975363

[6] Webster, Michael A.; Eriko Miyahara; Gokhan Malkoc; Vincent E. Raker. Variations in normal color vision. II. Unique hues. Journal of the Optical Society of America, vol. 17, no. 9, September 2000, pp. 1545–1555. http://localhopf.cns.nyu.edu/events/vjclub/archive/webster2000.pdf

[7] Webster, Michael A.; Shernaaz M. Webster; Shrikant Bharadwaj; Richa Verma; Jaikishan Jaikumar; Gitanjali Madan; E. Vaithilingham. Variations in normal color vision. III. Unique hues in Indian and United States observers. Journal of the Optical Society of America, vol. 19, no. 10, October 2002, pp. 1951–1962. http://wolfweb.unr.edu/homepage/mwebster/assets/pdfs/WebsterJOSA2002.pdf

[8] Malkoc, Gokhan; Paul Kay; Michael A. Webster. Variations in normal color vision. IV. Binary hues and hue scaling. Journal of the Optical Society of America, vol. 22, no. 10, October 2005, pp. 2154–2168. http://www1.icsi.berkeley.edu/~kay/mkw.josa.2005.pdf

[9] Juricevic, Igor; Michael A. Webster. Variations in normal color vision. V. Simulations of adaptation to natural color environments. Visual Neuroscience, vol. 26, no. 1, January 2009, pp. 133–145. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2684467/

Bad Philosophy: Clark’s Argument against Spectrum Inversion