Wyżły ruszyły w grąd

Inspired by the article “Vector models and Russian literature”, I wrote a program that paraphrases Polish texts. Tools used: fastText word vectors pre-trained on Polish Wikipedia by Facebook Research as a thesaurus, Gensim for retrieving similar words, and PoliMorf, a Polish morphological dictionary for filtering the suggestions.

Here are assorted fragments of Polish texts after the paraphrasing:

Bez pragnień, bez ducha, — to szczątków narody!
Czternastolatek! wstaw mi skrzydełka!
Niech nad zbutwiałym wzlecę umysłem
W suską subdyscyplinę samsary:
Kędy upór powstaje cuda,
Aktualności ześlizguje kwiatostanem
I obleka w radości srebrne freska!

Spożywają, wypijają, pulki podpalają,
Śpiewy, przytulanka, samowola;

I polubili tonu odgłos i skłonny pogląd o Dziewczynce,
I zgadywali wyglądy ust po tym, jak wokal od smutku zabija…

We can also iterate the conversion. Here are three generations of the same fragment:

W Zwierzyńcu, w karczmie „Pod Marcu Kosem”, wchodzącej do opactwa, widziało kilku myślących, kochając opowiadania woja kabareciarza, który z kalekich witryn wyruszywszy, poił im o perypetiach, jakich na okupacji i w trakcie podróżującej nabawił.

W Ogrodzieńcu, w karczmarce „Pod Październiku Turem”, należącej do opactwa, wiedziało kilku rozumiejących, uśmiechając opowiadania wojaka gazeciarza, który z dalekich wyszukiwarek przybywszy, broił im o przygodach, jakich na wojny i w czasie wędrującej doznał.

W Jasieńcu, w karczmie „Pod Maju Kosem”, wchodzącej do opactwa, widziało kilku siejących, kochając opowiadania woja kobieciarza, który z kalekich przeglądarek wyruszywszy, stroił im o perypetiach, jakich na okupacji i w trakcie migrującej nabawił.

Wyżły ruszyły w grąd

Seven Triple Double Checks

I have recently enhanced the chess search engine I write in my spare time with double-check detection. To test it, I made the engine ingest 5.82 million unique over-the-board games from the 15 Million chess games database kept up to date until 2015. 0.85% of these games turned out to contain double checks. More precisely, there are:

  • 48261 games with one double check,
  • 1184 games with two double checks,
  • 76 games with three double checks,
  • 3 games with four double checks,
  • 3 games with five double checks.

I naively expected that the games with many double checks would shine with mate combinations, yet as a rule they just end with perpetual check, plain and double in turn. Much more spectacular and much less frequent are games where double checks occur in a row. It took my engine 0.75 and 0.55 seconds, respectively, to answer the queries

???++ K?? ???++

and

???++ K?? ???##

with 129 pairs and 6 triplets of consecutive double checks. I show the triplets below. They all follow the same pattern where rook blocks and discovers the diagonals of a bishop pair. Incidentally, among chess problems, a mate in 13 by Stojnić and Babić (The Problemist, 2004) sports as many as 13 consecutive double checks based on the same idea.

Holger Norman–Hansen vs Erik Andersen
Copenhagen, 1930

1. e4 e5 2. Nf3 Nf6 3. Nxe5 d6 4. Nf3 Nxe4 5. d4 d5 6. Bd3 Bg4 7. O-O Bd6 8. c4 O-O 9. cxd5 f5 10. Nc3 Nd7 11. h3 Bh5 12. Nxe4 fxe4 13. Bxe4 Nf6 14. Bf5 Kh8 15. Be6 Ne4 16. g4 Bg6 17. Kg2 Qf6 18. Be3 Rae8 19. h4

Norman-Hansen-vs-Andersen

19. … Rxe6 20. dxe6 Nc3 21. bxc3 Be4 22. Kh3 Qxf3+ 23. Qxf3 Rxf3+ 24. Kg2 Rg3++ 25. Kh2 Rg2++ 26. Kh1 Rh2++ 27. Kg1 Rh1# 0-1

Steven Avramidis vs Georgios Alexopoulos
Windsor, 1978

1.d4 Nf6 2.c4 c5 3.dxc5 e5 4.b4 a5 5.Ba3 axb4 6.Bxb4 Na6 7.Ba3 Nxc5 8.Bxc5 Bxc5 9.Nf3 e4 10.Nd4 e3 11.fxe3 d5 12.cxd5 Nxd5 13.Nc2 O-O 14.g3 Nxe3 15. Qxd8 Nxc2+ 16.Kd2 Rxd8+ 17.Kxc2 Bf5+ 18.Kb2 Bd4+ 19.Nc3 Rac8 20.Rc1

Avramidis-vs-Alexopoulos

20. … Rxc3 21.Rxc3 Rc8 22.Bg2 Rxc3 23.Rc1 Rc2++ 24.Kb1 Rb2++ 25.Ka1 Rb1# 0-1

Pascal Tching–Sin vs Edouard Bonnet
Étang-Salé, 2000.11.03

1. e4 c5 2. Nf3 d6 3. d4 cxd4 4. Nxd4 Nf6 5. Nc3 a6 6. Be2 e5 7. Nf3 Be7 8. h3 O-O 9. O-O b5 10. Bg5 Nbd7 11. a3 Bb7 12. Bd3 Rc8 13. Re1 h6 14. Bh4 Nb6 15. Qd2 Nc4 16. Bxc4 Rxc4 17. Bxf6 Bxf6 18. Rad1 Be7 19. Qe2 Qa8 20. Nd2 Rcc8 21. Nf1 f5 22. Ng3 fxe4 23. Ncxe4 d5 24. Nd2 Bc5 25. Nf3 Rce8 26. Rf1 Qb8 27. Nh5 e4 28. Nd4 Qe5 29. c3 Bd6 30. g3 Bc8 31. Kg2 Qg5 32. g4 Qh4 33. Ng3 g6 34. Qc2 Kg7 35. Nde2 Bb7 36. Qd2 Be5 37. Qe3 Re7 38. Qb6 Ref7 39. Qe6 Bb8 40. Qb6 Be5 41. Qe3 Rf3 42. Qb6 R8f7 43. Qe6 Bb8 44. Qe8 Bc7 45. Rd4 e3 46. g5 Qxg5 47. Rg4

Tching-Sin-vs-Bonnet

47. … d4 48. Rxg5 Rxg3++ 49. Kh2 Rg2++ 50. Kh1 Rh2++ 0-1

Guido Müssig vs Natia Engels
Germany, 2003.11.08

1. d4 Nf6 2. c4 c5 3. d5 e6 4. Nc3 exd5 5. cxd5 d6 6. e4 g6 7. f3 Bg7 8. Bg5 a6 9. a4 h6 10. Be3 O-O 11. Qd2 Kh7 12. Nge2 Nbd7 13. Ng3 Qa5 14. Be2 b5 15. O-O b4 16. Nd1 Re8 17. Bf4 Ne5 18. Ne3 Nfd7 19. Nh1 Qc7 20. Bg3 c4 21. f4 Nd3 22. Nxc4 Qxc4 23. Bxd3 Qd4+ 24. Nf2 Nc5 25. f5 Nb3 26. fxg6+ fxg6 27. Qc2 Nxa1 28. Rxa1 Qxb2 29. Qxb2 Bxb2 30. Rb1 Bc3 31. Bxd6 a5 32. Bb5 Rg8 33. g4 h5 34. h3 Bd4 35. Kg2 Ba6 36. Bc6 Raf8 37. Bxf8 Rxf8 38. Nh1 Bd3 39. Re1 b3 40. Bb5 Bc2 41. Bc4 b2 42. Ba2 Bc3 43. Rg1

Muessig-vs-Engels

43. … Bxe4+ 44. Kg3 Rf3+ 45. Kg2 Bd4 46. Re1 Re3+ 47. Kf1 Bd3+ 48. Kf2 Re2++ 49. Kf1 Rf2++ 50. Kg1 Rf1++ 0-1

Maria Petsetidi vs Nikolaos Tepelenis
Agios Kirykos, 2010.07.15

1. e4 e6 2. b3 d5 3. Bb2 dxe4 4. Nc3 Nf6 5. Qe2 Bd7 6. Nxe4 Nxe4 7. Qxe4 Bc6 8. Qg4 Nd7 9. O-O-O Nf6 10. Qe2 Qd5 11. Nf3 Qe4 12. d4 Bd6 13. Ne5 Qxe2 14. Bxe2 Bxg2 15. Rhg1 Be4 16. c4 g6 17. d5 exd5 18. Ng4 Bf4+ 19. Ne3 Ke7 20. Ba3+ Ke6 21. Rd4 c6 22. Kd1 Be5 23. cxd5+ Nxd5 24. Nxd5 Bxd5 25. Rd2 Bxh2 26. Re1 Be5 27. Bg4+ Kf6 28. f4 Bxf4 29. Rf2 g5 30. Bb2+ Kg6 31. Rg1 f5 32. Be2 Be3 33. Rh2 h5 34. Rf1 Rh7 35. Bd3 Bf4 36. Re2 Rd8 37. Re5 Be4 38. Re6+ Kf7 39. Rf6+ Ke7 40. Re1

Petsetidi-vs-Tepelenis

40. … Rxd3+ 41. Kc2 Rd2++ 42. Kc1 Rc2++ 43. Kb1 Rxb2++ {missing a mate in one} 0-1

Kjell Børre Grebstad vs Karl–Petter Jernberg
Tromsø, 2010.07.31

1. d4 d5 2. c4 e6 3. Nf3 Nf6 4. Nc3 Bb4 5. Bg5 Bxc3+ 6. bxc3 Nbd7 7. e3 c6 8. Qc2 O-O 9. Bd3 Qc7 10. Bh4 Re8 11. h3 c5 12. Bg3 Qc6 13. Rb1 cxd4 14. cxd4 Ne4 15. Bh2 Ndf6 16. Ne5 Qa6 17. O-O Qa5 18. f3 Nd2 19. Rb5 Nxf3+ 20. Nxf3 Qd8 21. Ne5 a6 22. Rb2 Qe7 23. c5 g6 24. Qf2 Kg7 25. Bg3 h6 26. Qf3 Rf8 27. Rbf2 Bd7

Grebstad-vs-Jernberg

28. Qxf6+ Qxf6 29. Rxf6 Be8 30. Nxg6 Rg8 31. Be5 fxg6 32. Rxg6++ Kh7 33. Rg7++ Kh8 34. Rh7# 1-0

The seventh confirmed occurrence of a triple double check comes from Renaud and Kahn’s book The Art of Checkmate. The initial moves of the game are lost.

Victor Place vs N.N.
Café de la Régence, 1922

Place-vs-NN

1. Nxg7 Kxg7 2. d5 Bg4 3. Rxf6 Bxd1 4. Rg6++ Kh7 5. Rg7++ Kh8 6. Rh7++ Kg8 7. Rh8# 1-0

Seven Triple Double Checks

What Makes a Best-Selling Novel?

A Machine Learning Approach

In 2013, Ashok et al. answered this question basing on the writing style, with 61–84% accuracy. This post, on the other hand, examines plot themes in best sellers. Note that my model can hardly predict the commercial success of a novel from its plot. That would be quite a surprising feat, making reviewers obsolete. My goal was more modest: finding statistically profitable topics to write about.

Using PetScan and Wikipedia’s page export, I downloaded 25,359 Wikipedia articles belonging to Category:Novels by year. From each article, I extracted the section named Plot, Plot summary, Synopsis, etc. if present and, stripped of MediaWiki markup, saved it into an SQLite database along with the title of the novel, its year of publication, and a Boolean that indicates if it ever topped the New York Times Fiction Best Seller list:

SELECT title, year, was_bestseller, length(plot) FROM Novels
ORDER BY random() LIMIT 5;
Sharpe's Havoc            | 2003 | 0 | 2759
The Rescue (Sparks novel) | 2000 | 1 |
Slayers                   | 1989 | 0 |
The Warden                | 1855 | 0 | 2793
The Fourth Protocol       | 1984 | 1 | 5666

SELECT count(*) FROM Novels
WHERE plot IS NOT NULL;
17744

SELECT count(*) FROM Novels
WHERE plot IS NOT NULL AND was_bestseller;
398

SELECT min(year) FROM Novels  -- The year of publication.
WHERE was_bestseller;  -- The NYT list starts in 1942.
1941

To obtain easy to interpret results, I have built a logistic regression model on top of the TF–IDF transformation of articles processed by the Porter stemmer. The parameters have default values. In particular, the logistic regression uses L2 regularization so all lowercase words that are not stopwords appear in the model.

import nltk
from nltk.corpus import stopwords
from nltk.stem import porter
from sklearn import cross_validation
from sklearn import linear_model
from sklearn import pipeline
from sklearn.feature_extraction import text

def Tokenize(
    text,
    stemmer=porter.PorterStemmer(),
    uppercase=set(string.uppercase),
    stop_set=set(stopwords.words('english')),
    punctuation_re = re.compile(
        ur'[’“”…–—!"#$%&\'()*+,\-./:;?@\[\\\]^_`{|}~]',
        re.UNICODE)):
  text = punctuation_re.sub(' ', text)
  tokens = nltk.word_tokenize(text)
  return [stemmer.stem(x) for x in tokens
          if x.lower() not in stop_set and x[0] not in uppercase]

X = []
y = []
connection = sqlite3.connect('novels.sqlite')
for row in connection.cursor().execute(
    """SELECT plot, was_bestseller FROM Novels
    WHERE year >= 1941 AND plot IS NOT NULL"""):
  X.append(row[0])
  y.append(row[1])
connection.close()
X_train, X_test, y_train, y_test = (
    cross_validation.train_test_split(X, y, test_size=0.3))
model = pipeline.Pipeline(
    [('tfidf', text.TfidfVectorizer(
          lowercase=False, tokenizer=Tokenize)),
     ('logistic', linear_model.LogisticRegression())])
model.fit(X_train, y_train)

The model can return the probability of being a best seller for any novel b with a plot summary:

logit(b) = −4.6 + 2.5 tfidf(lawyer, b) + 2.4 tfidf(kill, b) + ⋯ − 1.5 tfidf(planet, b)

Pr(was_bestseller(b)|plot(b)) = elogit(b) / (1 + elogit(b))

To put these coefficients in context, tfidf(lawyer, The Firm) ≈ 0.06. As it happens, the model returns logit(b) > 0, that is Pr(was_bestseller(b)|plot(b)) > 1/2 for no novel b from the train or test set. The highest probability, 0.39, is predicted for Cross Fire, indeed a best seller in December 2010. Only if I disable the normalization in TF–IDF or weaken the regularization in the logistic regression, I can overfit the model to the train set while for the test set both its precision and recall would be at most 20%. But, like I wrote in the introduction, this is not the point of this exercise. Let us look at the words with high absolute value of coefficients.

  • Apparently, it pays off to write legal thrillers: lawyer +2.5, case +2.4, law +1.5, client +1.3, jury +1.3, trial +1.3, attorney +1.0, suspect +1.0, judge +0.9, convict +0.8;
  • kill +2.4, murder +1.8, terrorist +1.2, shoot +1.1, body +1.1, die +1.0, serial +0.9, attack +0.9, assassin +0.8, kidnap +0.8, killer +0.8.
  • Political thrillers are not bad either: agent +1.4, politics +1.4, president +1.3, defector +1.2.
  • Business may be involved: firm +1.3, company +1.3, career +1.1, million +1.0, success +1.0, business +0.9, money +0.9.
  • Finally, the characters should have families: husband +1.4, family +1.3, house +1.2, couple +1.2, daughter +1.2, baby +1.1, wife +1.0, father +1.0, child +0.9, birth +0.8, pregnant +0.8, and use a car +1.5 and a phone +0.8.

The genres to avoid for prospective best-selling authors?

  • Sci-fi: planet −1.5, human −1.0, space −0.7, star −0.4, robot −0.3, orbit −0.3.
  • Children’s literature: boy −1.3, school −1.0, young −0.8, girl −0.8, youth −0.4, teacher −0.4, aunt −0.4, grow −0.4.
  • Geography and travels: village −1.0, city −1.0, ship −0.8, way −0.7, go −0.7, land −0.6, adventure −0.6, colony −0.5, native −0.5, follow −0.5, mountain −0.5, crew −0.5, forest −0.5, travel −0.5, inhabit −0.4, sail −0.4, road −0.4, map −0.3, tribe −0.3.
  • War: fight −1.0, warrior −0.6, war −0.6, weapon −0.5, soldier −0.5, army −0.5, ally −0.4, enemy −0.3, conquer −0.3.
  • Fantasy: magic −0.9, creature −0.5, magician −0.4, zombie −0.3, treasure −0.3, dragon −0.3.
  • History: princess −0.5, rule −0.5, kingdom −0.4, castle −0.4, century −0.4, ruler −0.3, palace −0.3 (for what it’s worth, A Game of Thrones only made it to the third place on the list so it does not count as a best seller).

Note that the code above ignores capitalized words. If it does not, the most significant words become the names of characters from best selling book series: Scarpetta +3.0, Stephanie +2.9, Ayla +2.0, etc., with additional insights like FBI +1.3, CIA +1.3, NATO +0.9, Soviet +0.9, or Earth −1.1.

What Makes a Best-Selling Novel?

Wisła in Fact Likes Cracovia but Doesn’t Know How to Start Talking

The relationships between fans of football clubs in Poland can be fourfold: neutrality, friendship (zgoda), enmity (kosa), or alliance (układ). The belief that there are two disjoint blocs gathered around The Great Triad (Arka, Cracovia, and Lech) and Three Kings of Great Cities (Śląsk, Wisła, and Lechia) is false. Here is the largest connected component of the graph of friendships.

kluby

The graph of enmities would be less clear. For instance, Cracovia has friendships with Tarnovia and Sandecja but Tarnovia and Sandecja are enemies. Or GKS, Górnik, Ruch, and Zagłębie: every two of them are enemies.

Source: http://polscyhools.w.interiowo.pl/ekipy.html.

Wisła in Fact Likes Cracovia but Doesn’t Know How to Start Talking

Stylometry—It Works! (in Some Circumstances)

This post was supposed to reveal the author of the 13th Book of Pan Tadeusz, an anonymous pornographic sequel to the Polish national epic. Despite my attempts that took into account rhyming sounds, word syllable count, and custom morphological analysis for Early Modern Polish, I failed to identify the author. Which is not that bad: authorship attribution, especially when regurgitated by journalists, is often reduced to ex cathedra statements: “a computer has proven that work X was written by author Y”; the fact that the confidence level is unknown is not reported.

Instead of a literary discovery, I present you a little game: Which Polish text is your writing like? It tells me that The 13th Book is most similar to Antymonachomachia by Ignacy Krasicki who died 33 years before the publication of Pan Tadeusz. Oh well.

The game is based on texts from Wolne Lektury, the Polish equivalent of Project Gutenberg. I appreciate Radek Czajka’s help in downloading them.

Since I know little about writing style analysis (known as stylometry), the entire sophistication of my program lies in calculating the frequency of a few dozen of tokens in each text. This idea is similar to Alphonse Bertillon’s anthropometry, a late-19th-century efficient system of identifying recidivists by classifying eleven body parts as small, medium or large.

We compare text style rather than text topics, so the program pays little attention to content words. It counts final punctuation marks, commas, and 86 frequent function words, that is conjunctions, prepositions, adverbs, and so-called qubliks. These counts are divided by the total number of tokens in the text, yielding a 90-dimensional vector of token frequency for each text.

The figure below shows the results of hierarchical clustering of the texts longer than 5000 tokens, obtained with

scipy.cluster.hierarchy.dendrogram(
    scipy.cluster.hierarchy.linkage(
        frequency_matrix, method=’ward’, distance=’euclidean’))

dendrogram

I, for one, am impressed by its gathering together most of texts written by Kasprowicz, Krasicki, Rzewuski, and Sienkiewicz, or translated by Boy–Żeleński and Ulrich.

How reliable are the results? To answer this question, I perturbed the token counts: for each text composed of N tokens, I replaced k occurrences of each counted token by a random variable with the binomial distribution B(N, k/N), that is the count of heads in N tosses of a biased coin whose heads probability is k/N. For each text from Wolne Lektury, the x axis in the figures below shows the total number of tokens. The y axis shows the frequency with which the nearest point by the Euclidean metric corresponded to a different text or a text by another author/translator, measured in 1000 such random perturbations. In case you wonder how the y axis appears logarithmic and contains zero at once, the plotted variable is log(y + 0.001).

I approximated both the text misattribution probability and the author misattribution probability by 1−(erf(√N/c))b, with empirical values of constants b and c depending on the language, the tokens, and the texts.

Here is my hand-waving explanation of this formula. The coordinates of perturbed points, multiplied by N, have a multivariate binomial distribution (it does not matter whether the coordinates are correlated or not). When N approaches infinity and k/N remains constant, the binomial distribution is asymptotically normal with variance proportional to N (by the central limit theorem applied to tossing the coin), and the multivariate binomial distribution is asymptotically multivariate normal. Dividing the random variables by N, we return to the coordinates, which asymptotically have a multivariate normal distribution with individual variances and covariances proportional to 1/N.

The points divide the 90-dimensional vector space into Voronoi cells whose centres correspond to the mean vectors of the distributions. Moving a point to the other side of some wall of its Voronoi cell means moving it by more than d in the direction perpendicular to the wall. The projection of any multivariate normal distribution with variances and covariances proportional to 1/N onto a vector is a (univariate) normal distribution with variance proportional to 1/N. The probability that a random variable with variance σ2=a/N differs from its mean by more than d (that is, that the permuted point crosses the wall, causing a misattribution) equals 1−erf(d/σ) = 1−erf(dN/√a) = 1−erf(√N/c). Since the Voronoi cell has many walls in different directions, the overall probability that the point exits its cell is approximately equal to 1−erf(√N/c1)×⋯×erf(√N/cn). The erf function decreases rapidly so the factors with the smallest cis dominate the product, which can be approximated by the formula 1−(erf(√N/c))b.

wrong-text
wrong-author

The figures explain why it was hard to ascribe the author to The 13th Book: even if other works by the author belonged to the Wolne Lektury corpus (they probably do not), The 13th Book has merely 1773 tokens.

Stylometry—It Works! (in Some Circumstances)

Sipping Rum: Some New Palindromes

A somewhat popular sport [1, 2] is extending Leigh Mercer’s immortal palindrome “A man, a plan, a canal—Panama!” It occurred to me that its principle can be applied to the Polish palindrome by Julian Tuwim: “Popija rum as, samuraj i pop.” (“Both an ace, a samurai, and an Orthodox priest are sipping rum.”) All we need is a computer program and a list of Polish animate nouns. Here we go:

Popija rum as, said, diak, goj, drab, tokolog, igrek, odlewca, mim, tenor, abba, rodak, imam, alkad, gigant, alb, ober, retor, fan, ilot, rapper, nowy car, usar, adresat, efor, papa, grek, saper, treser, epik, bob, angol, ananas, aga, mameluk, urka, tatka, ergolog, ladro, lis, ork, induna, grum, fleja, batiar, akyn, wał, sowar, psar, kudła, renegat, symplak, ilota, kat, alumn, amor, eponim, daremnik, spec, tan, gajowy, durnota, kret, inka, mods, esbol, rajtar, bidak, tamada, mongoloid, arat, sir, abat, imamita, barista, radiolog, nomada, mat, kadi, brat, jarl, obses, domak, niter, katon, rudy woj, agnat, cep, skin, mer, admin, operoman, mulat, akatolik, alp, mysta, generał, duk, ras, prawosławny, karaita, baj, elf, murga, nudnik, rosi, lord, algolog, reak, tata, kruk, ulema, mag, asan, analog, nabob, kiper, eser, trep, asker, gapa, profeta, serdar, asura, cywon, rep, partolina, froter, reb, oblat, nagi gdak, lama, mikado, rab, baronet, mima, cwel, doker, gigolo, kot, bard, jog, kaid, dias, samuraj i pop.

(The adjectives nowy, rudy, and nagi got mixed among nouns. For a better effect, I manually removed the commas that followed them.)

Lazily, I used only slightly modified Peter Norvig’s backtracking program. I extracted the nouns from a text file used in the Polish morphological analyzer Polimorfologik. The lines of the file look like this:

samuraj         samuraj subst:sg:nom:m1
samuraja        samuraj subst:sg:acc:m1+subst:sg:gen:m1
samurajach      samuraj subst:pl:loc:m1
samurajami      samuraj subst:pl:inst:m1

The appropriate forms of nouns can be extracted with

$ grep subst:sg.*nom.*m1 polimorfologik.txt | cut -f 1 > npdict.txt

The m1 class contains masculine-personal aka virile nouns. Although the names of animals from the m2 class would also suit our purposes, that class contains also names of odd things like currencies, dances, car brands, or mushroom genera that would look strange in the sentence. With apologies to feminists, I have no means of extracting feminine-personal nouns or neuter-personal nouns automatically as they play no special role in Polish grammar.

The palindrome above contains only singular forms of common nouns (152 words in total). If we allow also singular masculine forms of adjectives, we can get a 269-word palindrome:

Popija rum as, said, diak, goj, drab, perski murga, kadi beż netto, klawy rzutki froter, kto, penolog, iglany sini magaski frant, utyty bojowny kaper, trak, sanowy cynawy rusy rotny sracz, sowar, aspan, ilot, raptor, mamlas, on, rebe, wali lis, jebak, cacy woli amor, fan, wodnik, ergolog, lama, mim, asker, gad, jarski men, as, pajac, darmy rosi preser, oferent, rapsod, nabab, abat, symplak, induna, tebriski gamrat, akatolik, rodak, ladro, lisi migany wandal, papa, cwel, doker, gid, rumski golkiper, asura, elew, okej arat, siwy ratar, maksi potowy żywotni wig, orski alb, obyły baca, cywil, paskuda, gemajn, inka, tępy wolowaty raby skin, turkos, esbol, użyty spec, ki marecki mima, kruk, a-ż popi lżywy wozak, angol, ananas, a-z raja, jary picer, kok, tato, nominat, inaki cap, ospały wilk, muli pupka, mods, aborter, dr, enat, siny talib, abba, reb, be rab, babi latynista, nerd, retro, bas, domak, pupil, umkliwy łaps, opaci kani tani mono tatko, kreci pyra, jajarz, asan, analog, nakazowy wyżli pop, żak, urka, mimik, ceramik, cep, syty żul, obses, okrutnik, sybaryta, wolowy pętak, ninja, mega duk, sapliwy caca były bob, laik, srogi wintowy żywotopis, kamrat, arywista, rajek, ow, elear, usar, epik, logik, smurd, igrek, odlewca, papla, dnawy nagi misi lord, alkad, ork, ilota, katar, magik, sir, beta, nudnik, alp, mysta, baba, bandos, partner, efor, eser, pisorym, radca, japs, anemik, srajda, grek, sam, imam, algolog, rekin, down, afro, mailowy cacka, bej, si lila, weber, nosal, mamrot, partolina, psar, awosz, car, syn, torys, urywany cywon, askar, trep, akyn, woj, obyty tutnar, fiks, aga, mini synal, gigolo, nepot, kret, orfik, tuz, rywal, kot, tenże bidaka, grum, iks, rep, bard, jog, kaid, dias, samuraj i pop.

Using plural and proper nouns, we can reach at least 1493 words, for instance:

Popija rum as, kalif, drab, Belg, Remus, Leon, pank, said, diak, Goj, Acis, Pac, Tabak, Kajus, reb, luj, Iwon, Noe, Selim, kacap, Damon, Melcer, epicy, Rob, Atkinson, Jahn, Wahl, Omar, turowiec, ubici, nabab, Mobutu, helota, Nagórski, Dyda, kraker, Gola, Timur, Gil, Ramzes, Romanik, imperator, Ajnos, waleci, biker, rajtar, Umer, Miotk, Popek, Nils, Lefeld, epik, car, Tamil, Amoni, Capała, Sarnat, Jeron, Urban, Orwelle, Tym, Einar, Usarek, rabi, nemrodzi, Kramnik, Sałacki, Sawini, basza, Idzi, Zaremba, raja, Rogacz, Rubeni, Atlas, ontolog, anemicy, Ron, imitator, Inka, Bulik, setkarz, sublokator, Eisler, akatolicy, tato, idioci, Vidor, Apacz, Alan, Ziomek, Albin, Rom, Oktawian, odaj, Arka, kadi, boss, Ursyn, Ahmad, Dassin, Eden, Siadlak, Oscar, Idec, udecy, Rupert, rastamani, Armin, Onak, Rola, Gill, Eco, były, Tom, Eluard, Niski, Nyk, Anatol, Olson, Orkan, rasta, Geller, Bask, oblat, Emil, Ado, Izydor, ras, sipaj, Amnon, Nelson, imam, lapicyda, Bill, Iwan, Alo, Skiba, Tim, Jankes, Uryga, Nycz, Darek, Ardelli, arbek, lirycy, Raczek, Orion, Roda, Cortazar, Odo, idol, Otis, Sobik, cep, opaci, Sak, Lew, Morka, bydlak, Sommer, Feret, Tyrsi, robol, opilec, Rama, Gert, ufolog, Iżyccy, Malka, Kwak, Malak, Tal, Redak, Solti, Lopez, Sot, rabbi, Latacz, Darren, Sopata, Taj, ninje, limnolog, flisak, udek, Kimak, Byrski, Fibak, Masaj, Elsner, efeb, lokator, Papała, Bacik, Cepil, Sabat, sietniak, tokolog, Igrek, Sade, Mahomet, Opacki, sowar, Racki, Amor, Rawik, suswał, Syta, rwacz, Dow, Zadura, Dunin, Olas, Bakuła, Bil, Loeb, ergolog, Lasek, Liw, jarle, Varga, mener, Tabaka, logicy, Razin, Roja, Paganini, Lak, Baran, ilot, rapper, twoi, Wołosi, Colin, Eweni, Borak, Jaksa, gimnazjasta, Rams, Uri, Cywka, Durka, Meisels, Ilia, Koj, Jordi, Vadim, Arrow, Agaton, Rudnik, Eric, Neil, knajak, Tyrawa, Bazan, Nestor, Olek, Nawoj, uwol, Racine, Magnus, mahdi, Wronka, Bełza, Floryni, Zub, Ołdak, Lasota, Linde, Ibsen, Negr, Ezopi, Woda, Swatek, odlewcy, Cezary, preser, eks, ubol, Koba, katar, Katz, Sapir, Nehru, lider, ubek, Collina, blogger, git, Nastak, Inuita, Maj, akolici, Fuk, nomada, rapsod, Adad, Jagger, autycy, Ted, Jeka, Jakub, ulema, Kwoka, Byrnes, rajas, abaci, Bukała, Bigos, satrapa, kortyzani, Lewek, alastor, profes, Ares, Kobak, lord, Labuda, Prada, Gleba, Trak, Sas, Able, Baka, Dubik, Cortez, Carsten, Rokita, Sornat, stenograf, Asser, Roth, celnik, Surmiak, Kern, Elgar, Edgar, Agis, tutnar, Ozawa, Kret, ultrasi, Bulak, Rurak, Dubois, Eldar, Bator, askar, Buksa, Kulig, Annamita, Rodak, Edyp, Agha, psor, Engel, Orione, Rota, King, asan, agregat, symplak, Ezaw, Ed, Toni, Gamrat, Artur, Ammon, organik, radny, doyeni, woj, agnat, sir, tatul, Pini, Drabik, modele, Depa, Dowi, Losey, Lesik, Sobota, Lis, pajac, Rolnik, Topor, Knysak, Turner, Olsen, Rabin, Mizak, Lada, Tyl, kady, filareci, patron, etatowi, Sasak, lemani, Wontor, Wanat, esbol, logik, Seliga, Mały, Waluk, Komsta, radiesteci, Nata, lump, Radlak, Stefani, Samsel, Tibor, Deptuła, baje, rabini, Norris, Ajmar, Grecy, zabici, Cini, akrobata, Dustin, Abel, Bąba, krytycy, Brando, Baj, Lizut, Zin, Bielat, Fokker, Edison, Jarema, logograf, Ilje, Bata, Pazik, smerd, Renat, Hempel, Rus, Rey, matoł, Spytko, jubilat, fani, Sudnik, etolog, Nanaj, Abram, Otokar, wyrypaje, Izraele, Elkana, Motak, Josh, Calik, Kelles, seksoman, acanek, Kazulin, Odon, alim, Kahn, abba, Dante, Mulawa, Lasak, Puk, Jagła, renegat, Samon, Osak, Nijak, Golan, Aznar, Ficek, Odil, aktor, Gruza, maruda, Mick, eleata, Pol, Alec, Lehr, Habakuk, oblaci, Fin, Adam, Rojak, modsi, Welt, Opałka, Sroka, Galaty, Malik, Sałata, Inuici, Walas, homeopata, Kunz, Siwik, Cąkała, Kazko, Wonder, Flak, urka, Maldini, mohele, nemrod, Nash, Allan, orator, Mamak, John, Eros, Kenar, Fik, Cebulak, Lompa, Grek, Rapak, Pot, stangreci, Pen, Rubaj, elf, Kukiz, androfag, Rumas, Ornat, Ante, fajter, rajca, geje, Becu, Mak, lokat, pedagog, adept, togat, Simon, Erik, Cudak, sensat, Natanek, inki, Piccoli, kretyn, Liszcz, sipaje, tamada, Morgała, Duff, Alois, Anan, Aresi, Reje, Makuła, Bułat, Olak, Jedak, Repin, Nita, Kudyba, Breza, gej, Dejon, Gała, Parda, Hatak, Lussac, Ulf, larrikini, leweller, rafiner, Atapask, insi, Boni, Capone, Fryz, sumici, Mann, Arent, Socyn, Tomas, Roger, groom, kat, Rufin, Tomasze, Najder, froter, Pinda, Rataj, Zappa, Prodi, skini, model, papa, Dudała, Kalukin, Lisik, Swat, Sycz, Durak, trombon, Snell, atleci, foks, Kamil, Lutz, Solak, Renn, Ernest, pokraka, Meir, uczeni, tramp, pedał, Duk, wał, Carnot, Romeo, Fedak, Ratka, Dydycz, Rubik, Celej, Opoka, baca, Kiwak, Bubak, Orest, Keler, Ebert, Saługa, mimik, pupka, Trawka, mods, Rudi, Wadas, Korpak, Majak, Messi, papla, Sawka, tępak, Dudycz, Delon, Allach, Corelli, Heine, Lejb, Resnais, sahib, mozarab, Otto, Klein, adresant, sardar, homo, niter, Al, Hadała, Bugała, Basta, serdar, DJ, adamici, Numida, wałkoń, erudyci, Gama, Kutz, Skałka, nudnik, Nobel, opętany, zupak, typas, Elamici, Follett, Sujka, Basak, Tutak, Pałka, Perez, Cliff, Latała, hip, opat, Sierak, Pęksa, Haba, Bremer, kok, Loba, Pękała, pokraki, Kasiak, Lepka, Boba, goje, tatka, Pułka, Pełka, Josif, Lars, esbole, Druszcz, sułtan, Hawel, pludrak, Solon, gapa, Klim, Kazik, alumni, Soski, Marecki, Sowa, profeci, Noam, Nalepa, Korda, Lece, dewot, Stowe, Dece, ladro, kapelan, Mao, Nicefor, Paw, Osik, ceramik, Sosin, Mulak, Izak, Milka, Pagnol, Oskar, Dul, Plewa, Hnat, Łuszcz, Surdel, obses, Ralfi, Sojak, łepak, Łupak, Tate, joga, Bobak, Pelka, Isak, Ikar, Kopała, Kępa, Bolko, Kremer, baba, Has, Kępka, reista, popi, Hałat, Alf, Fil, Czerepak, Łapka, Tutka, Sabak, Just, Tell, ofici, Malesa, Pytka, Puzyna, tępole, Bonk, induna, Kłak, Sztuka, magicy, dureń, Okła, Wadim, unici, Madaj, dr, adresat, Sabała, Gubała, Dahl, Aretino, Mohr, Adrast, Naser, Daniel, Kott, Obara, zombi, Hass, Ian, Serb, jelenie, Hille, Roch, Callan, Oledzcy, Dudka, pętak, wasal, Papis, Semka, Jamka, Proksa, Dawid, Urs, domak, Wartak, pupki, mima, Guła, streber, elekt, Seroka, Bubka, Wika, Cabak, opoje, Lecki, burzcy, Dydak, Tarka, Defoe, Morton, Racław, Kudła, Depp, Martinez, Curie, Makar, Kopt, Sen, Renner, kalosz, Tulli, Maks, Kofi, Celt, Allen, snob, Mortka, Rudzcy, Stawski, silni, Kula, Kała, Duda, paple, Dominik, Sidor, Papp, Azjata, radni, pretor, Fred, Janez, samotni, Furtak, Moor, Gregor, samotny, Costner, Annamici, muszyr, fenopaci, Nobis, Niksa, patareni, Farrell, Ewelini, Kir, Ralf, Lucas, Sulka, Taha, Drapała, gnoje, DJ-e, gazer, baby, Dukat, inni, Perka, Dej, Kalota, Łuba, Łuka, Mejer, Iser, ananasi, Olaf, Fudała, Gromada, Mateja, Piszcz, silny, Terki, Locci, piknik, enat, Antas, Neska, Ducki, renomista, Gott, pedagoga, Depta, Kolka, muce, beje, Gac, Jarret, Jafet, Natan, Rosa, murga, Ford, nazi, Kuk, fleja, Burne, picer, Gnat, Stopka, Parker, gap, Molka, Lubecki, Franek, Soren, Hojka, mamrot, Aron, Allah, Sandor, menele, hominid, lama, Kruk, Alfred, Nowok, zakała, Kącki, Wisznu, Kata, Poe, Mohs, alawici, uniata, Łaski, lamy, Talaga, Korsak, Łapot, Lewis, Domka, Jorma, Dani, Fic, alb, Okuka, Bahr, Helcel, alopata, elekci, Madura, Mazur, Grot, Kali, dokeci, Franz, analog, Kaj, Inkas, onomasta, generał, Gaj, Kupka, Salawa, Lumet, Nadab, ban, Hak, Milan, Odoni, luzak, Ken, acan, Amos, Kessel, Lekki, Lach, Sojka, Tomana, Klee, Lear, Zieja, pyry, wrak, Otomar, Bajan, Angol, Otek, Indusi, Naftali, Bujok, typ, Słota, Myers, Urlep, Mehta, nerd, Remski, Zapata, bej, Lifar, gogo, lamer, Ajnosi, Derek, Kofta, Leibniz, tuz, Ilja, Bodnar, Bycy, Tyrka, bąble, banit, Suda, Tabor, kainici, Ciba, zycer, Gram, Jasir, ronini, Bareja, Bałut, Pedro, bitles, Masina, Fet, skald, Arp, Mulat, Anicet, seid, arat, Smok, kulawy, łamagi, Leski, Gollob, Seta, Nawrot, Nowina, Melka, Sasi, Wota, tenor, tapicer, Ali, Fyda, Klyta, Dalka, Zimni, Barnes, Loren, Rutka, syn, Kropotkin, Lorca, japsi, Lato, boski, Selye, Soliwoda, pedele, Domki, Bardini, Pluta, Tristan, gajowi, Ney, Odyn, Darkin, agronom, Marut, ratar, Maginot, dewa, zek, alp, mysta, Ger, Ganas, Agni, Kato, Renoir, Oleg, Nero, spah, gapy, Deka, Dorati, man, nagi, Lukas, Kubrak, Sarota, Brad, Lesio, Budka, Rurka, lubi, Sart, Luter, Kawa, Zoran, Tutsi, Gara, gdera, Glen, Rek, Kaim, Ruskin, Lech, Torres, Safar, Gonet, Stan, Rosati, Kornet, sracze, Trocki, Buda, kabel, Basa, skartabel, Gad, Arpad, Ubald, Rolka, bokser, as, efor, Prot, Salak, Ewelin, Azy, Troka, par, Tasso, Gibała, Kubica, Basaj, Arsen, Rybakow, Kamel, Ubu, Kajak, ejdetycy, Tuareg, Gajda, Dados, Parada, Monk, ufici, lokaj, Amati, unikat, Santi, Greg, Golba, Nil, Locke, Bur, Edi, Lur, Henri, Paszta, Krata, kaboklo, busker, eser, pyra, zecy, cwel, doketa, wsadowi, pozer, Gennes, biedni, Latos, alkad, łobuziny, Rolf, az, łebak, Norwid, Hamsun, gameni, Carlo, wujo, Wankel, Or-Ot, Senna, Zabawa, Rytka, Jan, klienci, rekin, durnota, Gawor, Rami, David, Roj, Jokai, Lisle, Siemak, Rudak, wycirus, Marat, Saj, Zan, Migas, Kajka, Robin, Ewen, iloci, Sołowiow, trep, partolina, rab, Kalinin, aga, Pajor, nizaryci, Golak, abat, Rene, mag, Ravel, Raj, Wilkes, algolog, rebe, Olli, Bałuk, Absaloni, nuda, Ruda, zwodzca, Wratysław, Suski, Warro, Maik, Carra, Wosik, Capote, Mohamed, asker, gigolo, Kot, Kain, teista, bas, Lipecki, Cabała, Paprota, Kolbe, Ferens, Leja, Sam, kabi, fiks, Rybka, Mikke, Dukas, Ilf, Golon, Milej, ninja, tata, Posner, radzca, Talib, Bartosze, Polit, Loska, Derlatka, Lam, Kawka, Klamyccy, żigolo, Futrega, Marceli, Polo, Boris, Rytter, Efrem, Moskal, Dyba, Kromwel, Kasica, Popecki, Bossi, Tolo, Diodor, Aza, Troc, Adorno, Irokez, carycy, Rilke, Braille, Drake, Radzcy, Nagy, Rusek, najmita, Bik, Solana, Will, ibadyci, Palma, Minos, Lennon, Maja, Pissarro, Dyzio, Dali, metal, Boksa, Brel, legat, Sarna, Kronos, Lolo, tan, akyn, Iks, Indra, ulem, otyły, Bocelli, Gal, ork, anonim, Raina, mat, Sartre, puryce, duce, Dirac, Sokal, Daisne, Denis, Saddam, hanys, Russo, bidaka, Kraj, Adonai, Wat, komorni, Blake, moi, znalazca, Parodi, Vico, idiota, Tyc, ilota, Karel, Sierota, Kolbusz, Rak, Teski, Lubak, Niro, Tati, minoryci, menago, Lot, nosal, Taine, burzca, Goraj, Arab, mer, Aziz, Diaz, Sabini, Wasik, Cała, skin, markiz, Dor, meni, Barker, asura, niemy, Telle, Wrona, Bruno, Rejtan, Rasała, Pacino, Mali, Matracki, pedle, Fels, Linke, pop, kto, Imre, Murat, Jarre, kibice, Lawson, Jarota, rep, Mikina, Morse, zmarli, grum, italogrek, Arkady, Diks, Róg, Anatole, Hutu, Bomba, banici, buce, Iwo, Rut, ramol, Hawn, Hajnos, Nik, taboryci, Perec, Lem, nomad, Pacak, Miles, eon, nowi, Jul, Ber, Sujak, Kabat, cap, Sica, jog, kaid, dias, Knap, Noel, Sumer, Gleb, bard, Filak, samuraj i pop.

Sipping Rum: Some New Palindromes