The Vector Space of the Polish Parliament in Pictures

In fact, only of the lower house, the Sejm. During its current term, 460 deputies voted 5545 times since the 8th of November 2011. The Sejm website publishes the individual results of the votes as PDF files. I put the results into an SQLite database with an unholy hodgepodge of shell scripts, wget, pdftohtml, and Python. Here is a sample query and its output:

SELECT deputy, party, voting, result
FROM VotingResults
JOIN Deputies USING(deputy_id)
JOIN Parties USING(party_id)
JOIN Votings USING(voting_id)
JOIN Results USING(result_id)
ORDER BY random()

HOK MAREK          | PO  | data/glos_32_10.html | NAY
MILLER LESZEK      | SLD | data/glos_49_42.html | AYE
MASŁOWSKA GABRIELA | PiS | data/glos_18_14.html | ABSTAIN
ROMANEK ANDRZEJ    | SP  | data/glos_93_58.html | AYE
ŁOPATA JAN         | PSL | data/glos_57_65.html | ABSENT

The database registers as many as 509 deputies since some of them died during the term or were appointed or elected to other posts. The deputies are grouped into parties according to their latest status. There are data on 5539 votings. I skipped 6 elections of speakers whose PDF layout was different.

Using Python and numpy, we can put the results into a matrix R with 5539 rows and 509 columns. Its columns assigned to deputies are vectors in the 5539-dimensional space of voting results. The elements of R are +1 for AYE, −1 for NAY, and 0 for ABSTAIN and ABSENT. Shift each row to make its mean equal to zero, keeping ABSTAIN and ABSENT unbiased at zero:

M = R.sum(axis=1) / numpy.fabs(R).sum(axis=1)
MT = M[:, numpy.newaxis]
Rcentred = numpy.where(R != 0, R - MT, 0)

Then perform the Singular Value Decomposition of Rcentred:

U, S, VT = numpy.linalg.svd(Rcentred, full_matrices=False)

U holds the so-called left-singular vectors of Rcentred along which the variance of Rcentred is the largest possible. S contains the so-called singular values of Rcentred. We do not use VT.

We have just carried out the Principal Component Analysis of R. Thanks to it, we can visualize 5539 dimensions of R in a much smaller number of dimensions. In our case, three initial left-singular vectors contribute 70.1%, 8.5%, and 3.6% of the total variance of R, respectively. The projection of the vectors of Rcentred along its three initial left-singular vectors looks like this:

A scatter plot of three principal components of the voting patterns

The x axis, usually called the partisan axis, corresponds to the division between the government (PO+PSL) and the opposition (everybody else). The votings with the highest absolute weights in the first left-singular vector concern the involvement of the government in the Amber Gold scandal: this (−0.0216) and this (+0.0217).

The y axis appears to correspond to right wing-left wing polarization (everybody else versus SLD+RP). Its most significant votings are the rejection of a proposal to waive the law against insulting religious feelings (−0.0451) and the proposal to raise the taxes on copper and silver mining (+0.0447).

The z axis, in turn, seems to correspond to sentiments towards the European Union (everybody else versus PiS+RP+ZP aka SP). Its most significant votings concern two alignments between Polish and EU law: this (−0.0463) and this (+0.0470).

Finally, here are the votes of individual deputies projected on the xy, xz, and yz planes.

EDIT: here is a zoomable version, made with the Bokeh visualization library. Thanks for the tip, stared!

The 1st and the 2nd principal components of the voting patterns

The 1st and the 3rd principal components of the voting patterns

The 2nd and the 3rd principal components of the voting patterns

The Vector Space of the Polish Parliament in Pictures