In fact, only of the lower house, the Sejm. During its current term, 460 deputies voted 5545 times since the 8th of November 2011. The Sejm website publishes the individual results of the votes as PDF files. I put the results into an SQLite database with an unholy hodgepodge of shell scripts, wget, pdftohtml, and Python. Here is a sample query and its output:
SELECT deputy, party, voting, result FROM VotingResults JOIN Deputies USING(deputy_id) JOIN Parties USING(party_id) JOIN Votings USING(voting_id) JOIN Results USING(result_id) ORDER BY random() LIMIT 5; HOK MAREK | PO | data/glos_32_10.html | NAY MILLER LESZEK | SLD | data/glos_49_42.html | AYE MASŁOWSKA GABRIELA | PiS | data/glos_18_14.html | ABSTAIN ROMANEK ANDRZEJ | SP | data/glos_93_58.html | AYE ŁOPATA JAN | PSL | data/glos_57_65.html | ABSENT
The database registers as many as 509 deputies since some of them died during the term or were appointed or elected to other posts. The deputies are grouped into parties according to their latest status. There are data on 5539 votings. I skipped 6 elections of speakers whose PDF layout was different.
Using Python and numpy, we can put the results into a matrix R with 5539 rows and 509 columns. Its columns assigned to deputies are vectors in the 5539-dimensional space of voting results. The elements of R are +1 for AYE, −1 for NAY, and 0 for ABSTAIN and ABSENT. Shift each row to make its mean equal to zero, keeping ABSTAIN and ABSENT unbiased at zero:
M = R.sum(axis=1) / numpy.fabs(R).sum(axis=1) MT = M[:, numpy.newaxis] Rcentred = numpy.where(R != 0, R - MT, 0)
Then perform the Singular Value Decomposition of Rcentred:
U, S, VT = numpy.linalg.svd(Rcentred, full_matrices=False)
U holds the so-called left-singular vectors of Rcentred along which the variance of Rcentred is the largest possible. S contains the so-called singular values of Rcentred. We do not use VT.
We have just carried out the Principal Component Analysis of R. Thanks to it, we can visualize 5539 dimensions of R in a much smaller number of dimensions. In our case, three initial left-singular vectors contribute 70.1%, 8.5%, and 3.6% of the total variance of R, respectively. The projection of the vectors of Rcentred along its three initial left-singular vectors looks like this:
The x axis, usually called the partisan axis, corresponds to the division between the government (PO+PSL) and the opposition (everybody else). The votings with the highest absolute weights in the first left-singular vector concern the involvement of the government in the Amber Gold scandal: this (−0.0216) and this (+0.0217).
The y axis appears to correspond to right wing-left wing polarization (everybody else versus SLD+RP). Its most significant votings are the rejection of a proposal to waive the law against insulting religious feelings (−0.0451) and the proposal to raise the taxes on copper and silver mining (+0.0447).
The z axis, in turn, seems to correspond to sentiments towards the European Union (everybody else versus PiS+RP+ZP aka SP). Its most significant votings concern two alignments between Polish and EU law: this (−0.0463) and this (+0.0470).
Finally, here are the votes of individual deputies projected on the xy, xz, and yz planes.