Brexit and Data Science

What can we make of the indicative votes in parliament?

If you're following the Brexit process (who isn't in the UK!), you might be aware that MPs held indicative votes on the 27th of March, to vote on which steps they could stand behind for Brexit.

Well, all of the proposals failed. Since the pattern of votes for individual MPs were quite varied, I was wondering if I could spend a few minutes and apply same of the basic methods we teach on the Cambridge Spark Applied Data Science bootcamp to see if I find anything interesting.

Data Collection

First of all, the data!

All of the division (aka vote) results are published as a CSV file. Click the below button to download them now.

Get Data

The votes we're interested in are divisions #655 to #662.

I downloaded the appropriate CSV files from the corresponding division pages (for example https://commonsvotes.digiminster.com/Divisions/Details/655 ) and put them in a folder called 'data' (as we do on the bootcamp).

Looking at the files, you can see that they have the same structure, where line 4 is the name of the division, and the voting results start at line 10:

 
Division Number: 386 Division Date: 27/03/2019
Mr Baron's motion B (No deal)
Aye Count: 160 Noes Count: 400
Members recorded Member,Party,Constituency,Vote,Proxy Member "Diane Abbott","Labour","Hackney North and Stoke Newington","No","" "Debbie Abrahams","Labour","Oldham East and Saddleworth","No","" "Nigel Adams","Conservative","Selby and Ainsty","No Vote Recorded","" "Bim Afolami","Conservative","Hitchin and Harpenden","No",""

 

We can write some simple Python code to read in these files 1-by-1, parse the name of the vote, load the records as a CSV, augment with the name, and then combine all 8 votes to a single Pandas dataframe.

Please note, that I have chosen to transform the results from text to a number where -1 is No, 1 is Yes and 0 is no vote.

 

import numpy as np import pandas as pd from functools import reduce def vote_text_to_int(vote): if vote == 'Aye': return 1 elif vote == 'No': return -1 elif vote == 'No Vote Recorded': return 0 else: raise ValueError('Unsupported vote: {}'.format(vote)) def read_division(d): file_name = 'data/Division{}.csv'.format(d) division_name = None with open(file_name,'r') as f: division_name = f.readlines()[3][:-1] votes_data = pd.read_csv(file_name, skiprows=9) votes_data['Vote'] = votes_data['Vote'].apply(vote_text_to_int) votes_data.rename(columns={'Vote':division_name}, inplace=True) del votes_data['Proxy Member'] votes_data.set_index(['Member', 'Party', 'Constituency'], inplace=True) return votes_data data = [read_division(i) for i in range(655, 662+1)] data = reduce(lambda x,y: x.join(y), data) data.reset_index(inplace=True) data.head() 

 

The resulting Pandas dataframe looks something like this:

Screenshot 2019 04 02 at 14.05.55

Having this data I was interested if there are obvious clusters of MPs who vote in a similar fashion. I decided to use T-SNE, a powerful algorithm that helps visualize high dimensional data in 2 dimensions, which help general understanding of the data (https://lvdmaaten.github.io/tsne/).  

Playing with some of the settings I ended up with the below picture:

Screenshot 2019 04 02 at 14.08.28

The code to generate this is actually quite simple (thanks to sklearn and seaborn!)

from sklearn.manifold import TSNE
feat_cols = data.columns[3: ] tsne = TSNE(n_components = 2, verbose = 1, perplexity = 25, n_iter = 500) tsne_results = tsne.fit_transform(data.loc[: , feat_cols].values)
data_tsne = data.copy() data_tsne['x-tsne'] = tsne_results[: , 0] data_tsne['y-tsne'] = tsne_results[: , 1]
import seaborn as sns
sns.set(rc = { 'figure.figsize': (13, 13) })
sns.scatterplot(data = data_tsne, x = 'x-tsne', y = 'y-tsne', hue = 'Party')`

 

I was a little surprised to see this visual, as I expected tighter factions (maybe 4-5 of them), but seems like there is a bigger close cluster of Conservative MPs, 2 other small clusters, and everybody else all over the place.

If opinions are so vastly varied, it is not surprising that Parliament is struggling to get a majority. I was also interested in who are those MPs on this bigger cluster. To do that, I turned to a clustering algorithm, K-MEANS, which in a few lines of Python code managed to tag me the cluster in question with a cluster_id of 0:

scatter graph brexit cambridge spark tutorial david illes

The code for this is also absolutely minimal because of how amazing Scikit-learn is (do you start to see a pattern here?).

from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters = 5, random_state = 0).fit(data_tsne.loc[: , ['x-tsne', 'y-tsne']]) data_kmeans = data_tsne.copy() data_kmeans['cluster'] = kmeans.predict(data_tsne.loc[: , ['x-tsne', 'y-tsne']]) sns.scatterplot(data = data_kmeans, x = 'x-tsne', y = 'y-tsne', hue = 'cluster')

 

Now that these MPs are tagged, it's easy to filter them from the dataframe.

data_kmeans[data_kmeans.cluster == 0].Member.values

 

The full list as follows:

['Adam Afriyie', 'David Amess', 'Stuart Andrew', 'Richard Bacon', 'Steve Baker', 'John Baron', 'Henry Bellingham', 'Crispin Blunt', 'Peter Bone', 'Ben Bradley', 'Suella Braverman', 'Jack Brereton', 'Andrew Bridgen', 'Fiona Bruce', 'Conor Burns', 'William Cash', 'Rehman Chishti', 'Christopher Chope', 'Simon Clarke', 'Geoffrey Clifton-Brown', 'Tracey Crouch', 'David T. C. Davies', 'Philip Davies', 'Caroline Dinenage', 'Nadine Dorries', 'Steve Double', 'Richard Drax', 'James Duddridge', 'Iain Duncan Smith', 'Charlie Elphicke', 'Nigel Evans', 'Michael Fabricant', 'Mark Francois', 'Marcus Fysh', 'Zac Goldsmith', 'James Gray', 'Chris Green', 'Mark Harper', 'Rebecca Harris', 'John Hayes', 'Kate Hoey', 'Philip Hollobone', 'Adam Holloway', 'Eddie Hughes', 'Ranil Jayawardena', 'Bernard Jenkin', 'Andrea Jenkyns', 'Boris Johnson', 'Caroline Johnson', 'Gareth Johnson', 'David Jones', 'Mark Lancaster', 'Edward Leigh', 'Andrew Lewer', 'Julian Lewis', 'Julia Lopez', 'Jonathan Lord', 'Tim Loughton', 'Craig Mackinlay', 'Rachel Maclean', 'Anne Main', 'Alan Mak', 'Kit Malthouse', 'Paul Maynard', 'Stephen McPartland', 'Esther McVey', 'Stephen Metcalfe', 'Maria Miller', 'Amanda Milling', 'Nigel Mills', 'Sheryll Murray', "Neil O'Brien", 'Matthew Offord', 'Priti Patel', 'Owen Paterson', 'Christopher Pincher', 'Tom Pursglove', 'Will Quince', 'Dominic Raab', 'Jacob Rees-Mogg', 'Laurence Robertson', 'Andrew Rosindell', 'Lee Rowley', 'Paul Scully', 'Grant Shapps', 'Henry Smith', 'Royston Smith', 'Mark Spencer', 'Andrew Stephenson', 'Bob Stewart', 'Iain Stewart', 'Graham Stuart', 'Rishi Sunak', 'Desmond Swayne', 'Ross Thomson', 'Justin Tomlinson', 'Michael Tomlinson', 'Anne-Marie Trevelyan', 'Shailesh Vara', 'Giles Watling', 'Helen Whately', 'Heather Wheeler', 'John Whittingdale', 'Bill Wiggin', 'Mike Wood', 'William Wragg']

 

Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of Copy of find a waynot an excuse 2

Written by

David Illes
Teaching Fellow, Cambridge Spark

Connect with David

 

Interested in learning more about our Applied Data Science Bootcamp?

Explore the bootcamp

Data Analyst Apprenticeship L4

Learn advanced data analysis skills
with a government-funded apprenticeship
June 2020 start

Subscribe to our blog