Election Data Classification Project – End-to-End Analysis

Problem Definition

The objective of this project is to predict voter preference (Labour vs Conservative) using demographic, economic perception, political leadership ratings, and political awareness variables.

This is a binary classification problem, where the target variable is:

vote_Labour (1 = Labour, 0 = Conservative)

The analysis aims to:

Understand data structure and distributions
Identify relationships between predictors and voting behavior
Build and compare multiple classification models
Select the best model based on performance metric

Git Link

Dataset Overview

Rows: 1,525 voters
Columns: 9 features + 1 target
Data Types:
- Numerical: Age, economic conditions, leader ratings, political knowledge
- Categorical: Vote, Gender
Missing Values: None
Duplicates: 8 (not materially impactful)

Target Variable Distribution

Labour voters: ~70%
Conservative voters: ~30%

➡️ Dataset is moderately imbalanced, which makes recall and AUC important evaluation metrics in addition to accuracy.

Univariate Analysis – Key Observations

Age

Minimum age: 24
Average age: ~54
Maximum age: 93
Distribution slightly right-skewed, indicating more middle-aged and senior voters.

Economic Conditions

Most voters rate national and household economic conditions between 3–4.
Suggests generally neutral to positive economic sentiment.

Leadership Ratings

Blair ratings skew higher than Hague, indicating stronger preference for Blair.
Leadership perception shows potential influence on voting behavior.

Political Knowledge

Majority fall in medium knowledge categories (1–2).
Very few voters have extremely high political awareness.

Gender

Slightly more females (53%) than males (47).
Gender alone does not strongly separate voting behavior.

Multivariate & Bivariate Analysis

Correlation Analysis

Moderate correlations observed between:
- Economic household & national conditions
- Leadership ratings and vote choice
No strong multicollinearity detected.

Vote-Wise Distribution Insights

Labour voters generally:
- Rate Blair higher
- Show slightly better economic sentiment
- Have marginally higher political knowledge

Visual Techniques Used

Violin plots (vote vs numeric variables)
Pair plots for interaction patterns
Heatmaps for correlation
Strip plots to capture density and overlap

➡️ These patterns indicate that economic perception and leadership ratings are key predictors.

Data Preprocessing

Dropped irrelevant ID column
Converted categorical variables using one-hot encoding
Applied Min-Max scaling to numerical variables
Train-test split: 70% training / 30% testing

Model Building & Evaluation

Evaluation Metrics Chosen

Accuracy: Overall correctness
Recall: Important due to class imbalance
F1-Score: Balance between precision and recall
ROC-AUC: Measures discrimination capability

Model Performance Summary (Test Data)

Model	Accuracy	AUC
Naive Bayes	0.83	0.885
Logistic Regression	0.82	0.883
KNN	0.82	0.864
AdaBoost	0.82	0.879
Bagging	0.80	0.878
Random Forest	0.82	0.888
Gradient Boosting	0.83	0.904
Decision Tree	0.76	0.732

🏆 Best Model: Gradient Boosting

Why Gradient Boosting?

Highest ROC-AUC (0.904) on test data
Strong balance between bias and variance
Good generalization (no overfitting)
Consistent recall for Labour voters

Interpretation

Gradient Boosting effectively captures non-linear interactions between:

Economic sentiment
Leadership perception
Political awareness

Model Improvement (Bagging & Boosting)

Hyperparameter tuning applied to Bagging:
- Tree depth
- Minimum samples per leaf
- Feature sampling
Result:
- Slight improvement in recall
- No significant test accuracy gain
Boosting models showed better learning efficiency than Bagging.

➡️ Boosting proved more suitable for this dataset.

🧠 Key Business & Analytical Insights

Leadership perception (Blair vs Hague) strongly influences vote choice
Economic outlook at household level matters more than national perception
Political knowledge improves prediction confidence
Ensemble models outperform single classifiers
Accuracy alone is insufficient—AUC and recall are critical

🛠️ Tools & Skills Demonstrated

Languages & Libraries

Python (Pandas, NumPy, Scikit-learn)
Matplotlib, Seaborn

Techniques

EDA & visualization
Feature scaling & encoding
Classification modeling
ROC-AUC analysis
Ensemble learning (Bagging, Boosting)

Final Recommendation

For real-world voter prediction systems:

Use Gradient Boosting for deployment
Monitor AUC and recall, not just accuracy
Regularly retrain as political sentiment shifts

AI Councel Lab

Search This Blog