Skip to main content

Election Data Classification Project – End-to-End Analysis


Problem Definition

The objective of this project is to predict voter preference (Labour vs Conservative) using demographic, economic perception, political leadership ratings, and political awareness variables.

This is a binary classification problem, where the target variable is:

  • vote_Labour (1 = Labour, 0 = Conservative)

The analysis aims to:

  • Understand data structure and distributions

  • Identify relationships between predictors and voting behavior

  • Build and compare multiple classification models

  • Select the best model based on performance metric

Dataset Overview

  • Rows: 1,525 voters

  • Columns: 9 features + 1 target

  • Data Types:

    • Numerical: Age, economic conditions, leader ratings, political knowledge

    • Categorical: Vote, Gender

  • Missing Values: None

  • Duplicates: 8 (not materially impactful)

Target Variable Distribution

  • Labour voters: ~70%

  • Conservative voters: ~30%

➡️ Dataset is moderately imbalanced, which makes recall and AUC important evaluation metrics in addition to accuracy.


Univariate Analysis – Key Observations

Age

  • Minimum age: 24

  • Average age: ~54

  • Maximum age: 93

  • Distribution slightly right-skewed, indicating more middle-aged and senior voters.

Economic Conditions

  • Most voters rate national and household economic conditions between 3–4.

  • Suggests generally neutral to positive economic sentiment.

Leadership Ratings

  • Blair ratings skew higher than Hague, indicating stronger preference for Blair.

  • Leadership perception shows potential influence on voting behavior.

Political Knowledge

  • Majority fall in medium knowledge categories (1–2).

  • Very few voters have extremely high political awareness.

Gender

  • Slightly more females (53%) than males (47).

  • Gender alone does not strongly separate voting behavior.


Multivariate & Bivariate Analysis

Correlation Analysis

  • Moderate correlations observed between:

    • Economic household & national conditions

    • Leadership ratings and vote choice

  • No strong multicollinearity detected.

Vote-Wise Distribution Insights

  • Labour voters generally:

    • Rate Blair higher

    • Show slightly better economic sentiment

    • Have marginally higher political knowledge

Visual Techniques Used

  • Violin plots (vote vs numeric variables)

  • Pair plots for interaction patterns

  • Heatmaps for correlation

  • Strip plots to capture density and overlap

➡️ These patterns indicate that economic perception and leadership ratings are key predictors.


Data Preprocessing

  • Dropped irrelevant ID column

  • Converted categorical variables using one-hot encoding

  • Applied Min-Max scaling to numerical variables

  • Train-test split: 70% training / 30% testing


Model Building & Evaluation

Evaluation Metrics Chosen

  • Accuracy: Overall correctness

  • Recall: Important due to class imbalance

  • F1-Score: Balance between precision and recall

  • ROC-AUC: Measures discrimination capability

Model Performance Summary (Test Data)

ModelAccuracyAUC
Naive Bayes0.830.885
Logistic Regression0.820.883
KNN0.820.864
AdaBoost0.820.879
Bagging0.800.878
Random Forest0.820.888
Gradient Boosting0.830.904
Decision Tree0.760.732

🏆 Best Model: Gradient Boosting

Why Gradient Boosting?

  • Highest ROC-AUC (0.904) on test data

  • Strong balance between bias and variance

  • Good generalization (no overfitting)

  • Consistent recall for Labour voters

Interpretation

Gradient Boosting effectively captures non-linear interactions between:

  • Economic sentiment

  • Leadership perception

  • Political awareness


Model Improvement (Bagging & Boosting)

  • Hyperparameter tuning applied to Bagging:

    • Tree depth

    • Minimum samples per leaf

    • Feature sampling

  • Result:

    • Slight improvement in recall

    • No significant test accuracy gain

  • Boosting models showed better learning efficiency than Bagging.

➡️ Boosting proved more suitable for this dataset.


🧠 Key Business & Analytical Insights

  1. Leadership perception (Blair vs Hague) strongly influences vote choice

  2. Economic outlook at household level matters more than national perception

  3. Political knowledge improves prediction confidence

  4. Ensemble models outperform single classifiers

  5. Accuracy alone is insufficient—AUC and recall are critical


🛠️ Tools & Skills Demonstrated

Languages & Libraries

  • Python (Pandas, NumPy, Scikit-learn)

  • Matplotlib, Seaborn

Techniques

  • EDA & visualization

  • Feature scaling & encoding

  • Classification modeling

  • ROC-AUC analysis

  • Ensemble learning (Bagging, Boosting)

Final Recommendation

For real-world voter prediction systems:

  • Use Gradient Boosting for deployment

  • Monitor AUC and recall, not just accuracy

  • Regularly retrain as political sentiment shifts



















Comments

Popular posts from this blog

Text Analytics on U.S. Presidential Inaugural Speeches

Project Overview In this project, I performed text analytics and natural language processing (NLP) on three historic U.S. Presidential inaugural speeches to understand their linguistic structure, vocabulary usage, and dominant themes . Speeches Analyzed Franklin D. Roosevelt – 1941 John F. Kennedy – 1961 Richard Nixon – 1973 The goal was not political analysis, but language analysis using Python and NLP libraries. Git Link Problem Definition The objectives of this analysis were: Compute text statistics for each speech: Number of characters Number of words Number of sentences Average word length Perform text preprocessing : Lowercasing Removing punctuation, numbers, and special characters Stopword removal Stemming Identify the most frequently used words across all three speeches Visualize dominant themes using a Word Cloud Data Source The speeches were sourced from the NLTK Inaugural Corpus , which contains official U.S. presidential inaugural addresses dating back to 1789. from nlt...

Raghvendra Singh Portfolio

  I’m Raghvendra Singh Business Analytics & Data Science Professional I help businesses make data-driven decisions using analytics, dashboards and data science techniques across Ecommerce, Retail, Finance and Marketing . I specialize in converting raw data into clear insights, measurable impact and actionable recommendations for business leaders and teams. Profile Links Github LinkedIn Portfolio  Below are selected projects showcasing my work in analytics, data science and business problem-solving . 1. Digital Marketing Ads Clustering for Ads24x7 2. Inferential statistics: Probability to ANOVA 3. Power BI Sales & Invetory forecasting using SARIMA, SQL, Python 4. Power BI/ Looker/ Tableu- Neerus Dashboards - Myntra payments dashboard 5. Text Analytics using NLP on political speeches analysis 6.  Election Data Classification: End to end analysis 7.  📬 Let’s Connect 📧 Email: raghavsingh0027 @gmail.com 🔗 LinkedIn: https://www.linkedin.com/in/raghvendra0...

Introducing The Cat Poet: Your Personal AI Cat Wordsmith by AI Councel Lab

Poetry is the rhythmical creation of beauty in words.     – Edgar Allan Poe Now, imagine that beauty, powered by AI. Welcome to AI Councel Lab , your go-to space for cutting-edge AI tools that blend creativity and intelligence. Today, we're thrilled to introduce a truly unique creation: The  Cat Poet — a next-generation poetic companion that turns your ideas into art. ✨ What Is The AI   Cat Poet ? Try Cat Poet App Now → The Cat Poet is an AI-powered poetry generator designed to take a keyword or phrase of your choice and craft beautiful poems in a wide range of poetic styles — from minimalist Haikus to heartfelt Elegies , powerful Odes , and over 30 diverse poetic forms . Whether you're a writer, student, creative thinker, or someone just looking for a moment of lyrical joy, The Cat Poet is here to inspire you. 🧠 How It Works Simply enter a word, feeling, or concept — and let the AI weave its magic. Behind the scenes, a fine-tuned language model selects from a c...