Skip to main content

Election Data Classification Project – End-to-End Analysis


Problem Definition

The objective of this project is to predict voter preference (Labour vs Conservative) using demographic, economic perception, political leadership ratings, and political awareness variables.

This is a binary classification problem, where the target variable is:

  • vote_Labour (1 = Labour, 0 = Conservative)

The analysis aims to:

  • Understand data structure and distributions

  • Identify relationships between predictors and voting behavior

  • Build and compare multiple classification models

  • Select the best model based on performance metric

Dataset Overview

  • Rows: 1,525 voters

  • Columns: 9 features + 1 target

  • Data Types:

    • Numerical: Age, economic conditions, leader ratings, political knowledge

    • Categorical: Vote, Gender

  • Missing Values: None

  • Duplicates: 8 (not materially impactful)

Target Variable Distribution

  • Labour voters: ~70%

  • Conservative voters: ~30%

➡️ Dataset is moderately imbalanced, which makes recall and AUC important evaluation metrics in addition to accuracy.


Univariate Analysis – Key Observations

Age

  • Minimum age: 24

  • Average age: ~54

  • Maximum age: 93

  • Distribution slightly right-skewed, indicating more middle-aged and senior voters.

Economic Conditions

  • Most voters rate national and household economic conditions between 3–4.

  • Suggests generally neutral to positive economic sentiment.

Leadership Ratings

  • Blair ratings skew higher than Hague, indicating stronger preference for Blair.

  • Leadership perception shows potential influence on voting behavior.

Political Knowledge

  • Majority fall in medium knowledge categories (1–2).

  • Very few voters have extremely high political awareness.

Gender

  • Slightly more females (53%) than males (47).

  • Gender alone does not strongly separate voting behavior.


Multivariate & Bivariate Analysis

Correlation Analysis

  • Moderate correlations observed between:

    • Economic household & national conditions

    • Leadership ratings and vote choice

  • No strong multicollinearity detected.

Vote-Wise Distribution Insights

  • Labour voters generally:

    • Rate Blair higher

    • Show slightly better economic sentiment

    • Have marginally higher political knowledge

Visual Techniques Used

  • Violin plots (vote vs numeric variables)

  • Pair plots for interaction patterns

  • Heatmaps for correlation

  • Strip plots to capture density and overlap

➡️ These patterns indicate that economic perception and leadership ratings are key predictors.


Data Preprocessing

  • Dropped irrelevant ID column

  • Converted categorical variables using one-hot encoding

  • Applied Min-Max scaling to numerical variables

  • Train-test split: 70% training / 30% testing


Model Building & Evaluation

Evaluation Metrics Chosen

  • Accuracy: Overall correctness

  • Recall: Important due to class imbalance

  • F1-Score: Balance between precision and recall

  • ROC-AUC: Measures discrimination capability

Model Performance Summary (Test Data)

ModelAccuracyAUC
Naive Bayes0.830.885
Logistic Regression0.820.883
KNN0.820.864
AdaBoost0.820.879
Bagging0.800.878
Random Forest0.820.888
Gradient Boosting0.830.904
Decision Tree0.760.732

🏆 Best Model: Gradient Boosting

Why Gradient Boosting?

  • Highest ROC-AUC (0.904) on test data

  • Strong balance between bias and variance

  • Good generalization (no overfitting)

  • Consistent recall for Labour voters

Interpretation

Gradient Boosting effectively captures non-linear interactions between:

  • Economic sentiment

  • Leadership perception

  • Political awareness


Model Improvement (Bagging & Boosting)

  • Hyperparameter tuning applied to Bagging:

    • Tree depth

    • Minimum samples per leaf

    • Feature sampling

  • Result:

    • Slight improvement in recall

    • No significant test accuracy gain

  • Boosting models showed better learning efficiency than Bagging.

➡️ Boosting proved more suitable for this dataset.


🧠 Key Business & Analytical Insights

  1. Leadership perception (Blair vs Hague) strongly influences vote choice

  2. Economic outlook at household level matters more than national perception

  3. Political knowledge improves prediction confidence

  4. Ensemble models outperform single classifiers

  5. Accuracy alone is insufficient—AUC and recall are critical


🛠️ Tools & Skills Demonstrated

Languages & Libraries

  • Python (Pandas, NumPy, Scikit-learn)

  • Matplotlib, Seaborn

Techniques

  • EDA & visualization

  • Feature scaling & encoding

  • Classification modeling

  • ROC-AUC analysis

  • Ensemble learning (Bagging, Boosting)

Final Recommendation

For real-world voter prediction systems:

  • Use Gradient Boosting for deployment

  • Monitor AUC and recall, not just accuracy

  • Regularly retrain as political sentiment shifts



















Comments

Popular posts from this blog

Raghvendra Singh Portfolio

  I’m Raghvendra Singh Business Analytics & Data Science Professional I help businesses make data-driven decisions using analytics, dashboards and data science techniques across Ecommerce, Retail, Finance and Marketing . I specialize in converting raw data into clear insights, measurable impact and actionable recommendations for business leaders and teams. Profile Links Github LinkedIn Portfolio  Below are selected projects showcasing my work in analytics, data science and business problem-solving . 1. Digital Marketing Ads Clustering for Ads24x7 2. Inferential statistics: Probability to ANOVA 3. Power BI Sales & Invetory forecasting using SARIMA, SQL, Python 4. Power BI/ Looker/ Tableu- Neerus Dashboards - Myntra payments dashboard 5. Text Analytics using NLP on political speeches analysis 6.  Election Data Classification: End to end analysis 7.  📬 Let’s Connect 📧 Email: raghavsingh0027 @gmail.com 🔗 LinkedIn: https://www.linkedin.com/in/raghvendra0...

A Comprehensive Guide to Statistical Techniques and Analysis for Data Science

  In the field of data science, statistical analysis plays a critical role in making sense of large datasets, uncovering patterns, and drawing actionable insights. Data wrangling, or the process of cleaning and transforming raw data into a usable format, is equally essential to prepare data for statistical analysis. This blog will provide an overview of key statistical techniques for data analysis, along with practical code snippets to apply them using Python. What is Data Wrangling? Data wrangling involves cleaning, restructuring, and transforming raw data into a format that is easier to analyze. This process may include handling missing data, dealing with inconsistent formatting, or aggregating data. Python libraries such as Pandas and NumPy are commonly used for this purpose. Basic Data Wrangling Techniques Before diving into statistical analysis, it’s important to ensure the data is properly cleaned and prepared. Below are some common data wrangling techniques, along wit...

What tools do you need to start your Data Science journey?

  Welcome back to AI Councel Lab ! If you're reading this, you're probably eager to start your journey into the world of Data Science . It's an exciting field, but the vast array of tools and technologies can sometimes feel overwhelming. Don't worry, I’ve got you covered! In this blog, we’ll explore the essential tools you’ll need to begin your Data Science adventure. 1. Programming Languages: Python and R The first step in your Data Science journey is learning how to code. Python is widely regarded as the most popular language in Data Science due to its simplicity and vast libraries. Libraries like NumPy , Pandas , Matplotlib , and SciPy make Python the go-to tool for data manipulation, analysis, and visualization. R is another great language, especially for statistical analysis and visualization. It's commonly used by statisticians and data scientists who need to work with complex data and models. Recommendation: Start with Python , as it has broader appli...