Skip to main content

Digital Marketing Ads Clustering Using Machine Learning


The ads24x7 is a Digital Marketing company which has now got seed funding of $10 Million. They are expanding their wings in Marketing Analytics. They collected data from their Marketing Intelligence team and now wants you (their newly appointed data analyst) to segment type of ads based on the features provided. Use Clustering procedure to segment ads into homogeneous groups.


🔍 Project Objective

This project focuses on applying unsupervised machine learning and dimensionality reduction techniques to solve two real-world analytical problems:

  1. Segment digital advertisements based on performance metrics to optimize marketing strategy.

  2. Reduce high-dimensional census data using PCA to extract meaningful population insights efficiently.

The project demonstrates strong skills in EDA, clustering, PCA, business interpretation, and actionable recommendations.


🧠 Part 1: Digital Marketing Ads Clustering (Business Analytics + ML)

📌 Problem Statement

A digital marketing company wanted to segment advertisements into homogeneous groups based on performance indicators such as CTR, CPM, CPC, revenue, spend, device type, and platform.

⚙️ Approach

  • Performed detailed EDA (univariate & bivariate analysis)

  • Treated missing values using domain-specific formulas for CTR, CPM, and CPC

  • Detected and treated outliers using the IQR method

  • Applied z-score scaling to improve clustering performance

  • Used:

    • Hierarchical Clustering (Ward + Euclidean)

    • K-Means Clustering

  • Identified optimal clusters using:

    • Elbow method

    • Silhouette score

📊 Key Results

  • Optimal number of clusters identified as 5

  • Each cluster represented a distinct ad performance pattern

  • Certain clusters delivered high revenue with low CPC

  • Large ad sizes did not necessarily translate to better performance

💡 Business Insights

  • Video ads generated the highest average revenue

  • Mobile ads had lower CPM and higher reach

  • Poster-sized vertical ads showed best CTR and efficiency

  • A significant portion of ads consumed budget with poor ROI

📈 Recommendations

  • Increase investment in mobile video ads

  • Prioritize poster-sized creatives for higher conversions

  • Reduce spend on clusters with high CPC & low revenue

  • Use clustering as a recurring optimization strategy


📉 Part 2: Census Data Analysis Using PCA (Data Science)

📌 Problem Statement

The Indian Census dataset contained 57+ highly correlated variables, making analysis complex and inefficient.
Objective was to reduce dimensionality while retaining maximum variance.

⚙️ Approach

  • Conducted EDA on selected demographic and workforce variables

  • Treated outliers and scaled data using z-score normalization

  • Verified suitability using:

    • Bartlett’s Test of Sphericity

    • KMO Test (0.93 – excellent adequacy)

  • Applied Principal Component Analysis (PCA) using Scikit-learn

  • Used Scree plot and cumulative explained variance for PC selection

📊 Key Results

  • Reduced 57 variables → 5 principal components

  • These 5 PCs explained 90.6% of total variance

  • Principal components captured:

    • Population size

    • Workforce composition

    • Agricultural labor patterns

    • Gender-based employment distribution

💡 Insights

  • Strong correlation between male & female population metrics

  • Workforce participation patterns varied significantly across states

  • PCA successfully eliminated multicollinearity while preserving structure


🛠 Skills Demonstrated

Technical Skills

  • Python, Pandas, NumPy

  • Scikit-learn

  • Clustering (K-Means, Hierarchical)

  • PCA & linear algebra concepts

  • Data preprocessing & scaling

Analytics & Business Skills

  • Exploratory Data Analysis

  • Marketing analytics

  • KPI interpretation

  • Insight generation

  • Recommendation framing


🚀 Business Impact

  • Enables data-driven ad optimization

  • Reduces marketing spend inefficiency

  • Improves campaign ROI and targeting

  • Simplifies complex demographic datasets for faster decision-making


🏁 Conclusion

This project showcases my ability to:

  • Translate business problems into data science solutions

  • Apply machine learning practically, not theoretically

  • Convert complex analysis into clear business recommendations



🔗 Project Resources

  • 📁 Dataset & Code

  • 📊 Report


















Comments

Popular posts from this blog

Machine Learning vs Deep Learning : Understand the difference!

In the world of artificial intelligence (AI), terms like "Machine Learning" (ML) and "Deep Learning" (DL) are frequently used, often interchangeably. However, while both fall under the umbrella of AI, they are distinct in their methodologies, applications, and capabilities. In this post, we'll explore the key differences between machine learning and deep learning, helping you understand when and why each is used. What is Machine Learning? Machine Learning is a subset of AI focused on developing algorithms that allow computers to learn from and make predictions based on data. The core idea behind machine learning is that the system can automatically learn and improve from experience without being explicitly programmed for each task. There are three main types of machine learning: Supervised Learning : The model is trained on labeled data, which means the input data has corresponding output labels. The algorithm's goal is to learn a mapping from inputs ...

Data Analysis and Visualization with Matplotlib and Seaborn | TOP 10 code snippets for practice

Data visualization is an essential aspect of data analysis. It enables us to better understand the underlying patterns, trends, and insights within a dataset. Two of the most popular Python libraries for data visualization are Matplotlib and Seaborn . Both libraries are highly powerful, and they can be used to create a wide variety of plots to help researchers, analysts, and data scientists present data visually. In this article, we will discuss the basics of both libraries, followed by the top 10 most used code snippets for visualization. We'll also provide links to free resources and documentation to help you dive deeper into these libraries. Matplotlib and Seaborn: A Quick Overview Matplotlib Matplotlib is a low-level plotting library in Python. It allows you to create static, animated, and interactive plots. It provides a lot of flexibility but may require more code to create complex plots compared to Seaborn. Matplotlib is especially useful when you need full control ove...

Guide to Performing ETL (Extract, Transform, Load) Using SQL in Oracle and Other Databases

  In the world of data engineering, ETL (Extract, Transform, Load) is a key process that allows you to efficiently extract data from various sources, transform it into a suitable format for analysis, and then load it into a target database or data warehouse. This blog will guide you through the ETL process using SQL, with code examples applicable to Oracle and other relational databases such as MySQL, PostgreSQL, and SQL Server. What is ETL? ETL stands for Extract, Transform, Load , which refers to the three key steps involved in moving data from one system to another, typically from source databases to a data warehouse. Here’s a breakdown: Extract : This step involves retrieving data from source systems such as relational databases, flat files, APIs, or cloud services. Transform : The extracted data often needs to be cleaned, formatted, aggregated, or enriched to meet the specific needs of the destination system or analytics process. Load : Finally, the transformed data is l...