Skip to main content

Inferential Statistics in Practice: From Probability to ANOVA


🔍 Project Overview 

This project demonstrates the application of inferential statistics to solve multiple real-world problems across sports analytics, manufacturing quality control, marketing operations and healthcare.

The objective was to move beyond descriptive statistics and apply probability theory, hypothesis testing, and ANOVA techniques to draw meaningful conclusions and support data-driven decision-making.

Download Complete Report from Git

Open on Git


🎯 Key Objectives

  • Apply probability concepts to real datasets

  • Use normal distribution and Z-tests for quality analysis

  • Perform hypothesis testing (Z-test, T-test)

  • Analyze multi-factor effects using One-Way & Two-Way ANOVA

  • Translate statistical results into business insights and recommendations


🧠 Problem 1: Sports Injury Probability Analysis

Business Question

Can player position help explain the likelihood of foot injuries in a football team?

Approach

  • Used conditional probability and joint probability

  • Analyzed injury distribution across playing positions

Key Insight

  • Overall injury probability: 61%

  • Strikers had the highest injury likelihood among injured players

  • Player position plays a significant role in injury risk

Impact

Helps coaching and medical staff focus preventive care strategies on high-risk positions.


🏭 Problem 2: Manufacturing Quality Control (Normal Distribution)

Business Question

What proportion of cement gunny bags fail strength requirements?

Approach

  • Assumed normal distribution

  • Used Z-score-based probability estimation

  • Visualized probability regions for decision clarity

Key Insights

  • ~11% of bags fall below minimum strength threshold

  • Over 82% meet acceptable strength criteria

  • Identified risk zones contributing to material loss

Impact

Supports supply chain quality checks and reduces wastage risk.


🧪 Problem 3: Stone Hardness Testing (Hypothesis Testing)

Business Question

Are unpolished stones suitable for high-quality printing?

Statistical Techniques Used

  • Z-test (large sample, known population mean)

  • Independent two-sample T-test

  • Outlier treatment and distribution analysis

Key Findings

  • Mean hardness of unpolished stones is significantly below required threshold

  • Polished stones show higher and more consistent hardness

Recommendation

Zingaro is justified in rejecting unpolished stones for printing applications.


🦷 Problem 4: Dental Implant Hardness Analysis (ANOVA)

Business Question

How do dentist, method, and alloy influence implant hardness?

Techniques Used

  • One-Way ANOVA

  • Two-Way ANOVA with interaction effects

  • Shapiro-Wilk Test (normality)

  • Levene Test (variance equality)

  • Tukey post-hoc analysis

Key Insights

  • Dentist alone does not significantly impact hardness

  • Implant method significantly affects hardness

  • Strong interaction exists between dentist and method

  • Optimal methods vary by alloy type

Business Impact

  • Standardizes implant procedures

  • Improves treatment outcomes

  • Reduces variability in medical results


🛠 Skills Demonstrated

Statistical & Analytical Skills

  • Probability theory

  • Hypothesis testing

  • Z-test, T-test

  • One-Way & Two-Way ANOVA

  • Post-hoc analysis

Tools & Techniques

  • Python

  • Pandas, NumPy

  • SciPy, StatsModels

  • Data visualization

  • Statistical interpretation


📈 Overall Impact

This project showcases the ability to:

  • Choose the right statistical test for each problem

  • Validate assumptions before modeling

  • Interpret statistical output in business terms

  • Support decisions with data-backed evidence


🏁 Conclusion

Inferential statistics is a critical foundation for data science and analytics.
This project demonstrates how statistical methods can directly support sports strategy, manufacturing quality, marketing optimization, and healthcare decision-making.













Comments

Popular posts from this blog

Machine Learning vs Deep Learning : Understand the difference!

In the world of artificial intelligence (AI), terms like "Machine Learning" (ML) and "Deep Learning" (DL) are frequently used, often interchangeably. However, while both fall under the umbrella of AI, they are distinct in their methodologies, applications, and capabilities. In this post, we'll explore the key differences between machine learning and deep learning, helping you understand when and why each is used. What is Machine Learning? Machine Learning is a subset of AI focused on developing algorithms that allow computers to learn from and make predictions based on data. The core idea behind machine learning is that the system can automatically learn and improve from experience without being explicitly programmed for each task. There are three main types of machine learning: Supervised Learning : The model is trained on labeled data, which means the input data has corresponding output labels. The algorithm's goal is to learn a mapping from inputs ...

Data Analysis and Visualization with Matplotlib and Seaborn | TOP 10 code snippets for practice

Data visualization is an essential aspect of data analysis. It enables us to better understand the underlying patterns, trends, and insights within a dataset. Two of the most popular Python libraries for data visualization are Matplotlib and Seaborn . Both libraries are highly powerful, and they can be used to create a wide variety of plots to help researchers, analysts, and data scientists present data visually. In this article, we will discuss the basics of both libraries, followed by the top 10 most used code snippets for visualization. We'll also provide links to free resources and documentation to help you dive deeper into these libraries. Matplotlib and Seaborn: A Quick Overview Matplotlib Matplotlib is a low-level plotting library in Python. It allows you to create static, animated, and interactive plots. It provides a lot of flexibility but may require more code to create complex plots compared to Seaborn. Matplotlib is especially useful when you need full control ove...

Guide to Performing ETL (Extract, Transform, Load) Using SQL in Oracle and Other Databases

  In the world of data engineering, ETL (Extract, Transform, Load) is a key process that allows you to efficiently extract data from various sources, transform it into a suitable format for analysis, and then load it into a target database or data warehouse. This blog will guide you through the ETL process using SQL, with code examples applicable to Oracle and other relational databases such as MySQL, PostgreSQL, and SQL Server. What is ETL? ETL stands for Extract, Transform, Load , which refers to the three key steps involved in moving data from one system to another, typically from source databases to a data warehouse. Here’s a breakdown: Extract : This step involves retrieving data from source systems such as relational databases, flat files, APIs, or cloud services. Transform : The extracted data often needs to be cleaned, formatted, aggregated, or enriched to meet the specific needs of the destination system or analytics process. Load : Finally, the transformed data is l...