Skip to the content.

Predicting Sleep Quality from Lifestyle Factors

Overview

This project explores the relationship between sleep quality, physical activity, stress levels, and other lifestyle factors using statistical analysis and machine learning techniques.

The analysis includes exploratory data analysis, hypothesis testing, regression modeling, classification models, and clustering to better understand which factors influence sleep quality. The project demonstrates a full data science workflow from data exploration through predictive modeling.


Dataset

Source: Sleep Health and Lifestyle Dataset

Observations: 374 individuals

Features: 13 variables describing demographic, health, and lifestyle characteristics.

Key variables include:

The Person ID column was removed during preprocessing because it does not contribute to the analysis.


Exploratory Data Analysis

Univariate Analysis

Descriptive statistics were computed for all numerical variables, including:

Histograms and boxplots were generated to examine the distributions of:

These visualizations helped identify potential skewness, spread, and outliers within the dataset.


Bivariate Analysis

Pearson correlation was used to examine relationships between numerical variables.

A correlation heatmap revealed several notable relationships:

A Seaborn pairplot was also generated to visually explore relationships between all numerical variables.


Statistical Testing

A hypothesis test was performed to evaluate the relationship between physical activity level and sleep quality.

Null Hypothesis (H₀): There is no significant relationship between physical activity and sleep quality.

Alternative Hypothesis (H₁): There is a significant relationship between physical activity and sleep quality.

Because the variables were not perfectly normally distributed, a Spearman correlation test was used.

Results:

Since the p-value is below 0.05, the null hypothesis was rejected, indicating a statistically significant positive relationship between physical activity and sleep quality, although the strength of the relationship is relatively weak.


Regression Analysis

A linear regression model was used to examine how physical activity predicts sleep quality.

Key results:

Interpretation:

A scatter plot with a regression line was created to visualize the relationship between physical activity and sleep quality.


Classification Models

Sleep quality scores were treated as classification categories and predicted using two machine learning models.

Logistic Regression

Features used:

Model results:

A confusion matrix was visualized using a Seaborn heatmap.


Random Forest Classification

A Random Forest classifier was implemented using the same features.

Results improved significantly compared to logistic regression:

The Random Forest model performed better due to its ability to capture nonlinear relationships and handle class imbalance more effectively.


Clustering Analysis

K-Means clustering was used to group individuals based on:

Four clusters were identified.

Cluster interpretations:

The clustering produced a silhouette score of 0.71, indicating well-separated clusters.


Technologies Used


Key Findings


Why This Project Matters

Sleep health is closely connected to lifestyle behaviors such as physical activity, stress management, and daily habits. This project demonstrates how statistical analysis and machine learning can be used to identify patterns in health data and better understand the factors influencing sleep quality.