Skip to the content.

Predicting Obesity Levels in Latin American Cities

Overview

This project focuses on predicting obesity levels using demographic, dietary, and lifestyle factors collected through survey data. Obesity is associated with numerous chronic health conditions, making early identification of at-risk populations important for preventative health initiatives.

The objective of this project was to develop classification models capable of estimating obesity levels based on behavioral patterns and demographic characteristics. Two modeling scenarios were evaluated: one using both behavioral and demographic features, and another using behavioral factors alone. Comparing these scenarios helps assess how much predictive power demographic information contributes to obesity classification.

These models could be used to support public health monitoring, early risk screening, and targeted health interventions. For example, health organizations or local governments could use similar models to identify behavioral risk patterns within populations, design preventative education programs, or allocate resources toward communities with higher predicted obesity risk.


Dataset

Features include:

Target Variable


Data Preparation

Several preprocessing steps were applied to prepare the dataset for modeling:

Two feature sets were created:

Behavioral + Demographics

Behavioral-Only


Exploratory Data Analysis (EDA)

Exploratory analysis was conducted to better understand the distribution of obesity levels and behavioral patterns in the dataset.

Key visualizations included:

These visualizations help highlight patterns between lifestyle behaviors and obesity classifications.


Modeling Approach

This project uses supervised classification models to predict obesity levels.

Three classifiers were implemented:

A preprocessing pipeline was constructed using:

Models were trained and evaluated using an 80/20 train-test split with stratified sampling to maintain class balance.


Results

Behavioral + Demographics Models

Model Accuracy
Logistic Regression 0.6170
Random Forest 0.8274
Gradient Boosting 0.7612

Behavioral-Only Models

Model Accuracy
Logistic Regression 0.5650
Random Forest 0.7139
Gradient Boosting 0.6643

Key Observations


Model Insights

Feature importance analysis from the Random Forest models revealed several influential predictors.

Behavioral-Only Model

Top predictors included:

Behavioral + Demographics Model


Visualizations

Additional visualizations were created to support model interpretation:

These visualizations provide insights into classification performance and highlight where prediction improvements occur when demographic information is included.


Technologies Used


Why This Project Matters

Obesity continues to be a growing public health concern in many parts of the world, including Latin America. Predictive models like those developed in this project can help researchers and policymakers better understand how lifestyle behaviors influence obesity risk within populations.

In practical settings, similar models could be used to:

While behavioral data alone provides a reasonable baseline for obesity prediction, incorporating demographic information significantly improves classification accuracy. These findings highlight how combining lifestyle patterns with demographic context can enhance predictive health analytics while still allowing behavioral-only models to support broader community-level assessments.