Skip to the content.

Predicting 30-Day Hospital Readmissions for Diabetic Patients

Overview

This project focuses on predicting whether a diabetic patient will be readmitted to the hospital within 30 days of discharge using real-world clinical data. Hospital readmissions are costly and often indicate gaps in patient care, making accurate prediction especially important in healthcare settings where false negatives can have serious consequences.

The goal of this project was to build an interpretable classification model that prioritizes recall, ensuring that high-risk patients are identified and can receive preventative follow-up care.


Dataset

Target Variable


Data Cleaning & Preparation


Exploratory Data Analysis (EDA)

Key insights from the exploratory analysis:

Visualizations included:


Modeling Approach

Model: Logistic Regression

Logistic regression was chosen for its interpretability and suitability as a baseline model in a high-dimensional healthcare dataset.

Three model configurations were evaluated:

  1. Standard Logistic Regression
  2. Class-Balanced Logistic Regression
  3. Balanced Logistic Regression with Custom Probability Thresholds

Results

Model Variant Recall (Readmitted) Precision Accuracy ROC-AUC
Standard Logistic Regression 0.49 0.62 0.63 0.67
Balanced Logistic Regression 0.58 0.60 0.63 0.67
Balanced + Threshold (0.35) 0.89 0.50 0.55 0.67

Key takeaway:
Adjusting class weights and decision thresholds significantly improved recall, capturing most potential readmissions. While precision decreased, this tradeoff is appropriate in a healthcare context where missing at-risk patients is more costly than false positives.


Technologies Used


Future Improvements


Why This Project Matters

This project demonstrates applied machine learning in healthcare, emphasizing ethical tradeoffs, evaluation beyond accuracy, and decision-making under uncertainty using real-world clinical data.