Skip to the content.

Modeling Steam Game Prices for Indie Developers

Overview

This project focuses on estimating baseline prices for Steam games, helping indie and small development teams make informed pricing decisions. Without access to market research or publisher support, developers may struggle to price games fairly—too high can discourage buyers, while too low can undervalue their work.

The goal of this project was to build predictive regression models that use game characteristics (genres, categories, release timing) and, optionally, publisher context to provide baseline pricing guidance for developers.


Dataset

Target Variable


Data Cleaning & Preparation


Exploratory Data Analysis (EDA)

Key insights:

Visualizations included:


Modeling Approach

Models: Ridge Regression, Random Forest, Gradient Boosting, LightGBM

Two scenarios were evaluated:

  1. No-Publisher Model – simulates self-published or first-time developers; uses only game attributes (genres, categories, release features).
  2. Publisher-Aware Model – simulates games released through established publishers; incorporates publisher, release year, and recommendation counts in addition to game attributes.

Evaluation Metrics:


Results

No-Publisher Model

Model MAE ($) RMSE ($) MAPE (%)
Ridge Regression 4.37 6.86 0.254 152.3
Random Forest 3.83 6.40 0.350 108.7
Gradient Boosting 4.03 6.54 0.323 124.7
LightGBM 3.88 6.36 0.359 114.3

Publisher-Aware Model

Model MAE ($) RMSE ($) MAPE (%)
Ridge Regression 4.49 7.01 0.222 159.9
Random Forest 3.64 6.02 0.425 110.2
Gradient Boosting 3.75 6.05 0.421 120.9
LightGBM 3.66 6.07 0.415 111.3

Key takeaway:


Technologies Used


Future Improvements


Why This Project Matters

This project demonstrates applied machine learning for business decision-making in the gaming industry. It highlights how structured historical data can inform fair pricing, particularly for independent developers, while emphasizing ethical use of predictive models to avoid pressuring developers toward market averages.