Comparative Analysis of Random Forest and Gradient Boosting for Predicting Health Risk Levels in Coffee Consumers Using Lifestyle and Physiological Data

Sri Winiarti; ahmad Azhari

Authors

Sri Winiarti Universitas Ahmad Dahlan
ahmad Azhari Universitas Ahmad Dahlan

Keywords:

Machine Learning Random Forest Gradient Boosting Health Risk Prediction Coffee Consumption

Abstract

The growing consumption of coffee as part of modern lifestyles raises concerns about its potential impact on health, particularly when interacting with lifestyle and physiological factors such as sleep quality, stress level, and physical activity. However, accurately predicting health risk levels remains challenging due to the complex and non-linear nature of these relationships. This study aims to compare the performance of two ensemble learning methods, Random Forest (RF) and Gradient Boosting (GB), for predicting health risk levels among coffee consumers using lifestyle and physiological data.

The proposed methodology integrates data preprocessing, feature selection using Mutual Information (MI) and Recursive Feature Elimination (RFE), and model optimization through Grid Search with cross-validation. RF utilizes a bagging approach to improve stability and reduce overfitting, while GB applies a boosting mechanism to iteratively minimize prediction errors, offering higher flexibility in capturing complex patterns. Compared to traditional machine learning methods, the combination of MI–RFE feature selection and ensemble models provides improved accuracy, robustness, and interpretability.

Experimental results show that RF achieves the best and most stable performance with an accuracy of 0.9492, while GB reaches competitive performance (0.9422) after parameter tuning. Statistical tests confirm that the performance difference is significant. These findings demonstrate the effectiveness of ensemble learning in lifestyle-based health prediction and provide insights for developing intelligent preventive healthcare systems.

Comparative Analysis of Random Forest and Gradient Boosting for Predicting Health Risk Levels in Coffee Consumers Using Lifestyle and Physiological Data

Authors

Keywords:

Abstract

Published

How to Cite

Issue

Section

License

Similar Articles

Make a Submission

download template

Menu Utama