Comparative Analysis of Random Forest and Gradient Boosting for Predicting Health Risk Levels in Coffee Consumers Using Lifestyle and Physiological Data
Keywords:
Machine Learning Random Forest Gradient Boosting Health Risk Prediction Coffee ConsumptionAbstract
The growing consumption of coffee as part of modern lifestyles raises concerns about its potential impact on health, particularly when interacting with lifestyle and physiological factors such as sleep quality, stress level, and physical activity. However, accurately predicting health risk levels remains challenging due to the complex and non-linear nature of these relationships. This study aims to compare the performance of two ensemble learning methods, Random Forest (RF) and Gradient Boosting (GB), for predicting health risk levels among coffee consumers using lifestyle and physiological data.
The proposed methodology integrates data preprocessing, feature selection using Mutual Information (MI) and Recursive Feature Elimination (RFE), and model optimization through Grid Search with cross-validation. RF utilizes a bagging approach to improve stability and reduce overfitting, while GB applies a boosting mechanism to iteratively minimize prediction errors, offering higher flexibility in capturing complex patterns. Compared to traditional machine learning methods, the combination of MI–RFE feature selection and ensemble models provides improved accuracy, robustness, and interpretability.
Experimental results show that RF achieves the best and most stable performance with an accuracy of 0.9492, while GB reaches competitive performance (0.9422) after parameter tuning. Statistical tests confirm that the performance difference is significant. These findings demonstrate the effectiveness of ensemble learning in lifestyle-based health prediction and provide insights for developing intelligent preventive healthcare systems.
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Journal of Universal Transformation, Education, Research, and Utility(JUTERU)

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
