Project CEMAPRE internal
| Title | Integrating Statistical and Machine Learning Models for Robust Time Series Forecasting in Health Insurance Markets |
| Participants | Jorge Caiado (Principal Investigator), Ana Jesus |
| Summary | Accurate forecasting of health insurance sales is essential for strategic planning, risk management, and operational efficiency in increasingly volatile economic and healthcare environments. This research project aims to advance time series forecasting methodologies by systematically integrating classical statistical models with modern machine learning and deep learning approaches. Building on prior evidence that traditional models such as ARIMAX remain highly competitive when combined with rigorous preprocessing and exogenous information, the project will conduct a comprehensive comparative and hybrid analysis of statistical, machine learning, and deep learning models for monthly health insurance sales forecasting. Special emphasis will be placed on the role of exogenous macroeconomic and healthcare system indicators, outlier detection and correction, and robust validation strategies. Using real-world insurance data from the Portuguese market, the project will evaluate models including Holt-Winters, ARIMA/ARIMAX, Random Forest, XGBoost, and advanced neural architectures such as CNNs, RNNs, LSTMs, and Transformers. The expected outcome is a set of validated forecasting frameworks that combine accuracy, robustness, and interpretability, supporting evidence-based decision-making in regulated insurance environments. |