Research projects

Project CEMAPRE internal

TitleIntegrating Statistical and Machine Learning Models for Robust Time Series Forecasting in Health Insurance Markets
ParticipantsJorge Caiado (Principal Investigator), Ana Jesus
SummaryAccurate forecasting of health insurance sales is essential for strategic planning, risk management,
and operational efficiency in increasingly volatile economic and healthcare environments. This
research project aims to advance time series forecasting methodologies by systematically integrating
classical statistical models with modern machine learning and deep learning approaches.
Building on prior evidence that traditional models such as ARIMAX remain highly competitive when
combined with rigorous preprocessing and exogenous information, the project will conduct a
comprehensive comparative and hybrid analysis of statistical, machine learning, and deep learning
models for monthly health insurance sales forecasting. Special emphasis will be placed on the role
of exogenous macroeconomic and healthcare system indicators, outlier detection and correction, and
robust validation strategies.
Using real-world insurance data from the Portuguese market, the project will evaluate models
including Holt-Winters, ARIMA/ARIMAX, Random Forest, XGBoost, and advanced neural architectures such
as CNNs, RNNs, LSTMs, and Transformers. The expected outcome is a set of validated forecasting
frameworks that combine accuracy, robustness, and interpretability, supporting evidence-based
decision-making in regulated insurance environments.