2  Return Analysis

The objective of this chapter is to identify the optimal ARMA(p,q) model to describe the dynamics of ASML log-returns. To determine the most suitable stochastic process, we adopt a systematic approach structured in the following stages:

  1. Exploratory Analysis: Analysis of autocorrelation functions (ACF/PACF) to assess temporal dependency and detect potential seasonality or distinct patterns.
  2. Model Selection (Grid Search): Estimation of various order combinations, residual analysis (plots and Ljung-Box test), and selection of the “winning” models based on the minimization of AIC (Akaike Information Criterion) and BIC (Bayesian Information Criterion).
  3. Out-of-Sample Validation: Comparison of the predictive performance of the selected models against a benchmark (historical mean) using a test set withheld from the estimation process.

2.1 ACF and PACF Analysis

Analyzing correlograms is the preliminary step to identify potential orders for the AR (AutoRegressive) and MA (Moving Average) components.

Code
acf_res <- acf(log_returns$LogReturns, plot = FALSE, lag.max = 20)
pacf_res <- pacf(log_returns$LogReturns, plot = FALSE, lag.max = 20)

ci <- qnorm((1 + 0.95)/2)/sqrt(length(log_returns$LogReturns))

df_acf <- data.frame(
  lag = as.numeric(acf_res$lag)[-1],
  acf = as.numeric(acf_res$acf)[-1])

df_pacf <- data.frame(
  lag = as.numeric(pacf_res$lag),
  pacf = as.numeric(pacf_res$acf))

fig_acf <- plot_ly(df_acf, x = ~lag, y = ~acf, type = 'bar', name = 'ACF',
  marker = list(color = '#1f77b4')) %>%
  
  add_segments(x = min(df_acf$lag), xend = max(df_acf$lag), y = ci, yend = ci, 
  line = list(color = 'red', dash = 'dash', width = 1), showlegend = FALSE) %>%
  
  add_segments(x = min(df_acf$lag), xend = max(df_acf$lag), y = -ci, yend = -ci, 
  line = list(color = 'red', dash = 'dash', width = 1), showlegend = FALSE) %>%
  
  layout(yaxis = list(title = "ACF"))

fig_pacf <- plot_ly(df_pacf, x = ~lag, y = ~pacf, type = 'bar', name = 'PACF',
  marker = list(color = '#ff7f0e')) %>%
  
  add_segments(x = min(df_pacf$lag), xend = max(df_pacf$lag), y = ci, yend = ci, 
  line = list(color = 'red', dash = 'dash', width = 1), showlegend = FALSE) %>%
  
  add_segments(x = min(df_pacf$lag), xend = max(df_pacf$lag), y = -ci, yend = -ci, 
  line = list(color = 'red', dash = 'dash', width = 1), showlegend = FALSE) %>%
  
  layout(yaxis = list(title = "PACF"))

subplot(fig_acf, fig_pacf, nrows = 1, margin = 0.05, titleY = TRUE) %>%
  layout(title = "Autocorrelation & Partial Autocorrelation", margin = list(t = 50))

Log-Returns ACF and PACF

Observations: The autocorrelation at lag-1 is statistically significant, yet its magnitude is limited (approximately -0.04). Neither the ACF nor the PACF exhibits a clear decay or a sharp cutoff. This ambiguity makes it difficult to visually identify a pure AR or MA process, suggesting instead a mixed ARMA structure.

A classical iterative approach (“Box-Jenkins”) would proceed by analyzing the residuals:

  • If the residuals’ ACF shows a sharp cutoff \(\rightarrow\) add an MA term.
  • If the residuals’ PACF shows a sharp cutoff \(\rightarrow\) add an AR term.
  • If both plots decay slowly or are unclear \(\rightarrow\) increase both orders.

However, given the ambiguity of the observed patterns and considering that high-order lags (\(>3\)) are uncommon in financial contexts, we opted for a more transparent and systematic approach. We will perform a comprehensive search, testing all possible combinations within a limited range.

2.2 Models

To assess the goodness of fit for each configuration, we analyze two fundamental diagnostic outputs:

  • Residual Analysis (checkresiduals): This function allows us to verify the absence of autocorrelation in the residuals. In addition to a visual inspection of the plots, the Ljung-Box Test is performed.
    • \(H_0\) (Null Hypothesis): The residuals are independently distributed (White Noise).
    • Interpretation: A p-value \(>0.05\) indicates that there is insufficient evidence to reject the null hypothesis. Therefore, the model has adequately captured the temporal structure of the data.
  • Parameter Significance (coeftest): This reports the estimated coefficients (ar and ma) along with their Standard Errors. Using the z-test, we verify whether the coefficients are statistically different from zero (p-value \(<0.05\)).
Code
set.seed(123)
Code
arma_10 <- Arima(log_returns$LogReturns, order=c(1,0,0))
checkresiduals(arma_10)


    Ljung-Box test

data:  Residuals from ARIMA(1,0,0) with non-zero mean
Q* = 39.515, df = 9, p-value = 9.299e-06

Model df: 1.   Total lags used: 10
Code
coeftest(arma_10)

z test of coefficients:

             Estimate  Std. Error z value Pr(>|z|)   
ar1       -0.03766328  0.01236477 -3.0460 0.002319 **
intercept  0.00051722  0.00033887  1.5263 0.126937   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_01 <- Arima(log_returns$LogReturns, order=c(0,0,1))
checkresiduals(arma_01)


    Ljung-Box test

data:  Residuals from ARIMA(0,0,1) with non-zero mean
Q* = 39.316, df = 9, p-value = 1.01e-05

Model df: 1.   Total lags used: 10
Code
coeftest(arma_01)

z test of coefficients:

             Estimate  Std. Error z value Pr(>|z|)   
ma1       -0.03870356  0.01255525 -3.0827 0.002052 **
intercept  0.00051675  0.00033802  1.5288 0.126326   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_11 <- Arima(log_returns$LogReturns, order=c(1,0,1))
checkresiduals(arma_11)


    Ljung-Box test

data:  Residuals from ARIMA(1,0,1) with non-zero mean
Q* = 30.019, df = 8, p-value = 0.0002097

Model df: 2.   Total lags used: 10
Code
coeftest(arma_11)

z test of coefficients:

             Estimate  Std. Error z value  Pr(>|z|)    
ar1        0.74803078  0.09296953  8.0460 8.556e-16 ***
ma1       -0.78463504  0.08698476 -9.0204 < 2.2e-16 ***
intercept  0.00049195  0.00030041  1.6376    0.1015    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_20 <- Arima(log_returns$LogReturns, order=c(2,0,0))
checkresiduals(arma_20)


    Ljung-Box test

data:  Residuals from ARIMA(2,0,0) with non-zero mean
Q* = 39.327, df = 8, p-value = 4.275e-06

Model df: 2.   Total lags used: 10
Code
coeftest(arma_20)

z test of coefficients:

             Estimate  Std. Error z value Pr(>|z|)   
ar1       -0.03811219  0.01237289 -3.0803 0.002068 **
ar2       -0.01178765  0.01237261 -0.9527 0.340731   
intercept  0.00051767  0.00033491  1.5457 0.122176   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_02 <- Arima(log_returns$LogReturns, order=c(0,0,2))
checkresiduals(arma_02)


    Ljung-Box test

data:  Residuals from ARIMA(0,0,2) with non-zero mean
Q* = 38.929, df = 8, p-value = 5.066e-06

Model df: 2.   Total lags used: 10
Code
coeftest(arma_02)

z test of coefficients:

             Estimate  Std. Error z value Pr(>|z|)   
ma1       -0.03931359  0.01239570 -3.1716 0.001516 **
ma2       -0.01363511  0.01243057 -1.0969 0.272685   
intercept  0.00051676  0.00033299  1.5519 0.120694   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_12 <- Arima(log_returns$LogReturns, order=c(1,0,2))
checkresiduals(arma_12)


    Ljung-Box test

data:  Residuals from ARIMA(1,0,2) with non-zero mean
Q* = 29.667, df = 7, p-value = 0.0001093

Model df: 3.   Total lags used: 10
Code
coeftest(arma_12)

z test of coefficients:

             Estimate  Std. Error z value  Pr(>|z|)    
ar1        0.74660442  0.12291149  6.0743 1.245e-09 ***
ma1       -0.78624421  0.12342306 -6.3703 1.886e-10 ***
ma2        0.00387487  0.01562497  0.2480    0.8041    
intercept  0.00052359  0.00030184  1.7346    0.0828 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_21 <- Arima(log_returns$LogReturns, order=c(2,0,1))
checkresiduals(arma_21)


    Ljung-Box test

data:  Residuals from ARIMA(2,0,1) with non-zero mean
Q* = 29.433, df = 7, p-value = 0.0001206

Model df: 3.   Total lags used: 10
Code
coeftest(arma_21)

z test of coefficients:

             Estimate  Std. Error z value  Pr(>|z|)    
ar1        0.75216134  0.10514098  7.1538 8.439e-13 ***
ar2        0.00505353  0.01520891  0.3323   0.73968    
ma1       -0.79218013  0.10452116 -7.5791 3.479e-14 ***
intercept  0.00052281  0.00030083  1.7379   0.08223 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_22 <- Arima(log_returns$LogReturns, order=c(2,0,2))
checkresiduals(arma_22)


    Ljung-Box test

data:  Residuals from ARIMA(2,0,2) with non-zero mean
Q* = 27.067, df = 6, p-value = 0.0001407

Model df: 4.   Total lags used: 10
Code
coeftest(arma_22)

z test of coefficients:

             Estimate  Std. Error z value  Pr(>|z|)    
ar1        0.13889774  0.20808368  0.6675 0.5044470    
ar2        0.49269101  0.13447436  3.6638 0.0002485 ***
ma1       -0.18349571  0.20966818 -0.8752 0.3814804    
ma2       -0.50222839  0.14273213 -3.5187 0.0004337 ***
intercept  0.00052958  0.00029978  1.7666 0.0772978 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_30 <- Arima(log_returns$LogReturns, order=c(3,0,0))
checkresiduals(arma_30)


    Ljung-Box test

data:  Residuals from ARIMA(3,0,0) with non-zero mean
Q* = 27.506, df = 7, p-value = 0.0002702

Model df: 3.   Total lags used: 10
Code
coeftest(arma_30)

z test of coefficients:

             Estimate  Std. Error z value  Pr(>|z|)    
ar1       -0.03863782  0.01236319 -3.1252 0.0017767 ** 
ar2       -0.01340434  0.01237126 -1.0835 0.2785840    
ar3       -0.04157144  0.01236735 -3.3614 0.0007755 ***
intercept  0.00051836  0.00032127  1.6135 0.1066419    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_03 <- Arima(log_returns$LogReturns, order=c(0,0,3))
checkresiduals(arma_03)


    Ljung-Box test

data:  Residuals from ARIMA(0,0,3) with non-zero mean
Q* = 26.596, df = 7, p-value = 0.000394

Model df: 3.   Total lags used: 10
Code
coeftest(arma_03)

z test of coefficients:

             Estimate  Std. Error z value  Pr(>|z|)    
ma1       -0.03884123  0.01237078 -3.1398 0.0016909 ** 
ma2       -0.01337127  0.01235618 -1.0822 0.2791849    
ma3       -0.04358374  0.01269756 -3.4325 0.0005982 ***
intercept  0.00051686  0.00031767  1.6270 0.1037265    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_13 <- Arima(log_returns$LogReturns, order=c(1,0,3))
checkresiduals(arma_13)


    Ljung-Box test

data:  Residuals from ARIMA(1,0,3) with non-zero mean
Q* = 27.07, df = 6, p-value = 0.0001405

Model df: 4.   Total lags used: 10
Code
coeftest(arma_13)
Warning in sqrt(diag(se)): NaNs produced

z test of coefficients:

             Estimate  Std. Error z value Pr(>|z|)  
ar1        0.33895106         NaN     NaN      NaN  
ma1       -0.37787376         NaN     NaN      NaN  
ma2       -0.00033557         NaN     NaN      NaN  
ma3       -0.03535024         NaN     NaN      NaN  
intercept  0.00051530  0.00031168  1.6533  0.09828 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_23 <- Arima(log_returns$LogReturns, order=c(2,0,3))
checkresiduals(arma_23)


    Ljung-Box test

data:  Residuals from ARIMA(2,0,3) with non-zero mean
Q* = 25.221, df = 5, p-value = 0.0001263

Model df: 5.   Total lags used: 10
Code
coeftest(arma_23)

z test of coefficients:

             Estimate  Std. Error z value Pr(>|z|)  
ar1        0.12928859  0.26476852  0.4883  0.62533  
ar2        0.22589080  0.19486455  1.1592  0.24637  
ma1       -0.16826815  0.26487623 -0.6353  0.52525  
ma2       -0.23309281  0.19068257 -1.2224  0.22155  
ma3       -0.03297002  0.01859016 -1.7735  0.07614 .
intercept  0.00050770  0.00030819  1.6474  0.09948 .
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_31 <- Arima(log_returns$LogReturns, order=c(3,0,1))
checkresiduals(arma_31)


    Ljung-Box test

data:  Residuals from ARIMA(3,0,1) with non-zero mean
Q* = 27.5, df = 6, p-value = 0.0001167

Model df: 4.   Total lags used: 10
Code
coeftest(arma_31)

z test of coefficients:

             Estimate  Std. Error z value Pr(>|z|)   
ar1       -0.01946408  0.55210686 -0.0353 0.971877   
ar2       -0.01274576  0.02435614 -0.5233 0.600760   
ar3       -0.04139634  0.01342376 -3.0838 0.002044 **
ma1       -0.01920247  0.55288877 -0.0347 0.972294   
intercept  0.00051900  0.00032097  1.6170 0.105887   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_32 <- Arima(log_returns$LogReturns, order=c(3,0,2))
checkresiduals(arma_32)


    Ljung-Box test

data:  Residuals from ARIMA(3,0,2) with non-zero mean
Q* = 40.334, df = 5, p-value = 1.279e-07

Model df: 5.   Total lags used: 10
Code
coeftest(arma_32)

z test of coefficients:

             Estimate  Std. Error  z value  Pr(>|z|)    
ar1        0.03147753  0.02132298   1.4762  0.139883    
ar2       -0.97055711  0.01697418 -57.1784 < 2.2e-16 ***
ar3       -0.03525799  0.01280948  -2.7525  0.005914 ** 
ma1       -0.06976751  0.01731179  -4.0301 5.576e-05 ***
ma2        0.96109526  0.02058307  46.6935 < 2.2e-16 ***
intercept  0.00053285  0.00033637   1.5841  0.113169    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Code
arma_33 <- Arima(log_returns$LogReturns, order=c(3,0,3))
checkresiduals(arma_33)


    Ljung-Box test

data:  Residuals from ARIMA(3,0,3) with non-zero mean
Q* = 8.5727, df = 4, p-value = 0.07271

Model df: 6.   Total lags used: 10
Code
coeftest(arma_33)
Warning in sqrt(diag(se)): NaNs produced

z test of coefficients:

             Estimate  Std. Error  z value Pr(>|z|)    
ar1       -0.93303265         NaN      NaN      NaN    
ar2        0.68686580  0.06141265  11.1844  < 2e-16 ***
ar3        0.81111732  0.07456554  10.8779  < 2e-16 ***
ma1        0.89063594         NaN      NaN      NaN    
ma2       -0.72881204  0.05908310 -12.3354  < 2e-16 ***
ma3       -0.80508569  0.07150341 -11.2594  < 2e-16 ***
intercept  0.00060076  0.00028783   2.0872  0.03687 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As observed from the outputs, the simpler models (such as AR(1) and MA(1)) yield very low p-values for the Ljung-Box test (\(<0.05\)). This indicates that significant autocorrelation remains within the residuals, implying that these models have failed to capture all the relevant information.

Conversely, as we increase complexity towards an ARMA(3,3), the p-value rises above the critical 5% threshold. Consequently, we fail to reject the null hypothesis: the residuals of this model behave as White Noise. This suggests that the ARMA(3,3) is mathematically effective in capturing the patterns of the time series, although its high complexity warrants caution regarding the risk of overfitting.

Additionally, the ARMA(1,3) model exhibits instability in parameter estimation (incalculable Std. Error), pointing to potential over-parameterization.

2.3 AIC vs BIC

We visualize the trends of the information criteria for all estimated models.

Code
models_list <- list(
  "AR(1)" = arma_10, "MA(1)" = arma_01, "ARMA(1,1)" = arma_11,
  "AR(2)" = arma_20, "MA(2)" = arma_02, "ARMA(1,2)" = arma_12,
  "ARMA(2,1)" = arma_21, "ARMA(2,2)" = arma_22,
  "AR(3)" = arma_30, "MA(3)" = arma_03, "ARMA(1,3)" = arma_13,
  "ARMA(2,3)" = arma_23, "ARMA(3,1)" = arma_31, "ARMA(3,2)" = arma_32,
  "ARMA(3,3)" = arma_33
)

metrics_df <- data.frame(
  Model = names(models_list),
  AIC = sapply(models_list, AIC),
  BIC = sapply(models_list, BIC)
)

metrics_df$Model <- factor(metrics_df$Model, levels = metrics_df$Model)

min_aic_val <- min(metrics_df$AIC)
min_bic_val <- min(metrics_df$BIC)

plot_aic <- plot_ly(metrics_df, x = ~Model, y = ~AIC, name = 'AIC',
  type = 'scatter', mode = 'lines+markers',
  line = list(color = '#1f77b4', width = 2),
  marker = list(size = 8, color = '#1f77b4')) %>% 
  layout(title = "AIC Trend", yaxis = list(title = "AIC"))

best_aic <- metrics_df[metrics_df$AIC == min_aic_val, ]
plot_aic <- plot_aic %>%
  add_trace(data = best_aic, x = ~Model, y = ~AIC, 
    type = 'scatter', mode = 'markers', name = 'Best AIC',
    marker = list(color = 'red', size = 12, symbol = 'star'),
    showlegend = FALSE
  )

plot_bic <- plot_ly(metrics_df, x = ~Model, y = ~BIC, name = 'BIC',
  type = 'scatter', mode = 'lines+markers',
  line = list(color = '#ff7f0e', width = 2),
  marker = list(size = 8, color = '#ff7f0e')) %>% 
  layout(title = "BIC Trend", yaxis = list(title = "BIC"))

best_bic <- metrics_df[metrics_df$BIC == min_bic_val, ]
plot_bic <- plot_bic %>%
  add_trace(
    data = best_bic, x = ~Model, y = ~BIC, 
    type = 'scatter', mode = 'markers', name = 'Best BIC',
    marker = list(color = 'red', size = 12, symbol = 'star'),
    showlegend = FALSE
  )

final_plot <- subplot(plot_aic, plot_bic, nrows = 2, shareX = TRUE, titleY = TRUE) %>%
  layout(
    title = "Model Selection Criteria: AIC & BIC Trends",
    hovermode = "x unified",
    margin = list(t = 50)
  )

final_plot

Comparison between AIC and BIC

The graphical analysis of the information criteria highlights a typical divergence found in financial time series modeling.

As evidenced by the plot, the AIC criterion, which prioritizes goodness of fit, identifies the ARMA(3,3) as the optimal model (minimum value). Conversely, the BIC criterion, which penalizes model complexity more severely to prevent overfitting, suggests a much more parsimonious model: the ARMA(1,1).

Faced with these conflicting indications (a complex model vs a simple model), it is not possible to determine a priori which one is superior. To resolve this ambiguity and assess which model generalizes better on unobserved data, we will proceed with an Out-of-Sample validation, computing forecasts and comparing error metrics against actual data.

2.4 Forecast

Since the information criteria (AIC and BIC) suggest different models, we proceed with an Out-of-Sample validation to test the actual predictive capacity on data not used for estimation.

We split the time series into two subsets:

  • Training Set (\(80\%\)): Used to estimate the model parameters.
  • Test Set (\(20\%\)): Used to compare forecasts against actual data (h-step-ahead forecast).
Code
n_total <- nrow(log_returns)
n_train <- floor(0.80 * n_total)

train_data <- log_returns$LogReturns[1:n_train]
test_data <- log_returns$LogReturns[(n_train + 1):n_total]

set.seed(123)

mod_winner_bic <- Arima(train_data, order=c(1,0,1))
mod_winner_aic <- Arima(train_data, order=c(3,0,3))
mod_naive <- Arima(train_data, order=c(0,0,0)) 

h_steps <- length(test_data)

fc_bic <- forecast(mod_winner_bic, h=h_steps)
fc_aic <- forecast(mod_winner_aic, h=h_steps)
fc_naive <- forecast(mod_naive, h=h_steps)

The following plot displays the time series trend and the forecasts generated by the three models over the test period.

Note

Given the stationary nature and high volatility of log-returns, it is expected that medium-to-long-term forecasts will tend to converge rapidly towards the unconditional mean of the process. For this reason we apply method as one-step-ahead and k-step-ahead.

Code
fit_bic_rolling <- Arima(log_returns$LogReturns, model = mod_winner_bic)
one_step_bic <- fitted(fit_bic_rolling)[(n_train + 1):n_total]

fit_aic_rolling <- Arima(log_returns$LogReturns, model = mod_winner_aic)
one_step_aic <- fitted(fit_aic_rolling)[(n_train + 1):n_total]

dates_test  <- log_returns$Date[(n_train + 1):n_total]

fig_roll <- plot_ly(
  x = dates_test, 
  y = test_data, 
  type = 'scatter', 
  mode = 'lines', 
  name = 'Actual Test Data', 
  line = list(color = 'black', width = 1, dash = 'dot')
)

fig_roll <- add_trace(
  fig_roll, 
  x = dates_test, 
  y = as.numeric(one_step_bic), 
  name = 'Rolling ARMA(1,1)', 
  line = list(color = 'blue', width = 1)
)

fig_roll <- add_trace(
  fig_roll, 
  x = dates_test, 
  y = as.numeric(one_step_aic), 
  name = 'Rolling ARMA(3,3)', 
  line = list(color = 'red', width = 1)
)

fig_roll <- layout(
  fig_roll,
  title = "One-Step-Ahead Rolling Forecast (Test Set: 20%)",
  xaxis = list(title = "Date"),
  yaxis = list(title = "Log Return"),
  legend = list(orientation = "h", x = 0.1, y = -0.2)
)

fig_roll

Comparison One-Step-Ahead Rolling Forecast

Code
calc_k_step_rolling <- function(model_obj, series, n_train, k=5) {
  n_total <- length(series)
  test_indices <- (n_train + 1):n_total
  forecasts <- numeric(length(test_indices))
  
  for (i in seq_along(test_indices)) {
    target_idx <- test_indices[i]
    origin_idx <- target_idx - k
    
    if (origin_idx > 0) {
      subset_data <- series[1:origin_idx]
      fit_temp <- Arima(subset_data, model = model_obj)
      
      fc_temp <- forecast(fit_temp, h = k)
      
      forecasts[i] <- as.numeric(fc_temp$mean[k])
    } else {
      forecasts[i] <- NA
    }
  }
  return(forecasts)
}

k_horizon <- 5
vec_5step_bic <- calc_k_step_rolling(mod_winner_bic, log_returns$LogReturns, n_train, k = k_horizon)
vec_5step_aic <- calc_k_step_rolling(mod_winner_aic, log_returns$LogReturns, n_train, k = k_horizon)

dates_test <- log_returns$Date[(n_train + 1):n_total]

fig_roll_5 <- plot_ly(
  x = dates_test, 
  y = test_data, 
  type = 'scatter', 
  mode = 'lines', 
  name = 'Actual Test Data', 
  line = list(color = 'black', width = 1, dash = 'dot')
)

fig_roll_5 <- add_trace(
  fig_roll_5, 
  x = dates_test, 
  y = vec_5step_bic, 
  name = "Rolling ARMA(1,1)", 
  line = list(color = 'blue', width = 1.5)
)

fig_roll_5 <- add_trace(
  fig_roll_5, 
  x = dates_test, 
  y = vec_5step_aic, 
  name = "Rolling ARMA(3,3)", 
  line = list(color = 'red', width = 1.5)
)

fig_roll_5 <- layout(
  fig_roll_5,
  title = "Five-Step-Ahead Rolling Forecast (Test Set: 20%)",
  xaxis = list(title = "Date"),
  yaxis = list(title = "Log Return"),
  legend = list(orientation = "h", x = 0.1, y = -0.2)
)

fig_roll_5

Comparison Five-Step-Ahead Rolling Forecast

We evaluate the predictive performance using:

  • MAE (Mean absolute Error)
  • RMSE (Root Mean Squared Error)
  • Theil’s U2 index: The ratio between the model’s RMSE and the Naive model’s RMSE.
    • \(U<1\): The model outperforms the benchmark.
    • \(U \approx 1\): The model performs equivalently to the benchmark.
    • \(U>1\): The model performs worse than the benchmark.
Code
get_metrics_robust <- function(forecast_obj, actuals) {
  vec_pred <- as.numeric(forecast_obj$mean)
  vec_obs <- as.numeric(actuals)
  
  acc <- accuracy(vec_pred, vec_obs)
  
  return(c(RMSE = acc[1, "RMSE"], MAE = acc[1, "MAE"]))
}

res_bic <- get_metrics_robust(fc_bic, test_data)
res_aic <- get_metrics_robust(fc_aic, test_data)
res_naive <- get_metrics_robust(fc_naive, test_data)

u_bic <- res_bic["RMSE"] / res_naive["RMSE"]
u_aic <- res_aic["RMSE"] / res_naive["RMSE"]
u_naive <- 1.0

validation_table <- data.frame(
  Model = c("ARMA(1,1) [BIC Winner]", "ARMA(3,3) [AIC Winner]", "Naive (Benchmark)"),
  RMSE = c(res_bic["RMSE"], res_aic["RMSE"], res_naive["RMSE"]),
  MAE = c(res_bic["MAE"], res_aic["MAE"], res_naive["MAE"]),
  Theil_U2 = c(u_bic, u_aic, u_naive)
)

datatable(validation_table, options = list(dom = 't'), rownames = FALSE)

Comparison of out-of-sample error metrics

As highlighted in the table, the Theil’s U2 index for both ARMA models is extremely close to 1. This indicates that the statistical models, despite their complexity, fail to significantly outperform the simple forecast based on the historical mean (Naive Benchmark).

Important

This result is consistent with the Efficient Market Hypothesis (EMH), according to which past returns contain little to no useful information for predicting future short-term returns, making the series resemble a Random Walk.

2.5 Conclusion

The analysis conducted in this chapter has highlighted the limitations of stationary linear models in forecasting the direction of ASML returns.

Although information criteria (AIC) identified the ARMA(3,3) as a mathematical structure capable of “whitening” the in-sample residuals, the Out-of-Sample validation delivered an unequivocal verdict: a Theil’s U index close to 1 confirms that no ARMA model systematically outperforms the Naive benchmark. This empirical result corroborates the hypothesis that, regarding the conditional mean, returns follow a process closely resembling a Random Walk, rendering trading strategies based solely on past autocorrelation futile.

However, the unpredictability of the mean does not imply a total absence of structure within the series. Visual inspection of the log-returns plot reveals evident volatility clustering: periods of high volatility tend to be followed by high volatility, and periods of calm by calm (a phenomenon known as conditional heteroskedasticity).

By assuming constant variance over time (homoskedasticity), ARMA models fail to capture this fundamental characteristic of financial markets. Therefore, in the next chapter, we will shift our focus from predicting the sign of returns to modeling their magnitude (risk) by introducing the GARCH (Generalized AutoRegressive Conditional Heteroskedasticity) family of models.