--- title: "Dictionary of statespacer" description: > `statespacer()` returns an object containing many items. In this vignette, you will learn what exactly is returned by `statespacer()`, so that you can obtain more value out of the statespacer package. output: rmarkdown::html_vignette bibliography: ../inst/REFERENCES.bib vignette: > %\VignetteIndexEntry{Dictionary of statespacer} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` This document provides extensive details about the object that is returned by `statespacer()`. In order to do so, we start with introducing the form of the general linear Gaussian state space model, following the notation used by @durbin2012time. Obtaining a grasp of the notation used will help to get the most out of the statespacer package! ## The general linear Gaussian state space model There are many ways to write down the form of the general linear Gaussian state space model. We use the form used by @durbin2012time: $$ \begin{aligned} y_t ~ &= ~ Z_t\alpha_t ~ + ~ \varepsilon_t, &\varepsilon_t ~ &\sim ~ N(0, ~ H_t), \\ \alpha_{t+1} ~ &= ~ T_t\alpha_t ~ + ~ R_t\eta_t, &\eta_t ~ &\sim ~ N(0, ~ Q_t), \\ & &\alpha_1 ~ &\sim ~ N(a_1, ~ P_1), \end{aligned} $$ where $y_t$ is the *observation vector*, a $p ~ \times ~ 1$ vector of dependent variables at time $t$, $\alpha_t$ is the unobserved *state vector*, a $m ~ \times ~ 1$ vector of state variables at time $t$, and $\varepsilon_t$ and $\eta_t$ are disturbance vectors of respectively the observation equation, and the state equation. To initialise the model, $a_1$ is used as the initial guess of the state vector, and $P_1$ is the corresponding uncertainty of that guess. The matrices $Z_t$, $H_t$, $T_t$, $R_t$, and $Q_t$ are called the *system matrices* of the state space model. Different specifications of these system matrices, lead to different interpretations of the model at hand. ## The object as returned by statespacer Having obtained a better understanding of the notation used, it is easier to find our way in the object that is returned by `statespacer()`. Let's say we store the object of statespacer in a variable called `fit`, that is, `fit <- statespacer(...)`. `fit` is then a list, containing many items, including other lists. This section describes the items that are included in `fit` one by one. ### function_call `function_call` is a list that contains, as the name suggests, the call to the `statespacer()` function, including default values for the input arguments that were not specified. For details about the various input arguments, check out `?statespacer`. ### system_matrices `system_matrices` is a list containing all of the system matrices of each of the components. For the variance - covariance matrices $H$ and $Q$, it also contains 2 decompositions, namely the Cholesky $LDL^{\top}$ decomposition, where $L$ is the loading matrix and $D$ is the diagonal matrix, and the correlation / standard deviation decomposition. The initial guess for the state vector, `a1`, is also included, together with the corresponding uncertainty split out by its diffuse component, `P_inf`, and its stationary component `P_star`. Further, it contains `Z_padded`, which is a list containing the $Z$ matrices of the components augmented with zeroes, such that its dimension is $p ~ \times ~ m$. These matrices are useful to extract individual components (which is already done for you), or to extract standard deviations of the components. There's also a vector called `state_label`, which labels the state vector to indicate which state parameters belongs to which components. If components are specified that introduce parameters into the system matrices, then these parameters are also included here. At the moment, these parameters are `lamba` (frequency) and `rho` (dampening factor) for the cycles, `AR` and `MA` for the ARIMA components, `SAR` and `SMA` for the SARIMA components, and `self_spec` for the self specified component. Note that coefficients of explanatory variables are put into the state vector, so these are treated as state parameters, and readily returned by the Kalman filter. ### predicted `predicted` is a list that contains the one-step ahead predicted (predicting time $t$ using data up to time $t ~ - ~ 1$) objects as returned by the Kalman filter: * `yfit` is the predicted value of $y$. * `v` is the prediction error. * `Fmat` is the uncertainty of the prediction. * `a` is the predicted state. * `P` is the uncertainty of the predicted state. * `P_inf` is the diffuse part of `P`. * `P_star` is the non-diffuse part of `P`. * `a_fc` is the predicted state for time $N ~ + ~ 1$ ($N$ being the last observed time point). * `P_fc` is the uncertainty of `a_fc`. * `P_inf_fc` is the diffuse part of `P_fc`. * `P_star_fc` is the non-diffuse part of `P_fc`. Further, the contributions of the components to the predicted values are extracted separately. ### filtered `filtered` is a list that contains the filtered (estimates for time $t$ using data up to time $t$) objects as returned by the Kalman filter. Here, `a` is the filtered state, `P` the uncertainty of the filtered state, `P_inf` is the diffuse part of `P`, and `P_star` is the non-diffuse part of `P`. Further, the filtered values of the components are extracted separately. ### smoothed `smoothed` is a list that contains smoothed (estimates for time $t$ using all of the time points) objects as returned by the Kalman smoother: * `a` is the smoothed state. * `V` the uncertainty of the smoothed state. * `eta` the smoothed state disturbance. * `eta_var` the uncertainty of `eta`. * `epsilon` the smoothed observation disturbance. * `epsilon_var` the uncertainty of `epsilon`. Further, the smoothed values of the components are extracted separately. ### diagnostics `diagnostics` is a list that contains items useful for diagnostic tests and model selection: * `initialisation_steps` is the number of timesteps required before initialisation was achieved of the diffuse elements of the state vector. * `loglik` is the loglikelihood value at the estimated parameters. * `AIC` is the Akaike Information Criterion for the model. * `BIC` is the Bayesian Information Criterion for the model. * `r` is the scaled smoothed state disturbance. * `N` is the uncertainty of `r`. * `param_indices` is a list containing the indices of the parameters in the parameter vector for each of the components. * `hessian` is the hessian of the loglikelihood evaluated at the estimated parameters. The following objects are only returned if `diagnostics = TRUE`: * `e` is the smoothing error. * `D` is the uncertainty of `e`. * `Tstat_observation` is the T-statistic for testing whether deviations from the observation equation are significant. * `Tstat_state` is the T-statistic for testing whether deviations from the state equation are significant. * `v_normalised` is the normalised prediction error. * `Skewness` is the skewness of `v_normalised`. * `Kurtosis` is the Kurtosis of `v_normalised`. * `Jarque_Bera` is the Jarque-Bera statistic for testing for normality. * `Jarque_Bera_criticalvalue` is the critical value of the Jarque-Bera test. * `correlogram` is the correlogram of `v_normalised`. * `Box_Ljung` are the Box-Ljung statistics for testing for serial correlation. * `Box_Ljung_criticalvalues` are the critical values of the Box-Ljung tests. * `Heteroscedasticity` are statistics for testing for heteroscedasticity. * `Heteroscedasticity_criticalvalues` are the critical values of the heteroscedasticity tests. ### optim `optim` is the list as returned by `stats::optim` or `optimx::optimr`, depending on if you have optimx installed. See `?stats::optim` and `?optimx::optimr` for details. Only returned if `fit = TRUE`. ### loglik_fun `loglik_fun` is the loglikelihood function that takes `param` as its only argument. It returns the loglikelihood at the specified parameters. ### standard_errors `standard_errors` is a list that contains the standard errors for the transformed parameters. Its structure mimicks the structure from `system_matrices`, but only representing those system matrices that depend on the parameters. Only returned if `standard_errors = TRUE`. ## Order of parameter input This section provides details about the parameter vector that's supplied to `statespacer()`. It clarifies which elements are used for what components. Most components use a variance - covariance matrix, which are constructed using the Cholesky $LDL^{\top}$ decomposition. The parameters supplied to build the variance - covariance matrix are ordered as follows: First, parameters are used for the Diagonal matrix $D$ and transformed by $exp(2x)$. Second, the remaining parameters are assigned columnwise to the Loading matrix $L$, so first the $1_{st}$ column, then the $2_{nd}$ column, and so on. The parameters are assigned to the components in the following order: 1. The variance - covariance matrix, $H$, of the observation equation. Unless the $H$ matrix is self-specified! 1. The Local Level component. 1. The Local Level component + Slope in that order. 1. The BSM components, in the order of the specified `BSM_vec`. 1. Explanatory Variables, if the coefficients are time-varying. The coefficients themselves go into the state vector, so they don't need any parameters! 1. Local Level + Explanatory Variables in the Level. First the parameters go to the variance - covariance matrix of the Level, after which the remaining parameters go to the variance - covariance matrix of the Explanatory Variables (if time-varying). 1. Local Level + Slope + Explanatory Variables in the Level. First the parameters go to the variance - covariance matrix of the Level, then they go to the variance - covariance matrix of the Slope, after which the remaining parameters go to the variance - covariance matrix of the Explanatory Variables (if time-varying). 1. The Cycle components, in the order of the specified cycles. The first parameter is used for the frequency, $\lambda$, of the cycle. The second parameter is used for the damping factor, $\rho$, of the cycle, but only if `damping_factor_ind = TRUE`. The remaining parameters are used for the variance - covariance matrix. 1. ARIMA, in the order of the specified ARIMA components. First, the parameters are used for the variance - covariance matrix. Then, the remaining parameters are first used for the AR coefficients, and then the MA coefficients. 1. SARIMA, in the order of the specified SARIMA components. First, the parameters are used for the variance - covariance matrix. Then, the remaining parameters are used in the order of the specified seasonalities `s`, first used for the AR coefficients of the first seasonality, and then the MA coefficients of the first seasonality, and so on for the subsequent seasonalities. 1. The self-specified part. Care should be taken in specifying the initial parameters! Usually, I check out the variances of the dependent variables and then apply the transformation $0.5\log(x)$ to the variances, and specify those as initial values for the parameters that go to the various variance - covariance matrices. For the AR and MA coefficients, it might be beneficial to initialise them close to 0, to prevent them from converging to unit root solutions. Using the information in this section, it should make the trial and error process of finding proper initial parameters less cumbersome! ## References