Transformations

Let the regression variable under consideration be denoted by

\begin{equation}\label{eq:VecNot} \mathbf{X} = \begin{pmatrix} x_1 & \cdots & x_n \end{pmatrix}^\top. \end{equation}

Adopting the matrix notation used in Table 1 to represent the monthly number of working days in Germany, Equation \eqref{eq:VecNot} can be rewritten loosely as

\begin{equation}\label{eq:MatNot} \mathbf{X} = \begin{pmatrix} x_{1, 1} & x_{1, 2} & \cdots & x_{1, \tau} \\ x_{2, 1} & x_{2, 2} & \cdots & x_{2, \tau} \\ \vdots & \vdots & \ddots & \vdots \\ x_{n_1 ,1} & x_{n_2, 2} & \cdots & x_{n_\tau, \tau} \end{pmatrix}, \end{equation}

which will be used for convenience in the sequel. In Equation \eqref{eq:MatNot}, $n_j$ is the number of observations in the $j$-th period and $\tau$ is the number of observations per year (that is $\tau = 2$ for biannual data, $\tau = 4$ for quarterly data and $\tau = 12$ for monthly data), giving $n = \sum_{j = 1}^\tau n_j$. In most cases, the regression variable will cover a span of complete years, leading to $n_1 = n_2 = \cdots = n_\tau$. However, this does not constitute a necessary condition and, therefore, the plug-in allows for some flexibility here.

Lead/Lag

Using Equation \eqref{eq:VecNot}, a led or lagged regression variable is given by $\mathbf{X}_k = (x_{1-k}, …, x_{n-k})^\top$. The shift parameter $k$ can be specified via the argument Number of periods. Thus, $k > 0$ yields a lagged version of $\mathbf{X}$ and $k < 0$ yields a led version.

Regimes

As opposed to grouping (see below), this transformation creates a single daughter regression variable from a mother regression variable. More precisely, it produces an on-off copy of the latter variable over a set of user-defined spans called active periods.

To carry out this transformation in a TransReg document, the checkbox Enabled has to be marked first in the Regimes section of the specification panel. Afterwards, click into the white box next to Regimes to open a new menu where a list of active periods can be generated using the respective Start and End arguments. Save your list by clicking the Done button and the complementary inactive periods are automatically recognised. Use the drop down menu Value for inactive periods in order to assign values of $0.0$ (default) or $NaN$ to the inactive periods.

Grouping

Grouping is the process of creating a set of period-specific daughter regression variables from a mother regression variable. The maximum number of groups allowed in this process is set automatically to $\tau$. For any daughter regression variable, the columns which should not be taken into account are treated as either zeros or NaN’s, depending on the user’s choice. Thus, separation of the figures for period $\tau$ from $\mathbf{X}$ as given in Equation \eqref{eq:MatNot} may create the two daughter variables

\begin{equation*} \mathbf{X}_1 = \begin{pmatrix} x_{1, 1} & \cdots & x_{1, \tau - 1} & 0.0 \\ x_{2, 1} & \cdots & x_{2, \tau - 1} & 0.0 \\ \vdots & \ddots & \vdots & \vdots \\ x_{n_1, 1} & \cdots & x_{n_{\tau - 1}, \tau - 1} & 0.0 \end{pmatrix} \quad \text{and} \quad \mathbf{X}_2 = \begin{pmatrix} 0.0 & \cdots & 0.0 & x_{1, \tau} \\ 0.0 & \cdots & 0.0 & x_{2, \tau} \\ \vdots & \ddots & \vdots & \vdots \\ 0.0 & \cdots & 0.0 & x_{n_\tau, \tau} \end{pmatrix}, \end{equation*}

Centring

A variable $\mathbf{X}$ is called centred if $\mathbb{E} (\mathbf{X}) = 0$. Therefore a centred version of $\mathbf{X}$ is always obtained in theory by the transformation $\mathbf{X} - \mathbb{E} (\mathbf{X})$. In practice, however, the theoretical expectation of $\mathbf{X}$ is usually unknown and thus estimated by a sample mean.

Section Sample means briefly describes the two sample means implemented in the plug-in.
Section Pre-test provides details on the centring pre-test which is performed automatically for any regression variable that has not been centred by the plug-in so far.
Section Calculations outlines the options available for centring and how they can be put into practice in the specification panel.

Sample means

The plug-in implements the global and period-specific sample means.

The global sample mean of $\mathbf{X}$ is given by

\begin{equation} \label{eq:meanG} \bar{\mathbf{X}}_g = \frac{1}{n} \sum_{j = 1}^\tau \sum_{i = 1}^{n_j} x_{i, j}. \end{equation}

The vector of period-specific sample means is defined as $\bar{\mathbf{X}}_{ps} = \begin{pmatrix} \bar{\mathbf{X}}_1 & \cdots & \bar{\mathbf{X}}_\tau \end{pmatrix}^\top$, where

\begin{equation} \label{eq:meanPS} \bar{\mathbf{X}}_j = \frac{1}{n_j} \sum_{i = 1}^{n_j} x_{i, j} \end{equation} for each $j \in {1, \ldots, \tau}$.

Combining Equations \eqref{eq:meanG} and \eqref{eq:meanPS}, it follows immediately that

\begin{equation} \label{eq:meanGPS} \bar{\mathbf{X}}_g = \sum_{j = 1}^\tau w_j \bar{\mathbf{X}}_j, \end{equation}

where $w_j = n_j / n$. Thus the global sample mean of $\mathbf{X}$ always equals the weighted sum of period-specific sample means of $\mathbf{X}$ where the weight of each period is given by the period’s share to the total number of observations.

Pre-test

The centring pre-test is performed automatically for any regression variable which is labelled either as original or as grouped in order to avoid centring of regression variables that have been centred already. The basic idea is to check whether the global sample mean and/or the vector of period-specific sample means of the variable in question is sufficiently close to zero. To this end, let $\varepsilon_\mathit{low} > 0$ and $\varepsilon_\mathit{upp} > 0$ with $\varepsilon_\mathit{low} < \varepsilon_\mathit{upp}$ be a lower and an upper threshold, respectively, which account for different levels of sufficiency and can be customised in the global options (see Section Global Options). The pre-test then consists of two steps:

It is checked whether $\mathbf{X}$ has been centred already around its period-specific sample means. This is assumed to be true if $||\bar{\mathbf{X}}_{ps}||_2 \leq \varepsilon_\mathit{low}$ or at least $||\bar{\mathbf{X}}_{ps}||_2 \leq \varepsilon_\mathit{upp}$, where $||\cdot||_2$ denotes the Euclidean norm. If so, the global sample mean of $\mathbf{X}$ is also close to zero according to Equation \eqref{eq:meanGPS} and, therefore, not checked separately.
If $||\bar{\mathbf{X}}_{ps}||_2 > \varepsilon_\mathit{upp}$ holds in Step (1), it is checked whether $\mathbf{X}$ has been centred already around its global sample mean. This is assumed to be true if $|\bar{\mathbf{X}}_g| \leq \varepsilon_\mathit{low}$ or at least $|\bar{\mathbf{X}}_g| \leq \varepsilon_\mathit{upp}$.

Depending on the outcome of the pre-test, a verbal comment is displayed in the respective column of the data panel. The possible comments are summarised in Table 3. Note that any indication of centring will generate a warning message and the regression variable in question is not centred.

Table 1: Possible comments of the centring pre-test

Pre-test column		Displayed if
Centred (seasonal means)		$\|\|\bar{\mathbf{X}}_{ps}\|\|_2 \leq \varepsilon_\mathit{low}$.
Probably centred (seasonal means)		$\varepsilon_\mathit{low} < \|\|\bar{\mathbf{X}}_{ps}\|\|_2 \leq \varepsilon_\mathit{upp}$.
Centred (global mean)		$\|\|\bar{\mathbf{X}}_{ps}\|\|_2 > \varepsilon_\mathit{upp}$ and $\|\bar{\mathbf{X}}_g\,\| \leq \varepsilon_\mathit{low}$.
Probably centred (global mean)		$\|\|\bar{\mathbf{X}}_{ps}\|\|_2 > \varepsilon_\mathit{upp}$ and $\varepsilon_\mathit{low} < \|\bar{\mathbf{X}}_g\| \leq \varepsilon_\mathit{upp}$.
Not centred		$\|\|\bar{\mathbf{X}}_{ps}\|\|_2 > \varepsilon_\mathit{upp}$ and $\|\bar{\mathbf{X}}_g\| > \varepsilon_\mathit{upp}$.

Calculations

For any original regression variable $\mathbf{X}$, the transformed variable centred around the global sample mean of $\mathbf{X}$ is obtained as:

\begin{equation*} \mathbf{X}^\mathit{Cen}_g = \begin{pmatrix} x_{1, 1} - \bar{\mathbf{X}}_g & x_{1, 2} - \bar{\mathbf{X}}_g & \cdots & x_{1, \tau} - \bar{\mathbf{X}}_g \\ x_{2, 1} - \bar{\mathbf{X}}_g & x_{2, 2} - \bar{\mathbf{X}}_g & \cdots & x_{2, \tau} - \bar{\mathbf{X}}_g \\ \vdots & \vdots & \ddots & \vdots \\ x_{n_1 ,1} - \bar{\mathbf{X}}_g & x_{n_2, 2} - \bar{\mathbf{X}}_g & \cdots & x_{n_\tau, \tau} - \bar{\mathbf{X}}_g \end{pmatrix}, \end{equation*}

see also Example 2 below. Similarly, the transformed variable centred around the period-specific sample means of $\mathbf{X}$ is given by:

\begin{equation*} \mathbf{X}^\mathit{Cen}_{ps} = \begin{pmatrix} x_{1, 1} - \bar{\mathbf{X}}_1 & x_{1, 2} - \bar{\mathbf{X}}_2 & \cdots & x_{1, \tau} - \bar{\mathbf{X}}_\tau \\ x_{2, 1} - \bar{\mathbf{X}}_1 & x_{2, 2} - \bar{\mathbf{X}}_2 & \cdots & x_{2, \tau} - \bar{\mathbf{X}}_\tau \\ \vdots & \vdots & \ddots & \vdots \\ x_{n_1 ,1} - \bar{\mathbf{X}}_1 & x_{n_2, 2} - \bar{\mathbf{X}}_2 & \cdots & x_{n_\tau, \tau} - \bar{\mathbf{X}}_\tau \end{pmatrix}, \end{equation*}

see also Example 3 below. For any grouped regression variable, these transformations are carried out only in the “active” periods and the NaN’s, if introduced in the “inactive” periods, are replaced with zeros, see Example 4 below. Note that the latter replacement is also applied to missing values in both original and grouped regression variables.

To carry out centring in a TransReg document, go to the CENTRING section of the specification panel and use the drop down menu Sample mean in order to specify the type of sample mean. The implemented options are summarised in Table 2. In case of centring around a global sample mean or period-specific sample means, the span used for calculating the selected sample mean can be further specified. To this end, click on Calculation span, use the drop down menu Type to select a span category and, if necessary, provide the requested information. The implemented span categories are summarised in Table 3. In addition, the argument Extension horizon can be used to specify the number of periods (positive value) or the number of years (negative value) by which the centred regression variable should be extended with zeros. Eventually, confirm your choice by clicking on Calculate in the bottom line of the specification panel and the centred regression variable appears in the data panel under the following default name:

\[MyRegVar \rhd Centred.\]

If grouped regression variables have been created prior to centring, the default names of the centred daughter regression variables are:

\[\text{MyRegVar} \rhd \text{GroupG1} \rhd \text{Centred}, \\ \text{MyRegVar} \rhd \text{GroupG2} \rhd \text{Centred}, \\ \ldots \\ \text{MyRegVar} \rhd \text{GroupGN} \rhd \text{Centred}.\]

Table 2: Sample means for centring (default in boldface)

Sample mean		Action
None		No centring.
Global		Centring around global sample mean as defined in equation \eqref{eq:meanG}.
Seasonal		Centring around period-specific sample means as defined in Equation \eqref{eq:meanPS}.

Table 3: Calculation span for centring (default in boldface)

Type	Start	End
All	Start of regression variable	End of regression variable
From	User-defined date	End of regression variable
To	Start of regression variable	User-defined date
Between	User-defined date	User-defined date
Last	User-defined number of periods	End of regression variable
First	Start of regression variable	User-defined number of periods
Excluding	User-defined number of periods	User-defined number of periods

Example 2: (Centring around global sample mean) Using the span from January 2011 to December 2020, the global sample mean of the monthly number of working days is (rounded to one decimal place) given by \begin{equation*} \bar{\mathbf{X}}_g = 20.8. \end{equation*} Accordingly, the monthly number of working days centred around this average reads \begin{equation*} {\small\mathbf{X}^\mathit{Cen}_g = \left(\begin{array}{*{12}{r}} -0.2 & -0.8 & 1.8 & -1.8 & 1.2 & -1.5 & 0.2 & 2.0 & 1.2 & -0.9 & 0.5 & 0.2 \\ 0.8 & -0.2 & 1.2 & -1.8 & -0.8 & -0.5 & 1.2 & 2.0 & -0.8 & 1.1 & 0.5 & -3.8 \\ 1.2 & -1.2 & -0.8 & 0.2 & -1.5 & -0.8 & 2.2 & 1.0 & 0.2 & 1.1 & -0.5 & -2.8 \\ 0.8 & -0.8 & -0.2 & -0.8 & -0.8 & -1.5 & 2.2 & 0.0 & 1.2 & 1.1 & -0.8 & -1.8 \\ -0.2 & -1.2 & 1.2 & -0.8 & -2.8 & 0.5 & 2.2 & 0.2 & 1.2 & 1.2 & 0.2 & -0.8 \\ -1.2 & -0.2 & 0.2 & 0.2 & -1.5 & 1.2 & 0.2 & 2.0 & 1.2 & -0.9 & 0.5 & 0.2 \\ 0.8 & -1.2 & 2.2 & -2.8 & 0.2 & -0.5 & 0.2 & 2.0 & 0.2 & -0.8 & 0.5 & -1.8 \\ 1.2 & -1.2 & 0.2 & -0.8 & -1.5 & 0.2 & 1.2 & 2.0 & -0.8 & 1.1 & 0.5 & -3.8 \\ 1.2 & -0.8 & -0.2 & -0.8 & 0.2 & -2.5 & 2.2 & 1.0 & 0.2 & 1.1 & -0.5 & -2.8 \\ 0.8 & -1.2 & 1.2 & -0.8 & -1.8 & -0.5 & 2.2 & 0.2 & 1.2 & 1.2 & 0.2 & -0.8 \end{array}\right).} \end{equation*} What jumps immediately to the eye is the fact that the centred number of working days is always negative in February and positive or at least zero in July and August. This might be disadvantageous in some practical applications. Thus the use of period-specific sample means seems to be preferable in these cases.

Example 3: (Centring around month-specific sample means) Using again the span from January 2011 to December 2020, the month-specific sample means of the number of working days are (again rounded to one decimal place) given by \begin{equation*} \bar{\mathbf{X}}_{ps} = \begin{pmatrix} 21.3 & 19.9 & 21.5 & 19.8 & 19.9 & 20.2 & 22.2 & 22.0 & 21.3 & 21.3 & 20.9 & 19.0 \end{pmatrix}^\top. \end{equation*} Thus the monthly number of working days centred around these averages is \begin{equation*} {\mathbf{X}^\mathit{Cen}_{ps} = \left(\begin{array}{*{12}{r}} -0.7 & 0.1 & 1.1 & -0.8 & 2.1 & -0.9 & -1.2 & 0.8 & 0.7 & -1.4 & 0.4 & 2.0 \\ 0.3 & 0.7 & 0.5 & -0.8 & 0.1 & 0.1 & -0.2 & 0.8 & -1.3 & 0.6 & 0.4 & -2.0 \\ 0.7 & -0.3 & -1.5 & 1.2 & -0.6 & -0.2 & 0.8 & -0.2 & -0.3 & 0.6 & -0.6 & -1.0 \\ 0.3 & 0.1 & -0.9 & 0.2 & 0.1 & -0.9 & 0.8 & -1.2 & 0.7 & 0.6 & -0.9 & 0.0 \\ -0.7 & -0.3 & 0.5 & 0.2 & -1.9 & 1.1 & 0.8 & -1.0 & 0.7 & 0.7 & 0.1 & 1.0 \\ -1.7 & 0.7 & -0.5 & 1.2 & -0.6 & 1.8 & -1.2 & 0.8 & 0.7 & -1.4 & 0.4 & 2.0 \\ 0.3 & -0.3 & 1.5 & -1.8 & 1.1 & 0.1 & -1.2 & 0.8 & -0.3 & -1.3 & 0.4 & 0.0 \\ 0.7 & -0.3 & -0.5 & 0.2 & -0.6 & 0.8 & -0.2 & 0.8 & -1.3 & 0.6 & 0.4 & -2.0 \\ 0.7 & 0.1 & -0.9 & 0.2 & 1.1 & -1.9 & 0.8 & -0.2 & -0.3 & 0.6 & -0.6 & -1.0 \\ 0.3 & -0.3 & 0.5 & 0.2 & -0.9 & 0.1 & 0.8 & -1.0 & 0.7 & 0.7 & 0.1 & 1.0 \end{array}\right).} \end{equation*} As opposed to the use of the global sample mean in Example 2, this centred number of working days has positive and negative values in each month.
To demonstrate the impact of the span used for calculating the sample means, we now shorten this span to the period from January 2015 to December 2020 and note that the month-specific sample means change visibly, especially in May and June, to \begin{equation*} \bar{\mathbf{X}}_{ps} = \begin{pmatrix} 21.2 & 19.8 & 21.6 & 19.8 & 19.6 & 20.5 & 22.2 & 22.0 & 21.3 & 21.3 & 21.0 & 19.2 \end{pmatrix}^\top. \end{equation*} As a result, the centred monthly number of working days changes visibly as well. It now reads \begin{equation*} {\mathbf{X}^\mathit{Cen}_{ps} = \left(\begin{array}{*{12}{r}} -0.6 & 0.2 & 1.2 & -0.8 & 2.4 & -1.2 & -1.2 & 0.8 & 0.7 & -1.4 & 0.3 & 1.8 \\ 0.4 & 0.8 & 0.4 & -0.8 & 0.4 & -0.2 & -0.2 & 0.8 & -1.3 & 0.6 & 0.3 & -2.2 \\ 0.8 & -0.2 & -1.6 & 1.2 & -0.3 & -0.5 & 0.8 & -0.2 & -0.3 & 0.6 & -0.7 & -1.2 \\ 0.4 & 0.2 & -1.0 & 0.2 & 0.5 & -1.2 & 0.8 & -1.2 & 0.7 & 0.6 & -1.0 & -0.2 \\ -0.6 & -0.2 & 0.4 & 0.2 & -1.6 & 0.8 & 0.8 & -1.0 & 0.7 & 0.7 & 0.0 & 0.8 \\ -1.6 & 0.8 & -0.6 & 1.2 & -0.9 & 1.5 & -1.2 & 0.8 & 0.7 & -1.4 & 0.3 & 1.8 \\ 0.4 & -0.2 & 1.4 & -1.8 & 1.4 & -0.2 & -1.2 & 0.8 & -0.3 & -1.3 & 0.3 & -0.2 \\ 0.8 & -0.2 & -0.6 & 0.2 & -0.3 & 0.5 & -0.2 & 0.8 & -1.3 & 0.6 & 0.3 & -2.2 \\ 0.8 & 0.2 & -1.0 & 0.2 & 1.4 & -1.6 & 0.8 & -0.2 & -0.3 & 0.6 & -0.7 & -1.2 \\ 0.4 & -0.2 & 0.4 & 0.2 & -0.6 & -0.2 & 0.8 & -1.0 & 0.7 & 0.7 & 0.0 & 0.8 \end{array}\right),} \end{equation*} exhibiting even some sign changes compared to the centred version calculated above.

Example 4: (Grouping and centring around period-specific sample means) In Example 1, the monthly number of working days has been split into two daughter regression variables: one for the months January through November and one for December. Using the month-specific sample means calculated from the entire observation span in Example 3, the centred versions of the grouped regression variables are given by \begin{equation*} {\mathbf{X}^\mathit{Cen}_{1, ps} = \left(\begin{array}{*{12}{r}} -0.7 & 0.1 & 1.1 & -0.8 & 2.1 & -0.9 & -1.2 & 0.8 & 0.7 & -1.4 & 0.4 & 0.0 \\ 0.3 & 0.7 & 0.5 & -0.8 & 0.1 & 0.1 & -0.2 & 0.8 & -1.3 & 0.6 & 0.4 & 0.0 \\ 0.7 & -0.3 & -1.5 & 1.2 & -0.6 & -0.2 & 0.8 & -0.2 & -0.3 & 0.6 & -0.6 & 0.0 \\ 0.3 & 0.1 & -0.9 & 0.2 & 0.1 & -0.9 & 0.8 & -1.2 & 0.7 & 0.6 & -0.9 & 0.0 \\ -0.7 & -0.3 & 0.5 & 0.2 & -1.9 & 1.1 & 0.8 & -1.0 & 0.7 & 0.7 & 0.1 & 0.0 \\ -1.7 & 0.7 & -0.5 & 1.2 & -0.6 & 1.8 & -1.2 & 0.8 & 0.7 & -1.4 & 0.4 & 0.0 \\ 0.3 & -0.3 & 1.5 & -1.8 & 1.1 & 0.1 & -1.2 & 0.8 & -0.3 & -1.3 & 0.4 & 0.0 \\ 0.7 & -0.3 & -0.5 & 0.2 & -0.6 & 0.8 & -0.2 & 0.8 & -1.3 & 0.6 & 0.4 & 0.0 \\ 0.7 & 0.1 & -0.9 & 0.2 & 1.1 & -1.9 & 0.8 & -0.2 & -0.3 & 0.6 & -0.6 & 0.0 \\ 0.3 & -0.3 & 0.5 & 0.2 & -0.9 & 0.1 & 0.8 & -1.0 & 0.7 & 0.7 & 0.1 & 0.0 \end{array}\right)} \end{equation*} and \begin{equation*} {\mathbf{X}^\mathit{Cen}_{2, ps} = \left(\begin{array}{*{12}{r}} 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 2.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & -2.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & -1.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 1.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 2.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & -2.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & -1.0 \\ 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 0.0 & 1.0 \end{array}\right).} \end{equation*} Note that the process of centring replaces automatically the NaN's in each grouped monthly number of working days with zeros.