The Impact of Product and Labour Market Reform on Growth: Evidence for OECD Countries Based on Local Projections

This paper examines the impact of labour and product market reforms on economic growth in 25 OECD countries between 1985 and 2013, and tests whether this impact is conditioned by the fiscal policy stance, i.e. whether there are fiscal expansions or adjustments. Our local projection results suggest that controlling for endogeneity of reforms and the stance of fiscal policy is crucial. To control for endogeneity, we use the Augmented Inverse Probability Weighted estimator. Our results suggest that product market reforms mostly cause slight negative growth, except when implemented during periods of neutral fiscal policy. Labour market reforms hurt growth under tight and neutral fiscal policy but are conducive to economic growth if introduced during periods of expansionary fiscal policy. “In every press conference since I became ECB President, I have ended the introductory statement with a call to accelerate structural reforms in Europe. The same message was also conveyed repeatedly by my predecessors, in three quarters of all press conferences since the introduction of the euro. The term “structural reforms” is actually mentioned in approximately one third of all speeches by various members of the ECB Executive Board. By comparison, it features in only about 2% of speeches by governors of the Federal Reserve” (Draghi, 2015).

"In every press conference since I became ECB President, I have ended the introductory statement with a call to accelerate structural reforms in Europe. The same message was also conveyed repeatedly by my predecessors, in three quarters of all press conferences since the introduction of the euro. The term "structural reforms" is actually mentioned in approximately one third of all speeches by various members of the ECB Executive Board. By comparison, it features in only about 2% of speeches by governors of the Federal Reserve" (Draghi, 2015).

Introduction
Productivity growth is the most important component of economic growth. Regulation is widely believed to play a role in explaining cross-country productivity differences, as regulation limits the competitive pressures that challenge firms to thrive (Nicoletti and Scarpetta, 2003;Aghion and Griffith, 2005;Cette et al., 2016). Structural reforms are therefore often called for as illustrated by the quote from former ECB President Mario Draghi. Given the centrality of labour and product markets to the functioning of the economy, most research focuses on the output effects of reforms in these fields. 1 As pointed out by , these reforms broadly involve deregulating retail trade, professional services and certain segments of network industries, primarily by reducing barriers to entry; easing hiring and dismissal regulations for regular workers; and increasing the ability of and incentives for the non-employed to find jobs. 2 Product market reform may enhance productivity growth. For instance, Arnold et al. (2016) report that banking, telecommunications, insurance and transport reforms all had significant positive effects on the productivity of manufacturing firms in India. One mechanism through which reform may affect productivity is by enhancing firm dynamics, i.e., the process of entry, thrive, and exit from the market (de Haan and Parlevliet, 2018). Firm entry and exit (business churning) is often regarded as key to economic growth, as Schumpeterian creative destruction facilitates the resource shifts from less productive firms to more productive ones, fostering innovation and adoption of new technology (ECB, 2018). Business churning is affected by country-specific conditions influencing the incentives for firms to invest in new technology or adapt existing technologies to maintain their competitive edge. But competitive pressure may also spur productivity growth through other channels. For instance, the entry of new competitors may directly encourage productivity growth in incumbent firms (Aghion et al., 2004), while more competition in the markets for intermediate goods allows firms to boost productivity through cheaper inputs (Bourlès et al., 2013). Several studies have examined the impact of product market reform on economic growth, reporting mixed results (see Parlevliet et al., 2018 for a review of the literature).
Most research on labour market reform focuses on its effect on unemployment (see Brancaccio et al., 2018 for a discussion of the literature). However, this type of reform may 1 Note, however, that several studies (to be discussed below) do not find strong evidence that structural reform enhances growth. For instance,  estimate panel VARs for 20 OECD countries over the period  and report that labour and product markets deregulation involves potential short-run costs materialized by higher unemployment and lower output. 2 It is not clear whether labour and product market reforms are substitutes or complements. Following the theoretical work of Blanchard and Giavazzi (2003), who demonstrated a degree of substitutability between product and labour market regulations in a general equilibrium setting, several studies have investigated this relationship empirically with different outcomes (see Parlevliet et al., 2018). The relationship between both types of reform may also be different in the short and the long run. For instance,  find complementarity in the short run, but substitutability in the longer run. also affect economic growth. As pointed out by Brancaccio et al. (2018), several models suggest that a high degree of labour market rigidity causes wage stickiness which would be an impediment to the spontaneous balance of demand and supply of labour (Blanchard and Giavazzi, 2003). Removal of these rigidities would move the economy towards the production frontier and thereby promote economic growth. For example, in the Solow (1956) growth model, labour market rigidities can be argued to lead to a high capital labour ratio, which results in low levels of savings and capital accumulation and thus low growth. Furthermore, a rigid labour market implies that the economy produces below its potential and slows convergence of the economy toward its steady state (Alonso et al., 2004). Most empirical studies, however, do not provide strong support for the growth-enhancing effects of labour market reform. For instance, Campos and Nugent (2018), who use the change in their index of the rigidity of employment laws as proxy for labour market reform in a panel of more than 140 countries over the 1960-2004 period, report that changes in rigidity do not systematically affect economic growth. Likewise, Brancaccio et al. (2018), who consider the effects of a change in an employment protection index for a sample of 23 countries and 24 years, find no statistically significant positive impact of labour market reform on GDP growth.
We examine the impact of labour market and product market reforms on economic growth in 25 OECD countries. In addition, we test whether this impact is conditioned by the fiscal policy stance. We employ the local projections (LP) approach (Jordà, 2005). LP is a flexible alternative to vector autoregression models since it does not impose dynamic restrictions. Furthermore, it is better suited to estimating non-linear or state-dependent impacts, like, in our case, the stance of fiscal policy. In estimating our models, we follow Teulings and Zubanov (2014) and include the leads of the reform dummies. This approach alleviates the bias caused by overlapping forecast horizons. When calculating the forecast horizon, outcomes for observations prior to a treatment by construction overlap with the treatment ahead in time, but this is not registered in the data for the affected observation when using a standard LP setup.
The effects of structural reforms on growth are important to study on its own. However, an important related issue is whether reforms are better at delivering if implemented in combination with specific types of fiscal policy. Ignoring this conditionality, may explain why the results of previous studies on the growth-enhancing effect of structural reform differ. We therefore condition on fiscal policy. Bordon et al. (2018) report that the impact of product market reforms on employment when initiated with non-restrictive fiscal policy is positive and significant in the medium term. However, under a restrictive fiscal policy stance the impact of product market reforms on employment is negative and statistically significant five years after the reform has been launched. Furthermore, the likelihood that a structural reform occurs may depend on the presence of fiscal adjustments of expansions (Mirau et al., 2007). This is important, as we endogenize the occurrence of labour and product market reforms.
There are some papers closely related to our work. Bordon et al. (2018) investigate the impact of structural reforms on employment using OECD labour marker reform indicators and the local projection (LP) approach, while controlling for endogeneity. However, unlike Bordon et al. (2018), we examine the impact of reforms on economic growth. Most importantly, instead of using the OECD reform indicators, we use the reform indicators of . The most important advantage of this database is that it identifies the exact timing of major legislative and regulatory actions by advanced economies since the early 1970s in key labour and product market policy areas. Furthermore, it captures reforms in areas for which OECD indicators exist but do not cover all relevant policy dimensions .
Another paper that is strongly related to our work is Duval and Fuceri (2018) who also use local projections and the same database as the current paper. These authors examine the effects of labour and product market reforms on output, employment and productivity, and analyse how these vary with prevailing macroeconomic conditions and policies. Product market reforms are found to raise productivity and output, but gains materialise only slowly. The impact of labour market reforms is primarily on employment. The authors also find that the economy's response to reforms significantly improves when they are accompanied by fiscal or monetary stimulus. There are three main difference with our work. First, instead of using fiscal policy shocks identified as the forecast error of government expenditure, we identify the stance of fiscal policy based on the presence of fiscal adjustments and fiscal expansions following the approach suggested by Wiese et al. (2018). In our view, fiscal adjustments or expansions better capture the stance of fiscal policy; it builds upon the literature initiated by Alesina and Perotti (1995); see also Alesina et al. (2019). Second, we control for the endogeneity of reforms using the Augmented Inverse Probability Weighted (AIPW) estimator proposed by Jordà and Taylor (2015), following Bordon et al. (2018). Third, we include treatment leads to alleviate the bias from overlapping forecast horizons as proposed by Teulings and Zubanov (2014).
Our findings indicate that controlling for endogeneity of reforms and the stance of fiscal policy is crucial. Our results suggest that product market reforms mostly cause slight negative growth. Labour market reforms hurt growth under restrictive and neutral fiscal policy but are conducive to economic growth if introduced during periods of expansionary 3 fiscal policy.
The remainder of the paper is organized as follows. Section 2 discusses the data used. Section 3 outlines our methodology, while section 4 presents our main findings. Section 5 offers a robustness analysis, while section 6 concludes.

Structural reform
Most previous research on the impact of structural reform uses OECD regulation indicators (see, for instance, Bouis et al., 2012;Faccini, 2014;and Bordon et al., 2018). These range from 0 to 6 to capture the restrictiveness of regulation in labour and product markets. A reform is then identified as a fall of the index. In our research, we instead use the narrative reform database of . Drawing on , we may summarize the methodology used to construct this database as follows. In a first step, all legislative and regulatory actions related to product and labour market regulation mentioned in the OECD Economic Survey are identified for the 26 countries over the entire sample. Next, out of the more than 1000 actions, major reforms are identified based on three criteria: (1) the OECD Economic Survey uses strong normative language to define the action, suggestive of an important measure; (2) the policy action is mentioned repeatedly across different editions of the OECD Economic Survey for the country considered; (3) the OECD indicator of product and labour market regulation displays a very large change. Reform variables are two dummies (labour and product market reforms) equal to one if there is a major reform identified by . There are 248 product market reforms and 79 labour market reforms in our sample which runs from 1985 to 2013 (see Table 1).

Fiscal policy stance
Following the approach suggested by Wiese et al. (2018), we apply the Bai and Perron (B&P) (1998,2003) approach to the cyclically adjusted primary budget balance (CAPB) as share of GDP to identify fiscal adjustments and fiscal expansions. This approach is more objective than those used in the literature on fiscal adjustments so far, because it takes the variability of the budget balance within countries into account. Adjustments are generally defined in the literature as a discretionary (i.e. cyclically adjusted) and significant decline in the general government's budget balance. Significant in this case does not refer to statistical significance, but rather whether the change in the cyclically adjusted (primary) budget balance exceeds some (subjectively selected) threshold. So, these filters are based on a 'one-size-fits-all' principle and they therefore do not take into account that the budgetary processes in some countries may lead to a much more volatile budget balance than those in other countries. A filter that does not take volatility into account is prone to identify fiscal adjustments and expansions that are the result of the budgetary institutions in place (or other factors driving fiscal policy volatility), rather than deliberate attempts of politicians to improve the budget balance or to make fiscal policy expansionary. Our approach to identify the beginning of a period with tight or expansionary fiscal is based on the identification of statistically significant changes in the Data Generating Process of the CAPB. 4 Perron (1998, 2003) developed a general method for this purpose. Consider a model with m possible structural breaks: Where yt is the dependent variable, in our case, the cyclically adjusted primary budget balance in each individual country separately, δ j is a vector of estimated constants, i.e., the mean of the m + 1 different segments of the time series y t , and u t is the error term. The B&P filter generates the segmented route through the series that yields the lowest Sum of Squared Residuals (SSR) up to a maximum number of breaks. The maximum number of breaks is restricted by the trimming parameter h, which specifies a minimum number of observations that have to occur between consecutive breaks. We have set h=0.15 which means that 15% of the within country time series has to pass between breaks. 5 The process underlying the algorithm is straightforward. First, it searches for all possible sets of breaks up to a maximum restricted by the trimming parameter h, and determines the set that minimizes the SSR for each number of breaks. Then, F-tests determine whether the improved fit produced by allowing additional breaks is sufficiently large compared to what can be expected randomly on the basis of the asymptotic distribution derived in Bai and Perron (1998). We used the test procedure recommended by Bai and Perron (2003) to select the optimal number and timing of breaks. That is, dependent on properties of the individual time series, we chose the appropriate filter specification and test. Generally, though, the error distribution is allowed to differ across segments. 6 Autocorrelation and potential heteroskedasticity is modelled non-parametrically by running the filter using a heteroskedasticity and autocorrelation-consistent estimate of the variance-covariance matrix. The B&P method identifies the break date (fiscal adjustment or fiscal expansion initiation) as the first year after an upward or downward structural break in CAPB. Therefore, we take a one-year lag to identify the start of the fiscal adjustment or expansion. This method will identify the beginning but not the end of periods with tight or expansionary fiscal policy. In line with Wiese et al. (2018), we define the periods such that tight fiscal policy after an upward structural break continues as long as the change in the CAPB is positive, and expansionary fiscal policy after a downward structural break continues as long as the change in the CAPB is negative. Observation that are not classified as either expansionary or tight fiscal policy are labelled as observations with neutral fiscal policy.
Data for the cyclically adjusted budget balance (CAPB) come from the OECD and begin in 1985 for some countries, but later for several other. 7 Due to the limited availability of CAPB data we lose observations when we partition the data on fiscal policy stance (see Figure 1).
As Figure 1 shows, product market reforms seem unrelated to fiscal policy. However, labour market reforms happen more frequently during periods of tight fiscal policy than during periods with expansionary fiscal policy, but this is also caused by the fact the we observe fewer fiscal expansions than fiscal adjustments.

Figure 1. Fiscal policy stance and market reforms
Note: The figure displays the distribution of major product and/or labour market reforms during loose (expansionary), neutral and tight (restrictive) fiscal policy. In total there are 140 observations with tight fiscal policy, 112 with loose fiscal policy and 545 with neutral fiscal policy. There are 24 observations where a product and labour market reform is introduced simultaneously.

Dependent and other variables
Most other data come from the Penn World Tables (PWT) version 9.0 (Feenstra et al., 2015). The dependent variable is cumulative real GDP growth per capita projected stepwise forward in time, so 0 to 1, 0 to 2 etc. until 0 to 5 years. The cumulative growth rates are calculated based on real growth rates (log differences of real GDP in PPP 2011 US$, divided by populations size, both from PWT 9.0).
The political variables considered in the AIPW estimator are own updates of variables used in Wiese et al. (2018). They are explained in more detail in section 3. Table  A2 in Appendix 2 provides a description of all variables used and their sources.

Estimation methods
The basic regression model that we estimate is: log %,'() − log %,' = % + 0 2,%,' + 3 ∆ %,'89 + : Where h=1…,5 is the forecast horizon, and log %,'() − log %,' denotes the cumulative growth rates of real GDP over the forecast horizon. % denotes country fixed-effects and %,2,' are the reform dummies (where r denotes the product and labour markets). Notice that both reform indicators are always included simultaneously in the regressions. In all the OLS LP regressions (and in the AIPW LP outcome regression, see below) we include an AR(4) term for the growth rate between t-1 and t. 8 %' is a vector of additional control variables. %' contains the output gap calculated using the Hodrick-Prescott filter using high smoothing (l=100) as recommended in Jordà and Taylor (2015). Our results are robust to using OECD data on the output gap which is calculated using a production function approach; the correlation coefficient between the OECD output gap and our own is 0.845. %' also includes the change in physical capital (gross investments relative to GDP) and the percentage change from year to year in the human capital index from PWT 9.0.
Treatment leads and lags are also included as suggested by Teulings and Zubanov (2014). The leads are included to avoid the bias that results from overlapping forecast horizons. 9 The bias is in that sense deterministic as it is a consequence of calculating the projections themselves, under the hypothesis that reforms do have an effect on economic growth. We also include treatment lags in our models. But contrary to the leads, it is an empirical issue how long the effect of reforms persists in the data. Again, we use Akaike's information criterion to determine the lag length which consistently tells us to use 5 lags of the treatment variable.
The major drawback of equation (2) is that it ignores that structural reforms may be introduced in countries/years where the expected benefits of reform are higher than in countries/years were no reforms are introduced. In other words, failing to account for this can lead to selection bias. Therefore, we proceed with a quasi-experimental method, namely the Augmented Inverse Probability Weighted estimator proposed by Jordà and Taylor (2015). In the first step, we estimate logit models to estimate the likelihood that a structural reform occurs. This latent variable framework captures the idea that reforms are introduced in periods where the expected benefits of reforms are large. In the second step, we use local projections specified as equation 2, but weighing observations inversely according to the predicted probabilities from the logit model. Specifically, observations in which a reform took place are assigned a weight by the inverse of the probability score, whereas the observations where no reform took place receive a weight of the inverse of one minus the probability score. This means that treated observations with a low probability score receive a higher weight in the regression along with control observations with a high probability score. This places more weight on observations that are comparable and hence reduces selection bias. The augmented 8 Our choice of information criteria is based on the fact that the Akaike criterion is least likely to choose an autoregressive process of too low order; see Lutkepohl and Kratzig (2004). 9 The bias increases with the forecast horizon, see Teulings and Zubanov (2014). The leads of the treatment dummies ensure that it is registered in the data if the outcome for a specific observation is affected by a treatment ahead in time. This most often is the case for control observations, i.e. country-year pairs where no reform took place. However, in the IMF narrative reform data, reforms at times occur repeatedly within our forecast horizon of 5 years. In that case the Teulings and Zubanov (2014) approach also registers that the outcome of a treated observation may be affected by later treatments, which otherwise would have meant an upward bias in the effect of reforms. This is especially the case for product market reforms which happens frequently in the data (see Table 1).
weighting adds an adjustment factor to the treatment effect when the estimated probability scores are close to zero or one. The method is said to be double robust as it only requires one of the following two conditions to hold: The conditional mean model is correctly specified or the probability score model is correctly specified. Weighting can be interpreted as removing the correlation between the covariates and the reform indicator, and regression removes the direct effect of the covariates (see Imbens and Wooldridge, 2009 for more details). We report the Average Treatment Effect (ATE), which is calculated as the average difference between treated and non-treated (control) observations based on the weighted OLS regression line for both groups. Table 2 shows the LP estimates of our basic model (eq. 2). So, here we do not control for endogeneity and do not condition on the stance of fiscal policy. Figure 2 shows the corresponding impulse response functions for product and labour market reforms, based on the estimates shown in Table 2.  Table 2. The dark grey shaded areas display the 90% error bands, the light grey shaded areas display the 95% error bands. Year t=1 is the first year after a reform took place at t=0.

Local Projection results
It is quite remarkable that in this very simple setup, product and labour market reforms do not affect output growth. The estimated coefficient on the output gap coefficient is negative (as expected). This means that on the upturn of the business cycle growth in the future is predicted to be significantly negative and vice versa. It controls for a reversion to the mean effect. 25 Notes: The table shows the local projection estimates of labour and product market reforms on cumulative economic growth, unconditional of the fiscal policy stance. The models are based on equation 2, and include treatment leads equal to the forecast horizon, five lags of the treatment variable, and country fixed-effects. The number of treatment lags was determined by Akaike's information criterion. Standard errors clustered at the country level are shown in parentheses: *** p<0.01, ** p<0.05, * p<0.1.
The sum of the coefficients on the lagged dependent variable in column (1) is larger than one, which may imply a non-stationary growth process. However, panel stationarity tests reject non-stationarity (results available on request). In columns (2)-(5) of Table 2, the sum is much larger than one, but that is not a surprise since we estimate cumulative growth rates. The in-significant physical and human capital elasticities are perhaps not such a surprise in a sample of OECD countries; see Mankiw et al. (1992) for a similar finding.
Next, we included the stance of fiscal policy. Table 3 presents the estimation results and Figure 3 shows the corresponding impulse response functions. Surprisingly, the main takeaway from Table 3 is that product and labour market reforms only have an effect during periods of tight fiscal policy. Specifically, for product market reforms the accumulative effect on growth after 5 years is almost 2% of additional GDP. For labour market reforms the effect is significant and positive, after 5 years 3.7% of additional GDP is estimated.  Table 3 with the solid black lines. The impulse responses are partitioned on fiscal policy stance, the panels on the left display the impulse responses under tight fiscal policy, the panels in the middle under neutral fiscal policy and the panels on the right under loose fiscal policy. The dark grey shaded areas display the 90% error bands, the light grey shaded areas display the 95% error bands. Year t=1 is the first year after a reform took place at t=0. Yes Notes: The table shows the local projection estimates of labour and product market reforms on cumulative economic growth, conditional of fiscal policy stance. The models are based on equation (2), but also include treatment leads equal to the forecast horizon, five lags of the treatment variable, and country fixed-effects. The number of treatment lags was determined by Akaike's information criterion. Standard errors clustered at the country level are shown in parentheses: *** p<0.01, ** p<0.05, * p<0.1.

Quasi-experimental results
In an ideal RCT (Randomized Controlled Trial) setting where treatments are assigned randomly, we would expect the probability density function for each control variable included in equation (2) to be the same for each sub-population of treated and control units.
The overlap of the densities should be close to perfect. For example, the distribution of the deviation between actual GDP and potential GDP (the output gap) should be similar for the subpopulation where a major product market reform takes place and the subpopulation of all other (control) observations. A simple way to check whether this condition holds is to do a test of equality of means between the subsamples. This is done in Table 4 below. As evident, especially the balance of the output gap between treated and control observations is a cause of concern. This is an indication that we cannot assume that treatments are assigned randomly as is done in the simple LP analysis above. Specifically, the balance test in Table 4 indicates that the output gap on average is negative (implying that the economy is running below it potential) for treated observations compared to control observations. This suggests that labour and product market reforms cannot be viewed as exogenous events.
Notice that this balance condition is also behind the implicit assumption that we can estimate the simple LPs presented above by restricting the coefficients of the controls in equation 2 to be the same for the treatment and the control groups. The AIPW estimates below relaxes this assumption, as a regression is specified separately for both the treatment group and the control group (Imbens and Wooldride, 2009;Jordà and Taylor, 2015). The difference in the predicted outcomes of log %,'() − log %,' between each regression for the treatment and the control group then serves to calculate the (weighted) ATEs. Observations 691 Note: Standard errors of a two-sided t-test are reported in parentheses: *** p<0.01, ** p<0.05, * p<0.1 When policy interventions like labour and product market reforms are driven by endogenous responses to control variables (as shown in Table 4), the observed treatment and control units can be viewed as being oversampled from the part of the distribution in which the propensity score of treatment reaches high values. The simple LP projections presented above are based on the sampled distribution and will therefore be biased. Too much weight is given to treated observations with a high probability of treatment and too little weight is given to control observations with a high probability of treatment. Inverse weighting using propensity scores shift the probability mass away from the oversampled region of the distribution towards the under-sampled region. This shift rebalances the sample such that we can view the re-weighted sample as reconstructing the true distribution of outcomes under treated and control observations. In other words, we can view the rebalancing as if we had observed a random sample for each group, unaffected by endogenous responses to control variables. Thus, the regression for both the control group and the treatment group are less susceptible to bias and their difference can be used to calculate an unbiased estimated of the ATE of reforms on economic growth (see Imbens andWooldridge, 2009 andJordà andTaylor, 2015 for more details). 10 To estimate the propensity scores, ideally any predictor of treatment should be included regardless of whether that variable is included in the model specified in eq. (2). Therefore, we follow Jordà and Taylor (2015) and estimate a saturated propensity score model. As predictors of the likelihood of structural reform ("the treatment in t+1"), we use the following variables. First, the output gap and the lagged growth rates as these variables capture the idea that reforms are more likely to occur after times of economic crisis (Drazen and Grilli, 1993). In line with this argument, we also include the unemployment rate and the inflation rate, as these variables may also signify difficult economic times. We also add the fiscal policy stance to take into account that reforms may be more likely during different types of fiscal policy regimes. We add political variables to account for the fact that reforms are more viable under certain political circumstances. Specifically, we add: (1) A variable counting the number of years a government has held office to capture the idea that reforms become less likely the longer a government holds office. (2) An election variable reflecting that an executive or legislative election took place to capture the idea that reforms typically are more likely after a new government takes office (Haggard and Webb, 1993). (3) A variable measuring government ideology to capture the idea that the political colour of a government matters in terms of the policies it implements (Hibbs, 1977). (4) A variable measuring political fractionalisation to capture the idea that more politically fragmented governments may find it harder to implement economic reforms (Alesina and Drazen, 1991). We also control for the possibility that labour and product market reforms may be related (Fiori et al., 2012) by including labour market reforms as predictor of product market reforms in t+1, and vice versa for labour market reforms. In the logit model for labour market reforms, we also add institutional variables capturing the strictness of hiring and firing conditions for workers on temporary or regular contracts. This takes a level effect into account, as countries with very flexible hiring and firing conditions are typically less likely to reform the labour market (Turrini et al., 2015). We also include variables for duration dependence (Carter and Signorino, 2010). Specifically, we add a variable that counts the number of years since the last reforms, plus its squared and cubed term. F-tests show that the duration dependence variables are jointly significant and therefore should be in the model. Table A1 in Appendix 2 provides a description of the variables used.
An important thing to note here is that although relatively few variables are highly significant, the model has a high predictive ability: the 'area under the ROC curve' is above or close to 0.8 in all the reported logit models. 11 In all specifications shown in Table 5, the area under the ROC curve is statistically significantly different from 0.5. 12 Figures 4-6 provide smooth kernel density estimates of the distribution of the propensity scores for treatment and control units to check for overlap. The plotted densities are based on models 1-3 in Table 5, respectively. In the ideal RCT setting, the overlap between the distribution of propensity scores for treated and control units would be near identical. Although the logit models used to estimate the propensity scores all have high predictive ability, Figures 4-6 make clear that we have considerable overlap between the distributions for treated and control units. This indicates that we have a satisfactory logit model that can be used to identify the ATEs properly using our quasi-experimental estimation strategy. Figures 5 and 6 also make clear that some treated units have a propensity score that is very close to 0. In practice, this means that these observations get very high weights when weighing inversely with the propensity score. Although the AIPW estimator adds an adjustment factor to the treatment effect when the estimated probability scores are close to 0 for treated observations (and close to 1 for control observations) this is not enough to stabilise the estimator in our setting. Therefore, we truncate the propensity scores for labour market reforms and joint reforms (see the notes to Figures 5 and 6), following Imbens (2004) and Cole and Hernan (2008).

Figure 4. Overlap of propensity scores for product market reforms
11 ROC stands for Receiver Operating Characteristics. It is also referred to as the Correct Classification Frontier. If the model had no predictive ability, the area under the ROC curve would be 0.5. A perfect classification ability would correspond to an area under the ROC curve equal to 1. The area under the ROC curve has an approximate normal distribution in large samples. 12 In line with Jordà and Taylor (2015), we include country-dummies in the estimations. If we estimate the models in Table  5 without fixed-effects the predictive ability declines, but the area under the ROC curve is still statistically different from 0.5. We also estimated the model without country fixed effects, but the model with country fixed effects turned out to be superior in predicting treatment. So, we proceed with the FE specification for the propensity scores in the AIPW estimates regardless of the incidental parameter problem in the logit model.

Figure 5. Overlap of propensity scores for labour market reforms
Note: In the AIPW estimates of labour market reforms below we truncate propensity scores at 0.1 for p-scores lower than 0.1 due to many observations with a very low p-score. Otherwise the estimator becomes unstable (1 divided with a very small number will give a very large weight to treated observation with a low predicted p-score). To keep symmetry, we also truncate at high propensity scores, so above 0.9, but this has no consequences as can be seen in Figure 4.

Figure 6. Overlap of propensity scores for the interaction of labour and product market reforms
Note: In the AIPW estimates of the interaction of labour and product market reforms bellow we truncate propensity scores at 0.05 for p-scores lower than 0.05 due to observations with a very low p-score. Otherwise the estimator becomes unstable (1 divided with a very small number will give a very large weight to treated observation with a low predicted p-score). To keep symmetry, we also truncate at high propensity scores, so above 0.95, but this has no consequences as can be seen in Figure 5. Note: The table reports point estimates of a logit specification to predict the probability of treatment in t+1. In model 3 treatment is defined as observations in which both a product market reform and a labour market reform occurred simultaneously, there are 24 treatments in that case. As a consequence, we can only use the 13 countries in which reforms occurred at least once simultaneously to estimate the model, due to the inclusion of fixed effects. Standard errors clustered at the country level are shown in parentheses: *** p<0.01, ** p<0.05, * p<0.1.
Tables 6 and 7 present the results using the quasi-experimental doubly-robust Augmented Inverse Probability Weighted (AIPW) estimator proposed by Jordà and Taylor (2015). Table 6 shows the estimation results if we do not condition on fiscal policy, while Figure 7 offers the corresponding impulse response functions. If we do not condition on the fiscal policy stance at the time of the reform, the effects of reform on GDP growth are small. Only labour market reforms affect economic growth: after 3 years, economic growth has declined by 0.3%.
However, if we split the sample and estimate ATEs for each type of reform during different types of fiscal stances a more fine-grained pattern emerges (Table 7 and Figure 8). Product market reforms mostly cause slight negative growth, except for a very small positive effect during periods of neutral fiscal policy. Labour market reforms hurt growth if introduced during periods of tight fiscal and neutral fiscal policy, but they are conducive to economic growth if introduced during periods of loose fiscal policy. Yes Notes: The table shows the ATE responses of the AIPW local projection estimates of labour and product market reforms on cumulative economic growth, unconditional of fiscal policy stance. Compared to Table 2, the number of observations drop due to unavailability of some of the reform predictor variables. The models are based on equation 2, but also include treatment leads equal to the forecast horizon, five lags of the treatment variable, and country fixed-effects. The number of treatment lags was determined by Akaike's information criterion. Standard errors clustered at the country level are shown in parentheses: *** p<0.01, ** p<0.05, * p<0.1.

Figure 7. ATE responses of AIPW local projection estimates of the effect of product and labour market reforms on cumulative economic growth in t+1-5
Notes: The figure plots the ATE responses of product (left panel) and labour market (right panel) reforms on cumulative economic growth from Table 6 with the solid black lines. The dark grey shaded areas display the 90% error bands, the light grey shaded areas display the 95% error bands. Year t=1 is the first year after a reform took place at t=0. Yes Notes: The table shows the ATE responses of the AIPW estimates of labour and product market reforms on cumulative economic growth, conditional of fiscal policy stance. The models are based on equation 2, but also include treatment leads equal to the forecast horizon, five lags of the treatment variable, and country fixedeffects. The number of treatment lags was determined by Akaike's information criterion. Standard errors clustered at the country level are shown in parentheses: *** p<0.01, ** p<0.05, * p<0.1.

Figure 8. ATE responses of AIPW local projection estimates of the effect of product and labour market reforms on cumulative economic growth in t+1-5, partitioned by fiscal policy stance
Notes: The figure plots the ATE responses of product market (panel above) and labour market (panel below) reforms on cumulative economic growth from Table 7 with the solid black lines. The impulse responses are partitioned by fiscal policy stance, the left panels display the ATE responses under tight fiscal policy, the middle panels under neutral fiscal policy and the left panels under loose fiscal policy. The dark grey shaded areas display the 90% error bands, the light grey shaded areas display the 95% error bands. Year t=1 is the first year after a reform took place at t=0.
As shown, our findings change drastically when we control for the fact that the assignment of treatment is non-random. Unconditional of fiscal policy, product market reforms have no statistically significant effect on economic growth, while the unconditional effect of labour market reforms is significantly negative throughout the evaluation period.
When conditioning on fiscal policy a more fine-grained pattern emerges. Contrary to the simple LP results where treatment selection is ignored, we now find mostly negative effects of both product and labour market reforms during fiscal adjustments. However, labour market reforms have a positive effect on growth if implemented during a fiscal expansion, while their effect on growth is again negative if implemented when fiscal policy is neutral. Product market reforms have little to no effect if fiscal policy is neutral or loose.

Robustness analysis
As a first robustness test, we analyse the joint effect labour and product market reforms. In practice, that amounts to analysing whether reforms work better or worse when implemented as broad reform packages, i.e. simultaneous reforms in both the product and labour market.
Unfortunately, we only have 25 observations in which major reforms occur in both the product and labour market simultaneously. Therefore, it is not possible to conduct this analysis while conditioning on fiscal policy. There are simply too few treated observations in each cell for the types of fiscal policy.
The results reported in Table 8 and Figure 9 suggest that the initial effect of joint labour and product market reforms is negative in the short term but in the medium term the effect becomes positive. This conclusion follows from the fact that in the short run the effect of economy wide reforms is negative and just falls short of 10% significance after one year. After 2 years the effect becomes positive and after five years GDP have grown by 3%. The effect after 5 years is almost significant at the 5% level. The table shows the ATE responses of the AIPW estimates of labour and product market reforms occurring simultaneously on cumulative economic growth, unconditional of fiscal policy stance. The models are based on equation 2, but also include treatment leads equal to the forecast horizon, five lags of the treatment variable, country fixed-effects, and the indicators of labour and product market reforms. The number of treatment lags was determined by Akaike's information criterion. Standard errors clustered at the country level are shown in parentheses: *** p<0.01, ** p<0.05, * p<0.1.  Table 8 with the solid black lines. The dark grey shaded areas display the 90% error bands, the light grey shaded areas display the 95% error bands. Year t=1 is the first year after a reform took place at t=0. Additionally, we check if our main AIPW findings are sensitive to the way we identify the fiscal policy stance. As an alternative to the method based on structural break tests, we apply threshold criteria as usual in the literature on fiscal adjustments (cf. Alesina and Perotti, 1995;see Wiese et al., 2018 for an overview of thresholds used in the literature). Specifically, we define the start of a fiscal adjustment as a positive change in the CAPB larger than 1.5 percent of GDP; the adjustment continues as long as the change in CAPB is positive. A negative change in CAPB smaller than -1.5 percent of GDP indicates the start of a fiscal expansion, which continues as long as the change in the CAPB is negative. That way, we identify 152 periods with fiscal adjustments, 128 periods with fiscal expansions and 517 periods with neutral fiscal policy; see Figure A3 in the online Appendix 1 for the distribution of reforms over these types of fiscal policy stances. Yes Notes: The table shows the ATE responses of the AIPW estimates of labour and product market reforms on cumulative economic growth, conditional on the fiscal policy stance determined using a threshold approach. The models are based on equation 2, but also include treatment leads equal to the forecast horizon, five lags of the treatment variable, and country fixed-effects. The number of treatment lags was determined by Akaike's information criterion. Standard errors clustered at the country level are shown in parentheses: *** p<0.01, ** p<0.05, * p<0.1.

Figure 10. ATE responses of AIPW local projection estimates of the effect of product and labour market reforms on cumulative economic growth in t+1-5, partitioned by alternative fiscal policy stance
Notes: The figure plots the ATE responses of product market (panel above) and labour market (panel below) reforms on cumulative economic growth from Table 9 with the solid black lines. The impulse responses are partitioned by the fiscal policy stance determined using a threshold approach. The left panels display the ATE responses under tight fiscal policy, the middle panels under neutral fiscal policy and the left panels under loose fiscal policy. The dark grey shaded areas display the 90% error bands, the light grey shaded areas display the 95% error bands. Year t=1 is the first year after a reform took place at t=0. Figure 10 suggests that our conclusion that labour market reforms enhance economic growth under loose fiscal policy also holds under the alternative definitions of fiscal adjustments and expansions. Under tight and neutral fiscal policy, labour market reforms have a negative effect on growth; under tight fiscal policy this negative effect is significant only in the first years after the reform, while under neutral fiscal policy it becomes significant after some years. In line with our previous findings, product market reforms generally have a negative or non-significant effect on economic growth.
Finally, a cause of concern about our estimates may be the Nickell (1981) bias. Specifically, we estimate a dynamic panel model with fixed-effects. As Nickell (1981) shows, the demeaning process creates a correlation between regressor and error which creates a bias in the estimated coefficient of the lagged dependent variable. If the independent variables of interest are correlated with the lagged dependent variable their coefficients may be biased as well. This is particular a problem in a large N, small T context. We have small N and relatively large T. The bias can be gauged in the following way.
If the AR(1) coefficient 3 on ∆ %,' is positive (as in our case), the bias is invariably negative, so that the persistence of the 3 coefficient on ∆ %,' will be underestimated. For reasonably large values of T, the limit of 3 on ∆ %,' as N → ∞ will be approximately −(1 + 3 )/(T − 1). In our case 3 = 0.67, the bias will be about -0.062, or less than 1/10 of the true value. This is even assuming that N tends to infinity, which is far from the case in our application. Furthermore, the correlation between the labour and product market indicators and ∆ %,' is low and negative. The correlation coefficient for product (labour) market reforms and lagged GDP growth is -0.04 (-0.08). Because of this negative correlation, the Nickell bias also leads to an underestimation of the impulse responses of reforms on growth. This, in combination with the relative low size of the biased AR(1) term and the large T relative to N leads us to conclude that the Nickell bias in our case is negligible. 13

Conclusion
Our findings indicate that controlling for endogeneity of reforms and the stance of fiscal policy is crucial. Our results suggest that product market reforms mostly cause slight negative growth. Labour market reforms hurt growth under tight and neutral fiscal policy but are conducive to economic growth if introduced during periods of expansionary fiscal policy. Additionally, we show that product and labour market reforms are substitutes in the short run, but complements in the medium run.
One important topic for future research is to analyze the election effects of reforms. Recently, Alesina et al. (2020) found that liberalizing reforms are costly to incumbents when implemented close to elections. They also find that the electoral effects depend on the state of the economy at the time of reform: reforms are penalized during contractions; liberalizing reforms undertaken in expansions are often rewarded. Our results suggest that in analysing the electoral consequences of reform, it is important to distinguish between labour and product market reforms, as they may affect economic growth differently, and to take the fiscal policy stance into account as well, since expansionary fiscal policy may alleviate the negative short-run growth effects of reform. 13 GGM estimation is not suited in cases of large T and small N. Rather a method based on recursive substitutions could be used. But as noted in Teulings and Zubanov (2014), a disadvantage of such an approach is a sizeable efficiency loss.