How does the age structure of worker flows affect firm performance?

We develop a method for decomposing firm performance to impacts coming from the inflows and outflows of workers and apply it to study whether older workers are costly to firms. Our estimation equations are derived from a variant of the decomposition methods frequently used for measuring micro-level sources of industry productivity growth. By using comprehensive linked employer–employee data, we study the productivity and wage effects, and hence the profitability effects, of the hiring and separation of younger and older workers. The evidence shows that the separations of older workers are profitable to firms, especially in the manufacturing ICT-industries. To account for the correlation of the worker flows and productivity shocks we first estimate the shocks from a production function using materials as a proxy variable. In the second step the estimated shock is used as a control variable in our productivity, wage, and profitability equations.


Introduction
The increasing average age of the work force poses difficult challenges both to firms and the whole society. As a result of changes in the population age structure, an increasing share of firms' employees is in higher age groups. Pressures on the sustainability of pension systems have led governments to find ways of getting people to lengthen their working lives. These measures include reducing incentives of using early retirement channels or increases in the mandatory retirement age, but also extensions of the subjective right to continue working. It has also been argued that a greater awareness of the actual (low) level of pension in defined contribution pension plans will force many older employees to work longer. These developments will further increase the average age of the work force. The firms' incentives may not, however, align with the increasing supply. In contrast, firms are often reluctant to hire or even retain older employees and may prefer early retirement. Whether having older workers is profitable for the firms depends on the development of productivity and wage with age.
Ageing is most likely to affect performance in jobs that require physical strength. However, the development of cognitive abilities with age is not quite straightforward. Different kinds of abilities develop differently by age and the ''productivity potential'' of workers depends also on the development of the demand for different abilities (e.g. Skirbekk 2008, Ilmakunnas et al. 2010. It is likely that the consequences of ageing therefore differ between industries. Employees and employers tend to view these issues quite differently (e.g. van Dalen et al. 2010). The advocates of older employees emphasize that lower turnover and higher experience may compensate age-related losses in working capacity and advice firms to hold onto their 50? employees (e.g. Towers Perrin 2005). On the other hand, firms often think that there is an unfavorable productivitycompensation deficit (Munnell and Sass 2008). Declining productivity with age may be a real phenomenon or a stereotype, but it is easier to observe age-related increases in labor costs, which are related to health care costs and seniority-based pay. The discrepancy of productivity and wage has been explained by the existence of deferred payments (Lazear 1979). According to this argument, lower pay of the employees in their early career is repaid by the firm in the form of wage that exceeds productivity in the later career.
Economists have tried to measure age-related changes in productivity and labor costs. Individual-level productivity measures are available only in very special cases. On the other hand, the relationship between age and performance at the individual level may differ from the relationship of average age and labor productivity at the firm level, as the work environment and interaction between workers also matter. Therefore linked employer-employee data sets have been used for analyzing the impact of work force characteristics, like average age or shares of employees in different age groups, on plant-or firm-level productivity and wage (e.g. Hellerstein et al. 1999;Aubert and Crépon 2003;Ilmakunnas et al. 2004;Daveri and Maliranta 2007). This line of work is continuing, with emphasis on differences between firms (see e.g. Cataldi et al. 2011, on ICT and non-ICT firms, Zwick 2012, andMahlberg et al. 2013a, on sector differences, Göbel and Zwick 2013, on firms with different personnel measures, and Vandenberghe et al. 2013, on firms with different levels of training), differences between different types of employees (see e.g. Vandenberghe 2013, on gender differences), regional differences (e.g. Mahlberg et al. 2013b), and on estimation methods like GMM (e.g. van Ours and Stoeldraijer 2011, Göbel and Zwick 2013 or proxy variable methods (e.g. Vandenberghe 2013).
A drawback of this line of studies is that they do not pay much attention to how the structure of the work force is determined. 1 The purpose of this paper is to extend the analysis to directly examining how the age structure of the work force changes through the inflow and outflow of labor input and how the flows subsequently influence firm profitability. We disaggregate the labor flows to and from firms by age to three groups, ''young'' (30 or less), ''middleaged'' (31 to 50), and ''old'' (over 50). (We emphasize that the labels ''young'' and ''old'' are used just for illustrative purposes and refer to relative age.) We decompose firmlevel labor productivity change into the effects of the hiring and separation rates of the age groups and into the effect of productivity growth of those workers in different age groups who are staying in the firm. A similar decomposition can be made for firm wage growth. Combining the two decompositions, we also obtain an equation for firm profitability change, which is the main interest of this paper. These decompositions follow Maliranta et al. (2009), who applied them to the flows of R&D workers.
The decomposition bears a resemblance to the kind of decompositions used frequently to decompose industrylevel productivity change to the impacts of entry and exit of firms, and productivity growth in continuing firms (e.g. Foster et al. 2001;Balk 2016). A difference to these productivity decompositions is, however, that in our case the productivities are not observed, but are estimated. Our decomposition leads to a simple estimation equation, where the parameters have the interpretation of relative productivity levels of the different employee groups. To be able to perform the analysis we need detailed and comprehensive linked employer-employee data. We use the Finnish Longitudinal Employer-Employee Data (FLEED) data set of Statistics Finland that covers basically all firms in the country and all of their employees. The decompositions relate to performance change in the intervals 1995-1997, 1997-1999, 1999-2001, 2001-2003, and 2003-2005. To account for the endogeneity of the flows, we first estimate a production function using the proxy variable method suggested by Olley and Pakes (1996) and Levinsohn and Petrin (2003); see also Ackerberg et al. (2007). From this estimation we obtain a measure of the productivity shocks, which is then used as a control variable in the estimated decompositions.
Our results show that there is a positive relationship between separations of older workers and firms' profitability, which is economically and statistically significant. This holds especially in the manufacturing ICT industries. The positive connection is mainly due to a lower relative productivity level of the separated older workers. Separation of younger worker groups is, on the other hand, markedly less profitable, mainly because of their higher relative productivity. As for the hiring side, our results show that the (recently) hired older workers are positively related to profitability. This is not surprising because hiring of older workers is quite rare and thus their exceptionality is expected. Hiring of younger workers is (initially) unprofitable which is consistent with the view that accumulation of firm-specific knowledge is needed for better productivity (and profitability). Indeed, our results also show that the productivity growth of the staying younger workers exceed that of the mid-aged and even more that of the older workers. On the other hand, these productivity changes correspond to wage growth. However, we have to be careful in interpreting these associations as causal.
The structure of the paper is as follows. In Sect. 2 we discuss related literature on turnover and firm performance. In Sect. 3 we describe the decomposition of the growth in productivity, wage, and profitability to the impacts of the labor flows. In Sect. 4 we describe the data set and present the estimation results. Section 5 concludes the paper with some suggestions for further research.

Worker turnover and firm performance
Our analysis is related to other research on labor turnover and firm performance. The traditional view both in labor economics and management literature is to emphasize the negative aspects of turnover. In the management literature (e.g. Dalton et al. 1982), separation is called dysfunctional, when those high-productivity workers whom the organization would like to keep, are leaving. 2 This involves adjustment costs in the form of rehiring and training, but also less directly in the form of disruption of informal communication structures. However, labor turnover can also be functional, i.e. in the interest of the organization. This can happen e.g. when low productivity workers quit or their separation from the firm is initiated by the employer. Replacing the leavers by new workers also brings new ideas and knowledge to the firm. 3 This reasoning has generated some work where an optimal turnover rate is investigated, often using quits as the measure of turnover. There are only a few studies that have examined the separate effects of total hiring and total separation on performance (Bingley and Westergaard-Nielsen 2004, Siebert and Zubanov 2009), but they do not consider subgroups of the labor flows.
It is important to consider both hiring and separation flows and a detailed decomposition of them. If all employees were perfect substitutes, simultaneous separation and hiring would just cause costs without having a positive impact on productivity. The only necessary turnover would be such that is needed for expanding or reducing the total size of the labor input. However, if there is a connection e.g. between the age structure of the work force and performance, it is the inflow and outflow of different types of employees that the firms should control to optimize the work force structure. The optimal age mix of employees is based on the relative productivities and wages of the age groups, but the choice is constrained by legal limits on layoffs, availability of different types of employees (i.e., local labor supply), and differences in the quit propensities of different employee types. Therefore, the firms may not always be at the optimum and will adjust towards it with the inflow and outflow of different age groups.
There is recent empirical research using large-scale employer-employee data sets where the inflow of new employees is explicitly seen as means of knowledge transfer and attention is paid to the decomposition of the flow. The impact of several different types of knowledge flows on productivity has been investigated: Boschma et al. (2009) decompose hiring according to experience from similar, related, or unrelated industries, Balsvik (2011) by experience in multinational enterprises, Stoyanov and Zubanov (2012) by the productivity level of the firms where the hired workers are coming from, and Parrotta and Pozzoli (2012) according to the type of knowledge carriers (highly educated or technically educated employees with above average earnings).
While the studies discussed above have decomposed the hiring flow in many different ways, only a few studies have considered also the relationship of worker outflow to productivity. Kronenberg and Carree (2010) explain productivity growth by the qualities of hired and exiting workers. The quality variables include average age of the hired and exited, average productivity of previous employer, and shares of employees hired from the same industry or exiting to the same industry. Kaiser et al. (2015) estimate a patent production function, where the explanatory variables include the shares of hired workers from patenting and non-patenting firms and exited workers disaggregated in a similar way. Maliranta et al. (2009) decompose both the hiring and separation rates by age (young/old), education (low/high), tenure (short/long), previous job (R&D/other), and current job (R&D/other) and include these rates as explanatory variables in models for productivity and profitability growth. The staying workers are included as the shares of the corresponding subgroups. Our approach is similar to that in Maliranta et al. (2009), but we concentrate on the age effects. 4

Decomposition of firm performance
Assume a production function with M worker types (age groups) L tj , j = 1, …, M, at time t as the inputs: 2 One dictionary definition of the adjective dysfunctional is ''characterized by a breakdown of normal or beneficial relationships between members of the group'' (http://www.collinsdictionary.com). 3 The positive influences of turnover have been emphasized more formally in models where the search and matching process allocates workers to their best uses in firms (e.g. Jovanovic 1979). Worker flows and the matching process may be particularly important for productivity when technological change is rapid (see e.g. Aghion and Howitt 1996). 4 Vandenberghe (2010) has adopted our decomposition approach.
Take a first-order Taylor approximation of the production function about X 0 ¼ ðL 0 1 ; . . .; L 0 M Þ: where R is the remainder term of the approximation. Dividing by L t ¼ P j L tj and denoting the term in the brackets A we obtain where g j ¼ f 0 j X 0 , i.e. the marginal product of worker type j. Assuming constant returns, we can replace the marginal productivities g j by labor productivities Y tj /L tj , where Y tj is the output accounted for by worker type j. For example in a Cobb-Douglas function the marginal products are g j =a j Y t /L tj : Y tj /L tj , where a j is the power of labor input L tj in the production function. 5 We consider the development of productivity between points of time 0 and 1 (i.e. t = 0, 1). Given the approximation above, the firm's labor productivity at time 1 can then be expressed as the average of labor productivities, weighted by labor shares: where the error term e Y/L,1 has been included to reflect approximation errors, unobservable factors in our formulation, and the A/L term in (3). Each worker age group can further be divided into two subgroups; workers who worked in the firm at the previous point of time 0 and are still working in there, i.e., stayers (stay), and those who are working in the firm at time 1 but were not there at time 0, i.e., they were hired after 0 (hire). The firm's labor productivity level can then be expressed as follows: The shares of stayers and hired workers add up to one: Taking this adding-up constraint into account and multiplying and dividing the first term in the right hand side of (5) by P j L 1j;stay L 1;stay , it can be written as follows: To write the labor productivity level of the firm at time 0 we define a third subgroup, those who were in the firm at time 0, but are no longer there at time 1, i.e. those who have separated after 0 (sepa). We can write the time 0 productivity in an analogous way to (7), defining P j L 0j;stay L 0;stay : We are interested in labor productivity growth, i.e., the growth of productivity level between points of times 0 and 1, i.e.
We define the worker age groups in such a way that none of the staying workers changes her group between times 0 and 1, i.e., L 0j,stay = L 1j,stay and therefore L 0j,stay / L 0,stay = L 1j,stay /L 1,stay for all j. Note that people are, of course, aging over time, but the age groups should be understood as cohorts rather than absolute age groups.
We then obtain 6 The first set of terms on the right-hand side of Eq. (10) shows the productivity growth ''within workers'', i.e. the productivity growth that accumulates over time for those who are staying in the firm. It can be interpreted as productivity growth due to the accumulation of human capital through experience. The within worker productivity growth may vary across the age groups, and the total effect is a labor share weighted average of productivity changes in the different groups. A firm has a rapid productivity growth when a large proportion of workers have a high productivity growth rate. These workers may have such human capital that enables them to adopt or innovate more productive techniques. In other words, these workers have dynamic long-run effects on the company's productivity. This can be called Nelson-Phelps effect according to the seminal work by Nelson and Phelps (1966).
The second set of terms indicates the productivity effects of hiring of workers in different age groups. As can be seen from (10), hiring of type j workers has a positive impact on productivity change when these hired workers have a higher productivity level than the average staying workers. Newly hired workers may be more productive than incumbents at time 1 because they have learned more productive techniques when they worked for the previous employer, or have more recent education, for example. Adjustment costs related to the hiring of new employees are implicitly included in our formulation. The relative productivity of the hired workers should therefore be understood as productivity net of adjustment costs.
The third set of terms indicates the productivity effects of separations of different worker age groups. Quite analogously to the hiring effect, separation of type j workers has a positive effect on productivity change when these workers have a lower productivity level than the average incumbent worker at the initial time 0. Again, the productivity impact of separations is net of adjustment costs. Finally, there is the change in the error term.
The terms of the decomposition have analogues in the firm-level productivity analyses, where productivity growth is decomposed to growth within firms, between firms and through entry and exit. Here the productivity growth of the stayers plays the role of within-firm productivity growth, and the hiring and separation effects are analogous to entry and exit. However, in (10) there is no analogue to the between effect of the firm-level analysis, since the group of stayers in an age cohort does not switch to another group (i.e., the ''market shares'' of the age cohorts do not change among stayers).
Besides labor productivity, we can use a similar decomposition for the average wage level in the firm, since the average wage in the firm is a share weighted average of wages in the worker groups. In this case we just replace Y in the equations above by the wage sum W. Without the error terms, Eq. (10) and a corresponding equation for wage growth are in principle identities. We can observe the labor flows, but we do not know the productivities, so the equations cannot be used directly for assessing productivity and wage differences between the age groups. There are some influences, however, that have not been taken into account and now included in the error term and allow us to use the equations as a basis for estimating the productivities. There are likely to be differences across firms in the productivities of different age groups. If we use (10) as a model for estimating parameters that correspond to the agespecific productivities, we will estimate average productivities. Any firm differences will therefore be included in the error term. So far we have not taken into account other inputs, especially capital that affect productivity. Capital input could have been included already in the production function (1), in which case the capital-labor ratio would appear in the approximation (3). We will therefore include the capital-labor ratio and other control variables Z to account for other exogenous influences on firm productivity, wage, and profits. Inclusion of a constant term takes into account the productivity growth trend. After these observable influences are taken into account, the error accounts for all unobservables.
The terms of (10) can be turned into growth rates by dividing them by the average of productivity level of times 0 and 1. The growth rate is then a close approximation of a more common log-difference. We obtain the following estimation models: 6 The decomposition is related to those used commonly in firm or plant-level productivity analysis (e.g. J. Haltiwanger 1997), but is closer to those used by Maliranta (1997), Vainiomäki (1999), Maliranta and Ilmakunnas (2005) as well as Diewert and Fox (2009) are averages of productivity and wage, respectively, HR j = L 1j,hire /L 1 and SR j = L 0j,sepa /L 0 are the hiring and separation rates, respectively, and STAYSH j = L 0j,stay / L 0,stay (= L 1j,stay /L 1,stay ) is the share of staying workers. The shares of the M stayer groups sum up to one, so if they were all included in (11) and (12), they would have a linear dependence with the constant term. For the hiring and exit variables there is no such problem, as they are rates (e.g. hired of type j in relation to the number of all workers, not to all hires) and therefore all M groups can be included. In the estimations, we use panel data of firms over non-overlapping time periods to calculate two-year differences within the periods. Therefore the equations to be estimated will be indexed with i (firm) and s (period), which are not shown in (11) and (12).
The productivity and wage gaps by age can be analyzed both on the hiring side and on the separation side. On the hiring side the coefficients of our main interest that will be estimated with Eqs. (11) and (12) have the following interpretations: i.e. they measure the productivity and wage, respectively, of hired workers in age group j, compared to all staying workers. These differences are scaled by firm (over time average) productivity and average wage, respectively, which follows from the transformations of the dependent variables in (11) and (12). Equations (13) and (14) are approximately equal to log-differences, and hence the parameters measure relative differences between hired and staying workers. For the separation side, the estimable coefficients are obtained analogously as which measure the relative productivity and wage difference, respectively, of all staying workers and separated workers of type j. The intercepts in (11) and (12) indicate the growth rate in the reference age group of the stayers and the coefficients of the included STAYSH j variables (M -1 age group variables) indicate differences in the growth rate in the age groups j and in the reference group.
In this paper, we are particularly interested in profitability effects. Profitability is measured as follows: where OPM denotes operating margin (i.e., OPM = Y -W(1 ? a)), where a, the ratio of payroll taxes to wages, is assumed to be constant over time and across the worker groups. 7 The growth rate of profitability (17) is therefore the difference between the growth rates of productivity Y/L and wage W/L, which is approximated as where P ¼ 0:5 P 0 þ P 1 ½ . By inserting (11) and (12) into (18) we obtain an equation for the profitability change where, on the basis of (18), the following approximations hold equation (20) can be developed as follows which shows that the parameter of the hiring variable for the worker group j in the profit Eq. (19) can be interpreted as a measure of the relative profitability difference of the hired group j workers and all stayers at time 1. Analogously, we obtain that b P;j;sepa % ln P 0;j;sepa P 0;stay ; ð25Þ which provides us a measure of the relative profitability difference of the separated group j workers and staying workers before they leave.

Econometric issues
When using Eqs. (11), (12) and (19) for estimation, there are possible sources of bias. First, there can be unobservable firm heterogeneity both in firm productivity and wage levels, which is correlated with the employee characteristics (flows and shares of different age groups). In particular, new firms often start with a new work force which only slowly evolves over time (Haltiwanger et al. 1999(Haltiwanger et al. , 2007. Therefore, firm vintage and worker cohorts tend to be tied together, with young workers being employed in firms that have new equipment and a high productivity level. Since we are using growth rates as the dependent variables, this kind of time-invariant unobservables are eliminated. Our approach is related to the use of long differences to eliminate time-invariant unobservable effects in panel data models (e.g. Griliches and Mairesse 1998;Haltiwanger et al. 2007). We define the growth rates and labor flows in five different time periods and pool them in estimation. We also control for some observable firm characteristics, industry and region, included in Z. These account for industry and region specific trends in the levels. Second, there is heterogeneity across workers. Since the firms may hire the best applicants and lay off poor performers, the hiring and separation flows may be unrepresentative of the whole population. Heterogeneity can to some extent be taken into account by disaggregating the flows by age and by investigating the exit flows also by destination. For example, flows to unemployment may be a special group of all separations. Selection should affect productivity growth and wage growth in the same way under the null hypothesis of competitive labor markets (see Hellerstein and Neumark 2007). Therefore selection should be a less serious issue when we examine their difference, i.e. the productivity-wage gaps which directly relate to our measure of firm performance.
Third, the hiring rate and the involuntary part of the separation rate (i.e. layoffs) are based on the firms' decisions and are therefore possibly correlated with productivity shocks which are part of the error terms. Timevarying shocks would remain in the equations for the growth rates. For example, positive productivity shocks may lead to the hiring of new, young workers, which then causes an overestimate of their productivity effect (cf. Olley and Pakes 1996;Levinsohn and Petrin 2003). On the other hand, one of our main interests is the profitability effect of the separations of older workers. It is not quite obvious why a positive productivity or profitability shock should increase separations of older workers, which would generate a spurious positive correlation between profitability change and the separation of older workers. More probably, a negative profitability shock should increase a company's incentives to encourage its older labor force to early retirement, for instance. As a consequence, our main results are more likely biased in the direction of not showing older workers being overpaid even when they actually may be.
We use a proxy variable approach, suggested by Olley and Pakes (1996) and Levinsohn and Petrin (2003), for dealing with the shocks. Materials is the proxy variable, which is a function of capital and the productivity shock. The capital variable is the end of previous year capital stock, which is not affected by the current period shock. We first estimate a production function in level form using annual data, without disaggregating the labor input by age, including in this estimation a polynomial of capital and the proxy variable to account for the shock. The method yields an estimate of the productivity shock for each firm in each year. Then the shock is used in differenced form as an additional control variable in our productivity, wage, and profitability change models.
Another way in which we address the issue of correlation of the flows and shocks is to identify those workers who have left the company for the old age pension. We consider this a more exogenous type of separation than the other separations.
Fourth, there can be productivity differences across firms in the productivity of a certain age group. This can arise from decreasing returns. For example, extensive use of younger employees in a firm lowers their marginal productivity. There may also be genuine technological differences between firms or industries, including differences in economies of scale 8 or efficiency differences. These factors would imply that the coefficients vary across firms. We can still obtain an unbiased estimate of the mean coefficient and account for the firm differences by correcting standard errors for clustering within firms. To account for heterogeneity in technology we also estimate the models separately for some subsets of the data. Besides the whole business sector, we present results separately for the industrial and service sectors. In addition, in each of these sectors we study ICT and non-ICT firms separately.
Finally, price differences across firms may cause biases when a common deflator is used for all firms in an industry (see e.g. Foster et al. 2008). For example, the profitability level of a low-price firm will be underestimated and that of a high-price firm overestimated. However, to the extent that there are firm differences only in price levels, the bias is eliminated when profitability changes are examined, since the log-differences of the prices would be the same. If also price growth differs, the bias is not eliminated.

Data and variables
The unique identification codes for persons, companies and plants used in the different registers forms the backbone of the Finnish administrative register network and the Finnish statistical system. This provides an excellent opportunity to construct cross-sectionally and dynamically representative data for various research purposes by linking different administrative data sources (see Abowd and Kramarz 1999).
The data for this study are drawn from the FLEED. The data set merges comprehensive administrative records of all labor force members as well as all employers/enterprises (including information also on their establishments) subject to value added tax (VAT). It can be complemented by a range of additional information from both private and public sources. FLEED currently covers the years 1990 onwards with near-perfect traceability of employers and employees across time. The employment statistics, educational statistics, taxation records, business register, financial statement statistics, manufacturing census as well as various surveys are among the original sources of the FLEED variables.
To define the labor flows and changes in productivity, wage, and profitability, we use 2-year windows. The flows and changes are defined for the five time periods 1995-1997, 1997-1999, 1999-2001, 2001-2003, and 2003-2005. 9 The observation unit is a firm. In principle we also have data on establishments, but establishment-level information on value added and some other relevant variables, like capital intensity, are lacking beyond the manufacturing sector. Further, the links between employees and firms are more reliable than those between employees and establishments, especially in multi-unit firms. A drawback of the data is that we are not able to systematically identify the role of change in ownership, like mergers and acquisitions.
Our estimation sample covers the industry and service sectors. The industry sector is defined very broadly as consisting of branches that are not included in services, i.e., mining, manufacturing, public utilities and construction. The service sector comprises retail and wholesale trade, business services and personal services. Real estate and financial intermediation are excluded due to problems in measuring output in a reliable manner. The number of observations in the estimation sample by branches is shown in Appendix Table 7.
The dependent variables are defined as follows. Labor productivity growth is measured as a two-year rate of change of value added per employee [defined as in (11)], average wage growth is correspondingly a two-year rate of change of wage sum per employee, and change in profitability is a two-year relative change in value added per labor costs (wages and social security payments). These variables are measured in nominal terms, and price changes are controlled by a set industry dummies that are interacted with the period dummies. 10 The labor flows are based on the comparisons of employees in the firms at two points of time s and s-2, where s denotes the end year of a three-year period. 11 The flow rates are calculated separately for three age groups, ''young'' (30 years), ''middle-aged'' (31-50 years), and ''old'' (51-years). We use fairly broad age groups to ensure that we have enough employees in the groups when hiring and separation are disaggregated by age. With narrowly defined groups, e.g. 10-year intervals, many of the flows would be zeroes in the smaller firms. In each period age is based on the situation at the end year. For example, those who were 28 years old in s-2, are 30 years old in s and hence included among the ''young''. Those who were 30 already in year s-2, are 32 in s, and hence included in the middle group. The age group classification is thus based on year s age, and not on the age at which the employees were last observed in the firm.
The hiring rate HR jis for age group j is the number of new employees in firm i in the age group (those in the firm in s, but not in s-2) divided by the number of all employees of the firm in s. The separation rate SR jis is correspondingly the number of exited employees of firm i in age group j (those in the firm in s-2, but no longer in s), divided by the number of all employees in the firm in s-2. The share of stayers, STAYSH jis , is the number of staying employees of firm i in age group j (those in the firm both in s and s-2), divided by all stayers of the firm in s-2. The sum of these stayer shares is therefore one, so one of the groups is left out of the estimation.
As a control variable we use the change in the productivity shock, estimated using the proxy variable approach. We also control for the log of capital per employee, which is entered in two-year difference form to be consistent with the form of the dependent variables. The capital stock is measured by the book value at the end of the previous year. 12 Finally, we have a set of dummy variables. These include a set of dummies as controls for regional effects 13 (20 regions), industry dummies (46 industries), period dummies (5 three-year periods), as well as interacted industry and period dummies to account for, besides price changes, also the effects of idiosyncratic industry shocks.
Before conducting the econometric analysis we leave out some potentially erroneous observations that may distort our results. First, we remove those observations where the number of linked employees differs more than 10 % from the number of employees in the company data. This indicates that the linking of the individual and firm data is incomplete. Second, we remove some potentially influential outliers that we detected by using the method proposed by Hadi (1992Hadi ( , 1994. The method is useful for finding multiple outliers in multivariate data. Identification of outliers is made on the basis of three variables: (1) the growth rate of average monthly earnings calculated from the data on individuals (employment statistics), (2) the growth rate of average wage calculated from the company data, and (3) the productivity growth rate. The first two variables should be highly correlated with each other because they are essentially gauging the same thing, but may sometimes differ due to possible inaccuracies in the links between employees and their employers, for instance. Wage growth is usually correlated with productivity growth, but sometimes they may be very different because of measurement errors in output and/or labor input. The identified outliers (508 out of 24,842 firm-period observations) are removed from all estimations. We also restrict the sample to firms that employ at least ten employees and leave out the firms with over 10,000 employees; 11 observations are dropped from the sample due to the upper limit. Table 1 gives some descriptive summary statistics of our basic sample that is used in the regression analysis below. Because some observations cannot be used in the analysis due to missing values of the explanatory variables, we are finally left with 23,738 observations. The average number of linked employees per company is 85.2, which is close to the average number of employees in these firms according to company data (85.8 persons). In other words, our regressions are based on over 2 million individualperiod observations. Because we have five periods, our sample covers over 400,000 individuals per period. This figure includes those individuals who are employed by a company in our sample either in the initial (s-2) or end year (s) of a period or both. This is roughly one-third of the total employment in the non-farm business sector. 14 The average nominal productivity growth rate is 4.5 %. Average wage growth rate, calculated from company data, is 6.7 %. This is reasonably close to the average growth of monthly earnings of the linked employees, 6.9 %, obtained from the register data on individuals (employment statistics). The average hiring rate, which is the sum of the hiring rates of the three age groups, is 28.1 % and the average Footnote 11 continued why we estimate the models also by sector (industry and services). In principle, owner-managers can also cause measurement error. For small firms where the owner is working, he is included in the number of employees only if he takes at least half of his income as salary (as opposed to capital income). In any case, the restrictions we use (i.e., leaving out firms with less than ten employees and firms with missing employer-firm link) mean that this measurement error should not be large. We thank an anonymous referee for pointing out these issues to us. 12 The proxy variable approach is based on the assumption that capital is inherited from the previous period, and the proxy variable (materials) and labor choices happen after the shock is realized. Therefore, the latter variables are endogenous, but capital stock itself is not. 13 Some of the firms operate in more than one region. The region of a firm refers to the one where the employment share is the highest. 14 The number of observations drops because of the reasons discussed in the text. J Prod Anal (2016) 46:43-62 51 separation rate, sum of the separation rates of the three age groups, is 24.1 %. 15 Young employees account for 17.4 % of the staying employees and the old workers account for 21.9 %. Table 2 reports the baseline estimates. All of the estimation results reported below are based on weighted estimation, with firm employment used as the weight. 16 Since the productivity shock is a generated variable, we use robust standard errors, adjusted for clustering at the firm level.

Basic estimation results
The first two columns show the results for the productivity and wage change equations with the productivity shock included as a control variable. The entries in the third column are from a separate estimation for profitability change, but they are roughly equal to the differences of the corresponding entries for productivity and wage change in columns 1 and 2. The last three columns have the corresponding results without productivity shock. Consider first the results with the productivity shock included. Hiring of young employees lowers the productivity level. This implies that the young hires have lower productivity than all staying employees, presumably because of their lack of general skills. However, they also have lower wages, so that the negative effect on profitability is smaller than the effect on productivity. Hiring the mid-aged also has a negative association with productivity, but it is weaker than that of the youngest. Their connection to average wage is small and not significantly different from zero. The results on productivity and wage give a statistically significantly negative net connection to profitability. The hired old workers have a positive relationship with productivity, but it is imprecisely estimated, and the relationship with wage is negative, but not 15 Note that these figures underestimate actual turnover among the employees, since e.g. hiring of an employee after the start of a period and subsequent separation of the same employee before the end of the period is not included in the turnover rates. 16 There are two main reasons why we prefer using employment weighted estimation. Firstly, we are ultimately concerned of productivity differences between different employment groups. Unfortunately we are unable to measure productivity at the level individuals but only at the level of firms so that in a sense we are using aggregated data. In order to give an equal weight for each individual in our analysis, we should give a larger weight to large firms than to smaller firms. Second, weighted estimation provides us with a more efficient procedure in the presence of heteroscedasticity.
significant. The effect of hiring old employees on profitability is, however, positive and clearly significant. However, it should be noted that the number of recently hired older workers is quite small (see Table 1). These cases are thus quite exceptional and therefore it is unsurprising to find that these individuals are not a burden to the firm's profitability. On the separation side, exiting young employees have a positive coefficient both in productivity and wage equations, implying that they have lower productivity and wage than the staying employees. Our point estimates suggest that young leavers are underpaid, but because of a relatively large standard error the result on profitability does not differ from zero in a statistically significant way. Separation of the mid-aged has a higher coefficient in the productivity equation than the youngest. This results in a clearly significant positive connection to profitability. Separation of the oldest group has the largest coefficient both in productivity and wage equations, and in combination they yield an economically and statistically significant positive relationship between separation and profitability.
These estimates indicate that the separating employees have a clearly lower productivity level than the continuing employees, but they are also paid somewhat less on average. The net result is thus an increase in profitability for all of the age groups, but the gap between pay and productivity increases with age. These results seem to support the deferred pay argument that gives rise to wage profiles that give a high pay at the end of the career. 17  Non-reported variables include regional dummies and interactions of industry and period dummies. Employment weighted estimation. Robust standard errors in parentheses, clustered at the firm level. Firms with at least 10 and at most 10,000 employees included * p \ 0.1; ** p \ 0.05; *** p \ 0.01 17 In the Finnish pension system the pension was until 1996 based on the last four years' pay and until 2004 on the last ten year's pay in each employment relationship, which gave incentives for obtaining a high pay at the end of the career. This combination of backloaded wage and a fixed retirement age is consistent with the deferred payment model of Lazear (1979), although it is a result of a quite different institutional setting. The system has been based on a mix of centralized negotiations between labor unions, employer organizations and the government, and firm-level wage setting. Lazear's argument is somewhat difficult to use in the connection of labor flows, as it is an equilibrium model where the raising wage profile and unprofitability of the older employees is part of the ''package''. However, even in this case unexpected increases in the costs of older employees or an increase in the share of older employees through The results for the stayers indicate that the older employees have slower productivity growth than the young. This is consistent with the idea that the accumulation of productivity-enhancing (firm-specific) experience is greatest at the beginning of the career and then gradually slows down over time. Wages seem to develop correspondingly.
Comparison of the estimates with the productivity shock included (first three columns of Table 2) and those without the shock (last three columns) shows that the shock variable is significant in the productivity and profitability equations, but not in the wage equation. When the shock variable is included, some of the point estimates or their statistical significance differ from those obtained without the shock variable, but the main pattern of the coefficients remains the same. This shows that differencing over time has already succeeded in purging the unobservables and accounting for the time-varying shock has only a marginal additional influence on the results.
We have conducted some robustness analyses, which we briefly comment without showing the results in tables. First, we added firm size (number of employees) as a variable to account for scale economies. This had hardly any impact on the results. Second, we included worker characteristics as control variables. These included average tenure, average education years, and the share of females. Also this gave qualitatively the same results and the changes were mostly in the third decimals of the coefficients. Third, we considered the possibility that there is a selection bias because of exit of low productivity firms. Although many firms exit, exiting firms account for a small share of employees so that we would expect that the role of the selection bias is mitigated, especially in weighted estimation. Using an inverse Mills ratio selection correction term gave indeed results that were fairly similar to those in Table 2. 18 Fourth, we estimated the models without weighting. This produced in general coefficients that were lower in absolute value than those in our basic results. The pattern of the coefficients was, however, similar in the productivity and profitability equations, while the wage results had some differences. In the profitability model the only qualitative difference to the results of Table 2 was that the coefficient of the hired old employees was negative and not significantly different from zero.

Disaggregated results
To investigate whether the results hold for different subsectors within the whole business sector, we estimate the models separately for the industry and service sectors. Table 3 shows that there are some notable differences in the results. We find that the R 2 measures of the productivity and profitability equations are substantially higher in the industry sector than in the service sector. This may reflect a greater importance of idiosyncratic factors or measurement problems regarding productivity and profitability in service industries. In the wage equation R 2 is similar in both sectors.
More interestingly, the separation of older workers is particularly profitable in the industry sector. The results for the whole business sector are clearly driven by this sector. In contrast, the only statistically significant result on the hiring side in the industry sector is the negative coefficient for the mid-aged. The coefficient of the hired old workers is positive, but imprecisely estimated. These newly hired workers are paid less than their productivity. In the service sector the results are different in some respects. The labor flows of the older employees are profit neutral to firms, and hiring of young and mid-aged is negatively associated with profitability change.
The separations of the oldest age group may be driven by very different influences. Some of these employees are retiring. Some are laid off and may face periods of unemployment. Some are still looking for new jobs and quit to move to other firms. Finally, some withdraw from the labor market. To investigate the role of heterogeneity among the older employees, we have disaggregated the separation rate of the age group over 50 years into three flow rates. These are separation rate to pension (old age pension or disability pension), unemployment (including unemployment pension), and other (job-to-job moves and withdrawal from the labor market). For the sake of comparison, we have divided separations of the other age groups by destination into unemployment and other. There are very few in these age groups who end up into retirement; they have been included in the category ''other''. The estimation results with this disaggregation are shown in Table 4, separately for the industry and service sectors. Now the outflow of older workers into retirement and unemployment is found to have a statistically and economically significant positive relationship with productivity in the industry sector (column 1 of Table 4), indicating that these worker groups had lower than the average productivity level before they left. The results for wages (the second column) show that the wage level of these workers was below the average level, since their exit had a positive effect on average wage. Comparison of the coefficients of these exits in the productivity and wage equations indicates Footnote 17 continued aging may make the system unsustainable and give raise to incentives for separations. 18 The probability of exit was modelled as a function of firm size, industry, employee characteristics (average tenure and education years, share of women), capital-labor ratio, productivity shock, and an indicator for growing firms. All of these variables were measured in the year prior to exit. that these worker groups had been paid more than their productivity and their separations have been thus profitable to firms. This can also be seen in the positive coefficients of the third column, the profit equation. The productivity-wage gap is quite substantial. On the other hand, the results of Table 4 do not provide evidence that those older workers that have left the firm for some other destination, e.g. employment in another firm, had been overpaid. These workers account for roughly one-third of the total separations of the older workers. So, a substantial proportion of the older workers are not found to be overpaid in our analysis. This includes besides separations to other destinations, also hired and staying older workers.
Interestingly, we do not find statistical evidence that those young or middle aged workers who separated into unemployment had been overpaid. In these age groups only exits to other destinations than unemployment have a positive impact on profitability. For the service sector the results are again different. The only significant relationships with profitability are obtained for the exit of the oldest age group to unemployment and the mid-aged to other destinations.
Our interpretation of the results is that especially the outflows to unemployment reflect the firms' choices (i.e., these separations are a selected group) whereas especially the route to old-age pension is a more exogenous event to the firm. It is worth noting that the Finnish pension and unemployment insurance systems have had an early exit route called ''unemployment pension tunnel'', which has allowed unemployed to withdraw from the labor market at a relatively early stage by successively transferring to unemployment compensation, unemployment pension and finally to normal pension. It has actually been relatively common for the firms to use this system for downsizing their labor force, which can be seen as an increase in the unemployment risk at an age where the ''tunnel'' starts (Kyyrä and Wilke 2007). 19 It can also be argued that the  Non-reported variables include regional dummies and interactions of industry and period dummies. Employment weighted estimation. Robust standard errors in parentheses, clustered at the firm level. Firms with at least 10 and at most 10,000 employees included * p \ 0.1; ** p \ 0.05; *** p \ 0.01 19 In addition to the unemployment pension system, also disability pension gives incentives for laying off older employees. The larger use of the unemployment pension has in many cases been in the mutual interest of the firms and their employees (Hakola and Uusitalo 2005). Our results are quite consistent with the existence of this policy that makes it easy for firms to concentrate labor shedding on the older employees. The use of temporary workers also makes it easy and cheap to downsize the labor force when necessary. Unfortunately, our data do not allow us to distinguish between temporary and permanent employees, but the temporary employees are usually young. Also Table 1 above shows that the flow rate to unemployment has been quite high among the young. However, our estimation results do not give support to the view that this kind of downsizing would have been profitability-enhancing.
In order to examine more carefully what drives our findings of the negative productivity effects especially in the industry sector we have sliced our data a little bit more. First, we have classified the companies into ICT and non-ICT groups on the basis of 2-or 3-digit industries. Our starting point is the widely used classification proposed by van Ark et al. (2003). In their classification the ICT industries of the business sector consist of a broad variety   . We concentrate here on a somewhat narrower definition. 20 More specifically, we have excluded industries 18, 35, 36-37, 51, and 52 from the group of ICT industries. Table 5 shows the results of the profitability change equations for the ICT and non-ICT industries, with separations disaggregated by destination. The results are presented for the industry and service sectors separately.
Columns (1) and (2) show the results for the ICT and non-ICT parts of the industry sector, respectively. It should be noted that all the ICT industries of the industry sector belong to manufacturing. The table shows only the estimates for the profitability equation, but they are dominated by the effects of the hiring and separation flows on productivity. The coefficients for the separations of the older workers are positive and especially in the case of workers exiting to pension the coefficient is higher for the ICT part of the industry than for the non-ICT part. This evidence supports the view that skill obsolesce lowers the relative productivity of older employees and, in addition, weakens their ability to adopt or innovate new technologies. The results of column (1) thus provide evidence that older workers may be a burden to firms particularly in the manufacturing ICT industries. However, in contrast to these results, hiring of older workers is positively related to profitability in the ICT industry. It seems that the firms are able to pick up highly productive employees. On the other hand, hiring of the mid-aged lowers profitability in the non-ICT sector. As for the young and medium aged workers, their separation to other firms has a significant positive coefficient in the ICT industries. That is, these workers in the ICT sector have profitability that is below average. In both the ICT and non-ICT parts of the service sector (columns 3 and 4) the impacts of the separations of older employees on profitability change are smaller than in the industry sector. Some of the point estimates are even negative, although statistically not significant. Interestingly, the positive association between profitability and separations of older workers to unemployment is driven by the non-ICT part of the service sector, whereas the positive impacts of separations of young and mid-aged to other firms are driven by the ICT part of the sector.
One might suspect that our results are driven by some kind of downsizing effect, i.e. that firms aim to improve their profitability by reducing their work force. The results would in this case be dominated by separations in downsizing firms. It should be noted, however, that while our explanatory variables include all hiring and separation flows, we have implicitly controlled the employment growth (or decline) effect. In any case, for inspecting the downsizing hypothesis more carefully we have estimated the profitability equation by dividing the firms into two separate groups: the declining firms where employment has decreased and the expanding ones where employment has increased. Firms with stable employment are omitted. These results are shown in Table 6 for the industry and service sectors separately. Again we show only the results for the profitability change equation, but they are dominated by the effects of the flows on productivity. The main difference between growing and declining firms in the industry sector (columns 1 and 2 of the table) is that separations of older workers to unemployment has improved profitability more in the declining than in the growing firms. This supports the argument that downsizing has an important role in this labor flow. Growing industrial firms are also able to hire high-productivity older employees. Another difference is that separations of the young to other destinations than unemployment has improved profitability in the growing firms of the industry sector, but not in the declining ones. This may be an indication of more natural turnover in growing firms. In the service sector (columns 3 and 4 of the table) growing firms have hired low-productivity young and mid-aged workers, which has led to lower profitability. In contrast, hiring of these age groups to declining service sector firms has had a positive, but statistically non-significant relationship with profitability.

Conclusions
We have proposed a new way of estimating the performance effects of age using flows of labor to and from firms. The results support the argument that at the end of the working career wage exceeds productivity at least for some workers. Our results are consistent with some firm-or 20 See discussion in Daveri (2004). The results obtained by using the broad definition of ICT are available on request. It should be noted that ''Financial intermediation'' (ISIC 65-67) industries are excluded from our estimation sample. plant-level studies which have found a hump-shaped or declining relationship between productivity and age. Newer studies, on the other hand, tend to support a relatively flat age-productivity relationship at older ages (e.g. Zwick 2012, Mahlberg et al. 2013a).
Older workers appear to be particularly costly to firms in the manufacturing ICT industries. Further, the results indicate that in these cases older workers are negatively related to profitability (and productivity) growth, on top of lowering the current profitability (and productivity) level of the firm. These findings may be explained by rigidities in the wage formation which drive a wedge between the wage and productivity levels of older workers when rapid technological change makes skills obsolete at a rate that exceeds the rate of learning-by-doing. Such rigidities may derive from deferred compensation, employment protection that is focused more heavily on older workers, or insider power of the older workers in wage bargaining. On the other hand, we have found that a sizeable share of the older workers is not overpaid. Non-reported variables include regional dummies and interactions of industry and period dummies. Employment weighted estimation. Robust standard errors in parentheses, clustered at the firm level. Firms with at least 10 and at most 10,000 employees included Our approach has features that help to account for many kinds of estimation biases. In particular, we have used a proxy variable method to estimate productivity shocks and used the shock as a control variable in our estimations. Still, we have to be careful in giving the results a causal interpretation, since the estimated shocks can be correlated with some other unaccounted time-varying unobservables. Therefore we have made several robustness checks of the results. In particular, separation to pension can be considered to be more exogenous than separations to other destinations and therefore more immune to estimation biases. Our finding of positive profitability effects from these separations of the older employees gives particularly strong support for our findings. We also emphasize that the intention is not to measure the performance effects of hiring or separating a randomly chosen employee, but the effects of the hires and separations actually done.
It should be noted that our estimates gauge the total association of outflows of the oldest workers with productivity. Besides a direct productivity interpretation (i.e. a  Non-reported variables include regional dummies and interactions of industry and period dummies. Employment weighted estimation. Robust standard errors in parentheses, clustered at the firm level. Firms with at least 10 and at most 10,000 employees included * p \ 0.1; ** p \ 0.05; *** p \ 0.01 J Prod Anal (2016) 46:43-62 59 worker's efficiency in her own task), the estimates of this kind of analysis may arguably also capture various indirect channels that come into being through the diffusion of knowledge between different worker groups within a firm. Important as the diffusion of the tacit knowledge of older worker to the younger ones may be in many circumstances to the employer, our results, however, suggest that generally this effect does not outweigh possible shortages in productivity or ability to adopt new and more productive techniques.
The institutional setting has obviously contributed to the results. Firms have incentives to downsize by laying off the oldest employees. On the other hand, the pension system has given incentives for wage profiles that peak at the end of the career. Our results support the view that firms have followed these incentives created by the institutions for improving their performance.
There are changes in the labor market, however, that will in the future increase incentives for keeping the aging labor force at work. Pension reforms increase the mandatory retirement ages or extend employees' rights to stay at work. There is also reduced availability of labor because of smaller age cohorts. These developments create pressures for firms to keep their older employees and to use new means for improving their performance, like changes in work organization and rotation of tasks. In addition, there is a shift away from final salary pensions to systems where the benefits are more closely tied to contributions over the working career. The new environment with longer, less fixed retirement age and fewer incentives for bargaining a back-loaded wage may give rise to flatter age-wage profiles in the future.
The approach that we have used could be extended in various ways, which are left for future work. We have disaggregated the exit by destination, but obviously an interesting question would also be where the hired employees are coming from, for example, whether they are coming from low or high productivity firms. The employees could also be examined in terms of other characteristics, e.g. education or gender. A potential problem with this kind of disaggregation is, however, that the number of worker groups would grow fast. If we had for example three educational groups in addition to the three age groups, we would have nine age-education combinations and the models would have nine hiring rate variables, nine exit rate variables and eight stayer shares. In the smaller firms many of the age-education cells would be zeros and most likely the precision of the estimation would suffer as the coefficients would be identified from a smaller number of non-zero observations.
An interesting extension would also be to use actual wages of individual employees to calculate the wage decomposition directly instead of estimating it using average wages. However, for data reasons the results would most likely be different. The wage sum in the company data, which we have used for calculating average wages, is based on wages actually paid during the year, as reported by the firm. On the other hand, the data on individual wages in FLEED are based on tax registers and measure the total earnings of the individuals during the year. For workers who switch jobs the earnings therefore come from different firms. For example, if a person who has a low-pay job for the first half of the year switches to a high-pay job in another firm for the second half of the year, he would appear as a medium-pay hire in the new firm on the basis of his annual earnings although he is actually a high-pay hire. When average wages are used, this would not be an issue. Similar problems appear for separations. An employee who switches to another firm in the middle of the year, would contribute to the decomposition with his earnings in the previous year. The wage paid during the first half of the year of separation would not be included, unlike in the average wage. Further work is needed to investigate potential biases in both approaches. 21