Sorting Based on Urban Heritage and Income: Evidence from the Amsterdam Metropolitan Area

Urban heritage is often concentrated in conservation areas with a protected status. Previous research argues that urban heritage attracts especially higher educated households who are likely to have higher incomes. The presence of these households may have a further impact on the attractiveness of the neighborhoods concerned, for instance through endogenous amenities like better shops or schools. If this is the case for high income households, conservation areas will have a further impact on the area’s attractiveness through the demographic composition of the residential area. In this paper we investigate the interaction between the preference for urban heritage – as an exogenous amenity – and the preference for areas with a high concentration of high income households – as an endogenous amenity. We develop a logit-based sorting model in which different income groups interact and estimate it for the Amsterdam metropolitan area. Results show that all employed households highly value conservation areas and prefer to live in areas with a high concentration of high income households. We investigate the impact of urban heritage on house prices and welfare through counterfactual simulations. The disappearance of urban heritage would result in a substantially more suburbanized location pattern of the high income households in the Amsterdam metropolitan area, and to lower welfare for all income groups.


Introduction
Urban amenities are closely related to the current urban revival. The decentralization of employment, the improved possibilities for communication and the secular decrease in transport costs have weakened the strength of traditional forces behind the concentration of economic activities. Since the amenities that make a city attractive for consumers, such as shops, theatres and restaurants remain localized 'consumer city ' (Glaeser et al., 2001) becomes more important.
Moreover, Brueckner et al. (1999) suggest that the presence of urban amenities is an important driver of the location of high and low income households in urban areas. In their theory, the demand for these amenities is highly income elastic. This implies that the rich will tend to locate in the city center when it is amenity-rich, while they will otherwise prefer the suburbs where land is much cheaper. In their theoretical model, amenities are taken as exogenous as is plausible for historic city centers. Other urban amenities, like shops, restaurants and theatres, are affected by the composition of the population in a neighborhood. Through this route, urban heritage may have a further impact on the attractiveness of neighborhoods. To the extent that this effect is positive, this secondary effect of the exogenous amenities reinforces their primary impact.
The economic literature on heritage has mainly focused on the impact of designation of monuments and conservation areas. See Navrud and Ready (2002), Noonan (2003) for surveys of the early literature. Examples are Coulson and Leichenko (2001), Coulson and Lahr (2005), and more recently Been et al. (2014), Ahlfeldt et al. (2017) and Koster and Rouwendal (2017). Recently there has been an increasing interest on cultural heritage in a wider urban economic context (Van Duijn and Rouwendal, 2013;Falck et al., 2015;Sheppard, 2015).
The relative location of high and low income households is an important issue in urban economics (see, for instance, Wheaton, 1977). The monocentric model, which is the workhorse of this literature, suggests that the ratio of the income elasticity of the demand for housing and that of the value of commuting time is the driving force of the spatial distribution of incomes in urban areas. If the value of time is roughly proportional to the wage (and hence income) and the income elasticity of housing demand is less than 1, as much of the literature suggests, the model predicts that the rich should live in the city center. Since many cities (in the US as well as elsewhere) do not confirm this prediction, other factors must be important. One possibility is the durability of housing which tends to make older housing, which is often overrepresented in the central city, less suitable for high income households. 1 However, old houses are not necessarily inferior. For instance, in many European cities ancient buildings -the canal houses in Amsterdam are a clear example -are regarded as highly attractive urban heritage and are often inhabited by high income people. 2 This observation suggests that, under appropriate conditions, old housing may in fact contribute to the concentration of high income households in city centers.  have documented for the San Francisco Bay Area that the attractiveness of neighborhoods for specific groups of households is influenced substantially by the presence of particular household groups. 3 The pattern they observe confirms the sociological principle of 'homophily' -similarity breeds connection (see for instance, McPherson et al., 2001). There are several possible mechanisms behind this phenomenon. One is that households belonging to a particular group like to meet similar households and such interaction is facilitated by physical proximity. Another is that households with similar characteristics like the same type of amenities, such as schools, shops and restaurants with particular characteristics, and that these tend to emerge close to the concentrations of these households through the market or other allocation mechanisms as is suggested by the literature on Tiebout sorting. 4 The preference of households to live in neighborhoods where similar households are located interacts with the exogenous amenities and may therefore reinforce their impact. In other words, an overrepresentation of high income households in conservation areas will have a further impact on the area's attractiveness through preferences for demographic composition.
To investigate these issues, we develop and estimate a residential sorting model in which urban heritage -indicated by conservation areas -has a direct impact on the residential choice behavior of households, while we also incorporate preferences for demographic composition similar to  that may cause an additional impact of this amenity on neighborhood attractiveness for particular groups. Our model is of the 'horizontal,' logit-based type (see Kuminoff et al., 2013). This family of sorting models is closely related to the Berry, Levinsohn, and Pakes (1995) (BLP) model which is a workhorse model in industrial organization and was first applied to study residential sorting by Bayer and his co-authors. 5 We make three methodological contributions to the literature on sorting models. First, we relax the assumption that all actors attach the same value to the unobserved characteristics of the 1 See Bond and Coulson (1989) for an example of this 'filtering' literature, and Chen and Rosenthal (2008), and Brueckner and Rosenthal (2009) for recent contributions. 2 Lazrak et al. (2014) find in a hedonic analysis on Dutch data that the prices of houses listed as monuments 15-20% higher than those of otherwise comparable houses. 3 The classical references for these sorting effects are Schelling (1971Schelling ( , 1978. 4 See Tiebout (1956) and Epple et al. (1984;1998; for seminal contributions. 5 See Bayer, McMillan & Rueben (2004), Bayer & Timmins (2005;, and Bayer & McMillan (2012).
choice alternatives. Second, we apply the methodology of Belloni, Chen, Chernozhukov, & Hansen (2012) to the choice of instruments for the price and the share of high income households in the sorting model. Third, we clarify the working of the instrument for the price used in Bayer, Ferreira, & McMillan (2007). These aspects of the paper are further discussed in section 2.
We find that households in all income groups attach a large value to urban heritage in their neighborhood, which we measure as the part of it included in a conservation area. The values we find are substantially larger than those reported in Van Duijn and Rouwendal (2013) who take the much larger municipalities as their spatial units of analysis. 6 Our results also suggest that all households prefer to live in neighborhoods with a high share of high incomes. Although we do not find large differences in willingness to pay for urban heritage by income, a counterfactual simulation suggests that without urban heritage the location pattern of the high income households would be substantially different from what it is now, with significant concentrations of this group in some suburban locales, a pattern that is reminiscent of location patterns in U.S. metropolitan areas.
Our study sheds new light on the interaction between the effect of an exogenous amenity, cultural heritage, and that of the endogenous share of high income households on the attractiveness of neighborhoods. High income households are attracted to areas with cultural heritage, which makes these areas even more attractive to other households, which leads to a multiplier effect.
Moreover, our simulations show that in the absence of cultural heritage in the city center, Amsterdam would look much more like a U.S. suburbanized metropolitan area, which confirms the idea that historical inner cities are an important background of the difference between European and American urban areas as was argued by Brueckner et al. (1999).
The remainder of this paper is structured as follows. Section 2 discusses the methodology concerning the residential sorting model, its issues, and the introduction of spatial elements in the model. Section 3 describes the data and discusses some descriptive statistics of household and neighborhood characteristics, which is followed by the estimation results in Section 4. In Section 5 we discuss the implications of the estimation results and Section 6 concludes. 6 Van Duijn and Rouwendal (2013) also develop a sorting model and estimate it on Dutch data, but there are a number of important differences with the present paper. They apply the model to all municipalities in the Netherlands, whereas we focus on the Amsterdam metropolitan area and use much smaller spatial units. They use survey data whereas the present paper uses administrative data. They do not consider the role of demographic composition, which is arguably less important at the municipal level, whereas this is a main focus of this paper. Moreover, they maintain the assumption that unobserved neighborhood characteristics have the same value for all households.

2
The location choice model

The first stage
Our methodology follows Berry, Levinsohn, and Pakes (1995) -from now on BLP -who addressed a number of important issues in discrete choice models of market demand. A main innovation of the BLP paper was that the possible presence of unobserved characteristics of the choice alternatives could be dealt with explicitly. This clarified an endogeneity issue associated with the price variable that had long plagued the estimation of logit models of market demand. BLP showed that, under appropriate assumptions, the endogeneity could be dealt with by including alternative-specific constants representing the average utility attached to the alternatives in the logit model. These alternative-specific constants absorb the impact of the unobserved characteristics as well as the endogenous price and they are analyzed further in a second estimation step using methods for linear equations. 7 We assume that each consumer i (i=1…I) has preferences over n neighborhoods (n=1…N).
These preferences refer to neighborhood characteristics that are not all observed by the researcher.
Two observed characteristics that are of special importance are the neighborhood housing price, , and the share of households belonging to a particular group in the neighborhood population, .
This group is special because it evokes a 'social interaction' effect in that the choices of these actors affect the choice behavior of others in a direct way. We denote the other observed characteristics as a vector and unobserved characteristics as a scalar . Moreover we allow for idiosyncratic differences in preferences over alternatives .
The housing price and the share of households belonging to a particular group have a special status because they are determined by the model. The prices are pinned down by the market equilibrium condition that the share of actors choosing a choice alternative must equal the share of the housing stock present there. The choice behavior of the households belonging to the particular group affects the choice behavior of other groups in a direct way.
A restrictive assumption about the unobserved characteristics of the choice alternatives that, to the best of our knowledge, is imposed in all applications of BLP-type models is that they are evaluated identically by all actors. Since heterogeneity of preferences related to observed characteristics of the actors is generally found to be important in empirical work, this asymmetry is potentially restrictive. Below we generalize the BLP framework to a setup that allows a partial relaxation of this assumption by estimating separate models for subgroups of the population. 7 See Berry (1995) and BLP for further discussion. In these papers the price of the choice alternatives does not occur in the first (logit) stage of the estimation procedure.
That is, we assume that each consumer belongs to a group g (g=1…G) and that members of a given group have similar preferences. More specifically, we assume that the coefficients in the utility function referring to the price, the share of the special group and the unobserved neighborhood characteristics are identical for all members of a given group, while we allow the coefficients for the other neighborhood characteristics to be individual-specific. We specify the utility function for household as: where denotes the group to which household belongs, and and in the second line are the first and second terms in square brackets, respectively, in the first line.
When G=1, our specification of the utility function is the conventional one in which equals the average utility attached to alternative j in the population and is the deviation of the deterministic part of the utility from the average. The obvious disadvantage of this specification is that it does not allow for heterogeneity in the evaluation of the endogenous variables and and in the unobserved characteristics . Although some researchers include such variables in the deviation from the average , a drawback is that the coefficients of this term have to be estimated in the first step where the endogeneity of these variables cannot be addressed. Opinions on the importance of this issue differ, but it seems desirable to avoid it if possible.
For G>1 some heterogeneity in the evaluation of endogenous and unobserved characteristics is allowed: heterogeneity across the groups is possible, but within each group homogeneity is still required. This may be regarded as a modest generalization of the conventional specification, but a potentially important one. For instance, in the setting of this paper it allows us to estimate different coefficients for the housing price and the share of the special group without evoking the concerns about endogeneity that are associated with introducing these variables in the first stage of the estimation procedure.
To estimate the model, we start with estimating G logit models with alternative-specific constants. For each of these models the sum of the estimated probabilities , ∈ that alternative j will be chosen by consumers in the group, divided by the total number of consumers in the group, , equals the observed share of households in that group choosing alternative j, which we denote as 8 : which implies that the contraction mapping technique can be used. It follows also that, for the total population we have that the sum of the estimated probabilities that alternative j will be chosen divided by the total population equal to observed share of the total population that chooses j : This property will be used when computing the instrument for the price in the second stage.
Before moving to a discussion of that stage it is useful to observe that the procedure just proposed requires the presence of enough observations of actors belonging to each of the groups to be able to estimate the model. At the very least, there should be one observation of each group choosing each alternative, but in practice one would like to have more. This is not always an innocuous requirement. For instance, low and high income households may be strongly overrepresented in some parts of urban areas and strongly underrepresented in others. Extreme sorting would therefore invalidate the procedure just outlined. However, apart from this requirement, no additional assumptions are necessary to enable this method to work.
To complete the description of the model, we let denote the set of actors belongs to the group causing social interaction. The shares are determined as: ∑ . (4)

The second stage
The second stage consists of G linear regressions: where denotes the alternative specific constant for neighborhood j that has been estimated in the logit model referring to group g. The unobserved characteristics as evaluated by the various groups are the error terms in these equations. Since they may be correlated with the price and the share of high income households, these variables have to be instrumented.
Instruments have been difficult to find. The literature has therefore relied on the use of nonlinear functions of the exogenous characteristics of the choice alternatives, following the lead of BLP. These authors argued that the characteristics of other alternatives are excluded (by economic theory) from the utility of a given alternative and can therefore be used to construct instruments of the price. They proposed sums of the characteristics of sets of these alternatives. The intuition is that in a market with imperfect competition -like the automobile market -the price of a given alternative is set with an eye on the characteristics of its competitors. For the housing market a similar logic applies as the willingness to pay for a house at a given location is determined by the presence of good alternatives.
The econometric context of this issue is the fact that conditional moment restrictions imply the possibility to use arbitrary functions of exogenous variables as instruments. Chamberlain (1987) proposed the choice of instruments that minimize the asymptotic variance and BLP argue that their choice may be interpreted as an approximation to such optimal instruments.
Approximations are useful, because exact derivation of the optimal instruments is often hard in practice.
The literature contains a number of different suggestions for functions of characteristics that can be considered as useful approximation to the optimal instruments. For instance,  noted, in the context of residential sorting, that the utility of living in a particular neighborhood can be affected by amenities in contiguous neighborhoods, which invalidates the exclusion restriction for ujsing such amenities as instruments. They therefore exclude characteristics of choice alternatives that are close (within 3 km), and propose the use of characteristics that are nearby, but not too close as instruments (sometimes referred to as doughnut instruments (2012) furthermore show that the method allows for heteroscedasticity and present a procedure to deal with weak instruments. This method is potentially very useful as it offers the possibility to select the best of a number of potentially large instruments. Below, we use this method to select instruments from the large set of candidates that have been proposed in the literature.

The housing price
In this subsection we pay attention to other instruments for the housing price and the share of a special group of households that have been proposed in the literature, but that use additional exogenous information. These instruments were developed in various papers with Bayer as a common co-author. Hence we refer to them as Bayer's instruments although other authors were involved in the various papers as well.
In the previous section we already mentioned Bayer et al. (2007)'s 'doughnut' version of the BLP instruments for the price associated with the choice alternatives. They use them for preliminary computations of the model and then switch to "a more powerful instrument by calculating the predicted vector of market clearing prices for a version of the model that sets the vector of unobserved characteristics to zero" (Bayer, Ferreira and McMillan, 2007, p. 621). To understand how this instrument works, it is important to note that the heterogeneity of the population plays an essential role in its functioning. To see this, we consider the case of a single group (G=1), which is the setting in which  work. Moreover, for the moment we also ignore the 'social interaction' effect associated with the presence of a special group of households. Without heterogeneity the average utility of an alternative would be its actual utility for all actors and in the first stage we would only estimate the alternative-specific constants. Using (5), while suppressing the superscript , we can write the housing price of alternative as: The counterfactual price when the vector of unobserved characteristics equals 0 is: which would be useless as an instrument.
What happens if the population is heterogeneous? In Appendix B we derive two equations that are analogous to (6) and (7): 10 The first two terms in the braces on the right-hand side are the averages of the expected maximum utilities and the logged choice probabilities of alternative in the population. Since demand for housing has to be equal to supply of housing, we must have , that is, the average of the probability that will be chosen in the population must be equal to the share of the housing stock present in . We can therefore rewrite ln ∑ ln ln . The first term on the right-hand side is (apart from the sign) an entropy measure of inequality (see, for instance, Shorrocks, 1980, p. 622 or Theil, 1967, p. 126-7).
In the model, the heterogeneity in the choice probabilities is related to the heterogeneity of the characteristics of the actors. This distribution of the actor characteristics is exogenous information and thus can serve as the basis of an instrument. However, this information cannot be used in the way suggested by (8), because the choice probabilities depend on the unobserved characteristics. The suggestion of  is therefore to strip the unobserved characteristics from the model and compute the prices that will equilibrate demand and supply in that situation, given in (9). 11 Note that it is crucial to transform the information about the actor characteristics in such a way that it becomes alternative-specific. The differences in valuation of the characteristics of a choice alternatives determine the variation in choice behavior. Nonlinear interaction between the distribution of actor characteristics and the choice alternative characteristics is therefore crucial.
The instrument uses the (counterfactual) choice probabilities for this purpose. The nonlinearity of discrete choice models is known to be a source of identification in itself, see Brock and Durlauf (2001), and this may contribute further to the usefulness of 's instrument.

Related issues
The presence of a social interaction effect complicates he computation of the price instrument. If it would be included in the model when the counterfactual equilibrium prices are taken into account, 10 To verify the analogy, note that with homogeneous actors we have: exp / ∑ exp , which implies that ln ln ∑ exp and note that ln ∑ exp , the 'logsum' gives the expected utility of the actors. 11 See Bernasco et al. (2017) for the use of a similar instrument for criminal activity conditional upon neighborhood choice. one can no longer maintain that the instrument is based only on exogenous information. This problem can be solved by leaving the social interaction effect out of the model when computing the price instrument. That is, is set equal to 0 in this computation. Although this may lower the correlation between the instrument and the observed prices, the presence of all other characteristics will probably be sufficient to keep its value sufficiently high for the instrumental variable to work.
Another issue is that we also need an instrument for the shares of the actors belonging to the special group that causes the social interaction effect. We propose to construct it by, again, using the model with all s set equal to zero and the prices set equal to the instrument values. That is, we compute the instrument values as the shares of the special group im0olied by the counterfactual situation in which there are no unobserved characteristics, prices set equal to their instrument values and the social interaction effect is also set equal to zero. That is, the instrument for the shares are the shares of the special group associated with the counterfactual situation to which our price instrument refers. This instrument thus uses the same information as the Bayer's price instrument, but in a different way. What we do is analogous to using a nonlinear function of one or more exogenous characteristics to find new candidate instruments.
The procedure for computing the two instruments can be refined somewhat through an iterative process. That is, one can substitute the instrument values for the s into the choice probabilities and set at its appropriate value. This will distort the market equilibrium on which the price instrument has been based. Prices can then be re-adjusted to find a second counterfactual equilibrium in which the values of the instrument for the shares of the social interaction group have been taken into account. The implied choice probabilities can then be used to compute a second version of the instrument for these shares. And so on, until convergence. This is the procedure we used for computing Bayer's instruments as they have been used in the selection procedure discussed below. 12

Other issues
In this subsection we deal with some remaining issues in the specification of our model. One is the specification of the heterogeneity in the evaluation of neighborhood characteristics by households belonging to the same group. A second concerns the possibility that some characteristics may have an impact that transcends the boundaries of the neighborhood in which they are located.
The heterogeneity in preferences within groups is embodied in the terms which arespecific. The dimension of this vector is equal to the number of neighborhood characteristics . We assume that its elements 1 … are linear functions of the characteristics of the households. Denoting the value of characteristic of household as , and postulate: where is the number of household characteristics. Note that we allow this function to be specific for the group to which the individual belongs. It follows then that we can write the second term in squared brackets on the right-hand side of (1) as: In the first stage of the estimation procedure we estimate the coefficients , jointly with the alternative-specific constants for each group . This provides a fairly flexible way for dealing with heterogeneity in household preferences.
Since our neighborhoods are relatively small, it is probable that their attractiveness is determined in part by amenities in other neighborhoods in the proximity. For instance, having the ancient Amsterdam city center within walking distance may still be experienced as an attractive property of a neighborhood, even though it does not belong itself to that center. We therefore allow for the possibility that some neighborhood characteristics are indicators of amenities in different, but close-by, neighborhoods. More specifically, if amenity k in neighborhood j affect the well-being of the inhabitants of surrounding neighborhoods, we define characteristic k' of a such a neighborhood j' as a distance-weighted average of amenity k in other neighborhoods: where is the set of neighborhoods in the proximity of j'. This 'potential' formulation was also employed by Van Duijn and Rouwendal (2013) who used the much larger municipality as their spatial unit of analysis. Clearly, , in (11) can be interpreted as a spatial lag of , with exponential weights. 13, 14 13 We only use the spatial lag for exogenous neighborhood characteristics. 14 The distance decay coefficient, φ, is set at 0.2. The function is therefore exponentially decreasing and weights are going towards zero when distance increases (weight < 0.1 if distance is 5 km). The cutoff point is set at 5km.

3
Data and descriptive statistics

Urban conservation areas
We estimate the model on microdata for the Amsterdam metropolitan area. The historic city center of Amsterdam and its canal belt are world-famous and have almost completely been listed as World Heritage by UNESCO in 2010. The center has many urban amenities like shops, restaurants, theatres, and has a cosmopolitan atmosphere which is regarded by many as very attractive. It is also a very popular residential area with high house prices. The ancient canal houses are still highly appreciated for residential purposes and only affordable by the rich.
In the Netherlands, the parts of urban areas that are regarded as being exceptionally If the preferences of the households coincide (at least to a substantial extend) with the expert judgments that are behind the listing as conservations areas, one should expect that they are more attractive than non-listed areas. If the households that choose to live there are predominantly rich, they become overrepresented in these areas and if the composition of the neighborhood population is one of its relevant attributes for location choice, this will have a further impact on the housing market equilibrium in the area. Ignoring this effect when it is present may lead to biased estimates of the coefficients of a sorting model as the omitted variable is probably correlated with neighborhood characteristics that are included like urban heritage and house prices. The main purpose of our empirical work is to disentangle the effects of the urban heritage per se and that of the related sorting by income. 15 In Dutch: Beschermde stadsgezichten. 16 In Dutch: Rijksdienst voor het Cultureel Erfgoed. It is part of the Ministry of Education, Culture and Science. 17 The National Park Service is a government office of the United States Department of the Interior. Note that the criteria of designation to become a conservation area differs between countries.

The Amsterdam metropolitan area
The study area consists of the municipality of Amsterdam and a number of surrounding municipalities. The spatial unit we use is that of the neighborhood 18 which is considerably smaller than a municipality.  Figure A.1 in Appendix A shows a map of the resulting areas. Below, we will still refer to these 85 choice alternatives as neighborhoods.
The boundaries of our (aggregated) neighborhoods do not always coincide with those of the conservation areas. We use the size of the area inside the boundaries of a neighborhood that belongs to a conservation area as an indicator of the amount of urban heritage in that neighborhood. For instance, the historic city center of Amsterdam is large, 679 hectare (6.79 km 2 ), 20 and contains the canals, many gabled houses and numerous other monuments. There are ten neighborhoods that cover a part of it. The one with the largest part of the Amsterdam historic city center is Nieuwmarkt en Lastage (1.03 km 2 ). Figure 3 shows the share of the high income households, defined as the top 25% high income households, per neighborhood in the study area. The left panel refers to all households and it suggests that the rich are underrepresented in the city center. Although this may give the impression that Amsterdam is more similar to American cities than one would have expected, the background is entirely different. Housing policy in the Netherlands has for a long time emphasized the construction of social housing, especially in the large cities. Even in the center of Amsterdam, the share of social housing in the total stock is large, and since this sector is only accessible to lowincome households, this has a substantial impact on the share of high income households in our study area. 21 It is therefore also of interest to look at the share of high income households in the owner-occupied sector only, and this is what panel b) of Figure 3 does. Now the picture is different, and the share of high income households in some of the central area neighborhoods is higher than the surrounding neighborhoods.
As will be clear from this discussion, the owner-occupied and rental sectors of the Dutch housing market differ much. While the market mechanism determines allocation in the former part, rent control and the associated excess demand and queuing are dominant in the latter. 22 We will take this difference into account when estimating the sorting model by treating rental and owner-

Data
To estimate the sorting model we need information about households and neighborhoods.
Administrative data on households was provided by Statistics Netherlands (CBS). The information refers to 2008 and contains approximately 600,000 households spread over the study area. We select households with at least one employed member and divide this population into three groups of equal size on the basis of income: high, middle and low income. For computational reasons, we take a random sample of these remaining 329,701 households based on the number of observations per choice alternative. For each choice alternative which contains 100 or more of each income group we take a random sample. This leaves us with a dataset in which we observe a minimum of 22 The allocation system for public housing is based on choice-based lettings. The private rental sector in the Netherlands is negligibly small.

Amsterdam
Historic city center 30 households from each income group per choice alternative. In total this sample contains 86,663 households, which is 26% of the population of households in which at least one member is employed. 23 Table 1 shows an overview of the household and neighborhood characteristics. Household characteristics include income, 24 housing tenure, composition -whether the individuals within the household are a couple and whether the household has children under the age of 18 -age and neighborhood of residence. An important limitation of the CBS data is that they do not contain information about education levels. This is well known to be an important variable as it appears that especially young higher educated people appreciate urban heritage and the associated urban amenities. Although income and education are in general strongly correlated, this is less true for the younger households. Neighborhood characteristics include, apart from the conservation area, the price of a standard house, the share of high income households in the population of the neighborhood, distance to the nearest concentration of 100,000 jobs, distance to the nearest intercity train station, size of nature and water, and the average age of the housing stock. The price of a standard house is a hedonic price index based on transaction data from the Dutch Association of Real Estate Agents (NVM). We construct the index on the basis of a straightforward log-linear hedonic price regression that includes neighborhood dummies. This index gives the price of an owner-occupied house with a given bundle of characteristics. This enables us to compare prices within neighborhoods. The average of these prices is around €230,000.

Estimation results
This section reports and discusses the results of the two estimation steps of the sorting model for neighborhoods in the Amsterdam area. Because allocation of housing in the rental sector deviates from the market mechanism, we cannot regard the observed choices in this segment of the market as revealing the household's preferences in the same way as this occurs in a market setting. We have therefore estimated separate sets of coefficients for the owner-occupied and rental sectors and will only report estimation results of the former below. 25

First step estimation results
The first step of our residential sorting model involves the estimation of a MNL model for each of the three income groups. The utility function has been extensively discussed in Section 2 (see Equations 1 and 5). The utility that household i of group g attaches to alternative j is as follows: We estimate the alternative-specific constants , 1. . , 1 … as well as the coefficients , , 1 … , 1 … , 1 … . As discussed before, we do not include the possibly endogenous variables housing price and the share of high income households in the MNL model.
Their impact will be absorbed by the 's. Also, separate sets of coefficients were estimated for alternatives referring to rental and owner-occupied housing.  Note: Parameter estimates are used to calculate the deviations from the mean indirect utility with all variables normalized to have mean zero. Significance at 90%, 95% and 99% level are, respectively, indicated as *, **, and ***. The first step results for renters can be found in Table D.1 of Appendix D. The regression results based on other specifications can be obtained from the author.

Second step estimation results
The second step of the residential sorting model consists of three 2SLS estimations based on These instruments and the results of LASSO are further discussed in Appendix C. 26 We initially limited the set of instruments first to these groups to see how the various suggestions for candidate instruments that have been made in the literature work out for our data. We also provide results for the complete set of candidate instruments. Since we have two endogenous variables, the price and the share of high income households, we carried out two instrument selection procedures. In general this resulted in the choice of different instruments. In the second stage we used the union of the sets of the selected instruments for the two variables as instruments.  which represents the alternative specific constants estimated in the first step of the sorting model. Standard errors are in parentheses. Significance at 90%, 95% and 99% level are, respectively, indicated as *, **, and ***. which represents the alternative specific constants estimated in the first step of the sorting model. Standard errors are in parentheses. Significance at 90%, 95% and 99% level are, respectively, indicated as *, **, and ***. which represents the alternative specific constants estimated in the first step of the sorting model. Standard errors are in parentheses. Significance at 90%, 95% and 99% level are, respectively, indicated as *, **, and ***.
The results in Column (6) of Table 3 show that, for all income groups, households appear to appreciate living close to high income households. The coefficient is significantly higher for high income households than for low income households. This result confirms that of  who found self-segregation based on income and ethnicity while all households prefer to live close to high income households. This suggests it is more likely that low income households are pushed out of gentrifying neighborhoods, rather than leaving them because they dislike their changing demographic composition.
The coefficients of conservation areas are positive and significant for all income groups.
Somewhat unexpected is that the coefficients between income groups are not significantly different from each other. A priory, we expected that the average high income household would have a higher preference to live close to conservation areas than the average medium or low income household. This result suggests that income might not affect the preference of living close to conservation areas, but, again, that high income households are likely to push out the other income groups because they can spend more on housing. However, results from the first step discussed in Section 4.1 suggest that other household characteristics, such as marital status and age, are also important in determining the preference of conservation areas.
We observe similar results for the spatial lag of conservation areas. The positive and significant coefficients show that conservation areas have a positive external effect on the attractiveness of locations just outside the neighborhoods in which they are located. This result is in line with Ahlfeldt et al. (2017) and Been et al. (2017). It shows that spatially lagged exogenous variables are important to include in location choice models. Households do not only enjoy the characteristics from their own neighborhood but also those from surrounding neighborhoods.
Only few of the other neighborhood characteristics that we include in our location choice model are also important for the location choice of households. The coefficients for the distance to the nearest concentration of 100,000 jobs and for the distance to the nearest intercity train station show a negative but insignificant effect. This is not too surprising as the Amsterdam area can be regarded as one labor market with good accessibility to public transport. For nature we find the expected positive signs of the estimated coefficients, but they are not significant. For water, we find inconclusive results for the main effect but a negative and highly significant effect for its spatial lag.
Both amenities are relatively abundantly present in the suburban areas constructed in the postwar period that are less appreciated and we therefore suspect that these variables pick up the presence of the associated negative amenities. 27 The age of the housing stock should be expected to absorb this effect to some extent. The coefficients have a positive sign, indicating that older neighborhoods are in general more attractive. This is in contrast to most US city centers where low income households seem to occupy old homes (Rosenthal, 2008).

Implications
In this section, we consider the implications of our estimation results reported in Section 4. The sorting model allows us to calculate the marginal willingness-to-pay (MWTP) of each type of household that we included in the analysis. These figures give a clear overview of the impact of different neighborhood characteristics on the location choice of heterogeneous households with respect to the price of a standard house. Furthermore, the sorting model also allows us to do a counterfactual analysis. The general equilibrium property, where housing demand has to match the housing supply, enables us to show how prices of a standard house change when we change one of the neighborhood or household characteristics. 28 We report changes in the price of a standard house for several areas within the Amsterdam metropolitan area if there were no differences in the availability of conservation areas among all neighborhoods in the Amsterdam area. 29

Marginal willingness-to-pay
The estimation results reported in Section 4 enable us to calculate the MWTP of heterogeneous households for neighborhood characteristics (see Appendix E for technical details). This allows us to compare the MWTP between the income groups and neighborhoods. Column 1 of Table 4 reports the average MWTP of the three income groups. For the high income group the figures represent the MWTP of the average high income household. Columns 2 through 4 report the deviations from the average of each income group for couples, households with children and age.
The mean MWTP -in terms of higher prices for a standard house -for an additional percentage point of high income households in their neighborhood is largest for high income households (€2,666 for a percentage point increase). As we discussed above, we did not allow for differences in the MWTP within the three income groups to prevent endogeneity issues. The differences between the groups that we find are small. Low income households are still willing to pay around €1,700 to live in a neighborhood with a 1% higher share of high income households.
The MWTP of an average high income household for living inside conservation areas is large and significant (€31,183 for an additional km 2 in conservation area). This implies that the average high income household, that has to pay €31,181 extra for a house in an area with an extra square kilometer of conservation area in their neighborhood, still reaches the same utility as when living in an otherwise equal neighborhood without this amenity. 30 The figures for the other income groups are somewhat lower but of the same order of magnitude. The figures are much higher than those reported in Van Duijn and Rouwendal (2013) who used much larger spatial units. Their results refer to living in the municipality Amsterdam instead of another municipality, whereas the results reported here allow us to differentiate between various neighborhoods within the municipality 28 Note that our model does only explain relative house prices. We therefore assume that the average house price remains unchanged in counterfactual simulations. 29 Similar interpretations are the change in prices if there would be no conservation areas in the Amsterdam area or if all households would not value conservation areas. 30 Note that the ceteris paribus condition involved refers also to the random part of the utility function.
Living not inside but close to conservation areas is also contributes significantly to wellbeing. The mean MWTP is highest for high income households (€9,637). This number can be interpreted as the value attached to an extra square kilometer of conservation area in surrounding neighborhoods -where the distance between adjacent neighborhoods is 1km (the average distance between the cores of neighborhoods in the Amsterdam area is somewhat lower than 1 km) -that the average household is willing to pay in terms of the price for a standard house.
The interpretation of the mean MWTP figures of the other neighborhood characteristics is similar. Deviations from the mean of each group are shown in Columns 2 through 4 of Table 4. Their interpretation can be clarified as follows: If a household belongs to the high income group, and is a couple with children under 18 while the age of the head of the household is equal to the average in this group, their MWTP for an additional square kilometer of conservation area is €3,963 (=-1986 + -1977) lower than the average high income household. This MWTP is around 10% lower than that of the average high income household.
These results show that there is large heterogeneity between different types of households.
On average, high income couples seem to prefer areas outside the historic center and further from the intercity station where the housing stock is younger compared to the average high income household. Middle income couples seem only to prefer to live further from the intercity station and in areas with water compared to the average middle income household. Low income couples seem to not to differ that much from the average low income households. High income households with young children prefer to live in neighborhoods outside conservation areas with more green, further away from the labor market but close to an intercity station compared to the average high income household. Middle and low income households with young children prefer younger neighborhoods compared to the average household of that group. The age component seems to play a large role as well. Note that we did not include retired households in the sample so the positive deviations from the mean for living in conservation areas is not surprising. Depending to the income group, households where the head of the household is 10 years older than the average, their MWTP for living in conservation areas increases with around €11,000 to €13,000 in terms of house prices (which is around a 35% increase from the mean MWTP). The same positive, but slightly smaller, numbers are found for the presence of nature and water. This suggests that older households who are still working have a higher preference to reside in areas within conservation areas with preferably a lot of green and water compared to younger households.
Some of the effects we find are not monotone in income. Low income households are sometimes more similar to high incomes households than medium income households. This may be related to our lack of information about education that was noted earlier and the attractiveness of Amsterdam for highly educated young people, who often do not (yet) have a high income. More general it may have to do with the attractiveness that a cosmopolitan city like Amsterdam has on specific groups of people who like the special atmosphere of the city although living there does not enable them to reach a particularly high income.

Impact on house prices
The sorting model suggests that house prices react to changes in amenities. The general equilibrium property of the sorting model allows us to estimate the changes in house prices when the number of neighborhood amenities change. We have carried out a counterfactual simulation in which we compute the price of a standard house that would prevail if there were no differences in the availability of conservation areas in each neighborhood in the Amsterdam area. We set conservation areas at zero in all neighborhoods. 31 Evidently, the spatial spillovers of conservation areas will also disappear but all other neighborhood characteristics remain unchanged. Note: The predicted house prices, taken into account the general equilibrium property of the sorting framework and the scaling, are reported as a counterfactual simulation that sets all urban heritage to zero.
We use the following procedure. First, we compute the new price equilibrium while keeping the shares of high income households constant in the demand equations. 32 At this price equilibrium 31 Results will not change if we set the variable at any other value, e.g. the average over all neighborhoods. The important thing is that there are no differences in this amenity between the neighborhoods. 32 The price equilibrium computed in this way is unique (see, for instance, Rouwendal, 1990).
the share of high income households will differ from what they were originally. The second step is to adjust these shares and re-compute the price equilibrium. This procedure is continued until the shares of high income households do no longer change. In each step the house prices are scaled so as to keep the average housing price equal to its value in the situation where urban heritage was present. 33 Stadsdeel Zuid is outside of the historic city center but, because of the large spatial spillover effects of the historic city center, house prices in this neighborhood will also decrease. The gap between the price of a standard house in Amsterdam and elsewhere decreases. However, the prices in the city center will still be higher than in most other areas in Amsterdam due to the attractive central location close to the Central station and the accessibility to the job market.
Columns 4 and 5 of Table 5 report the actual percentage of high income households and the counterfactual percentage of high income households after we set the area of conservation areas at zero in all neighborhoods. For many districts and municipalities the share of the high income households hardly changes. However, we find strong increases in the share of high income households in suburban municipalities where the initial share of high income households was already high, such as Abcoude and Landsmeer, and almost no changes in Diemen and Amstelveen, which are closer to the city of Amsterdam. This suggests that when urban heritage in the city center disappears, high income households are likely to move to 'posh' suburban municipalities. Our simulations thus confirm the impression that urban heritage contributes significantly to keep the rich households -especially those rich workers who are single, older and without young children as can be concluded from the MWTP figures -in the center of the metropolitan area. These results are 33 The procedure used by Bayer and McMillan (2012) is identical. 34 In Dutch: Wijk of stadsdeel. This is an administrative unit which is between the neighborhood and municipal level.
in line with the amenity-based theory of Brueckner et al. (1999) that high income households will only locate in city centers if city centers have a strong amenity advantage over suburban municipalities.

Conclusion
In this empirical paper, we use a logit-based sorting model to investigate the impact of urban heritage on demographic composition and house prices in a metropolitan region. Our analysis uses the sorting framework developed by BLP and  in which the price of a standard house is explained by the housing supply and demand equilibrium. To mitigate the concerns about endogeneity, we extended the sorting model by splitting the population into three groups, based on income, and estimated different logit models for each of these groups. Within each group we allow for heterogeneous preferences of different households. In particular, we allow for differences in the sensitivity for the housing price and the share of high income households between these groups, but not within them.
Another novel aspect of our analysis is the use of Belloni et al. (2012)'s selection procedure for optimal instruments. We find that all candidate instruments -that are nonlinear functions of the neighborhood characteristics and do not use any additional exogenous information -perform poorly. In contrast, the instruments suggested by Bayer and his co-authors in several papers perform very well. We show that these instruments use additional information that combines information about household characteristics -crossed with the neighborhood characteristics -and the nonlinearity of the model.
We find that all households attach a large value to the proximity of urban heritage and that differences in proximity to heritage within the municipality are important. We also find that households attach value to the share of high income households living in a neighborhood. All households prefer to live in neighborhoods with a higher share of high income households, which is in line with . We find that all income groups attach roughly the same value to urban heritage, but the value varies with other household characteristics. This does not confirm our a priori expectations based on Brueckner et al. (1999) -that high income households attach a larger value to the proximity of urban heritage than lower income households -but it unravels new insights in the value of urban heritage.
We use the estimated model to calculate marginal willingness to pay for neighborhood amenities and to compute counterfactual equilibrium prices for the situation in which urban heritage would be absent. We find a large willingness to pay for urban heritage that contributes to substantial differences in attractiveness and, as a consequence, in house prices between areas in the Amsterdam metropolitan region. Although we found relatively weak evidence for Brueckner et al.'s (1999) suggestion that especially high income households are attracted to the urban amenities, our simulations show that the disappearance of urban heritage would lead to a substantially more suburbanized location pattern of the high income households in our study region.

APPENDIX B. DERIVATIONS FOR BAYER'S INSTRUMENTS
The derivation presented here are related to the discussion in Section 2.3 of Weijschede-van der Straaten et al. (2017). To see how Bayer's instrument for the price works with a heterogeneous population, we start from (1) and write the individual choice probabilities as: Taking logs on both sides and rearranging gives: The second term on the right-hand side is the expected maximum utility of consumer , which we will denote as . Taking averages over the population, we find: Note that in (B.7) the unobserved characteristic does not occur because it has been set equal to 0, and that and ln have been given a superfix because their values will change when the unobserved characteristics are set equal to zero and prices are adjusted so as to re-establish a market equilibrium.

APPENDIX C. THE CANDIDATE INSTRUMENTS AND THEIR SELECTION
In this appendix we provide further information about the sets of candidate instruments and the outcomes of Belloni et al.'s (2012) selection procedure. Full results of first stage regressions are available from the authors upon request.

Squares and cross products
The set consists of the 9 squared exogenous variables and their 36 cross products. Selected instruments for the log price were the squares of variables 1 and 2, and the cross product of variables 2 and 9 and that of variables 6 and 9. For the share of high income households, only one candidate was selected: the cross product of variables 4 and 5.

Cubes and third order cross products
The set consists of the 9 cubed exogenous variables and their 84 third order cross products. For the price one candidate is selected: the cube of variable 2. For the share of high income households also only one candidate is selected: the cross product of exogenous variables 3, 4 and 6.

BLP-like instruments
The set consist of the 9 sums of the characteristics of neighborhoods at a distance of less than 5 km, less than 10 km and less than 15 km (27 candidate instruments), the 9 sums of the characteristics of neighborhoods at a distance of more than 5, 10 and 15 km (again 27 candidate instruments), and the 9 sums of the characteristics of the neighborhoods at a distance between 5 and 10, 10 and 15 and 5 and 15 instruments (27 'doughnut' instruments). Selected instruments for the price are the sum of characteristic 2 over neighborhoods at a distance of less than 5 km, the sum of characteristic 2 over neighborhoods at a distance of more than 5 km, and the sum of characteristic 1 over neighborhoods at a distance of more than 15 km. Selected instruments for the share of high income households are the sum of characteristic 7 over neighborhoods at a distance of less than 10 km and the sum of characteristic 7 over neighborhoods at a distance between 5 and 10 km.

Rank instruments
The instruments are the rankings of the neighborhoods based on each of the 9 characteristics. For the logged price the rank of characteristics 1, 2 and 9 were chosen, for the share of high income households those of characteristics 5 and 7.

Distance to an ideal neighborhood and number of close competitors
The ideal neighborhood is defined as one in which all characteristics have the most preferred value that exists in the set of neighborhoods. Distance measures were based on normalized values of the characteristics. Candidate instruments were: (i) the Euclidean distance to the ideal, (ii) Manhattan distance, (iii) the maximum of the difference to the ideal value for the 9 characteristics. The close competitors were defined as the number of neighborhoods at a Euclidean distance in characteristics space of at most 1, 2 or 3. In total this gives us 6 instruments based on normalized characteristics. For the log price the Manhattan distance was selected, for the share of high income households the maximum of the differences and the number of neighborhoods at a Euclidean distance of less than 3.

Bayer's instruments
The computation of these instruments has been discussed in the main text. The price instrument was selected for the log price, the other instrument for the share of high income households. In contrast to all sets of candidate instruments discussed above the first stages are very strong.

Other issues
When we consider all instruments jointly only Bayer's price instrument is selected for the log price and Bayer's share instrument for the share of high income households.
When we remove the two Bayer instruments from the set of candidates, the selected instruments for the price are the cross product of characteristics 2 and 9, the cross product of characteristics 1, 4 and 9, and the Manhattan distance to the ideal neighborhood. The selected instruments for the share of high income households are the cross product of characteristics 1, 4 and 6, the cross product of characteristics 1,5 and 8, and the sum of characteristic 7 for neighborhoods at a distance between 5 and 10 km.  (km2 Note: Parameter estimates are used to calculate the deviations from the mean indirect utility with all variables normalized to have mean zero. Significance at 90%, 95% and 99% level are, respectively, indicated as *, **, and ***. The regression results based on other specifications can be obtained from the author.