Harmonizing Data Across Geographic Scopes/Layers

المشرف العام · 9 أغسطس 2015

A common problem in the analysis of spatial data is deciding how to harmonize data from different levels of geographic resolution. For example, suppose Municipality $X$ crosses a county border, so that a portion of $X$ lies in County $A$, while another lies in County $B$. If the municipality is my unit of analysis, but detailed socioeconomic data exist only at the county level, how do I map that data to $X$? The two approaches that I have seen are land area-based proportional allocation and plurality allocation. I find both of these unsatisfying, so this question is an effort to discover other approaches.

Land Area-Based Proportional Allocation

In this scheme, let $p_A$ be the proportion of County $A$ that is taken up by $X$, and $p_B$ be the parallel proportion of County $B$ taken up by $X$. Assume that structures and people are uniformly distributed across both counties. If $n_A$ and $n_B$ are the population counts in County $A$ and County $B$, respectively, the population of $X$ ($n_X$) is given by $p_A n_A + p_B n_B$.

This approach has a nice intuitive feel insofar as it views $X$ as a composite construct with characteristics from both counties. However, the uniform distribution assumption is a heroic one, and if we want to speak about anything other than simple counts (e.g. the distribution of income) even stronger assumptions are needed. (One could envision an assumption of randomly distributed incomes based on the distributional information in each county, and then folding the distributions of $A$ and $B$ together.)

Plurality Allocation

In this scheme, let $p_{X \in A}$ and $p_{X \in B}$ be the proportions of $X$ that reside in County $A$ and County $B$, respectively. Suppose $p_{X \in A}>p_{X \in B}$, which in this case means the majority of $X$ resides in County $A$ (a plurality suffices when more than two counties are involved). Municipality $X$, then, takes on all the characteristics of County $A$, and County $B$ is forsaken.

This approach avoids the cross-tab pitfalls of the proportional allocation approach. It does, however, fail a face validity test when $p_{X \in A}$ is low. For example, if $p_{X \in A} = 0.51$, do we really believe that $X$ is appropriately represented with County $A$'s characteristics? Furthermore, if I am simulating an activity that relies on this method, how does one true up the simulated aggregates at the state level?

Ideas?

There must be a better way. Any suggestions?

أكثر...

Harmonizing Data Across Geographic Scopes/Layers

المشرف العام

Administrator