An Empirical Test of A Geodemographic Segmentation System - Part I

Above all, this is about selling effectively and efficiently.  Who are the target consumers?  You can get start off with the entire human population on the planet Earth.  But in practice, marketing and sales tend to be done differently by country.  So let us focus on a single country such as the United States.  There are about 280 million persons living in the United States.  Are they all potential customers?  That depends on the product, on account of means, needs and legal requirement.  For example, a two-year-old is not going to purchase an automobile; as another example, the minimum legal age for alcohol consumption is 21 years.  

Assume now that the target consumers are adults (persons age 18 or over), and there are more than 210 million adults in the United States.  Are they equally likely to consume the product/service at issue?  In most cases, this universe is segmented into different groups on account of access, means, needs, benefits, attitudes and so on.  A smart marketer would create a segmentation of the universe and focus the marketing efforts to those segments most likely to consume the product/service for reasons of effectiveness and efficiency.  This particular paradigm is accepted as gospel.

In practice, it is difficult to classify individual persons into segments.  For example, there is no extensive accurate database that about all the adults in the United States.  While some information is known about some people, the database is far from complete or accurate.  Into this vacuum comes the notion of geodemographic segmentation.  While we don't have information for individual persons, we do know that these people have to live somewhere.  Every geographical location in the United States can be classified by state, county, city, town, census tract, census block group, census block or postal zipcode.  At each geographical level (such as the zipcode), aggregate information is available from the US Bureau of the Census on demographic variables such as median age, median household income, percent of home ownership, residential density, etc.  A geodemographic segmentation system classifies people on the basis of the aggregate demographic geographical location of their home addresses.

There are many geodemographic segmentation systems, depending on the list of demographic variables used and the level of geography.  The subject of this article is the zipcode-based PRIZM NE system constructed by Claritas.  The segmentation system divides the country into 66 mutually exclusive and exhaustive segments using age, income, presence of children, marital status, home ownership and urbanicity.  Details of the nomenclature and description of these 66 segments can be found at this page.   It may too unwieldy to work this large a number, but Claritas has two standard sets of groups: the social groups and the lifestage groups.

There are 14 social groups based upon three levels of affluence (low, moderate and high) and four levels of urbanization (urban (U), Suburban (S), second city (C) and Town & Country (T)).  There are 11 lifestage groups based upon three levels of affluence (low, moderate and high) and three age-and-children combinations ("Younger years", "Family life" and "Mature years").

Now anyone can construct any number of segmentation systems.  These systems will differ by the level of geography, list of demographic variables and statistical technique for segmentation.  At the end of the day, regardless of the segmentation system, there has to be some form of validation that this will satisfy the need for effectiveness and efficiency in marketing and communications.  This is the subject of this article.

How do we validate a segmentation system?  We cannot use the geodemographic variables used in the construction of the system because this is a self-fulfilling prophecy.  The validation can be done through running a small survey sample in which primary data are collected and then we see if those data are consistent with the character of the segmentation data.

We will now refer to the MARS 2004 study.  This is a mail survey of 21,054 adults in the United States conducted during the first quarter of 2004.  Each person has a mailing address with a zipcode that will permit us to make the unique assignment to the associated PRIZM NE segment.  Since the social and lifestage groups are based upon income, age, urbanicity and presence of children, we will check if our individual responses are consistent with the predictions of the PRIZM NE segmentation system.

First, we look at household income distributions within the PRIZM NE segments, as shown in the following chart.  For example, within social group S (suburban), we see that household income decreases down the affluence scale (1=elite suburbs, 2=afflunetials, 3 = middleburbs, 4=inner suburbs); as another example, within lifestage group F ("Family Life"), the same occurs (1=accumulated wealth, 2=young accumulators, 3=mainstream families, 4=sustaining families).

(source: MARS 2004)

Next, we look at the age of the head of household distributions within the PRIZM NE segments, as shown in the chart below.  Within the social groups, there appears does not appear to be any patterns either across group or within group.  After all, the construction of the social groups did not consider the age.  Within the lifestyle groups, if we look at the percent of persons 50+, there is a progression as we move from "Younger Years" (Y) through "Family Life" (F) to "Mature Years" (M).

(source: MARS 2004)

Next, we look at the county size distributions within the PRIZM NE segments.  County size is a standard way of characterizing the urban nature of geographical areas.  The United States is divided into more than 3,000 geographical areas known as counties of various physical sizes and population counts.  At one end of the scale, the "A" counties are those in the largest metropolitan areas such as New York City, Los Angeles, Chicago, Philadelphia, Washington DC, etc.  At the other end of the scale, the "D" counties are those have small populations and away from large metropolitan areas.  County size is in fact a geodemographic variable itself.  From the chart, we see that there is an obvious relationship with the social groups.  The lifestage groups do not consider urbanicity and that is why there does not seem to be any obvious patterns.

(source: MARS 2004)

Finally, we look at the presence of children within the PRIZM NE segments.  Within the social groups, there does not appear to be any patterns.  Within the lifestage groups, the highest incidences occur among the "Family Life" (F) segments, next within the "Younger Years" (Y) and lowest within the "Mature Years" (M) groups.

(source: MARS 2004)

In summary, we can say that the PRIZM NE geodemographic segmentation system behaves as claimed when compared to actual data obtained via individual-level surveys.  Throughout all this, we remember that geodemographics is a second-best solution to actual individual-level data.  For example, consider the segment labeled "Affluent Empty Nests."  By definition, when individual-level data are available, then these people must have no children living with them anymore.  But segment M1 has an incidence that is less than the overall average of 38%, but still much higher than zero.  The PRIZM NE can identify those zipcodes with lower than average incidences but these are still mixed neighborhoods in which some people can have children.  Still, accurate individual-level data is rarely ever available and the geodemographic segmentation systems will certainly improve effectiveness and efficiency over pure random guessing.

(posted by Roland Soong, 6/27/2004)

(Return to Zona Latina's Home Page)