IDEAS home Printed from
   My bibliography  Save this article

Massively Categorical Variables: Revealing the Information in Zip Codes


  • Thomas J. Steenburgh

    () (Yale University, New Haven, Connecticut 06520)

  • Andrew Ainslie

    () (University of California, Los Angeles, Los Angeles, California 90095)

  • Peder Hans Engebretson

    () (ClearInfo, Denver, Colorado)


We introduce the idea of a massively categorical variable, a variable such as zip code that takes on too many values to treat in the standard manner. We show how to use a massively categorical variable directly as an explanatory variable. As an application of this concept, we explore several of the issues that analysts confront when trying to develop a direct marketing campaign. We begin by pointing out that the data contained in many of the common sources are masked through aggregation in order to protect consumer privacy. This creates some difficulty when trying to construct models of individual level behavior. We show how to take full advantage of such data through a hierarchical Bayesian variance components (HBVC) model. The flexibility of our approach allows us to combine several sources of information, some of which may not be aggregated, in a coherent manner. We show that the conventional modeling practice understates the uncertainty with regard to its parameter values. We explore an array of financial considerations, including ones in which the marginal benefit is non-linear, to make robust model comparisons. To implement the decision rules that determine the optimal number of prospects to contact, we develop an algorithm based on the Monte Carlo Markov chain output from parameter estimation. We conclude the analysis by demonstrating how to determine an organization's willingness to pay for additional data.

Suggested Citation

  • Thomas J. Steenburgh & Andrew Ainslie & Peder Hans Engebretson, 2003. "Massively Categorical Variables: Revealing the Information in Zip Codes," Marketing Science, INFORMS, vol. 22(1), pages 40-57, August.
  • Handle: RePEc:inm:ormksc:v:22:y:2003:i:1:p:40-57

    Download full text from publisher

    File URL:
    Download Restriction: no

    References listed on IDEAS

    1. A. Gelman & Y. Goegebeur & F. Tuerlinckx & I. Van Mechelen, 2000. "Diagnostic checks for discrete data regression models using posterior predictive simulations," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 49(2), pages 247-268.
    2. Arthur Hsu & Ronald T. Wilcox, 2000. "Stochastic Prediction in Multinomial Logit Models," Management Science, INFORMS, vol. 46(8), pages 1137-1144, August.
    3. Peter E. Rossi & Robert E. McCulloch & Greg M. Allenby, 1996. "The Value of Purchase History Data in Target Marketing," Marketing Science, INFORMS, vol. 15(4), pages 321-340.
    4. Jan Roelf Bult & Tom Wansbeek, 1995. "Optimal Selection for Direct Mail," Marketing Science, INFORMS, vol. 14(4), pages 378-394.
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. M. Ballings & D. Van Den Poel & E. Verhagen, 2013. "Evaluating the Added Value of Pictorial Data for Customer Churn Prediction," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 13/869, Ghent University, Faculty of Economics and Business Administration.
    2. repec:eee:proeco:v:191:y:2017:i:c:p:85-96 is not listed on IDEAS
    3. van Dijk, Bram & Paap, Richard, 2008. "Explaining individual response using aggregated data," Journal of Econometrics, Elsevier, vol. 146(1), pages 1-9, September.
    4. P. Baecke & D. Van Den Poel, 2012. "Including Spatial Interdependence in Customer Acquisition Models: a Cross-Category Comparison," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 12/788, Ghent University, Faculty of Economics and Business Administration.
    5. P. Baecke & D. Van Den Poel, 2010. "Improving purchasing behavior predictions by data augmentation with situational variables," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 10/658, Ghent University, Faculty of Economics and Business Administration.
    6. repec:eee:ijrema:v:34:y:2017:i:3:p:593-603 is not listed on IDEAS
    7. P. Baecke & D. Van Den Poel, 2012. "Improving Customer Acquisition Models by Incorporating Spatial Autocorrelation at Different Levels of Granularity," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 12/819, Ghent University, Faculty of Economics and Business Administration.
    8. André Bonfrer & Xavier Drèze, 2009. "Real-Time Evaluation of E-mail Campaign Performance," Marketing Science, INFORMS, vol. 28(2), pages 251-263, 03-04.
    9. Matthew Nagler, 2006. "An exploratory analysis of the determinants of cooperative advertising participation rates," Marketing Letters, Springer, vol. 17(2), pages 91-102, April.
    10. Jeonghye Choi & David R. Bell & Leonard M. Lodish, 2012. "Traditional and IS-Enabled Customer Acquisition on the Internet," Management Science, INFORMS, vol. 58(4), pages 754-769, April.
    11. M. Ballings & D. Van Den Poel, 2012. "The Relevant Length of Customer Event History for Churn Prediction: How long is long enough?," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 12/804, Ghent University, Faculty of Economics and Business Administration.
    12. Steven M. Shugan, 2003. "Editorial: Compartmentalized Reviews and Other Initiatives: Should Marketing Scientists Review Manuscripts in Consumer Behavior?," Marketing Science, INFORMS, vol. 22(2), pages 151-160.
    13. repec:eee:ijrema:v:29:y:2012:i:4:p:337-345 is not listed on IDEAS
    14. Steven M. Shugan, 2004. "The Impact of Advancing Technology on Marketing and Academic Research," Marketing Science, INFORMS, vol. 23(4), pages 469-475.
    15. Andrew Ainslie & Xavier Drèze & Fred Zufryden, 2005. "Modeling Movie Life Cycles and Market Share," Marketing Science, INFORMS, vol. 24(3), pages 508-517, November.
    16. Kelvyn Jones & Dewi Owen & Ron Johnston & James Forrest & David Manley, 2015. "Modelling the occupational assimilation of immigrants by ancestry, age group and generational differences in Australia: a random effects approach to a large table of counts," Quality & Quantity: International Journal of Methodology, Springer, vol. 49(6), pages 2595-2615, November.
    17. Ron Borzekowski & Raphael Thomadsen & Charles Taragin, 2009. "Competition and price discrimination in the market for mailing lists," Quantitative Marketing and Economics (QME), Springer, vol. 7(2), pages 147-179, June.


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormksc:v:22:y:2003:i:1:p:40-57. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Mirko Janc). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.