IDEAS home Printed from https://ideas.repec.org/a/inm/orijds/v3y2024i1p28-48.html
   My bibliography  Save this article

Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms

Author

Listed:
  • Siong Thye Goh

    (Lee Kong Chian School of Business, Singapore Management University, Singapore 178899)

  • Lesia Semenova

    (Department of Computer Science, Duke University, Durham, North Carolina 27708)

  • Cynthia Rudin

    (Department of Computer Science, Duke University, Durham, North Carolina 27708)

Abstract

We present sparse tree-based and list-based density estimation methods for binary/categorical data. Our density estimation models are higher-dimensional analogies to variable bin-width histograms. In each leaf of the tree (or list), the density is constant, similar to the flat density within the bin of a histogram. Histograms, however, cannot easily be visualized in more than two dimensions, whereas our models can. The accuracy of histograms fades as dimensions increase, whereas our models have priors that help with generalization. Our models are sparse, unlike high-dimensional fixed-bin histograms. We present three generative modeling methods, where the first one allows the user to specify the preferred number of leaves in the tree within a Bayesian prior. The second method allows the user to specify the preferred number of branches within the prior. The third method returns density lists (rather than trees) and allows the user to specify the preferred number of rules and the length of rules within the prior. The new approaches often yield a better balance between sparsity and accuracy of density estimates than other methods for this task. We present an application to crime analysis, where we estimate how unusual each type of modus operandi is for a house break-in.

Suggested Citation

  • Siong Thye Goh & Lesia Semenova & Cynthia Rudin, 2024. "Sparse Density Trees and Lists: An Interpretable Alternative to High-Dimensional Histograms," INFORMS Joural on Data Science, INFORMS, vol. 3(1), pages 28-48, April.
  • Handle: RePEc:inm:orijds:v:3:y:2024:i:1:p:28-48
    DOI: 10.1287/ijds.2021.0001
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/ijds.2021.0001
    Download Restriction: no

    File URL: https://libkey.io/10.1287/ijds.2021.0001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Cattaneo, Matias D & Jansson, Michael & Ma, Xinwei, 2020. "Simple Local Polynomial Density Estimators," Department of Economics, Working Paper Series qt9vt997qn, Department of Economics, Institute for Business and Economic Research, UC Berkeley.
    2. Kaiyuan Wu & Wei Hou & Hongbo Yang, 2018. "Density estimation via the random forest method," Communications in Statistics - Theory and Methods, Taylor & Francis Journals, vol. 47(4), pages 877-889, February.
    3. Cattaneo, Matias D & Jansson, Michael & Ma, Xinwei, 2020. "Simple Local Polynomial Density Estimators," University of California at San Diego, Economics Working Paper Series qt9vt997qn, Department of Economics, UC San Diego.
    4. Matias D. Cattaneo & Michael Jansson & Xinwei Ma, 2020. "Simple Local Polynomial Density Estimators," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(531), pages 1449-1455, July.
    5. Luo Lu & Hui Jiang & Wing H. Wong, 2013. "Multivariate Density Estimation by Bayesian Sequential Partitioning," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 108(504), pages 1402-1410, December.
    6. Tao Chen & Julian Morris & Elaine Martin, 2006. "Probability density estimation via an infinite Gaussian mixture model: application to statistical process monitoring," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 55(5), pages 699-715, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Diogo G.C. Britto & Caio de Holanda & Alexandre Fonseca & Breno Sampaio, 2025. "Parental leave, family, and firms," WIDER Working Paper Series wp-2025-71, World Institute for Development Economic Research (UNU-WIDER).
    2. Luis R. Martinez & Jonas Jessen & Guo Xu, 2023. "A Glimpse of Freedom: Allied Occupation and Political Resistance in East Germany," American Economic Journal: Applied Economics, American Economic Association, vol. 15(1), pages 68-106, January.
    3. Jessen, Jonas & Jessen, Robin & Galecka-Burdziak, Ewa & Góra, Marek & Kluve, Jochen, 2023. "The Micro and Macro Effects of Changes in the Potential Benefit Duration," IZA Discussion Papers 15978, Institute of Labor Economics (IZA).
    4. Aaron Albert & Nathan Wozny, 2024. "The Impact of Academic Probation: Do Intensive Interventions Help?," Journal of Human Resources, University of Wisconsin Press, vol. 59(3), pages 852-878.
    5. KAREKURVE-RAMACHANDRA, VARUN & Singh, Sudhir & Stommes, Drew, 2024. "Political Exit: The Unintended Effects of Electoral Rules," OSF Preprints d7xsk_v1, Center for Open Science.
    6. Serena Canaan & Pierre Mouganie & Peng Zhang, 2025. "The long‐run educational benefits of high‐achieving classrooms," Journal of Policy Analysis and Management, John Wiley & Sons, Ltd., vol. 44(4), pages 1347-1373, September.
    7. Johnsen, Julian V. & Willén, Alexander, 2022. "The effect of negative income shocks on pensioners," Labour Economics, Elsevier, vol. 76(C).
    8. Brodeur, Abel & Cook, Nikolai & Heyes, Anthony, 2022. "We Need to Talk about Mechanical Turk: What 22,989 Hypothesis Tests Tell Us about Publication Bias and p-Hacking in Online Experiments," IZA Discussion Papers 15478, Institute of Labor Economics (IZA).
    9. Fernando Alexandre & Miguel Chaves & Miguel Portela, 2025. "Investment grants and firms’ productivity: how effective is a grant booster shot?," Small Business Economics, Springer, vol. 64(4), pages 1601-1641, April.
    10. Matthew Kubic, 2025. "The benefits of article 11 pro forma disclosure," Review of Accounting Studies, Springer, vol. 30(3), pages 2768-2821, September.
    11. Houmark, Mikkel Aagaard & Jørgensen, Cecilie Marie Løchte & Kristiansen, Ida Lykke & Gensowski, Miriam, 2024. "Effects of extending paid parental leave on children’s socio-emotional skills and well-being in adolescence," European Economic Review, Elsevier, vol. 170(C).
    12. Guida Ayza Estopa, 2024. "Return-to-work policies for disability insurance recipients: The role of financial incentives," French Stata Users' Group Meetings 2024 17, Stata Users Group.
    13. Federico Boffa & Vincenzo Mollisi & Giacomo A. M. Ponzetto, 2023. "Do incompetent politicians breed populist voters? Evidence from Italian municipalities," Economics Working Papers 1861, Department of Economics and Business, Universitat Pompeu Fabra.
    14. Gonzalez-Eiras, Martín & Sanz, Carlos, 2021. "Women’s representation in politics: The effect of electoral systems," Journal of Public Economics, Elsevier, vol. 198(C).
    15. repec:irs:cepswp:2024-01 is not listed on IDEAS
    16. Gurgand, Marc & Lorenceau, Adrien & Mélonio, Thomas, 2023. "Student loans: Credit constraints and higher education in South Africa," Journal of Development Economics, Elsevier, vol. 161(C).
    17. Seungho Choi & Raphael Jonghyeon & Simon Xu, 2023. "The Strategic Use of Corporate Philanthropy: Evidence from Bank Donations," Review of Finance, European Finance Association, vol. 27(5), pages 1883-1930.
    18. De Benedetto, Marco Alberto & De Paola, Maria & Scoppa, Vincenzo & Smirnova, Janna, 2025. "Erasmus program and labor market outcomes: Evidence from a fuzzy regression discontinuity design," Labour Economics, Elsevier, vol. 93(C).
    19. Babii, Andrii & Kumar, Rohit, 2023. "Isotonic regression discontinuity designs," Journal of Econometrics, Elsevier, vol. 234(2), pages 371-393.
    20. Diogo G. C. Britto & Paolo Pinotti & Breno Sampaio, 2022. "The Effect of Job Loss and Unemployment Insurance on Crime in Brazil," Econometrica, Econometric Society, vol. 90(4), pages 1393-1423, July.
    21. Albanese, Andrea & Cockx, Bart & Dejemeppe, Muriel, 2024. "Long-term effects of hiring subsidies for low-educated unemployed youths," Journal of Public Economics, Elsevier, vol. 235(C).

    More about this item

    Keywords

    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:orijds:v:3:y:2024:i:1:p:28-48. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.