IDEAS home Printed from https://ideas.repec.org/a/inm/ormoor/v46y2021i3p970-1007.html

Data Exploration by Representative Region Selection: Axioms and Convergence

Author

Listed:
  • Alexander S. Estes

    (Department of Industrial and Systems Engineering, University of Minnesota, Minneapolis, Minnesota 55455)

  • Michael O. Ball

    (Robert H. Smith School of Business and Institute of Systems Research, University of Maryland, College Park, Maryland 20742)

  • David J. Lovell

    (Department of Civil and Environmental Engineering and Institute of Systems Research, University of Maryland, College Park, Maryland 20742)

Abstract

We present a new type of unsupervised learning problem in which we find a small set of representative regions that approximates a larger data set. These regions may be presented to a practitioner along with additional information in order to help the practitioner explore the data set. An advantage of this approach is that it does not rely on cluster structure of the data. We formally define this problem, and we present axioms that should be satisfied by functions that measure the quality of representatives. We provide a quality function that satisfies all of these axioms. Using this quality function, we formulate two optimization problems for finding representatives. We provide convergence results for a general class of methods, and we show that these results apply to several specific methods, including methods derived from the solution of the optimization problems formulated in this paper. We provide an example of how representative regions may be used to explore a data set.

Suggested Citation

  • Alexander S. Estes & Michael O. Ball & David J. Lovell, 2021. "Data Exploration by Representative Region Selection: Axioms and Convergence," Mathematics of Operations Research, INFORMS, vol. 46(3), pages 970-1007, August.
  • Handle: RePEc:inm:ormoor:v:46:y:2021:i:3:p:970-1007
    DOI: 10.1287/moor.2020.1115
    as

    Download full text from publisher

    File URL: http://dx.doi.org/10.1287/moor.2020.1115
    Download Restriction: no

    File URL: https://libkey.io/10.1287/moor.2020.1115?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Hassin, Refael & Levin, Asaf & Morad, Dana, 2003. "Lexicographic local search and the p-center problem," European Journal of Operational Research, Elsevier, vol. 151(2), pages 265-279, December.
    2. Kaluszka, M., 1998. "On the Devroye-Györfi methods of correcting density estimators," Statistics & Probability Letters, Elsevier, vol. 37(3), pages 249-257, March.
    3. Ingrid K. Glad & Nils Lid Hjort & Nikolai G. Ushakov, 2003. "Correction of Density Estimators that are not Densities," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 30(2), pages 415-427, June.
    4. Sourour Elloumi & Martine Labbé & Yves Pochet, 2004. "A New Formulation and Resolution Method for the p-Center Problem," INFORMS Journal on Computing, INFORMS, vol. 16(1), pages 84-94, February.
    5. Dorit S. Hochbaum & David B. Shmoys, 1985. "A Best Possible Heuristic for the k -Center Problem," Mathematics of Operations Research, INFORMS, vol. 10(2), pages 180-184, May.
    6. Burman, Prabir & Polonik, Wolfgang, 2009. "Multivariate mode hunting: Data analytic tools with measures of significance," Journal of Multivariate Analysis, Elsevier, vol. 100(6), pages 1198-1218, July.
    7. Caruso, C. & Colorni, A. & Aloi, L., 2003. "Dominant, an algorithm for the p-center problem," European Journal of Operational Research, Elsevier, vol. 149(1), pages 53-64, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Alexander Estes & David J. Lovell & Michael O. Ball, 2019. "Unsupervised prototype reduction for data exploration and an application to air traffic management initiatives," EURO Journal on Transportation and Logistics, Springer;EURO - The Association of European Operational Research Societies, vol. 8(5), pages 467-510, December.
    2. Chandra Ade Irawan & Said Salhi & Zvi Drezner, 2016. "Hybrid meta-heuristics with VNS and exact methods: application to large unconditional and conditional vertex $$p$$ p -centre problems," Journal of Heuristics, Springer, vol. 22(4), pages 507-537, August.
    3. Raphael Kramer & Manuel Iori & Thibaut Vidal, 2020. "Mathematical Models and Search Algorithms for the Capacitated p -Center Problem," INFORMS Journal on Computing, INFORMS, vol. 32(2), pages 444-460, April.
    4. José Alejandro Cornejo Acosta & Jesús García Díaz & Ricardo Menchaca-Méndez & Rolando Menchaca-Méndez, 2020. "Solving the Capacitated Vertex K-Center Problem through the Minimum Capacitated Dominating Set Problem," Mathematics, MDPI, vol. 8(9), pages 1-16, September.
    5. Nicolas Dupin & Frank Nielsen & El-Ghazali Talbi, 2021. "Unified Polynomial Dynamic Programming Algorithms for P-Center Variants in a 2D Pareto Front," Mathematics, MDPI, vol. 9(4), pages 1-30, February.
    6. Callaghan, Becky & Salhi, Said & Nagy, Gábor, 2017. "Speeding up the optimal method of Drezner for the p-centre problem in the plane," European Journal of Operational Research, Elsevier, vol. 257(3), pages 722-734.
    7. Yi Xu & Jigen Peng & Yinfeng Xu, 2018. "The mixed center location problem," Journal of Combinatorial Optimization, Springer, vol. 36(4), pages 1128-1144, November.
    8. Gaar, Elisabeth & Sinnl, Markus, 2022. "A scaleable projection‐based branch‐and‐cut algorithm for the p‐center problem," European Journal of Operational Research, Elsevier, vol. 303(1), pages 78-98.
    9. Tsuruta, Yasuhito, 2024. "Bias correction for kernel density estimation with spherical data," Journal of Multivariate Analysis, Elsevier, vol. 203(C).
    10. Patricia Domínguez-Marín & Stefan Nickel & Pierre Hansen & Nenad Mladenović, 2005. "Heuristic Procedures for Solving the Discrete Ordered Median Problem," Annals of Operations Research, Springer, vol. 136(1), pages 145-173, April.
    11. Olivier Thas, 2009. "Comments on: Goodness-of-fit tests in mixed modes," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 18(2), pages 260-264, August.
    12. Barbara Anthony & Vineet Goyal & Anupam Gupta & Viswanath Nagarajan, 2010. "A Plant Location Guide for the Unsure: Approximation Algorithms for Min-Max Location Problems," Mathematics of Operations Research, INFORMS, vol. 35(1), pages 79-101, February.
    13. Zhuqi Miao & Balabhaskar Balasundaram & Eduardo L. Pasiliao, 2014. "An exact algorithm for the maximum probabilistic clique problem," Journal of Combinatorial Optimization, Springer, vol. 28(1), pages 105-120, July.
    14. Renbo Huang & Guirong Zhuo & Lu Xiong & Shouyi Lu & Wei Tian, 2023. "A Review of Deep Learning-Based Vehicle Motion Prediction for Autonomous Driving," Sustainability, MDPI, vol. 15(20), pages 1-43, October.
    15. Christopher R. Genovese & Marco Perone-Pacifico & Isabella Verdinelli & Larry Wasserman, 2016. "Non-parametric inference for density modes," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 78(1), pages 99-126, January.
    16. Wei Ding & Ke Qiu, 2020. "Approximating the asymmetric p-center problem in parameterized complete digraphs," Journal of Combinatorial Optimization, Springer, vol. 40(1), pages 21-35, July.
    17. Jesus Garcia-Diaz & Jairo Sanchez-Hernandez & Ricardo Menchaca-Mendez & Rolando Menchaca-Mendez, 2017. "When a worse approximation factor gives better performance: a 3-approximation algorithm for the vertex k-center problem," Journal of Heuristics, Springer, vol. 23(5), pages 349-366, October.
    18. Yasuhito Tsuruta, 2026. "Wrapped flat-top kernel density estimation with circular data," Statistical Papers, Springer, vol. 67(2), pages 1-25, April.
    19. Alexandre Leblanc, 2010. "A bias-reduced approach to density estimation using Bernstein polynomials," Journal of Nonparametric Statistics, Taylor & Francis Journals, vol. 22(4), pages 459-475.
    20. Sergio Cabello, 2023. "Faster distance-based representative skyline and k-center along pareto front in the plane," Journal of Global Optimization, Springer, vol. 86(2), pages 441-466, June.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;

    JEL classification:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:inm:ormoor:v:46:y:2021:i:3:p:970-1007. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Asher (email available below). General contact details of provider: https://edirc.repec.org/data/inforea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.