IDEAS home Printed from https://ideas.repec.org/a/eee/ecosta/v2y2017icp73-80.html
   My bibliography  Save this article

Big Data in context and robustness against heterogeneity

Author

Listed:
  • Marron, J.S.

Abstract

The phrase Big Data has generated substantial current discussion within and outside of the field of statistics. Some personal observations about this phenomenon are discussed. One contribution is to put this set of ideas into a larger historical context. Another is to point out the related important concept of robustness against data heterogeneity, and some earlier methods which had that property, and also to discuss a number of interesting open problems motivated by this concept.

Suggested Citation

  • Marron, J.S., 2017. "Big Data in context and robustness against heterogeneity," Econometrics and Statistics, Elsevier, vol. 2(C), pages 73-80.
  • Handle: RePEc:eee:ecosta:v:2:y:2017:i:c:p:73-80
    DOI: 10.1016/j.ecosta.2016.06.001
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S2452306216300016
    Download Restriction: Full text for ScienceDirect subscribers only. Contains open access articles

    File URL: https://libkey.io/10.1016/j.ecosta.2016.06.001?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Peter Hall & J. S. Marron & Amnon Neeman, 2005. "Geometric representation of high dimension, low sample size data," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 67(3), pages 427-444, June.
    2. Marron, J.S. & Todd, Michael J. & Ahn, Jeongyoun, 2007. "Distance-Weighted Discrimination," Journal of the American Statistical Association, American Statistical Association, vol. 102, pages 1267-1271, December.
    3. John Shawe-Taylor & Keith Howker & Phil Gosset & Mark Hyland & Herman Verrelst & Yves Moreau & Christof Stoermann & Peter Burge, 2000. "Novel Techniques for Profiling and Fraud Detection in Mobile Telecommunications," World Scientific Book Chapters, in: Business Applications Of Neural Networks The State-of-the-Art of Real-World Applications, chapter 8, pages 113-139, World Scientific Publishing Co. Pte. Ltd..
    4. Jeffrey T Leek & John D Storey, 2007. "Capturing Heterogeneity in Gene Expression Studies by Surrogate Variable Analysis," PLOS Genetics, Public Library of Science, vol. 3(9), pages 1-12, September.
    5. Charles M. Perou & Therese Sørlie & Michael B. Eisen & Matt van de Rijn & Stefanie S. Jeffrey & Christian A. Rees & Jonathan R. Pollack & Douglas T. Ross & Hilde Johnsen & Lars A. Akslen & Øystein Flu, 2000. "Molecular portraits of human breast tumours," Nature, Nature, vol. 406(6797), pages 747-752, August.
    6. Makoto Aoshima & Kazuyoshi Yata, 2014. "A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(5), pages 983-1010, October.
    7. Xiaosun Lu & J. S. Marron & Perry Haaland, 2014. "Object-Oriented Data Analysis of Cell Images," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 109(506), pages 548-559, June.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yugo Nakayama & Kazuyoshi Yata & Makoto Aoshima, 2020. "Bias-corrected support vector machine with Gaussian kernel in high-dimension, low-sample-size settings," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 72(5), pages 1257-1286, October.
    2. Ishii, Aki & Yata, Kazuyoshi & Aoshima, Makoto, 2022. "Geometric classifiers for high-dimensional noisy data," Journal of Multivariate Analysis, Elsevier, vol. 188(C).
    3. Makoto Aoshima & Kazuyoshi Yata, 2019. "High-Dimensional Quadratic Classifiers in Non-sparse Settings," Methodology and Computing in Applied Probability, Springer, vol. 21(3), pages 663-682, September.
    4. Makoto Aoshima & Kazuyoshi Yata, 2019. "Distance-based classifier by data transformation for high-dimension, strongly spiked eigenvalue models," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 71(3), pages 473-503, June.
    5. Mark Reimers, 2010. "Making Informed Choices about Microarray Data Analysis," PLOS Computational Biology, Public Library of Science, vol. 6(5), pages 1-7, May.
    6. Niladri Roy Chowdhury & Dianne Cook & Heike Hofmann & Mahbubul Majumder & Eun-Kyung Lee & Amy Toth, 2015. "Using visual statistical inference to better understand random class separations in high dimension, low sample size data," Computational Statistics, Springer, vol. 30(2), pages 293-316, June.
    7. Makoto Aoshima & Kazuyoshi Yata, 2014. "A distance-based, misclassification rate adjusted classifier for multiclass, high-dimensional data," Annals of the Institute of Statistical Mathematics, Springer;The Institute of Statistical Mathematics, vol. 66(5), pages 983-1010, October.
    8. Sudhir Varma, 2020. "Blind estimation and correction of microarray batch effect," PLOS ONE, Public Library of Science, vol. 15(4), pages 1-15, April.
    9. Nakayama, Yugo & Yata, Kazuyoshi & Aoshima, Makoto, 2021. "Clustering by principal component analysis with Gaussian kernel in high-dimension, low-sample-size settings," Journal of Multivariate Analysis, Elsevier, vol. 185(C).
    10. Anil K. Ghosh & Munmun Biswas, 2016. "Distribution-free high-dimensional two-sample tests based on discriminating hyperplanes," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 25(3), pages 525-547, September.
    11. Kazuyoshi Yata & Makoto Aoshima, 2020. "Geometric consistency of principal component scores for high‐dimensional mixture models and its application," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 47(3), pages 899-921, September.
    12. Jung, Sungkyu, 2018. "Continuum directions for supervised dimension reduction," Computational Statistics & Data Analysis, Elsevier, vol. 125(C), pages 27-43.
    13. Bolivar-Cime, A. & Marron, J.S., 2013. "Comparison of binary discrimination methods for high dimension low sample size data," Journal of Multivariate Analysis, Elsevier, vol. 115(C), pages 108-121.
    14. Pedro Galeano & Daniel Peña, 2019. "Data science, big data and statistics," TEST: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 28(2), pages 289-329, June.
    15. Yang, Xi & Hoadley, Katherine A. & Hannig, Jan & Marron, J.S., 2023. "Jackstraw inference for AJIVE data integration," Computational Statistics & Data Analysis, Elsevier, vol. 180(C).
    16. Manish G & Anil Kumar Badana & Rama Rao Malla, 2017. "Emerging Diagnostic and Prognostic Biomarkers of Triple Negative Breast Cancer," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 1(3), pages 561-565, August.
    17. Jacob Elnaggar & Fern Tsien & Lucio Miele & Chindo Hicks & Clayton Yates & Melisa Davis, 2019. "An Integrative Genomics Approach for Associating Genetic Susceptibility with the Tumor Immune Microenvironment in Triple Negative Breast Cancer," Biomedical Journal of Scientific & Technical Research, Biomedical Research Network+, LLC, vol. 15(1), pages 1-12, February.
    18. Yata, Kazuyoshi & Aoshima, Makoto, 2013. "PCA consistency for the power spiked model in high-dimensional settings," Journal of Multivariate Analysis, Elsevier, vol. 122(C), pages 334-354.
    19. Jung, Sungkyu & Sen, Arusharka & Marron, J.S., 2012. "Boundary behavior in High Dimension, Low Sample Size asymptotics of PCA," Journal of Multivariate Analysis, Elsevier, vol. 109(C), pages 190-203.
    20. María Elena Martínez & Jonathan T Unkart & Li Tao & Candyce H Kroenke & Richard Schwab & Ian Komenaka & Scarlett Lin Gomez, 2017. "Prognostic significance of marital status in breast cancer survival: A population-based study," PLOS ONE, Public Library of Science, vol. 12(5), pages 1-14, May.

    More about this item

    Keywords

    Big data; Robustness against heterogeneity;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:ecosta:v:2:y:2017:i:c:p:73-80. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: https://www.journals.elsevier.com/econometrics-and-statistics .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.