IDEAS home Printed from https://ideas.repec.org/a/spr/infosf/v21y2019i1d10.1007_s10796-018-9850-y.html
   My bibliography  Save this article

Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks

Author

Listed:
  • Eric Golinko

    (Florida Atlantic University)

  • Xingquan Zhu

    (Florida Atlantic University)

Abstract

Feature embedding is an emerging research area which intends to transform features from the original space into a new space to support effective learning. Many feature embedding algorithms exist, but they often suffer from several major drawbacks, including (1) only handle single feature types, or users have to clearly separate features into different feature views and supply such information for feature embedding learning; (2) designed for either supervised or unsupervised learning tasks, but not for both; and (3) feature embedding for new out-of-training samples have to be obtained through a retraining phase, therefore unsuitable for online learning tasks. In this paper, we propose a generalized feature embedding algorithm, GEL, for both supervised, unsupervised, and online learning tasks. GEL learns feature embedding from any type of data or data with mixed feature types. For supervised learning tasks with class label information, GEL leverages a Class Partitioned Instance Representation (CPIR) process to arrange instances, based on their labels, as a dense binary representation via row and feature vectors for feature embedding learning. If class labels are unavailable, CPIR is naturally degenerated and treats all instances as one class. Based on the CPIR representation, GEL uses eigenvector decomposition to convert the proximity matrix into a low-dimensional space. For new out-of-training samples, their low-dimensional representation are derived through a direct conversion without a retraining phase. The learned numerical embedding features can be directly used to represent instances for effective learning. Experiments and comparisons on 28 datasets, including categorical, numerical, and ordinal features, demonstrate that embedding features learned from GEL can effectively represent the original instances for clustering, classification, and online learning.

Suggested Citation

  • Eric Golinko & Xingquan Zhu, 2019. "Generalized Feature Embedding for Supervised, Unsupervised, and Online Learning Tasks," Information Systems Frontiers, Springer, vol. 21(1), pages 125-142, February.
  • Handle: RePEc:spr:infosf:v:21:y:2019:i:1:d:10.1007_s10796-018-9850-y
    DOI: 10.1007/s10796-018-9850-y
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10796-018-9850-y
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10796-018-9850-y?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Harry Crane, 2015. "Clustering from Categorical Data Sequences," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 110(510), pages 810-823, June.
    2. Chao Chen & Mei-Ling Shyu & Shu-Ching Chen, 2016. "Weighted subspace modeling for semantic concept retrieval using gaussian mixture models," Information Systems Frontiers, Springer, vol. 18(5), pages 877-889, October.
    3. Roy Gelbard, 2013. "“Padding” bitmaps to support similarity and mining," Information Systems Frontiers, Springer, vol. 15(1), pages 99-110, March.
    4. Nenadic, Oleg & Greenacre, Michael, 2007. "Correspondence Analysis in R, with Two- and Three-dimensional Graphics: The ca Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 20(i03).
    5. Bates, Douglas & Eddelbuettel, Dirk, 2013. "Fast and Elegant Numerical Linear Algebra Using the RcppEigen Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 52(i05).
    6. Lixin Shen & Hong Wang & Li Da Xu & Xue Ma & Sohail Chaudhry & Wu He, 2016. "Identity management based on PCA and SVM," Information Systems Frontiers, Springer, vol. 18(4), pages 711-716, August.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Thouraya Bouabana-Tebibel & Stuart H. Rubin & Lydia Bouzar-Benlabiod, 2019. "Guest Editorial: Recent Trends in Reuse and Integration," Information Systems Frontiers, Springer, vol. 21(1), pages 1-3, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Michael Greenacre, 2012. "Fuzzy coding in constrained ordinations," Economics Working Papers 1325, Department of Economics and Business, Universitat Pompeu Fabra.
    2. Michael Greenacre, 2008. "Correspondence analysis of raw data," Economics Working Papers 1112, Department of Economics and Business, Universitat Pompeu Fabra, revised Jul 2009.
    3. Pötscher, Benedikt M. & Preinerstorfer, David, 2023. "How Reliable Are Bootstrap-Based Heteroskedasticity Robust Tests?," Econometric Theory, Cambridge University Press, vol. 39(4), pages 789-847, August.
    4. Martinetti, Davide & Geniaux, Ghislain, 2017. "Approximate likelihood estimation of spatial probit models," Regional Science and Urban Economics, Elsevier, vol. 64(C), pages 30-45.
    5. Thouraya Bouabana-Tebibel & Stuart H. Rubin, 2016. "Towards common reusable semantics," Information Systems Frontiers, Springer, vol. 18(5), pages 819-823, October.
    6. Aaron T L Lun & Hervé Pagès & Mike L Smith, 2018. "beachmat: A Bioconductor C++ API for accessing high-throughput biological data from a variety of R matrix types," PLOS Computational Biology, Public Library of Science, vol. 14(5), pages 1-15, May.
    7. Greenacre, Michael, 2009. "Power transformations in correspondence analysis," Computational Statistics & Data Analysis, Elsevier, vol. 53(8), pages 3107-3116, June.
    8. Gangl, Katharina & Kastlunger, Barbara & Kirchler, Erich & Voracek, Martin, 2012. "Confidence in the economy in times of crisis: Social representations of experts and laypeople," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 41(5), pages 603-614.
    9. Martina Zámková & Martin Prokop, 2014. "Comparison of Consumer Behavior of Slovaks and Czechs in the Market of Organic Products by Using Correspondence Analysis," Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, Mendel University Press, vol. 62(4), pages 783-795.
    10. Mengyue Wang & Xin Li & Patrick Y. K. Chau, 2021. "Leveraging Image-Processing Techniques for Empirical Research: Feasibility and Reliability in Online Shopping Context," Information Systems Frontiers, Springer, vol. 23(3), pages 607-626, June.
    11. C. J. Torrecilla-Salinas & O. Troyer & M. J. Escalona & M. Mejías, 2019. "A Delphi-based expert judgment method applied to the validation of a mature Agile framework for Web development projects," Information Technology and Management, Springer, vol. 20(1), pages 9-40, March.
    12. Claudia Diamantini & Paolo Lo Giudice & Domenico Potena & Emanuele Storti & Domenico Ursino, 2021. "An Approach to Extracting Topic-guided Views from the Sources of a Data Lake," Information Systems Frontiers, Springer, vol. 23(1), pages 243-262, February.
    13. Harold Doran, 2023. "A Collection of Numerical Recipes Useful for Building Scalable Psychometric Applications," Journal of Educational and Behavioral Statistics, , vol. 48(1), pages 37-69, February.
    14. James Flamino & Alessandro Galeazzi & Stuart Feldman & Michael W. Macy & Brendan Cross & Zhenkun Zhou & Matteo Serafino & Alexandre Bovet & Hernán A. Makse & Boleslaw K. Szymanski, 2023. "Political polarization of news media and influencers on Twitter in the 2016 and 2020 US presidential elections," Nature Human Behaviour, Nature, vol. 7(6), pages 904-916, June.
    15. Monroy-Gómez-Franco, Luis, 2023. "Shades of social mobility: Colorism, ethnic origin and intergenerational social mobility," The Quarterly Review of Economics and Finance, Elsevier, vol. 90(C), pages 247-266.
    16. Huynh Evertsen, Phuc & Rasmussen, Einar & Nenadic, Oleg, 2022. "Commercializing circular economy innovations: A taxonomy of academic spin-offs," Technological Forecasting and Social Change, Elsevier, vol. 185(C).
    17. Borong Zou & Hong Wang & Hui Li & Ling Li & Yuhan Zhao, 2022. "Predicting stock index movement using twin support vector machine as an integral part of enterprise system," Systems Research and Behavioral Science, Wiley Blackwell, vol. 39(3), pages 428-439, May.
    18. Peter Ingwersen & Antonio Eleazar Serrano-López, 2018. "Smart city research 1990–2016," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(2), pages 1205-1236, November.
    19. Oliver Wieczorek & Melanie Malzahn, 2024. "Exploring an extinct society through the lens of Habitus-Field theory and the Tocharian text corpus," Palgrave Communications, Palgrave Macmillan, vol. 11(1), pages 1-10, December.
    20. Huichen Gao & Shijuan Wang, 2022. "The Intellectual Structure of Research on Rural-to-Urban Migrants: A Bibliometric Analysis," IJERPH, MDPI, vol. 19(15), pages 1-19, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:infosf:v:21:y:2019:i:1:d:10.1007_s10796-018-9850-y. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.