IDEAS home Printed from https://ideas.repec.org/p/ant/wpaper/2017005.html
   My bibliography  Save this paper

A benchmarking study of classification techniques for behavioral data

Author

Listed:
  • DE CNUDDE, Sofie
  • MARTENS, David
  • EVGENIOU, Theodoros
  • PROVOST, Foster

Abstract

The predictive power in ubiquitous big, behavioral data has been emphasized by previous academic research. The ultra-high dimensional and sparse characteristics, however, pose significant challenges on state-of-the-art classification techniques. Moreover, no consensus exists regarding a feasible trade-off between classification performance and computational complexity. This work provides a contribution in this direction through a systematic benchmarking study. Forty-three fine-grained behavioral data sets are analyzed with 11 classification techniques. Statistical performance comparisons enriched with learning curve analyses demonstrate two important findings. Firstly, an inherent AUC-time trade-off becomes clear, making the choice for an appropriate classifier dependent on time restrictions and data set characteristics. Logistic regression achieves the best AUC, however in the worst amount of time. Also, L2 regularization proves better than sparse L1-regularization. An attractive trade-off is found in a similarity-based technique called PSN. Secondly, the results illustrate that significant value lies in collecting and analyzing even more data, both in the instance and in the feature dimension, contrasting findings on traditional data. The results of this study provide guidance for researchers and practitioners for the selection of appropriate classification techniques, sample sizes and data features, while also providing focus in scalable algorithm design in the face of large, behavioral data.

Suggested Citation

  • DE CNUDDE, Sofie & MARTENS, David & EVGENIOU, Theodoros & PROVOST, Foster, 2017. "A benchmarking study of classification techniques for behavioral data," Working Papers 2017005, University of Antwerp, Faculty of Applied Economics.
  • Handle: RePEc:ant:wpaper:2017005
    as

    Download full text from publisher

    File URL: https://repository.uantwerpen.be/docman/irua/f3979a/142910.pdf
    Download Restriction: no

    References listed on IDEAS

    as
    1. K. W. De Bock & D. Van Den Poel & S. Manigart, 2009. "Predicting web site audience demographics for web advertising targeting using multi-web site clickstream data," Working Papers of Faculty of Economics and Business Administration, Ghent University, Belgium 09/618, Ghent University, Faculty of Economics and Business Administration.
    2. repec:wsi:ijitdm:v:05:y:2006:i:04:n:s0219622006002258 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. DE CNUDDE, Sofie & MARTENS, David & PROVOST, Foster, 2018. "An exploratory study towards applying and demystifying deep learning classification on behavioral big data," Working Papers 2018002, University of Antwerp, Faculty of Applied Economics.

    More about this item

    NEP fields

    This paper has been announced in the following NEP Reports:

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ant:wpaper:2017005. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Joeri Nys). General contact details of provider: http://edirc.repec.org/data/ftufsbe.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.