IDEAS home Printed from https://ideas.repec.org/h/spr/sprchp/978-981-10-2164-0_1.html
   My bibliography  Save this book chapter

New Theory of Discriminant Analysis

In: New Theory of Discriminant Analysis After R. Fisher

Author

Listed:
  • Shuichi Shinmura

    (Seikei University, Faculty of Economics)

Abstract

A new theory of discriminant analysis “the Theory” after R. FisherFisher, R. is explained. There are five serious problems with discriminant analysis. I completely solve these problems through five mathematical programming-based linear discriminant functions (MP-based LDFs). First, I develop an optimal linear discriminant function using integer programming (IP-OLDF) based on a minimum number of misclassifications (minimum NM (MNM)) criterion. We consider discriminating the data with n-cases by p-variables. The case x i = (x 1i + … + x pi ) is p-vector (i = 1, …, n). Because I formulate IP-OLDF in the p-dimensional discriminant coefficient space b, n-linear hyperplanes (x i × b + 1 = 0) divide the coefficient space into finite convex polyhedrons (CPs). LDFs that correspond to a CP interior point misclassify the same k-cases, and this clearly reveals the relationship between NM and the discriminant coefficient. Because there are finite CPs in the discriminant coefficient space, we should select the CP interior point with MNM. We call this CP, “optimal CP (OCP).” MNM decreases monotonously (MNM p ≥ NMN(p+1)). Therefore, if MNM p = 0, all MNMs of the models, including these p-variables, are zero. If data are general positions, IP-OLDF searches for the vertex of true OCP. However, if data are not general positions, such as Student data, IP-OLDF might not search for the vertex of true OCP. Therefore, I develop Revised IP-OLDF that searches for the interior point of true OCP directly. If LDF corresponds to the CP vertex or edge, there are over p-cases on the discriminant hyperplane and LDF cannot discriminate these cases correctly (Problem 1). This fact means that NM might not be true. Only Revised IP-OLDF is free from Problem 1. When IP-OLDF discriminates Swiss banknote data that have six variables, MNM of the two-variable model (X4, X6) is zero. Therefore, 16 models, including (X4, X6), are zero, and 47 models are not linearly separable. Although a hard-margin SVM (H-SVM) indicates linearly separable data (LSD) clearly, there are few types of research on LSD discrimination. Most statisticians erroneously believe that the purpose of discrimination is to discriminate overlapping data, not LSD. All LDFs, with the exception of H-SVM and Revised IP-OLDF, might not discriminate LSD correctly (Problem 2). Moreover, such LDFs cannot determine whether the data overlap or LSD because MNM = 0 means LSD and MNM > 0 means overlap. I demonstrate that Fisher’s LDF and a quadratic discriminant function (QDF) cannot judge the pass/fail determination using examination scores and that the 18 error rates of both discriminant functions are very high. I explain the defect of the generalized inverse matrix technique and that QDF misclassifies all cases of class 1 to class 2 for a particular case (Problem 3) using Japanese-automobile data. Fisher never formulated an equation for standard errors (SEs) of the error rate and discriminant coefficient (Problem 4). The k-fold cross-validation for small sample method (Method 1) solves Problem 4. This offers the error rate means, M1 and M2, from the training and validation samples in addition to the 95 % confidence interval (CI) of the error rate and coefficient. I propose a simple and powerful model selection procedure to select the best model with minimum M2 instead of the leave-one-out (LOO) procedure. The best models of Revised IP-OLDF are better than seven other LDFs. For more than ten years, many researchers have struggled to analyze microarray dataset (the dataset) that is LSD (Problem 5). We call the linearly separable dataset as the largest Matroska. Only Revised IP-OLDF can select features naturally and find the smaller gene set or subspace (smaller Matroska) in the dataset. When we discriminate the smaller Matroska again, we can find the smaller Matroska. If we cannot find the smaller Matroska anymore, I call the last smaller Matroska as small Matroska (SM) that is linearly separable gene subspace. Because the dataset has the structure of Matroska, I develop a Matroska feature-selection method (Method 2) that finds the surprising structure of the dataset that is the disjoint union of several SMs, which are linearly separable subspaces or models. Now, we can analyze each SM very quickly because all SMs are small samples. The Theory is most suitable to analyze the datasets.

Suggested Citation

  • Shuichi Shinmura, 2016. "New Theory of Discriminant Analysis," Springer Books, in: New Theory of Discriminant Analysis After R. Fisher, chapter 0, pages 1-35, Springer.
  • Handle: RePEc:spr:sprchp:978-981-10-2164-0_1
    DOI: 10.1007/978-981-10-2164-0_1
    as

    Download full text from publisher

    To our knowledge, this item is not available for download. To find whether it is available, there are three options:
    1. Check below whether another version of this item is available online.
    2. Check on the provider's web page whether it is in fact available.
    3. Perform a
    for a similarly titled item that would be available.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sprchp:978-981-10-2164-0_1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.