IDEAS home Printed from
   My bibliography  Save this paper

Novel Methods for Multivariate Ordinal Data applied to Genetic Diplotypes, Genomic Pathways, Risk Profiles, and Pattern Similarity


  • Wittkowski, Knut M.


Introduction: Conventional statistical methods for multivariate data (e.g., discriminant/regression) are based on the (generalized) linear model, i.e., the data are interpreted as points in a Euclidian space of independent dimensions. The dimensionality of the data is then reduced by assuming the components to be related by a specific function of known type (linear, exponential, etc.), which allows the distance of each point from a hyperspace to be determined. While mathematically elegant, these approaches may have shortcomings when applied to real world applications where the relative importance, the functional relationship, and the correlation among the variables tend to be unknown. Still, in many applications, each variable can be assumed to have at least an “orientation”, i.e., it can reasonably assumed that, if all other conditions are held constant, an increase in this variable is either “good” or “bad”. The direction of this orientation can be known or unknown. In genetics, for instance, having more “abnormal” alleles may increase the risk (or magnitude) of a disease phenotype. In genomics, the expression of several related genes may indicate disease activity. When screening for security risks, more indicators for atypical behavior may constitute raise more concern, in face or voice recognition, more indicators being similar may increase the likelihood of a person being identified. Methods: In 1998, we developed a nonparametric method for analyzing multivariate ordinal data to assess the overall risk of HIV infection based on different types of behavior or the overall protective effect of barrier methods against HIV infection. By using u-statistics, rather than the marginal likelihood, we were able to increase the computational efficiency of this approach by several orders of magnitude. Results: We applied this approach to assessing immunogenicity of a vaccination strategy in cancer patients. While discussing the pitfalls of the conventional methods for linking quantitative traits to haplotypes, we realized that this approach could be easily modified into to a statistically valid alternative to a previously proposed approaches. We have now begun to use the same methodology to correlate activity of anti-inflammatory drugs along genomic pathways with disease severity of psoriasis based on several clinical and histological characteristics. Conclusion: Multivariate ordinal data are frequently observed to assess semiquantitative characteristics, such as risk profiles (genetic, genomic, or security) or similarity of pattern (faces, voices, behaviors). The conventional methods require empirical validation, because the functions and weights chosen cannot be justified on theoretical grounds. The proposed statistical method for analyzing profiles of ordinal variables, is intrinsically valid. Since no additional assumptions need to be made, the often time-consuming empirical validation can be skipped.

Suggested Citation

  • Wittkowski, Knut M., 2003. "Novel Methods for Multivariate Ordinal Data applied to Genetic Diplotypes, Genomic Pathways, Risk Profiles, and Pattern Similarity," MPRA Paper 4570, University Library of Munich, Germany.
  • Handle: RePEc:pra:mprapa:4570

    Download full text from publisher

    File URL:
    File Function: original version
    Download Restriction: no

    References listed on IDEAS

    1. Li K-C. & Aragon Y. & Shedden K. & Thomas Agnan C., 2003. "Dimension Reduction for Multivariate Response Data," Journal of the American Statistical Association, American Statistical Association, vol. 98, pages 99-109, January.
    2. repec:aph:ajpbhl:1998:88:4:671-674_8 is not listed on IDEAS
    3. Quinn McNemar, 1947. "Note on the sampling error of the difference between correlated proportions or percentages," Psychometrika, Springer;The Psychometric Society, vol. 12(2), pages 153-157, June.
    4. repec:aph:ajpbhl:1998:88:4:590-596_3 is not listed on IDEAS
    Full references (including those not matched with items on IDEAS)


    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

    Cited by:

    1. Jan Ours & Frederic Vermeulen, 2007. "Ranking Dutch Economists," De Economist, Springer, vol. 155(4), pages 469-487, December.
    2. Wittkowski Knut M. & Song Tingting & Anderson Kent & Daniels John E., 2008. "U-Scores for Multivariate Data in Sports," Journal of Quantitative Analysis in Sports, De Gruyter, vol. 4(3), pages 1-24, July.
    3. Morales José F. & Song Tingting & Auerbach Arleen D. & Wittkowski Knut M., 2008. "Phenotyping Genetic Diseases Using an Extension of µ-Scores for Multivariate Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-20, June.

    More about this item


    ranking; nonparametric; robust; scoring; multivariate;

    JEL classification:

    • C35 - Mathematical and Quantitative Methods - - Multiple or Simultaneous Equation Models; Multiple Variables - - - Discrete Regression and Qualitative Choice Models; Discrete Regressors; Proportions
    • C44 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods: Special Topics - - - Operations Research; Statistical Decision Theory
    • C14 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Semiparametric and Nonparametric Methods: General


    Access and download statistics


    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:pra:mprapa:4570. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Joachim Winter). General contact details of provider: .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.