IDEAS home Printed from https://ideas.repec.org/a/bpj/ijbist/v4y2008i1n18.html
   My bibliography  Save this article

Testing for Associations with Missing High-Dimensional Categorical Covariates

Author

Listed:
  • Schumi Jennifer

    (Statistics Collaborative, Inc.)

  • DiRienzo A. Gregory

    (Harvard University)

  • DeGruttola Victor

    (Harvard University)

Abstract

Understanding how long-term clinical outcomes relate to short-term response to therapy is an important topic of research with a variety of applications. In HIV, early measures of viral RNA levels are known to be a strong prognostic indicator of future viral load response. However, mutations observed in the high-dimensional viral genotype at an early time point may change this prognosis. Unfortunately, some subjects may not have a viral genetic sequence measured at the early time point, and the sequence may be missing for reasons related to the outcome. Complete-case analyses of missing data are generally biased when the assumption that data are missing completely at random is not met, and methods incorporating multiple imputation may not be well-suited for the analysis of high-dimensional data. We propose a semiparametric multiple testing approach to the problem of identifying associations between potentially missing high-dimensional covariates and response. Following the recent exposition by Tsiatis, unbiased nonparametric summary statistics are constructed by inversely weighting the complete cases according to the conditional probability of being observed, given data that is observed for each subject. Resulting summary statistics will be unbiased under the assumption of missing at random. We illustrate our approach through an application to data from a recent AIDS clinical trial, and demonstrate finite sample properties with simulations.

Suggested Citation

  • Schumi Jennifer & DiRienzo A. Gregory & DeGruttola Victor, 2008. "Testing for Associations with Missing High-Dimensional Categorical Covariates," The International Journal of Biostatistics, De Gruyter, vol. 4(1), pages 1-17, September.
  • Handle: RePEc:bpj:ijbist:v:4:y:2008:i:1:n:18
    DOI: 10.2202/1557-4679.1102
    as

    Download full text from publisher

    File URL: https://doi.org/10.2202/1557-4679.1102
    Download Restriction: For access to full text, subscription to the journal or payment for the individual article is required.

    File URL: https://libkey.io/10.2202/1557-4679.1102?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Dudoit Sandrine & van der Laan Mark J. & Pollard Katherine S., 2004. "Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-71, June.
    2. Mark van der Laan & Sandrine Dudoit & Katherine Pollard, 2004. "Multiple Testing. Part III. Procedures for Control of the Generalized Family-Wise Error Rate and Proportion of False Positives," U.C. Berkeley Division of Biostatistics Working Paper Series 1140, Berkeley Electronic Press.
    3. van der Laan Mark J. & Birkner Merrill D. & Hubbard Alan E., 2005. "Empirical Bayes and Resampling Based Multiple Testing Procedure Controlling Tail Probability of the Proportion of False Positives," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 4(1), pages 1-32, October.
    4. van der Laan Mark J. & Dudoit Sandrine & Pollard Katherine S., 2004. "Augmentation Procedures for Control of the Generalized Family-Wise Error Rate and Tail Probabilities for the Proportion of False Positives," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-27, June.
    5. Sandrine Dudoit & Mark van der Laan & Katherine Pollard, 2004. "Multiple Testing. Part I. Single-Step Procedures for Control of General Type I Error Rates," U.C. Berkeley Division of Biostatistics Working Paper Series 1137, Berkeley Electronic Press.
    6. Lu Tian & Tianxi Cai & Els Goetghebeur & L. J. Wei, 2007. "Model evaluation based on the sampling distribution of estimated absolute prediction error," Biometrika, Biometrika Trust, vol. 94(2), pages 297-311.
    7. van der Laan Mark J. & Dudoit Sandrine & Pollard Katherine S., 2004. "Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 3(1), pages 1-35, June.
    8. Mark van der Laan & Sandrine Dudoit & Katherine Pollard, 2004. "Multiple Testing. Part II. Step-Down Procedures for Control of the Family-Wise Error Rate," U.C. Berkeley Division of Biostatistics Working Paper Series 1138, Berkeley Electronic Press.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Christina C. Bartenschlager & Michael Krapp, 2015. "Theorie und Methoden multipler statistischer Vergleiche," AStA Wirtschafts- und Sozialstatistisches Archiv, Springer;Deutsche Statistische Gesellschaft - German Statistical Society, vol. 9(2), pages 107-129, November.
    2. Alessio Farcomeni, 2009. "Generalized Augmentation to Control the False Discovery Exceedance in Multiple Testing," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 36(3), pages 501-517, September.
    3. Joseph P. Romano & Michael Wolf, 2008. "Balanced Control of Generalized Error Rates," IEW - Working Papers 379, Institute for Empirical Research in Economics - University of Zurich.
    4. Merrill Birkner & Sandra Sinisi & Mark van der Laan, 2004. "Multiple Testing and Data Adaptive Regression: An Application to HIV-1 Sequence Data," U.C. Berkeley Division of Biostatistics Working Paper Series 1161, Berkeley Electronic Press.
    5. van der Laan Mark J. & Hubbard Alan E., 2006. "Quantile-Function Based Null Distribution in Resampling Based Multiple Testing," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 5(1), pages 1-30, May.
    6. Miecznikowski, Jeffrey C. & Gold, David & Shepherd, Lori & Liu, Song, 2011. "Deriving and comparing the distribution for the number of false positives in single step methods to control k-FWER," Statistics & Probability Letters, Elsevier, vol. 81(11), pages 1695-1705, November.
    7. Joseph P. Romano & Azeem M. Shaikh & Michael Wolf, 2010. "Hypothesis Testing in Econometrics," Annual Review of Economics, Annual Reviews, vol. 2(1), pages 75-104, September.
    8. Mathur, Maya B & VanderWeele, Tyler, 2018. "Statistical methods for evidence synthesis," Thesis Commons kd6ja, Center for Open Science.
    9. Yang Yang & Victor DeGruttola, 2008. "Resampling-Based Multiple Testing Methods with Covariate Adjustment: Application to Investigation of Antiretroviral Drug Susceptibility," Biometrics, The International Biometric Society, vol. 64(2), pages 329-336, June.
    10. Pallavi Basu & Luella Fu & Alessio Saretto & Wenguang Sun, 2021. "Empirical Bayes Control of the False Discovery Exceedance," Working Papers 2115, Federal Reserve Bank of Dallas.
    11. Wang, Li & Xu, Xingzhong, 2012. "Step-up procedure controlling generalized family-wise error rate," Statistics & Probability Letters, Elsevier, vol. 82(4), pages 775-782.
    12. Hossain, Ahmed & Beyene, Joseph & Willan, Andrew R. & Hu, Pingzhao, 2009. "A flexible approximate likelihood ratio test for detecting differential expression in microarray data," Computational Statistics & Data Analysis, Elsevier, vol. 53(10), pages 3685-3695, August.
    13. G�nther Fink & Margaret McConnell & Sebastian Vollmer, 2014. "Testing for heterogeneous treatment effects in experimental data: false discovery risks and correction procedures," Journal of Development Effectiveness, Taylor & Francis Journals, vol. 6(1), pages 44-57, January.
    14. Irene Castro-Conde & Jacobo Uña-Álvarez, 2015. "Power, FDR and conservativeness of BB-SGoF method," Computational Statistics, Springer, vol. 30(4), pages 1143-1161, December.
    15. Aureo de Paula & Xun Tang, 2010. "Inference of Signs of Interaction Effects in Simultaneous Games with Incomplete Information, Second Version," PIER Working Paper Archive 11-003, Penn Institute for Economic Research, Department of Economics, University of Pennsylvania, revised 12 Feb 2011.
    16. Guo Wenge & Peddada Shyamal, 2008. "Adaptive Choice of the Number of Bootstrap Samples in Large Scale Multiple Testing," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 7(1), pages 1-21, March.
    17. Áureo de Paula & Xun Tang, 2012. "Inference of Signs of Interaction Effects in Simultaneous Games With Incomplete Information," Econometrica, Econometric Society, vol. 80(1), pages 143-172, January.
    18. Cerioli, Andrea & Farcomeni, Alessio, 2011. "Error rates for multivariate outlier detection," Computational Statistics & Data Analysis, Elsevier, vol. 55(1), pages 544-553, January.
    19. de Uña-Alvarez Jacobo, 2012. "The Beta-Binomial SGoF method for multiple dependent tests," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 11(3), pages 1-32, May.
    20. Rubin Daniel & Dudoit Sandrine & van der Laan Mark, 2006. "A Method to Increase the Power of Multiple Testing Procedures Through Sample Splitting," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 5(1), pages 1-20, August.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bpj:ijbist:v:4:y:2008:i:1:n:18. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Peter Golla (email available below). General contact details of provider: https://www.degruyter.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.