IDEAS home Printed from https://ideas.repec.org/a/bla/jorssb/v77y2015i5p947-972.html
   My bibliography  Save this article

A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression

Author

Listed:
  • Qifan Song
  • Faming Liang

Abstract

type="main" xml:id="rssb12095-abs-0001"> We propose a Bayesian variable selection approach for ultrahigh dimensional linear regression based on the strategy of split and merge. The approach proposed consists of two stages: split the ultrahigh dimensional data set into a number of lower dimensional subsets and select relevant variables from each of the subsets, and aggregate the variables selected from each subset and then select relevant variables from the aggregated data set. Since the approach proposed has an embarrassingly parallel structure, it can be easily implemented in a parallel architecture and applied to big data problems with millions or more of explanatory variables. Under mild conditions, we show that the approach proposed is consistent, i.e. the true explanatory variables can be correctly identified by the approach as the sample size becomes large. Extensive comparisons of the approach proposed have been made with penalized likelihood approaches, such as the lasso, elastic net, sure independence screening and iterative sure independence screening. The numerical results show that the approach proposed generally outperforms penalized likelihood approaches: the models selected by the approach tend to be more sparse and closer to the true model.

Suggested Citation

  • Qifan Song & Faming Liang, 2015. "A split-and-merge Bayesian variable selection approach for ultrahigh dimensional regression," Journal of the Royal Statistical Society Series B, Royal Statistical Society, vol. 77(5), pages 947-972, November.
  • Handle: RePEc:bla:jorssb:v:77:y:2015:i:5:p:947-972
    as

    Download full text from publisher

    File URL: http://hdl.handle.net/10.1111/rssb.2015.77.issue-5
    Download Restriction: Access to full text is restricted to subscribers.
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Guangbao Guo & Yue Sun & Xuejun Jiang, 2020. "A partitioned quasi-likelihood for distributed statistical inference," Computational Statistics, Springer, vol. 35(4), pages 1577-1596, December.
    2. Hubin, Aliaksandr & Storvik, Geir, 2018. "Mode jumping MCMC for Bayesian variable selection in GLMM," Computational Statistics & Data Analysis, Elsevier, vol. 127(C), pages 281-297.
    3. Guangbao Guo & Guoqi Qian & Lu Lin & Wei Shao, 2021. "Parallel inference for big data with the group Bayesian method," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 84(2), pages 225-243, February.
    4. Lee, JooChul & Wang, HaiYing & Schifano, Elizabeth D., 2020. "Online updating method to correct for measurement error in big data streams," Computational Statistics & Data Analysis, Elsevier, vol. 149(C).
    5. Fang, Jianglin, 2023. "A split-and-conquer variable selection approach for high-dimensional general semiparametric models with massive data," Journal of Multivariate Analysis, Elsevier, vol. 194(C).
    6. Jaeger, Adam & Lazar, Nicole A., 2020. "Split sample empirical likelihood," Computational Statistics & Data Analysis, Elsevier, vol. 150(C).
    7. Runmin Shi & Faming Liang & Qifan Song & Ye Luo & Malay Ghosh, 2018. "A Blockwise Consistency Method for Parameter Estimation of Complex Models," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(1), pages 179-223, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:bla:jorssb:v:77:y:2015:i:5:p:947-972. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://edirc.repec.org/data/rssssea.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.