IDEAS home Printed from https://ideas.repec.org/a/spr/sankhb/v86y2024i1d10.1007_s13571-023-00321-9.html
   My bibliography  Save this article

Doubly Weighted Estimation Approach for Linear Regression Analysis with Two-stage Cluster Samples

Author

Listed:
  • Brajendra C. Sutradhar

    (Memorial University)

Abstract

In a two stage clusters sampling (TSCS) setup, a sample of clusters is chosen at the first stage from a large number of clusters belonging to a finite population (FP), and in the second stage a random sample of individuals is chosen from the selected cluster. In this sampling setup, it is of interest to collect responses along with certain multi-dimensional fixed covariates from all individuals selected in the second stage cluster, and examine the effects of such covariates on the responses. In some studies, the fixed covariates from the so-called sampling frame consisting of all first-stage clustered individuals may be available. Because the responses in a given cluster share a common random cluster effect, they are correlated. Thus, if the first-stage clusters based data were all available, one could estimate the regression parameters/effects by using the standard infinite population based generalized least square (GLS) approach that produces efficient estimates as compared to the simpler OLS (ordinary least square) estimates. But, in the present TSCS setup, the first-stage clustered data are not available, and hence the estimation has to be done using second-stage clusters, where the responses may not be assumed any more arising from the infinite population, rather there is a sampling effect to consider in order to develop appropriate estimating equations for the regression parameters. However, the existing four decades long studies including a pioneer work by Prasad and Rao (J. Am. Stat. Assoc., 85, 163–171 1990) used the same GLS estimation by treating the second stage clusters as the first stage clusters following a super-population model based correlation structure. In this paper, we revisit this important inference issue and find that because the existing second-stage clusters based GLS approach is constructed ignoring the sampling effect (of the first stage clusters), leave alone the efficiency gain, this approach produces biased and hence inconsistent estimates for the regression parameters and other related subsequent effects. As a remedy, on top of sampling weights we introduce an inverse correlation weight to the second stage clustered elements and provide a doubly weighted GLS (DWGLS) estimation approach which produces unbiased and consistent estimates of the regression parameters. The correlation parameters are also consistently estimated. A numerical illustration using a hypothetical two-stage cluster sample is provided to understand the estimation biases caused by sampling mis-specification under a simpler specialized linear cluster model with no covariates without any loss of generality. For the general regression case, the unbiasedness and consistency properties of the proposed estimator of the regression parameter, which is of main interest, are studied analytically in details. The asymptotic normality of the regression estimator is also studied for the construction of confidence intervals when needed.

Suggested Citation

  • Brajendra C. Sutradhar, 2024. "Doubly Weighted Estimation Approach for Linear Regression Analysis with Two-stage Cluster Samples," Sankhya B: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 86(1), pages 55-90, May.
  • Handle: RePEc:spr:sankhb:v:86:y:2024:i:1:d:10.1007_s13571-023-00321-9
    DOI: 10.1007/s13571-023-00321-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s13571-023-00321-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s13571-023-00321-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Skinner, Chris J. & de Toledo Vieira, Marcel, 2007. "Variance estimation in the analysis of clustered longitudinal survey data," LSE Research Online Documents on Economics 39106, London School of Economics and Political Science, LSE Library.
    2. Ludwig Fahrmeir & Heinz Kaufmann, 1987. "Regression Models For Non‐Stationary Categorical Time Series," Journal of Time Series Analysis, Wiley Blackwell, vol. 8(2), pages 147-160, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Sun-Joo Cho & Sarah Brown-Schmidt & Woo-yeol Lee, 2018. "Autoregressive Generalized Linear Mixed Effect Models with Crossed Random Effects: An Application to Intensive Binary Time Series Eye-Tracking Data," Psychometrika, Springer;The Psychometric Society, vol. 83(3), pages 751-771, September.
    2. Brajendra C. Sutradhar, 2018. "Semi-parametric Dynamic Models for Longitudinal Ordinal Categorical Data," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(1), pages 80-109, February.
    3. Ana Isabel Polo Peña & Dolores María Frías Jamilena & Miguel Ángel Rodríguez Molina, 2017. "The effects of perceived value on loyalty: the moderating effect of market orientation adoption," Service Business, Springer;Pan-Pacific Business Association, vol. 11(1), pages 93-116, March.
    4. H. Kaufmann, 1988. "On existence and uniqueness of maximum likelihood estimates in quantal and ordinal response models," Metrika: International Journal for Theoretical and Applied Statistics, Springer, vol. 35(1), pages 291-313, December.
    5. Moysiadis, Theodoros & Fokianos, Konstantinos, 2014. "On binary and categorical time series models with feedback," Journal of Multivariate Analysis, Elsevier, vol. 131(C), pages 209-228.
    6. Moritz Berger & Gerhard Tutz, 2021. "Transition models for count data: a flexible alternative to fixed distribution models," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 30(4), pages 1259-1283, October.
    7. Brajendra C. Sutradhar, 2022. "Multinomial Logistic Mixed Models for Clustered Categorical Data in a Complex Survey Sampling Setup," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 84(2), pages 743-789, August.
    8. Peiming Wang & Martin Puterman, 1999. "Markov Poisson regression models for discrete time series. Part 1: Methodology," Journal of Applied Statistics, Taylor & Francis Journals, vol. 26(7), pages 855-869.
    9. Zhen, X. & Basawa, I.V., 2009. "Observation-driven generalized state space models for categorical time series," Statistics & Probability Letters, Elsevier, vol. 79(24), pages 2462-2468, December.
    10. Song, Peter X.-K. & Freeland, R. Keith & Biswas, Atanu & Zhang, Shulin, 2013. "Statistical analysis of discrete-valued time series using categorical ARMA models," Computational Statistics & Data Analysis, Elsevier, vol. 57(1), pages 112-124.
    11. Brajendra C Sutradhar, 2018. "A Parameter Dimension-Split Based Asymptotic Regression Estimation Theory for a Multinomial Panel Data Model," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 80(2), pages 301-329, August.
    12. Chiranjit Dutta & Nalini Ravishanker & Sumanta Basu, 2022. "Modeling Multivariate Positive-Valued Time Series Using R-INLA," Papers 2206.05374, arXiv.org, revised Jul 2022.
    13. Konstantinos Fokianos & Benjamin Kedem, 2004. "Partial Likelihood Inference For Time Series Following Generalized Linear Models," Journal of Time Series Analysis, Wiley Blackwell, vol. 25(2), pages 173-197, March.
    14. Ginger M. Davis & Katherine B. Ensor, 2007. "Multivariate Time‐Series Analysis With Categorical and Continuous Variables in an Lstr Model," Journal of Time Series Analysis, Wiley Blackwell, vol. 28(6), pages 867-885, November.
    15. Xu Gao & Daniel Gillen & Hernando Ombao, 2018. "Fisher information matrix of binary time series," METRON, Springer;Sapienza Università di Roma, vol. 76(3), pages 287-304, December.
    16. Heikki Kauppi, 2008. "Yield-Curve Based Probit Models for Forecasting U.S. Recessions: Stability and Dynamics," Discussion Papers 31, Aboa Centre for Economics.
    17. R. Prabhakar Rao & Brajendra C. Sutradhar, 2020. "Multiple Categorical Covariates-Based Multinomial Dynamic Response Model," Sankhya A: The Indian Journal of Statistics, Springer;Indian Statistical Institute, vol. 82(1), pages 186-219, February.
    18. Pruscha Helmut & Göttlein Axel, 2003. "Forecasting of Categorical Time Series Using a Regression Model," Stochastics and Quality Control, De Gruyter, vol. 18(2), pages 223-240, January.
    19. Zhen, X. & Basawa, I.V., 2009. "Categorical time series models for contingency tables," Statistics & Probability Letters, Elsevier, vol. 79(10), pages 1331-1336, May.
    20. Eziyi Ibem & Dolapo Amole, 2014. "Satisfaction with Life in Public Housing in Ogun State, Nigeria: A Research Note," Journal of Happiness Studies, Springer, vol. 15(3), pages 495-501, June.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:sankhb:v:86:y:2024:i:1:d:10.1007_s13571-023-00321-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.