IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0260092.html
   My bibliography  Save this article

Polling India via regression and post-stratification of non-probability online samples

Author

Listed:
  • Roberto Cerina
  • Raymond Duch

Abstract

Recent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modified MrP model of Indian vote preferences that proposes innovations to each of its three core components: stratification frame, training data, and a learner. For the post-stratification frame we propose a novel Data Integration approach that allows the simultaneous estimation of counts from multiple complementary sources, such as census tables and auxiliary surveys. For the training data we assemble panels of respondents from two unorthodox online populations: Amazon Mechanical Turks workers and Facebook users. And as a modeling tool, we replace the Bayesian multilevel regression learner with Random Forests. Our 2019 pre-election forecasts for the two largest Lok Sahba coalitions were very close to actual outcomes: we predicted 41.8% for the NDA, against an observed value of 45.0% and 30.8% for the UPA against an observed vote share of just under 31.3%. Our uniform-swing seat projection outperforms other pollsters—we had the lowest absolute error of 89 seats (along with a poll from ‘Jan Ki Baat’); the lowest error on the NDA-UPA lead (a mere 8 seats), and we are the only pollster that can capture real-time preference shifts due to salient campaign events.

Suggested Citation

  • Roberto Cerina & Raymond Duch, 2021. "Polling India via regression and post-stratification of non-probability online samples," PLOS ONE, Public Library of Science, vol. 16(11), pages 1-34, November.
  • Handle: RePEc:plo:pone00:0260092
    DOI: 10.1371/journal.pone.0260092
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0260092
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0260092&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0260092?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    2. Ghitza, Yair & Gelman, Andrew, 2020. "Voter Registration Databases and MRP: Toward the Use of Large-Scale Databases in Public Opinion Research," Political Analysis, Cambridge University Press, vol. 28(4), pages 507-531, October.
    3. Jackman, Simon & Spahn, Bradley, 2019. "Why Does the American National Election Study Overestimate Voter Turnout?," Political Analysis, Cambridge University Press, vol. 27(2), pages 193-207, April.
    4. Zellner, Arnold & Tobias, Justin, 1998. "A Note on Aggregation, Disaggregation and Forecasting Performance," CUDARE Working Papers 198677, University of California, Berkeley, Department of Agricultural and Resource Economics.
    5. Cerina, Roberto & Duch, Raymond, 2020. "Measuring public opinion via digital footprints," International Journal of Forecasting, Elsevier, vol. 36(3), pages 987-1002.
    6. Jennifer L. Castle & David F. Hendry, 2010. "Nowcasting from disaggregates in the face of location shifts," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 29(1-2), pages 200-214.
    7. Jeffrey R. Lax & Justin H. Phillips, 2009. "How Should We Estimate Public Opinion in The States?," American Journal of Political Science, John Wiley & Sons, vol. 53(1), pages 107-121, January.
    8. Park, David K. & Gelman, Andrew & Bafumi, Joseph, 2004. "Bayesian Multilevel Estimation with Poststratification: State-Level Estimates from National Polls," Political Analysis, Cambridge University Press, vol. 12(4), pages 375-385.
    9. Duch, Raymond & Laroze, Denise & Robinson, Thomas & Beramendi, Pablo, 2020. "Multi-modes for Detecting Experimental Measurement Error," Political Analysis, Cambridge University Press, vol. 28(2), pages 263-283, April.
    10. Dietrich, Simone & Winters, Matthew S., 2015. "Foreign Aid and Government Legitimacy," Journal of Experimental Political Science, Cambridge University Press, vol. 2(2), pages 164-171, January.
    11. Buttice, Matthew K. & Highton, Benjamin, 2013. "How Does Multilevel Regression and Poststratification Perform with Conventional National Surveys?," Political Analysis, Cambridge University Press, vol. 21(4), pages 449-467.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lauderdale, Benjamin E. & Bailey, Delia & Blumenau, Jack & Rivers, Douglas, 2020. "Model-based pre-election polling for national and sub-national outcomes in the US and UK," International Journal of Forecasting, Elsevier, vol. 36(2), pages 399-413.
    2. Marina Christofoletti & Tânia R. B. Benedetti & Felipe G. Mendes & Humberto M. Carvalho, 2021. "Using Multilevel Regression and Poststratification to Estimate Physical Activity Levels from Health Surveys," IJERPH, MDPI, vol. 18(14), pages 1-16, July.
    3. Wang, Wei & Rothschild, David & Goel, Sharad & Gelman, Andrew, 2015. "Forecasting elections with non-representative polls," International Journal of Forecasting, Elsevier, vol. 31(3), pages 980-991.
    4. Margaret Weden & Christine Peterson & Jeremy Miles & Regina Shih, 2015. "Evaluating Linearly Interpolated Intercensal Estimates of Demographic and Socioeconomic Characteristics of U.S. Counties and Census Tracts 2001–2009," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 34(4), pages 541-559, August.
    5. Facchini, Giovanni & Hatton, Timothy J. & Steinhardt, Max F., 2024. "Opening Heaven’s Door: Public Opinion and Congressional Votes on the 1965 Immigration Act," The Journal of Economic History, Cambridge University Press, vol. 84(1), pages 232-270, March.
    6. Caldarulo, Mattia & Mossberger, Karen & Howell, Anthony, 2023. "Community-wide broadband adoption and student academic achievement," Telecommunications Policy, Elsevier, vol. 47(1).
    7. Christopher Claassen & Richard Traunmüller, 2020. "Improving and Validating Survey Estimates of Religious Demography Using Bayesian Multilevel Models and Poststratification," Sociological Methods & Research, , vol. 49(3), pages 603-636, August.
    8. Laura C. Dawkins & Daniel B. Williamson & Stewart W. Barr & Sally R. Lampkin, 2020. "‘What drives commuter behaviour?': a Bayesian clustering approach for understanding opposing behaviours in social surveys," Journal of the Royal Statistical Society Series A, Royal Statistical Society, vol. 183(1), pages 251-280, January.
    9. Cerina, Roberto & Duch, Raymond, 2020. "Measuring public opinion via digital footprints," International Journal of Forecasting, Elsevier, vol. 36(3), pages 987-1002.
    10. Jinan N. Allan & Joseph T. Ripberger & Wesley Wehde & Makenzie Krocak & Carol L. Silva & Hank C. Jenkins‐Smith, 2020. "Geographic Distributions of Extreme Weather Risk Perceptions in the United States," Risk Analysis, John Wiley & Sons, vol. 40(12), pages 2498-2508, December.
    11. Matto Mildenberger & Jennifer R. Marlon & Peter D. Howe & Anthony Leiserowitz, 2017. "The spatial distribution of Republican and Democratic climate opinions at state and local scales," Climatic Change, Springer, vol. 145(3), pages 539-548, December.
    12. Matto Mildenberger & Peter Howe & Erick Lachapelle & Leah Stokes & Jennifer Marlon & Timothy Gravelle, 2016. "The Distribution of Climate Change Public Opinion in Canada," PLOS ONE, Public Library of Science, vol. 11(8), pages 1-14, August.
    13. Lechner, Michael, 2018. "Modified Causal Forests for Estimating Heterogeneous Causal Effects," IZA Discussion Papers 12040, Institute of Labor Economics (IZA).
    14. William Arbour, 2021. "Can Recidivism be Prevented from Behind Bars? Evidence from a Behavioral Program," Working Papers tecipa-683, University of Toronto, Department of Economics.
    15. Alexandre Belloni & Victor Chernozhukov & Denis Chetverikov & Christian Hansen & Kengo Kato, 2018. "High-dimensional econometrics and regularized GMM," CeMMAP working papers CWP35/18, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    16. Dimitris Bertsimas & Agni Orfanoudaki & Rory B. Weiner, 2020. "Personalized treatment for coronary artery disease patients: a machine learning approach," Health Care Management Science, Springer, vol. 23(4), pages 482-506, December.
    17. Nicolaj N. Mühlbach, 2020. "Tree-based Synthetic Control Methods: Consequences of moving the US Embassy," CREATES Research Papers 2020-04, Department of Economics and Business Economics, Aarhus University.
    18. Kyle Colangelo & Ying-Ying Lee, 2019. "Double debiased machine learning nonparametric inference with continuous treatments," CeMMAP working papers CWP72/19, Centre for Microdata Methods and Practice, Institute for Fiscal Studies.
    19. Shonosuke Sugasawa & Hisashi Noma, 2021. "Efficient screening of predictive biomarkers for individual treatment selection," Biometrics, The International Biometric Society, vol. 77(1), pages 249-257, March.
    20. Ruoxuan Xiong & Allison Koenecke & Michael Powell & Zhu Shen & Joshua T. Vogelstein & Susan Athey, 2021. "Federated Causal Inference in Heterogeneous Observational Data," Papers 2107.11732, arXiv.org, revised Apr 2023.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0260092. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.