IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0260092.html
   My bibliography  Save this article

Polling India via regression and post-stratification of non-probability online samples

Author

Listed:
  • Roberto Cerina
  • Raymond Duch

Abstract

Recent technological advances have facilitated the collection of large-scale administrative data and the online surveying of the Indian population. Building on these we propose a strategy for more robust, frequent and transparent projections of the Indian vote during the campaign. We execute a modified MrP model of Indian vote preferences that proposes innovations to each of its three core components: stratification frame, training data, and a learner. For the post-stratification frame we propose a novel Data Integration approach that allows the simultaneous estimation of counts from multiple complementary sources, such as census tables and auxiliary surveys. For the training data we assemble panels of respondents from two unorthodox online populations: Amazon Mechanical Turks workers and Facebook users. And as a modeling tool, we replace the Bayesian multilevel regression learner with Random Forests. Our 2019 pre-election forecasts for the two largest Lok Sahba coalitions were very close to actual outcomes: we predicted 41.8% for the NDA, against an observed value of 45.0% and 30.8% for the UPA against an observed vote share of just under 31.3%. Our uniform-swing seat projection outperforms other pollsters—we had the lowest absolute error of 89 seats (along with a poll from ‘Jan Ki Baat’); the lowest error on the NDA-UPA lead (a mere 8 seats), and we are the only pollster that can capture real-time preference shifts due to salient campaign events.

Suggested Citation

  • Roberto Cerina & Raymond Duch, 2021. "Polling India via regression and post-stratification of non-probability online samples," PLOS ONE, Public Library of Science, vol. 16(11), pages 1-34, November.
  • Handle: RePEc:plo:pone00:0260092
    DOI: 10.1371/journal.pone.0260092
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0260092
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0260092&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0260092?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Ghitza, Yair & Gelman, Andrew, 2020. "Voter Registration Databases and MRP: Toward the Use of Large-Scale Databases in Public Opinion Research," Political Analysis, Cambridge University Press, vol. 28(4), pages 507-531, October.
    2. Zellner, Arnold & Tobias, Justin, 1998. "A Note on Aggregation, Disaggregation and Forecasting Performance," CUDARE Working Papers 198677, University of California, Berkeley, Department of Agricultural and Resource Economics.
    3. Duch, Raymond & Laroze, Denise & Robinson, Thomas & Beramendi, Pablo, 2020. "Multi-modes for Detecting Experimental Measurement Error," Political Analysis, Cambridge University Press, vol. 28(2), pages 263-283, April.
    4. Marcellino, Massimiliano & Stock, James H. & Watson, Mark W., 2003. "Macroeconomic forecasting in the Euro area: Country specific versus area-wide information," European Economic Review, Elsevier, vol. 47(1), pages 1-18, February.
    5. Stefan Wager & Susan Athey, 2018. "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 113(523), pages 1228-1242, July.
    6. Jackman, Simon & Spahn, Bradley, 2019. "Why Does the American National Election Study Overestimate Voter Turnout?," Political Analysis, Cambridge University Press, vol. 27(2), pages 193-207, April.
    7. Cerina, Roberto & Duch, Raymond, 2020. "Measuring public opinion via digital footprints," International Journal of Forecasting, Elsevier, vol. 36(3), pages 987-1002.
    8. Jennifer L. Castle & David F. Hendry, 2010. "Nowcasting from disaggregates in the face of location shifts," Journal of Forecasting, John Wiley & Sons, Ltd., vol. 29(1-2), pages 200-214.
    9. Jeffrey R. Lax & Justin H. Phillips, 2009. "How Should We Estimate Public Opinion in The States?," American Journal of Political Science, John Wiley & Sons, vol. 53(1), pages 107-121, January.
    10. Park, David K. & Gelman, Andrew & Bafumi, Joseph, 2004. "Bayesian Multilevel Estimation with Poststratification: State-Level Estimates from National Polls," Political Analysis, Cambridge University Press, vol. 12(4), pages 375-385.
    11. Dietrich, Simone & Winters, Matthew S., 2015. "Foreign Aid and Government Legitimacy," Journal of Experimental Political Science, Cambridge University Press, vol. 2(2), pages 164-171, January.
    12. Rajashri Chakrabarti & Joydeep Roy, 2007. "Effect of redrawing of political boundaries on voting patterns: evidence from state reorganization in India," Staff Reports 301, Federal Reserve Bank of New York.
    13. Buttice, Matthew K. & Highton, Benjamin, 2013. "How Does Multilevel Regression and Poststratification Perform with Conventional National Surveys?," Political Analysis, Cambridge University Press, vol. 21(4), pages 449-467.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Anurag Barthwal & Mamta Bhatt & Shwetank Avikal & Chandra Prakash, 2025. "Machine learning-based prediction models for electoral outcomes in India: a comparative analysis of exit polls from 2014–2021," Quality & Quantity: International Journal of Methodology, Springer, vol. 59(1), pages 313-338, February.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Lauderdale, Benjamin E. & Bailey, Delia & Blumenau, Jack & Rivers, Douglas, 2020. "Model-based pre-election polling for national and sub-national outcomes in the US and UK," International Journal of Forecasting, Elsevier, vol. 36(2), pages 399-413.
    2. Marina Christofoletti & Tânia R. B. Benedetti & Felipe G. Mendes & Humberto M. Carvalho, 2021. "Using Multilevel Regression and Poststratification to Estimate Physical Activity Levels from Health Surveys," IJERPH, MDPI, vol. 18(14), pages 1-16, July.
    3. Hyndman, Rob J. & Ahmed, Roman A. & Athanasopoulos, George & Shang, Han Lin, 2011. "Optimal combination forecasts for hierarchical time series," Computational Statistics & Data Analysis, Elsevier, vol. 55(9), pages 2579-2589, September.
    4. Wang, Wei & Rothschild, David & Goel, Sharad & Gelman, Andrew, 2015. "Forecasting elections with non-representative polls," International Journal of Forecasting, Elsevier, vol. 31(3), pages 980-991.
    5. Katja Heinisch & Rolf Scheufele, 2018. "Bottom-up or direct? Forecasting German GDP in a data-rich environment," Empirical Economics, Springer, vol. 54(2), pages 705-745, March.
    6. Margaret Weden & Christine Peterson & Jeremy Miles & Regina Shih, 2015. "Evaluating Linearly Interpolated Intercensal Estimates of Demographic and Socioeconomic Characteristics of U.S. Counties and Census Tracts 2001–2009," Population Research and Policy Review, Springer;Southern Demographic Association (SDA), vol. 34(4), pages 541-559, August.
    7. Monteforte, Libero, 2007. "Aggregation bias in macro models: Does it matter for the euro area?," Economic Modelling, Elsevier, vol. 24(2), pages 236-261, March.
    8. Brüggemann, Ralf & Lütkepohl, Helmut, 2013. "Forecasting contemporaneous aggregates with stochastic aggregation weights," International Journal of Forecasting, Elsevier, vol. 29(1), pages 60-68.
    9. Frédérick Demers & Annie De Champlain, 2005. "Forecasting Core Inflation in Canada: Should We Forecast the Aggregate or the Components?," Staff Working Papers 05-44, Bank of Canada.
    10. Facchini, Giovanni & Hatton, Timothy J. & Steinhardt, Max F., 2024. "Opening Heaven’s Door: Public Opinion and Congressional Votes on the 1965 Immigration Act," The Journal of Economic History, Cambridge University Press, vol. 84(1), pages 232-270, March.
    11. Marcus P. A. Cobb, 2020. "Aggregate density forecasting from disaggregate components using Bayesian VARs," Empirical Economics, Springer, vol. 58(1), pages 287-312, January.
    12. Hubrich, Kirstin, 2005. "Forecasting euro area inflation: Does aggregating forecasts by HICP component improve forecast accuracy?," International Journal of Forecasting, Elsevier, vol. 21(1), pages 119-136.
    13. Colin Bermingham & Antonello D’Agostino, 2014. "Understanding and forecasting aggregate and disaggregate price dynamics," Empirical Economics, Springer, vol. 46(2), pages 765-788, March.
    14. Hendry, David F. & Hubrich, Kirstin, 2011. "Combining Disaggregate Forecasts or Combining Disaggregate Information to Forecast an Aggregate," Journal of Business & Economic Statistics, American Statistical Association, vol. 29(2), pages 216-227.
    15. Caldarulo, Mattia & Mossberger, Karen & Howell, Anthony, 2023. "Community-wide broadband adoption and student academic achievement," Telecommunications Policy, Elsevier, vol. 47(1).
    16. Christopher Claassen & Richard Traunmüller, 2020. "Improving and Validating Survey Estimates of Religious Demography Using Bayesian Multilevel Models and Poststratification," Sociological Methods & Research, , vol. 49(3), pages 603-636, August.
    17. Barbara Batóg & Jacek Batóg, 2021. "Regional Government Revenue Forecasting: Risk Factors of Investment Financing," Risks, MDPI, vol. 9(12), pages 1-15, November.
    18. Jing Zeng, 2016. "Combining country-specific forecasts when forecasting Euro area macroeconomic aggregates," Empirica, Springer;Austrian Institute for Economic Research;Austrian Economic Association, vol. 43(2), pages 415-444, May.
    19. Francisco Craveiro Dias & Maximiano Pinheiro & António Rua, 2016. "A bottom-up approach for forecasting GDP in a data rich environment," Economic Bulletin and Financial Stability Report Articles and Banco de Portugal Economic Studies, Banco de Portugal, Economics and Research Department.
    20. Cerina, Roberto & Duch, Raymond, 2020. "Measuring public opinion via digital footprints," International Journal of Forecasting, Elsevier, vol. 36(3), pages 987-1002.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0260092. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.