IDEAS home Printed from https://ideas.repec.org/p/ucm/doicae/1920.html
   My bibliography  Save this paper

Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives

Author

Listed:
  • Vicente Nuñez-Antón

    (Department of Applied Economics III (Econometrics and Statistics), Faculty of Economics and Business, University of the Basque Country UPV/EHU, Bilbao. (Spain).)

  • Juan Manuel Pérez-Salamero González

    (Department of Financial Economics and Actuarial Science, Faculty of Economics, University of Valencia, Valencia. (Spain).)

  • Marta Regúlez-Castillo

    (Department of Applied Economics III (Econometrics and Statistics), Faculty of Economics and Business, University of the Basque Country UPV/EHU, Bilbao. (Spain).)

  • Carlos Vidal-Meliá

    (Department of Financial Economics and Actuarial Science, Faculty of Economics, University of Valencia, Valencia (Spain) and research affiliation with the Instituto Complutense de Análisis Económico (ICAE), Complutense University of Madrid (Spain) and the Centre of Excellence in Population Ageing Research (CEPAR), UNSW (Australia).)

Abstract

This paper develops an optimization model for selecting a large subsample that improves the representativeness of a simple random sample previously obtained from a population larger than the population of interest. The problem formulation involves convex mixed-integer nonlinear programming (convex MINLP) and is therefore NP-hard. However, the solution is found by maximizing the “constant of proportionality” – in other words, maximizing the size of the subsample taken from a stratified random sample with proportional allocation – and restricting it to a p-value high enough to achieve a good fit to the population of interest using Pearson’s chi-square goodness-of-fit test. The beauty of the model is that it gives the user the freedom to choose between a larger subsample with a poorer fit and a smaller subsample with a better fit. The paper also applies the model to a real case: The Continuous Sample of Working Lives (CSWL), which is a set of anonymized microdata containing information on individuals from Spanish Social Security records. Several waves (2005-2017) are first examined without using the model and the conclusion is that they are not representative of the target population, which in this case is people receiving a pension income. The model is then applied and the results prove that it is possible to obtain a large dataset from the CSWL that (far) better represents the pensioner population for each of the waves analysed.

Suggested Citation

  • Vicente Nuñez-Antón & Juan Manuel Pérez-Salamero González & Marta Regúlez-Castillo & Carlos Vidal-Meliá, 2019. "Improving the representativeness of a simple random sample: an optimization model and its application to the Continuous Sample of Working Lives," Documentos de Trabajo del ICAE 2019-20, Universidad Complutense de Madrid, Facultad de Ciencias Económicas y Empresariales, Instituto Complutense de Análisis Económico.
  • Handle: RePEc:ucm:doicae:1920
    as

    Download full text from publisher

    File URL: https://eprints.ucm.es/id/eprint/55423/1/1920.pdf
    Download Restriction: no
    ---><---

    Other versions of this item:

    References listed on IDEAS

    as
    1. Catalina Amuedo-Dorantes & Cristina Borra, 2013. "On the differential impact of the recent economic downturn on work safety by nativity: the Spanish experience," IZA Journal of Migration and Development, Springer;Forschungsinstitut zur Zukunft der Arbeit GmbH (IZA), vol. 2(1), pages 1-26, December.
    2. Brindusa Anghel & Henrique Basso & Olympia Bover & José María Casado & Laura Hospido & Mario Izquierdo & Ivan A. Kataryniuk & Aitor Lacuesta & José Manuel Montero & Elena Vozmediano, 2018. "Income, consumption and wealth inequality in Spain," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 9(4), pages 351-387, November.
    3. Ignacio Moral-Arce & Ció Patxot & Guadalupe Souto, 2008. "La sostenibilidad del sistema de pensiones. una aproximación a partir de la MCVL," Revista de Economia Aplicada, Universidad de Zaragoza, Departamento de Estructura Economica y Economia Publica, vol. 16(E-1), pages 29-66, Special N.
    4. Kontopantelis, Evangelos, 2013. "A Greedy Algorithm for Representative Sampling: repsample in Stata," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 55(c01).
    5. J Ignacio García-Pérez & Ioana Marinescu & Judit Vall Castello, 2019. "Can Fixed-term Contracts Put Low Skilled Youth on a Better Career Path? Evidence from Spain," The Economic Journal, Royal Economic Society, vol. 129(620), pages 1693-1730.
    6. Christian Dudel & María Andrée López Gómez & Fernando G. Benavides & Mikko Myrskylä, 2018. "The Length of Working Life in Spain: Levels, Recent Trends, and the Impact of the Financial Crisis," European Journal of Population, Springer;European Association for Population Studies, vol. 34(5), pages 769-791, December.
    7. José María Arranz & Carlos García-Serrano, 2014. "Duration of Joblessness and Long-term Unemployment: Is Duration as Long as Official Statistics Say?," AIEL Series in Labour Economics, in: Miguel Ángel Malo & Dario Sciulli (ed.), Disadvantaged Workers, edition 127, chapter 0, pages 297-320, Springer.
    8. Patricia Peinado & Felipe Serrano, 2011. "A Dynamic Analysis of the Effect of Social Security Reform on Spanish Widow Pensioners," Panoeconomicus, Savez ekonomista Vojvodine, Novi Sad, Serbia, vol. 58(5), pages 759-771, December.
    9. Jorge De La Roca & Diego Puga, 2017. "Learning by Working in Big Cities," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 84(1), pages 106-142.
    10. Stéphane Bonhomme & Laura Hospido, 2017. "The Cycle of Earnings Inequality: Evidence from Spanish Social Security Data," Economic Journal, Royal Economic Society, vol. 127(603), pages 1244-1278, August.
    11. Marie, Olivier & Vall Castello, Judit, 2012. "Measuring the (income) effect of disability insurance generosity on labour market participation," Journal of Public Economics, Elsevier, vol. 96(1), pages 198-210.
    12. Anton Grafström & Lina Schelin, 2014. "How to Select Representative Samples," Scandinavian Journal of Statistics, Danish Society for Theoretical Statistics;Finnish Statistical Society;Norwegian Statistical Association;Swedish Statistical Association, vol. 41(2), pages 277-290, June.
    13. José A. Dí az-Garcí a & Rogelio Ramos-Quiroga, 2012. "Optimum allocation in multivariate stratified random sampling: stochastic matrix mathematical programming," Statistica Neerlandica, Netherlands Society for Statistics and Operations Research, vol. 66(4), pages 492-511, November.
    14. J. Ignacio Conde-Ruiz & Clara I. González, 2016. "From Bismarck to Beveridge: the other pension reform in Spain," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 7(4), pages 461-490, November.
    15. Raquel Vegas Sánchez & Isabel Argimón & Marta Botella & Clara González, 2013. "Old age pensions and retirement in Spain," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 4(3), pages 273-307, August.
    16. Alicia Gómez–Tello & Rosella Nicolini, 2017. "Immigration and productivity: a Spanish tale," Journal of Productivity Analysis, Springer, vol. 47(2), pages 167-183, April.
    17. Álvarez de Toledo, Pablo & Núñez, Fernando & Usabiaga, Carlos, 2014. "An empirical approach on labour segmentation. Applications with individual duration data," Economic Modelling, Elsevier, vol. 36(C), pages 252-267.
    18. Carlos Vidal-Meliá & María del Carmen Boado-Penas & Ole Settergren, 2009. "Automatic Balance Mechanisms in Pay-As-You-Go Pension Systems," The Geneva Papers on Risk and Insurance - Issues and Practice, Palgrave Macmillan;The Geneva Association, vol. 34(2), pages 287-317, April.
    19. Valliant, Richard & Gentle, James E., 1997. "An application of mathematical programming to sample allocation," Computational Statistics & Data Analysis, Elsevier, vol. 25(3), pages 337-360, August.
    20. José Mª Arranz & Carlos García-Serrano & Virginia Hernanz, 2012. "How do we pursue labormetrics? An application using the MCVL," Alcamentos 1201, Universidad de Alcalá, Departamento de Economía., revised 2013.
    21. Sophie Baillargeon & Louis‐Paul Rivest, 2009. "A General Algorithm for Univariate Stratification," International Statistical Review, International Statistical Institute, vol. 77(3), pages 331-344, December.
    22. St鰨ane Bonhomme & Laura Hospido, 2013. "Earnings inequality in Spain: new evidence using tax data," Applied Economics, Taylor & Francis Journals, vol. 45(30), pages 4212-4225, October.
    23. Rebollo-Sanz, Yolanda, 2012. "Unemployment insurance and job turnover in Spain," Labour Economics, Elsevier, vol. 19(3), pages 403-426.
    24. Pablo Álvarez de Toledo & Fernando Núñez & Carlos Usabiaga, 2011. "An Empirical Analysis of the Matching Process in the Spanish Public Employment Agencies: The Vacancies," Working Papers 11.03, Universidad Pablo de Olavide, Department of Economics.
    25. Raquel Carrasco & J. Ignacio García-Pérez, 2015. "Employment Dynamics Of Immigrants Versus Natives: Evidence From The Boom-Bust Period In Spain, 2000–2011," Economic Inquiry, Western Economic Association International, vol. 53(2), pages 1038-1060, April.
    26. Inmaculada Cebrián & Gloria Moreno, 2015. "The Effects of Gender Differences in Career Interruptions on the Gender Wage Gap in Spain," Feminist Economics, Taylor & Francis Journals, vol. 21(4), pages 1-27, October.
    27. José Arranz & Carlos García-Serrano, 2014. "Duration and Recurrence of Unemployment Benefits," Journal of Labor Research, Springer, vol. 35(3), pages 271-295, September.
    28. José María Arranz & Carlos García-Serrano, 2014. "The interplay of the unemployment compensation system, fixed-term contracts and rehirings," International Journal of Manpower, Emerald Group Publishing Limited, vol. 35(8), pages 1236-1259, October.
    29. Alfonso R. Sanchez Martín & Virginia SanchezMarcos, 2010. "Demographic Change and Pension Reform in Spain: An Assessment in a Two-Earner, OLG Model," Fiscal Studies, Institute for Fiscal Studies, vol. 31(3), pages 405-452, September.
    30. Juan José Alonso Fernández & José Enrique Devesa Carpio & Mar Devesa Carpio & Inmaculada Domínguez Fabián & Borja Encinas Goenechea & Robert Meneu Gaya, 2018. "Towards an adequate and sustainable replacement rate in defined benefit pension systems: The case of Spain," International Social Security Review, John Wiley & Sons, vol. 71(1), pages 51-70, January.
    31. Ignacio García Pérez, J. & Osuna, Victoria, 2014. "Dual labour markets and the tenure distribution: Reducing severance pay or introducing a single contract," Labour Economics, Elsevier, vol. 29(C), pages 1-13.
    32. José María Arranz & Carlos García Serrano, 2011. "Are the MCVL tax data useful? Ideas for mining," Hacienda Pública Española / Review of Public Economics, IEF, vol. 199(4), pages 151-186, December.
    33. Juan Manuel Pérez-Salamero & Marta Regúlez Castillo & Carlos Vidal Meliá, 2016. "Análisis de la representatividad de la MCVL: el caso de las prestaciones del sistema público de pensiones," Hacienda Pública Española / Review of Public Economics, IEF, vol. 217(2), pages 67-130, June.
    34. Dudel, Christian & López Gómez, María Andrée & Benavides, Fernando G. & Myrskylä, Mikko, 2018. "The length of working life in Spain: levels, recent trends, and the impact of the financial crisis," LSE Research Online Documents on Economics 86990, London School of Economics and Political Science, LSE Library.
    35. González, Libertad & Ortega, Francesc, 2011. "How do very open economies adjust to large immigration flows? Evidence from Spanish regions," Labour Economics, Elsevier, vol. 18(1), pages 57-70, January.
    36. María Andrée López Gómez & Laura Serra & George L Delclos & Fernando G Benavides, 2017. "Employment history indicators and mortality in a nested case-control study from the Spanish WORKing life social security (WORKss) cohort," PLOS ONE, Public Library of Science, vol. 12(6), pages 1-15, June.
    37. Claudia D’Ambrosio & Andrea Lodi, 2013. "Mixed integer nonlinear programming tools: an updated practical overview," Annals of Operations Research, Springer, vol. 204(1), pages 301-320, April.
    38. Mingfeng Lin & Henry C. Lucas & Galit Shmueli, 2013. "Research Commentary ---Too Big to Fail: Large Samples and the p -Value Problem," Information Systems Research, INFORMS, vol. 24(4), pages 906-917, December.
    39. Stéphane Bonhomme & Laura Hospido, 2017. "The Cycle of Earnings Inequality: Evidence From Spanish Social Security Data," Economic Journal, Royal Economic Society, vol. 127(603), pages 1244-1278.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Anne M. Garvey & Juan Manuel Pérez-Salamero González & Manuel Ventura-Marco & Carlos Vidal-Meliá, 2021. "From “Table 29” to the actuarial balance sheet: is it really that big a leap?," Documentos de Trabajo del ICAE 2021-05, Universidad Complutense de Madrid, Facultad de Ciencias Económicas y Empresariales, Instituto Complutense de Análisis Económico.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Juan Manuel Pérez-Salamero González & Marta Regúlez-Castillo & Carlos Vidal-Meliá, 2017. "The continuous sample of working lives: improving its representativeness," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 8(1), pages 43-95, March.
    2. Cristina Lafuente, 2020. "Unemployment in administrative data using survey data as a benchmark," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 11(2), pages 115-153, June.
    3. Juan Manuel Pérez-Salamero & Marta Regúlez Castillo & Carlos Vidal Meliá, 2016. "Análisis de la representatividad de la MCVL: el caso de las prestaciones del sistema público de pensiones," Hacienda Pública Española / Review of Public Economics, IEF, vol. 217(2), pages 67-130, June.
    4. Samuel Bentolila & J. Ignacio García-Pérez & Marcel Jansen, 2017. "Are the Spanish Long-Term Unemployed Unemployable?," Working Papers wp2018_1707, CEMFI.
    5. Samuel Bentolila & J. Ignacio García-Pérez & Marcel Jansen, 2017. "Are the Spanish long-term unemployed unemployable?," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 8(1), pages 1-41, March.
    6. Garcia-Louzao, Jose & Hospido, Laura & Ruggieri, Alessandro, 2023. "Dual returns to experience," Labour Economics, Elsevier, vol. 80(C).
    7. Virginia Sanchez Marcos & Ezgi Kaya & Nezih Guner, 2017. "Labor Market Frictions and Lowest Low Fertility," 2017 Meeting Papers 1015, Society for Economic Dynamics.
    8. J. Ignacio Conde-Ruiz & Clara I. González, 2016. "From Bismarck to Beveridge: the other pension reform in Spain," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 7(4), pages 461-490, November.
    9. Cem Özgüzel, 2021. "The Cushioning Effect of Immigrant Mobility," CESifo Working Paper Series 9268, CESifo.
    10. Cem Özgüzel, 2020. "The Cushioning Effect of Immigrant Mobility: Evidence from the Great Recession in Spain," Working Papers halshs-03000365, HAL.
    11. Cristina Lafuente, 2018. "The best of the two worlds: assessing the use of administrative data for the study of employment," Edinburgh School of Economics Discussion Paper Series 286, Edinburgh School of Economics, University of Edinburgh.
    12. Cem Ozguzel, 2019. "Essays on migration and productivity [Essais sur les migrations et la productivité]," PSE-Ecole d'économie de Paris (Postprint) tel-03381203, HAL.
    13. Manuel Arellano & Stéphane Bonhomme & Micole De Vera & Laura Hospido & Siqi Wei, 2022. "Income risk inequality: Evidence from Spanish administrative records," Quantitative Economics, Econometric Society, vol. 13(4), pages 1747-1801, November.
    14. Florentino Felgueroso & José-Ignacio García-Pérez & Marcel Jansen & David Troncoso-Ponce, 2018. "The Surge in Short-Duration Contracts in Spain," De Economist, Springer, vol. 166(4), pages 503-534, December.
    15. J. Ignacio Conde-Ruiz & Manu García & Luis A. Puch & Jesús Ruiz, 2019. "Calendar effects in daily aggregate employment creation and destruction in Spain," SERIEs: Journal of the Spanish Economic Association, Springer;Spanish Economic Association, vol. 10(1), pages 25-63, March.
    16. Amparo Nagore García & Arthur van Soest, 2017. "New job matches and their stability before and during the crisis," International Journal of Manpower, Emerald Group Publishing Limited, vol. 38(7), pages 975-995, October.
    17. Amuedo-Dorantes Catalina & Borra Cristina, 2017. "Retirement Decisions in Recessionary Times: Evidence from Spain," The B.E. Journal of Economic Analysis & Policy, De Gruyter, vol. 17(2), pages 1-21, April.
    18. Julián Costas-Fernández & Simón Lodato, 2022. "Inequality, poverty and the composition of redistribution," Social Choice and Welfare, Springer;The Society for Social Choice and Welfare, vol. 59(4), pages 925-967, November.
    19. Camarero Garcia, Sebastian & Hansch, Michelle, 2020. "The effect of unemployment insurance benefits on (self-)employment: Two sides of the same coin?," ZEW Discussion Papers 20-062, ZEW - Leibniz Centre for European Economic Research.
    20. David R. Agrawal & Dirk Foremny, 2019. "Relocation of the Rich: Migration in Response to Top Tax Rate Changes from Spanish Reforms," The Review of Economics and Statistics, MIT Press, vol. 101(2), pages 214-232, May.

    More about this item

    Keywords

    Optimization; Subsampling; Chi-square test; P-value; Continuous Sample of Working Lives.;
    All these keywords.

    JEL classification:

    • C61 - Mathematical and Quantitative Methods - - Mathematical Methods; Programming Models; Mathematical and Simulation Modeling - - - Optimization Techniques; Programming Models; Dynamic Analysis
    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C12 - Mathematical and Quantitative Methods - - Econometric and Statistical Methods and Methodology: General - - - Hypothesis Testing: General
    • H55 - Public Economics - - National Government Expenditures and Related Policies - - - Social Security and Public Pensions
    • J26 - Labor and Demographic Economics - - Demand and Supply of Labor - - - Retirement; Retirement Policies

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:ucm:doicae:1920. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Águeda González Abad (email available below). General contact details of provider: https://edirc.repec.org/data/feucmes.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.