Dealing with the biased effects issue when handling huge datasets: the case of INVALSI data

My bibliography Save this article

Dealing with the biased effects issue when handling huge datasets: the case of INVALSI data

Author

Listed:

E. Raffinetti
I. Romeo

Registered:

Emanuela Raffinetti

Abstract

The increasing prevalence of huge datasets addresses the research to appropriate statistical methods for solving troubles caused by their complexity. On the one hand, several techniques are mentioned in the literature, especially for the time-consuming and variables reduction issues. On the other, less debate is devoted to the statistical inference issue. Indeed, a large number of involved statistical units may lead to wrongly consider as significant variables without any actual impact on the phenomenon under study. This paper suggests a suitable subsampling procedure for the reduction of the number of statistical units and provides a novel index for the assessment of the significance effects. The proposal is validated by comparing results obtained from the analysis on the original data to those obtained from the proposed subsampling approach. The illustrative application focuses on the educational dataset made available by the National Committee for the Evaluation of the Italian Education Systems (INVALSI). This dataset collects information about the student features and achievements in Maths within the lower secondary schools of the Lombardy region (Italy). Due to the hierarchical structure of the data, a multilevel model is implemented with the purpose of investigating the effects of both individual and school factors on student Maths score.

Suggested Citation

E. Raffinetti & I. Romeo, 2015. "Dealing with the biased effects issue when handling huge datasets: the case of INVALSI data," Journal of Applied Statistics, Taylor & Francis Journals, vol. 42(12), pages 2554-2570, December.

Handle: RePEc:taf:japsta:v:42:y:2015:i:12:p:2554-2570
DOI: 10.1080/02664763.2015.1043867

Download full text from publisher

As the access to this document is restricted, you may want to

for a different version of it.

References listed on IDEAS

Kapetanios, George, 2010. "A Testing Procedure for Determining the Number of Factors in Approximate Factor Models With Large Datasets," Journal of Business & Economic Statistics, American Statistical Association, vol. 28(3), pages 397-409.
- George Kapetanios, 2005. "A Testing Procedure for Determining the Number of Factors in Approximate Factor Models with Large Datasets," Working Papers 551, Queen Mary University of London, School of Economics and Finance.
S. K. Vines, 2000. "Simple principal components," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 49(4), pages 441-451.
Hugh Chipman & Hong Gu, 2005. "Interpretable dimension reduction," Journal of Applied Statistics, Taylor & Francis Journals, vol. 32(9), pages 969-987.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Paolo Giudici & Emanuela Raffinetti, 2021. "Cyber risk ordering with rank-based statistical models," AStA Advances in Statistical Analysis, Springer;German Statistical Society, vol. 105(3), pages 469-484, September.
Malavasi, Matteo & Peters, Gareth W. & Shevchenko, Pavel V. & Trück, Stefan & Jang, Jiwook & Sofronov, Georgy, 2022. "Cyber risk frequency, severity and insurance viability," Insurance: Mathematics and Economics, Elsevier, vol. 106(C), pages 90-114.
Matteo Malavasi & Gareth W. Peters & Pavel V. Shevchenko & Stefan Truck & Jiwook Jang & Georgy Sofronov, 2021. "Cyber Risk Frequency, Severity and Insurance Viability," Papers 2111.03366, arXiv.org, revised Mar 2022.
Daniel Doz & Mara Cotič & Darjo Felda, 2023. "Random Forest Regression in Predicting Students’ Achievements and Fuzzy Grades," Mathematics, MDPI, vol. 11(19), pages 1-19, September.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Trendafilov, Nickolay T. & Vines, Karen, 2009. "Simple and interpretable discrimination," Computational Statistics & Data Analysis, Elsevier, vol. 53(4), pages 979-989, February.
T. F. Cox & D. S. Arnold, 2018. "Simple components," Journal of Applied Statistics, Taylor & Francis Journals, vol. 45(1), pages 83-99, January.
Nickolay Trendafilov, 2014. "From simple structure to sparse components: a review," Computational Statistics, Springer, vol. 29(3), pages 431-454, June.
GUO-FITOUSSI, Liang, 2013. "A Comparison of the Finite Sample Properties of Selection Rules of Factor Numbers in Large Datasets," MPRA Paper 50005, University Library of Munich, Germany.
Claudio Morana, 2014. "Factor Vector Autoregressive Estimation of Heteroskedastic Persistent and Non Persistent Processes Subject to Structural Breaks," Working Papers 273, University of Milano-Bicocca, Department of Economics, revised May 2014.
Matteo Barigozzi & Antonio M. Conti & Matteo Luciani, 2014. "Do Euro Area Countries Respond Asymmetrically to the Common Monetary Policy?," Oxford Bulletin of Economics and Statistics, Department of Economics, University of Oxford, vol. 76(5), pages 693-714, October.
- Barigozzi, Matteo & Conti, Antonio & Luciani, Matteo, 2012. "Do Euro area countries respond asymmetrically to the common monetary policy?," LSE Research Online Documents on Economics 43344, London School of Economics and Political Science, LSE Library.
- Matteo Luciani & Antoniomaria Conti & Matteo Barigozzi, 2013. "Do Euro Area Countries Respond Asymmetrically to the Common Monetary Policy?," ULB Institutional Repository 2013/153330, ULB -- Universite Libre de Bruxelles.
- Matteo Barigozzi & Antonio M. Conti & Matteo Luciani, 2013. "Do euro area countries respond asymmetrically to the common monetary policy?," Temi di discussione (Economic working papers) 923, Bank of Italy, Economic Research and International Relations Area.
Kim, Hyun Hak & Swanson, Norman R., 2018. "Mining big data using parsimonious factor, machine learning, variable selection and shrinkage methods," International Journal of Forecasting, Elsevier, vol. 34(2), pages 339-354.
Alain-Philippe Fortin & Patrick Gagliardini & O. Scaillet, 2022. "Eigenvalue tests for the number of latent factors in short panels," Swiss Finance Institute Research Paper Series 22-81, Swiss Finance Institute.
- Alain-Philippe Fortin & Patrick Gagliardini & Olivier Scaillet, 2022. "Eigenvalue tests for the number of latent factors in short panels," Papers 2210.16042, arXiv.org.
Wei, Jie & Chen, Hui, 2020. "Determining the number of factors in approximate factor models by twice K-fold cross validation," Economics Letters, Elsevier, vol. 191(C).
Yongfu Huang & Muhammad G. Quibria, 2015. "The global partnership for sustainable development," Natural Resources Forum, Blackwell Publishing, vol. 0(3-4), pages 157-174, August.
- Huang, Yongfu & Quibria, M. G., 2013. "The Global Partnership for Sustainable Development," WIDER Working Paper Series 057, World Institute for Development Economic Research (UNU-WIDER).
Gagliardini, Patrick & Ossola, Elisa & Scaillet, Olivier, 2019. "A diagnostic criterion for approximate factor structure," Journal of Econometrics, Elsevier, vol. 212(2), pages 503-521.
- Patrick Gagliardini & Elisa Ossola & Olivier Scaillet, 2016. "A diagnostic criterion for approximate factor structure," Papers 1612.04990, arXiv.org, revised Aug 2017.
- Patrick Gagliardini & Elisa Ossola & O. Scaillet, 2016. "A Diagnostic Criterion for Approximate Factor Structure," Swiss Finance Institute Research Paper Series 16-51, Swiss Finance Institute, revised Dec 2016.
Xu Cheng & Zhipeng Liao & Frank Schorfheide, 2016. "Shrinkage Estimation of High-Dimensional Factor Models with Structural Instabilities," The Review of Economic Studies, Review of Economic Studies Ltd, vol. 83(4), pages 1511-1543.
- Xu Cheng & Zhipeng Liao & Frank Schorfheide, 2013. "Shrinkage estimation of high-dimensional factor models with structural instabilities," Working Papers 14-4, Federal Reserve Bank of Philadelphia.
- Xu Cheng & Zhipeng Liao & Frank Schorfheide, 2014. "Shrinkage Estimation of High-Dimensional Factor Models with Structural Instabilities," NBER Working Papers 19792, National Bureau of Economic Research, Inc.
Tae-Hwy Lee & Ekaterina Seregina, 2024. "Optimal Portfolio Using Factor Graphical Lasso," Journal of Financial Econometrics, Oxford University Press, vol. 22(3), pages 670-695.
- Tae-Hwy Lee & Ekaterina Seregina, 2020. "Optimal Portfolio Using Factor Graphical Lasso," Papers 2011.00435, arXiv.org, revised Apr 2023.
- Tae-Hwy Lee & Ekaterina Seregina, 2023. "Optimal Portfolio Using Factor Graphical Lasso," Working Papers 202302, University of California at Riverside, Department of Economics.
- Tae-Hwy Lee & Ekaterina Seregina, 2020. "Optimal Portfolio Using Factor Graphical Lasso," Working Papers 202025, University of California at Riverside, Department of Economics.
Yunus Emre Ergemen & Carlos Vladimir Rodríguez-Caballero, 2016. "A Dynamic Multi-Level Factor Model with Long-Range Dependence," CREATES Research Papers 2016-23, Department of Economics and Business Economics, Aarhus University.
Gu, Shihao & Kelly, Bryan & Xiu, Dacheng, 2021. "Autoencoder asset pricing models," Journal of Econometrics, Elsevier, vol. 222(1), pages 429-450.
Claudio Barbieri & Mattia Guerini & Mauro Napoletano, 2021. "The anatomy of government bond yields synchronization in the Eurozone," SciencePo Working papers Main hal-03373853, HAL.
- Claudio Barbieri & Mattia Guerini & Mauro Napoletano, 2021. "The Anatomy of Government Bond Yields Synchronization in the Eurozone," LEM Papers Series 2021/07, Laboratory of Economics and Management (LEM), Sant'Anna School of Advanced Studies, Pisa, Italy.
- Claudio Barbieri & Mattia Guerini & Mauro Napoletano, 2021. "The anatomy of government bond yields synchronization in the Eurozone," Working Papers hal-03373853, HAL.
- Claudio Barbieri & Mattia Guerini & Mauro Napoletano, 2024. "The anatomy of government bond yields synchronization in the Eurozone," SciencePo Working papers Main hal-04530954, HAL.
- Claudio Barbieri & Mattia Guerini & Mauro Napoletano, 2021. "The Anatomy of Government Bond Yields Synchronization in the Eurozone," GREDEG Working Papers 2021-08, Groupe de REcherche en Droit, Economie, Gestion (GREDEG CNRS), Université Côte d'Azur, France.
- Claudio Barbieri & Mattia Guerini & Mauro Napoletano, 2024. "The anatomy of government bond yields synchronization in the Eurozone," Post-Print hal-04530954, HAL.
Linton, O. B. & Rücker, M. & Vogt, M. & Walsh, C., 2024. "Estimation and Inference in High-Dimensional Panel Data Models with Interactive Fixed Effects," Janeway Institute Working Papers 2429, Faculty of Economics, University of Cambridge.
Guowei Cui & Milda NorkutÄ— & Vasilis Sarafidis & Takashi Yamagata, 2022. "Two-stage instrumental variable estimation of linear panel data models with interactive effects [Eigenvalue ratio test for the number of factors]," The Econometrics Journal, Royal Economic Society, vol. 25(2), pages 340-361.
- Guowei Cui & Milda Norkuté & Vasilis Sarafidis & Takashi Yamagata, 2020. "Two-Stage Instrumental Variable Estimation of Linear Panel Data Models with Interactive Effects," ISER Discussion Paper 1101, Institute of Social and Economic Research, The University of Osaka.
- Milda Norkute & Guowei Cui & Vasilis Sarafidis & Takashi Yamagata, 2021. "Two-Stage Instrumental Variable Estimation of Linear Panel Data Models with Interactive Effects," Bank of Lithuania Working Paper Series 90, Bank of Lithuania.
- Cui, Guowei & Norkute, Milda & Sarafidis, Vasilis & Yamagata, Takashi, 2020. "Two-Stage Instrumental Variable Estimation of Linear Panel Data Models with Interactive Effects," MPRA Paper 102827, University Library of Munich, Germany.
Cynthia Fan Yang, 2021. "Common factors and spatial dependence: an application to US house prices," Econometric Reviews, Taylor & Francis Journals, vol. 40(1), pages 14-50, January.
- Yang, Cynthia Fan, 2017. "Common Factors and Spatial Dependence: An Application to US House Prices," MPRA Paper 89032, University Library of Munich, Germany, revised 20 Aug 2018.
Andres Sagner, 2020. "High Dimensional Quantile Factor Analysis," Working Papers Central Bank of Chile 886, Central Bank of Chile.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:taf:japsta:v:42:y:2015:i:12:p:2554-2570. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Chris Longhurst (email available below). General contact details of provider: http://www.tandfonline.com/CJAS20 .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Dealing with the biased effects issue when handling huge datasets: the case of INVALSI data

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data