IDEAS home Printed from https://ideas.repec.org/a/prs/ecoprv/ecop_0249-4744_1995_num_119_3_5738.html
   My bibliography  Save this article

Nettoyage de fichiers dans le cas de données individuelles : recherche de la cohérence transversale

Author

Listed:
  • Elizabeth Kremp

Abstract

[fre] Nettoyage de fichiers dans le cas de données individuelles : recherche de la cohérence transversale . par Elizabeth Kremp . avoir précisé les notions de valeurs aberrantes et de valeurs extrêmes, cet article rappelle les outils statistiques et présente différentes univariées permettant d'identifier ces valeurs. Huit techniques construites à partir de ces outils et de ces méthodes sont ensuite sur un fichier de données d'entreprises pour un ratio. Un des enseignements de ces tests est qu'il faut utiliser des statistiques robustes les méthodes cherchant à identifier les points aberrants. Ensuite, l'application de trois de ces techniques à sept ratios, permet de les d'évaluer le rôle du choix des ratios et de mesurer les phénomènes cumulatifs d'élimination d'observations. Deux d'entre elles des résultats très proches. La plus simple à mettre en œuvre supprime les observations situées à plus de trois intervalles du premier et du troisième quartiles. Cependant, si la distribution de la vraie population pour la variable étudiée est très d'une distribution normale, alors cette technique peut conduire à éliminer trop d'observations, et une variante qui supprime les à plus de cinq intervalles interquartiles semble préférable. [spa] Limpieza de los fîcheros en el caso de datos individuales : busqueda de una coherencia transversal . por Elisabeth Kremp . Tras haber precisado las nociones de valores aberrantes y de valores extremos, este articulo resena los instrumentos estadisticos y présenta diferentes métodos univariados que permiten définir estos valores. Ocho técnicas construidas a partir de estos instrumentos y de estos métodos son luego sometidas a test a partir de un fïchero de datos de empresas ,en vista de la elaboraciôn de un ratio. Una de las ensenanzas que se desprenden de estos tests es la necesidad de utilizar estadfsticas robustas en los métodos destinados a la identificaciôn de los puntos aberrantes. A continuaciôn, la aplicaciôn de très de estas técnicas para siete ratios, permite compararlos, evaluar el papel de la elecciôn de estos ratios y medir los fenômenos cumulativos de eliminaciôn de observaciones. Dos de estas técnicas conducen a resultados muy similares. La mas simple a ser ejecutada suprime las observaciones situadas a mas de très intevalos intercuartiles del primer y del tercer cuartil. Sin embargo, si la distribution de la verdadera poblaciôn para la variable estudiada esta muy alejada de una distribution normal, entonces esta técnica puede conducir a la eliminaciôn de demasiadas observaciones, y una variante que suprime las observaciones situadas a mâs de cinco intervalos intercuartiles pareciera ser preferible. [eng] Cleaning Files Containing Individual Data: The Search for Transversal Consistency . by Elizabeth Kremp . This article first defines the notions of aberrant values and extreme values. It then describes the statistical tools and presents different univaried methods for identifying these values. Eight techniques based on these tools and methods are tested on a file of company data for one ratio. One of the conclusions of these tests is that robust statistics need to be used in the methods seeking to identify aberrant points. Three of these techniques are applied to seven ratios for a comparison, evaluation of the role of the choice of ratios and measurement of the cumulative observation elimination phenomena. Two of these techniques produce very similar results. The easiest technique to apply eliminates the observations situated at more than three interquartile intervals from the first and third quartiles. However, if the distribution of the real population for the variable studied differs greatly from a normal distribution, this technique can eliminate too many observations. In this case, a variant that eliminates the observations at more than five interquartile intervals would appear preferable. [ger] Sâuberung yon Dateien im Falle personenbezogener Daten: Streben nach transversaler Koharenz . von Elisabeth Kremp . Nachdem die Begriffe der Abweichungs- und Extremwerte bestimmt worden sind, erinnert dieser Artikel an die statistischen Instrumente und pràsentiert verschiedene univariate Methoden, mit denen sich diese Werte ermitteln lassen. Anhand einer Datei mit Unternehmensdaten fur eine Kennzahl werden danach acht Verfahren getestet, die auf der Grundlage dieser Instrumente und dieser Methoden erstellt wurden. Eine der Lehren, die sich aus diesen Tests ziehen lassen, lautet, daB bei den Methoden, die der Bestimmung der Abweichungspunkte dienen, solide Statistiken benutzt werden mussen. AnschlieBend ermôglicht es die Anwendung von drei dieser Verfahren auf sieben Kennzahlen, diese miteinander zu vergleichen, die Rolle der Wahl der Kennzahlen zu bewerten und die kumulativen Phanomene bei der Eliminierung von Beobachtungen zu messen. Zwei von ihnen fiihren zu recht âhnlichen Ergebnissen. Das Verfahren, das sich am einfachsten anwenden laBt, eliminiert die Beobachtungen, die mehr als drei Quartilabstande vom ersten und vom dritten Quartil entfernt liegen. Wenn die Verteilung der wirklichen Population fur die untersuchte Variable allzusehr von einer normalen Verteilung abweicht, kann dieses Verfahren jedoch dazu fiihren, daB zu viele Beobachtungen eliminiert werden. Wie es scheint, ist eine Variante, die die mehr als funf Quartilabstande entfernt liegenden Beobachtungen eliminiert, zu bevorzugen.

Suggested Citation

  • Elizabeth Kremp, 1995. "Nettoyage de fichiers dans le cas de données individuelles : recherche de la cohérence transversale," Économie et Prévision, Programme National Persée, vol. 119(3), pages 171-193.
  • Handle: RePEc:prs:ecoprv:ecop_0249-4744_1995_num_119_3_5738
    DOI: 10.3406/ecop.1995.5738
    Note: DOI:10.3406/ecop.1995.5738
    as

    Download full text from publisher

    File URL: https://doi.org/10.3406/ecop.1995.5738
    Download Restriction: no

    File URL: https://www.persee.fr/doc/ecop_0249-4744_1995_num_119_3_5738
    Download Restriction: no

    File URL: https://libkey.io/10.3406/ecop.1995.5738?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Krasker, William S. & Kuh, Edwin & Welsch, Roy E., 1983. "Estimation for dirty data and flawed models," Handbook of Econometrics, in: Z. Griliches† & M. D. Intriligator (ed.), Handbook of Econometrics, edition 1, volume 1, chapter 11, pages 651-698, Elsevier.
    2. Jacques Mairesse & Elizabeth Kremp, 1993. "A look at productivity at the firm level in eight French service industries," Journal of Productivity Analysis, Springer, vol. 4(1), pages 211-234, June.
    3. William Gould & Ali S. Hadi, 1993. "Identifying multivariate outliers," Stata Technical Bulletin, StataCorp LP, vol. 2(11).
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Gilbert Cette & Simon Corde & Rémy Lecat, 2017. "Stagnation of productivity in France: A legacy of the crisis or a structural slowdown?," Economie et Statistique / Economics and Statistics, Institut National de la Statistique et des Etudes Economiques (INSEE), issue 494-495-4, pages 11-36.
    2. Paul-Antoine Chevalier & Rémy Lecat & Nicholas Oulton, 2009. "Convergence of Firm-Level Productivity, Globalisation, Information Technology and Competition: Evidence from France," CEP Discussion Papers dp0916, Centre for Economic Performance, LSE.
    3. Rym Ben Ayed Mouelhi & Mohamed Goaied, 2001. "Efficience technique et incitations salariales. Analyse empirique sur un panel incomplet des industries textiles en Tunisie," Economie & Prévision, La Documentation Française, vol. 148(2), pages 99-111.
    4. Gilbert Cette & Sandra Nevoux & Loriane Py, 2022. "The impact of ICTs and digitalization on productivity and labor share: evidence from French firms," Economics of Innovation and New Technology, Taylor & Francis Journals, vol. 31(8), pages 669-692, November.
    5. Francesco Daveri & Rémy Lecat & Maria Laura Parisi, 2016. "Service Deregulation, Competition, and the Performance of French and Italian Firms," Scottish Journal of Political Economy, Scottish Economic Society, vol. 63(3), pages 278-302, July.
    6. Claude Mathieu & Yann Nicolas, 2006. "Coûts d'ajustement de la demande de travail : une comparaison entre la France et la République tchèque," Economie & Prévision, La Documentation Française, vol. 0(2), pages 135-152.
    7. Justine Valette & Paul Amadieu & Patrick Sentis, 2018. "Les coopératives résistent-elles mieux ? Une analyse de survie des coopératives agricoles françaises," Post-Print hal-01990418, HAL.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Franco Peracchi, 1988. "Robust Estimators of Regression," UCLA Economics Working Papers 476, UCLA Department of Economics.
    2. Almas Heshmati, 2003. "Productivity Growth, Efficiency and Outsourcing in Manufacturing and Service Industries," Journal of Economic Surveys, Wiley Blackwell, vol. 17(1), pages 79-112, February.
    3. Par Hansson & Magnus Henrekson, 1994. "What makes a country socially capable of catching up?," Review of World Economics (Weltwirtschaftliches Archiv), Springer;Institut für Weltwirtschaft (Kiel Institute for the World Economy), vol. 130(4), pages 760-783, December.
    4. Ben Abdallah, Khaled & Belloumi, Mounir & De Wolf, Daniel, 2015. "International comparisons of energy and environmental efficiency in the road transport sector," Energy, Elsevier, vol. 93(P2), pages 2087-2101.
    5. Sule Ozler & James Harrigan, 1988. "Export Instability and Growth," UCLA Economics Working Papers 486, UCLA Department of Economics.
    6. Billor, Nedret & Hadi, Ali S. & Velleman, Paul F., 2000. "BACON: blocked adaptive computationally efficient outlier nominators," Computational Statistics & Data Analysis, Elsevier, vol. 34(3), pages 279-298, September.
    7. Hideki Toya & Mark Skidmore & Raymond Robertson, 2010. "A Reevaluation of the Effect of Human Capital Accumulation on Economic Growth Using Natural Disasters as an Instrument," Eastern Economic Journal, Palgrave Macmillan;Eastern Economic Association, vol. 36(1), pages 120-137.
    8. Ingco, Merlinda D. & Hilker, James H., 1988. "Michigan State University Agriculture Model: U.S. Livestock and Poultry Supply and Demand Component -- Model Structure, Specification, and Empirical Results," Agricultural Economic Report Series 201371, Michigan State University, Department of Agricultural, Food, and Resource Economics.
    9. Ali, Mukhtar M. & Sharma, Subhash C., 1996. "Robustness to nonnormality of regression F-tests," Journal of Econometrics, Elsevier, vol. 71(1-2), pages 175-205.
    10. Beggs, John J, 1988. "Diagnostic Testing in Applied Econometrics," The Economic Record, The Economic Society of Australia, vol. 64(185), pages 81-101, June.
    11. Bockerman, Petri & Maliranta, Mika, 2007. "The micro-level dynamics of regional productivity growth: The source of divergence in Finland," Regional Science and Urban Economics, Elsevier, vol. 37(2), pages 165-182, March.
    12. repec:zbw:bofitp:2007_021 is not listed on IDEAS
    13. Fornaro, Paolo & Luomaranta, Henri, 2017. "Small and Medium Firms, Aggregate Productivity and the Role of Dependencies," ETLA Working Papers 47, The Research Institute of the Finnish Economy.
    14. Mark C. Anderson & Rajiv D. Banker & Sury Ravindran, 2006. "Value Implications of Investments in Information Technology," Management Science, INFORMS, vol. 52(9), pages 1359-1376, September.
    15. Romero, Jorge A., 2022. "Lobbying and political expenses: Complements or substitutes?," Journal of Business Research, Elsevier, vol. 149(C), pages 558-575.
    16. Mark C. Anderson & Rajiv D. Banker & Sury Ravindran, 2000. "Executive Compensation in the Information Technology Industry," Management Science, INFORMS, vol. 46(4), pages 530-547, April.
    17. Zaman, Asad & Rousseeuw, Peter J. & Orhan, Mehmet, 2001. "Econometric applications of high-breakdown robust regression techniques," Economics Letters, Elsevier, vol. 71(1), pages 1-8, April.
    18. Rincke, Johannes, 2005. "Neighborhood Influence and Political Change: Evidence from US School Districts," ZEW Discussion Papers 05-16, ZEW - Leibniz Centre for European Economic Research.
    19. Bjorklund, Anders & Chadwick, Laura, 2003. "Intergenerational income mobility in permanent and separated families," Economics Letters, Elsevier, vol. 80(2), pages 239-246, August.
    20. Dollar, David, 1990. "Economic Reform and Allocative Efficiency in China's State-Owned Industry," Economic Development and Cultural Change, University of Chicago Press, vol. 39(1), pages 89-105, October.
    21. Meier, Carsten-Patrick, 2004. "Investigating the impact of an appreciation of the euro in a small macroeconometric model of Germany and the euro area," Kiel Working Papers 1204, Kiel Institute for the World Economy (IfW Kiel).

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:prs:ecoprv:ecop_0249-4744_1995_num_119_3_5738. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Equipe PERSEE (email available below). General contact details of provider: https://www.persee.fr/collection/ecop .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.