IDEAS home Printed from https://ideas.repec.org/a/eee/csdana/v164y2021ics0167947321001420.html
   My bibliography  Save this article

Outlier detection in networks with missing links

Author

Listed:
  • Gaucher, Solenne
  • Klopp, Olga
  • Robin, Geneviève

Abstract

Outliers arise in networks due to different reasons such as fraudulent behaviour of malicious users or default in measurement instruments and can significantly impair network analyses. In addition, real-life networks are likely to be incompletely observed, with missing links due to individual non-response or machine failures. Therefore, identifying outliers in the presence of missing links is a crucial problem in network analysis. A new algorithm is introduced to detect outliers in a network and simultaneously predict the missing links. The proposed method is statistically sound: under fairly general assumptions, this algorithm exactly detects the outliers, and achieves the best known error for the prediction of missing links with polynomial computational cost. The sub-linear convergence of the algorithm is proven, which confirms its computational efficiency. A simulation study is provided, which demonstrates the good behaviour of the algorithm in terms of outlier detection and prediction of the missing links. The method is also illustrated with an application in epidemiology and with the analysis of a political Twitter network. The algorithm is freely available as an R package on the Comprehensive R Archive Network.

Suggested Citation

  • Gaucher, Solenne & Klopp, Olga & Robin, Geneviève, 2021. "Outlier detection in networks with missing links," Computational Statistics & Data Analysis, Elsevier, vol. 164(C).
  • Handle: RePEc:eee:csdana:v:164:y:2021:i:c:s0167947321001420
    DOI: 10.1016/j.csda.2021.107308
    as

    Download full text from publisher

    File URL: http://www.sciencedirect.com/science/article/pii/S0167947321001420
    Download Restriction: Full text for ScienceDirect subscribers only.

    File URL: https://libkey.io/10.1016/j.csda.2021.107308?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Douglas M. Hawkins, 1980. "Critical Values for Identifying Outliers," Journal of the Royal Statistical Society Series C, Royal Statistical Society, vol. 29(1), pages 95-96, March.
    2. Timothée Tabouy & Pierre Barbillon & Julien Chiquet, 2020. "Variational Inference for Stochastic Block Models From Sampled Data," Journal of the American Statistical Association, Taylor & Francis Journals, vol. 115(529), pages 455-466, January.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Follain, Bertille & Wang, Tengyao & Samworth, Richard J., 2022. "High-dimensional changepoint estimation with heterogeneous missingness," LSE Research Online Documents on Economics 115014, London School of Economics and Political Science, LSE Library.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Damian Przekop, 2020. "Feature Engineering for Anti-Fraud Models Based on Anomaly Detection," Central European Journal of Economic Modelling and Econometrics, Central European Journal of Economic Modelling and Econometrics, vol. 12(3), pages 301-316, September.
    2. Francesca Ieva & Anna Maria Paganoni, 2020. "Component-wise outlier detection methods for robustifying multivariate functional samples," Statistical Papers, Springer, vol. 61(2), pages 595-614, April.
    3. Andrzej Chmielowiec, 2021. "Algorithm for error-free determination of the variance of all contiguous subsequences and fixed-length contiguous subsequences for a sequence of industrial measurement data," Computational Statistics, Springer, vol. 36(4), pages 2813-2840, December.
    4. Marc Chataigner & Stéphane Crépey & Jiang Pu, 2020. "Nowcasting Networks," Post-Print hal-03910123, HAL.
    5. Greco, Salvatore & Ishizaka, Alessio & Tasiou, Menelaos & Torrisi, Gianpiero, 2019. "Sigma-Mu efficiency analysis: A methodology for evaluating units through composite indicators," European Journal of Operational Research, Elsevier, vol. 278(3), pages 942-960.
    6. David Juárez-Varón & Victoria Tur-Viñes & Alejandro Rabasa-Dolado & Kristina Polotskaya, 2020. "An Adaptive Machine Learning Methodology Applied to Neuromarketing Analysis: Prediction of Consumer Behaviour Regarding the Key Elements of the Packaging Design of an Educational Toy," Social Sciences, MDPI, vol. 9(9), pages 1-23, September.
    7. Stéphane Crépey & Lehdili Noureddine & Nisrine Madhar & Maud Thomas, 2022. "Anomaly Detection on Financial Time Series by Principal Component Analysis and Neural Networks," Working Papers hal-03777995, HAL.
    8. Zhongqiu Wang & Guan Yuan & Haoran Pei & Yanmei Zhang & Xiao Liu, 2020. "Unsupervised learning trajectory anomaly detection algorithm based on deep representation," International Journal of Distributed Sensor Networks, , vol. 16(12), pages 15501477209, December.
    9. Arata, Linda & Fabrizi, Enrico & Sckokai, Paolo, 2020. "A worldwide analysis of trend in crop yields and yield variability: Evidence from FAO data," Economic Modelling, Elsevier, vol. 90(C), pages 190-208.
    10. Wentao Yang & Huaxi He & Dongsheng Wei & Hao Chen, 2022. "Generating pseudo-absence samples of invasive species based on outlier detection in the geographical characteristic space," Journal of Geographical Systems, Springer, vol. 24(2), pages 261-279, April.
    11. Fournier, Nicholas PhD & Farid, Yashar Zeinali PhD & Patire, Anthony David PhD, 2021. "Potential Erroneous Degradation of High Occupancy Vehicle (HOV) Facilities," Institute of Transportation Studies, Research Reports, Working Papers, Proceedings qt3z76r7tj, Institute of Transportation Studies, UC Berkeley.
    12. Richter, Lucas & Lehna, Malte & Marchand, Sophie & Scholz, Christoph & Dreher, Alexander & Klaiber, Stefan & Lenk, Steve, 2022. "Artificial Intelligence for Electricity Supply Chain automation," Renewable and Sustainable Energy Reviews, Elsevier, vol. 163(C).
    13. Tommaso Barbariol & Enrico Feltresi & Gian Antonio Susto, 2020. "Self-Diagnosis of Multiphase Flow Meters through Machine Learning-Based Anomaly Detection," Energies, MDPI, vol. 13(12), pages 1-24, June.
    14. Puteri Paramita & Zuduo Zheng & Md Mazharul Haque & Simon Washington & Paul Hyland, 2018. "User satisfaction with train fares: A comparative analysis in five Australian cities," PLOS ONE, Public Library of Science, vol. 13(6), pages 1-26, June.
    15. Liqun Diao & Grace Y. Yi, 2023. "Classification Trees with Mismeasured Responses," Journal of Classification, Springer;The Classification Society, vol. 40(1), pages 168-191, April.
    16. Durgesh Samariya & Amit Thakkar, 2023. "A Comprehensive Survey of Anomaly Detection Algorithms," Annals of Data Science, Springer, vol. 10(3), pages 829-850, June.
    17. Sisman, S. & Aydinoglu, A.C., 2022. "Improving performance of mass real estate valuation through application of the dataset optimization and Spatially Constrained Multivariate Clustering Analysis," Land Use Policy, Elsevier, vol. 119(C).
    18. Marino, Maria Francesca & Pandolfi, Silvia, 2022. "Hybrid maximum likelihood inference for stochastic block models," Computational Statistics & Data Analysis, Elsevier, vol. 171(C).
    19. Gasser, Patrick, 2020. "A review on energy security indices to compare country performances," Energy Policy, Elsevier, vol. 139(C).
    20. Saint‐Clair Chabert‐Liddell & Pierre Barbillon & Sophie Donnet, 2022. "Impact of the mesoscale structure of a bipartite ecological interaction network on its robustness through a probabilistic modeling," Environmetrics, John Wiley & Sons, Ltd., vol. 33(2), March.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:eee:csdana:v:164:y:2021:i:c:s0167947321001420. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Catherine Liu (email available below). General contact details of provider: http://www.elsevier.com/locate/csda .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.