IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v16y2025i1d10.1038_s41467-025-63751-1.html
   My bibliography  Save this article

Best practices and tools in R and Python for statistical processing and visualization of lipidomics and metabolomics data

Author

Listed:
  • Jakub Idkowiak

    (University of Pardubice
    KU Leuven)

  • Jonas Dehairs

    (KU Leuven)

  • Jana Schwarzerová

    (Brno University of Technology
    University of Vienna
    University Hospital Ostrava)

  • Dominika Olešová

    (Slovak Academy of Sciences
    Slovak Academy of Sciences)

  • Jacob X. M. Truong

    (North Terrace
    North Terrace)

  • Aleš Kvasnička

    (Palacký University Olomouc
    Oslo University Hospital)

  • Marios Eftychiou

    (VIB-KU Leuven Center for Cancer Biology
    VIB Center for AI & Computational Biology
    KU Leuven
    KU Leuven)

  • Ruben Cools

    (VIB-KU Leuven Center for Cancer Biology
    VIB Center for AI & Computational Biology
    KU Leuven)

  • Xander Spotbeen

    (KU Leuven)

  • Robert Jirásko

    (University of Pardubice)

  • Vullnet Veseli

    (University of Pardubice)

  • Marco Giampà

    (KU Leuven
    VIB-KU Leuven Center for Cancer Biology)

  • Vincent de Laat

    (KU Leuven)

  • Lisa M. Butler

    (North Terrace
    North Terrace)

  • Wolfram Weckwerth

    (University of Vienna
    University of Vienna)

  • David Friedecký

    (Palacký University Olomouc)

  • Jonas Demeulemeester

    (VIB-KU Leuven Center for Cancer Biology
    VIB Center for AI & Computational Biology
    KU Leuven)

  • Karel Hron

    (Palacký University Olomouc)

  • Johannes V. Swinnen

    (KU Leuven)

  • Michal Holčapek

    (University of Pardubice)

Abstract

Mass spectrometry-based lipidomics and metabolomics generate extensive data sets that, along with metadata such as clinical parameters, require specific data exploration skills to identify and visualize statistically significant trends and biologically relevant differences. Besides tailored methods developed by individual labs, a solid core of freely accessible tools exists for exploratory data analysis and visualization, which we have compiled here, including preparation of descriptive statistics, annotated box plots, hypothesis testing, volcano plots, lipid maps and fatty acyl chain plots, unsupervised and supervised dimensionality reduction, dendrograms, and heat maps. This review is intended for those who would like to develop their skills in data analysis and visualization using freely available R or Python solutions. Beginners are guided through a selection of R and Python libraries for producing publication-ready graphics without being overwhelmed by the code complexity. This manuscript, along with associated GitBook code repository containing step-by-step instructions, offers readers a comprehensive guide, encouraging the application of R and Python for robust and reproducible chemometric analysis of omics data.

Suggested Citation

  • Jakub Idkowiak & Jonas Dehairs & Jana Schwarzerová & Dominika Olešová & Jacob X. M. Truong & Aleš Kvasnička & Marios Eftychiou & Ruben Cools & Xander Spotbeen & Robert Jirásko & Vullnet Veseli & Marco, 2025. "Best practices and tools in R and Python for statistical processing and visualization of lipidomics and metabolomics data," Nature Communications, Nature, vol. 16(1), pages 1-19, December.
  • Handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-63751-1
    DOI: 10.1038/s41467-025-63751-1
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-025-63751-1
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-025-63751-1?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Kuhn, Max, 2008. "Building Predictive Models in R Using the caret Package," Journal of Statistical Software, Foundation for Open Access Statistics, vol. 28(i05).
    2. Xiaotao Shen & Hong Yan & Chuchu Wang & Peng Gao & Caroline H. Johnson & Michael P. Snyder, 2022. "TidyMass an object-oriented reproducible analysis framework for LC–MS data," Nature Communications, Nature, vol. 13(1), pages 1-12, December.
    3. Hubert, M. & Vandervieren, E., 2008. "An adjusted boxplot for skewed distributions," Computational Statistics & Data Analysis, Elsevier, vol. 52(12), pages 5186-5201, August.
    4. Florian Rohart & Benoît Gautier & Amrit Singh & Kim-Anh Lê Cao, 2017. "mixOmics: An R package for ‘omics feature selection and multiple data integration," PLOS Computational Biology, Public Library of Science, vol. 13(11), pages 1-19, November.
    5. Habtamu B Beyene & Gavriel Olshansky & Adam Alexander T. Smith & Corey Giles & Kevin Huynh & Michelle Cinel & Natalie A Mellett & Gemma Cadby & Joseph Hung & Jennie Hui & John Beilby & Gerald F Watts , 2020. "High-coverage plasma lipidomics reveals novel sex-specific lipidomic fingerprints of age and BMI: Evidence from two large population cohort studies," PLOS Biology, Public Library of Science, vol. 18(9), pages 1-37, September.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Mia Hubert & Peter Rousseeuw & Pieter Segaert, 2015. "Multivariate functional outlier detection," Statistical Methods & Applications, Springer;Società Italiana di Statistica, vol. 24(2), pages 177-202, July.
    2. Prabal Das & D. A. Sachindra & Kironmala Chanda, 2022. "Machine Learning-Based Rainfall Forecasting with Multiple Non-Linear Feature Selection Algorithms," Water Resources Management: An International Journal, Published for the European Water Resources Association (EWRA), Springer;European Water Resources Association (EWRA), vol. 36(15), pages 6043-6071, December.
    3. Paulo Infante & Gonçalo Jacinto & Anabela Afonso & Leonor Rego & Pedro Nogueira & Marcelo Silva & Vitor Nogueira & José Saias & Paulo Quaresma & Daniel Santos & Patrícia Góis & Paulo Rebelo Manuel, 2023. "Factors That Influence the Type of Road Traffic Accidents: A Case Study in a District of Portugal," Sustainability, MDPI, vol. 15(3), pages 1-16, January.
    4. Ephrem Habyarimana & Faheem S Baloch, 2021. "Machine learning models based on remote and proximal sensing as potential methods for in-season biomass yields prediction in commercial sorghum fields," PLOS ONE, Public Library of Science, vol. 16(3), pages 1-23, March.
    5. Crespo, Cristian, 2020. "Two become one: improving the targeting of conditional cash transfers with a predictive model of school dropout," LSE Research Online Documents on Economics 123139, London School of Economics and Political Science, LSE Library.
    6. Alexander Wettstein & Gabriel Jenni & Ida Schneider & Fabienne Kühne & Martin grosse Holtforth & Roberto La Marca, 2023. "Predictors of Psychological Strain and Allostatic Load in Teachers: Examining the Long-Term Effects of Biopsychosocial Risk and Protective Factors Using a LASSO Regression Approach," IJERPH, MDPI, vol. 20(10), pages 1-20, May.
    7. Tang, Kayu & Parsons, David J. & Jude, Simon, 2019. "Comparison of automatic and guided learning for Bayesian networks to analyse pipe failures in the water distribution system," Reliability Engineering and System Safety, Elsevier, vol. 186(C), pages 24-36.
    8. Matthew S. Bramble & Victor Fourcassié & Neerja Vashist & Florence Roux-Dalvai & Yun Zhou & Guy Bumoko & Michel Lupamba Kasendue & D’Andre Spencer & Hilaire Musasa Hanshi-Hatuhu & Vincent Kambale-Mast, 2024. "Glutathione peroxidase 3 is a potential biomarker for konzo," Nature Communications, Nature, vol. 15(1), pages 1-14, December.
    9. Daifeng Xiang & Gangsheng Wang & Jing Tian & Wanyu Li, 2023. "Global patterns and edaphic-climatic controls of soil carbon decomposition kinetics predicted from incubation experiments," Nature Communications, Nature, vol. 14(1), pages 1-14, December.
    10. Joel Podgorski & Oliver Kracht & Luis Araguas-Araguas & Stefan Terzer-Wassmuth & Jodie Miller & Ralf Straub & Rolf Kipfer & Michael Berg, 2024. "Groundwater vulnerability to pollution in Africa’s Sahel region," Nature Sustainability, Nature, vol. 7(5), pages 558-567, May.
    11. Vincenzo Verardi, 2013. "Semiparametric regression in Stata," United Kingdom Stata Users' Group Meetings 2013 14, Stata Users Group.
    12. Tranos, Emmanouil & Incera, Andre Carrascal & Willis, George, 2022. "Using the web to predict regional trade flows: data extraction, modelling, and validation," OSF Preprints 9bu5z, Center for Open Science.
    13. Štefan Lyócsa & Petra Vašaničová & Branka Hadji Misheva & Marko Dávid Vateha, 2022. "Default or profit scoring credit systems? Evidence from European and US peer-to-peer lending markets," Financial Innovation, Springer;Southwestern University of Finance and Economics, vol. 8(1), pages 1-21, December.
    14. Finger, Robert, 2012. "Biases in Farm-Level Yield Risk Analysis due to Data Aggregation," German Journal of Agricultural Economics, Humboldt-Universitaet zu Berlin, Department for Agricultural Economics, vol. 61(01), pages 1-14, February.
    15. Marcos Rodrigues & Fermín Alcasena & Pere Gelabert & Cristina Vega‐García, 2020. "Geospatial Modeling of Containment Probability for Escaped Wildfires in a Mediterranean Region," Risk Analysis, John Wiley & Sons, vol. 40(9), pages 1762-1779, September.
    16. Siyu Han & Shixiang Yu & Mengya Shi & Makoto Harada & Jianhong Ge & Jiesheng Lin & Cornelia Prehn & Agnese Petrera & Ying Li & Flora Sam & Giuseppe Matullo & Jerzy Adamski & Karsten Suhre & Christian , 2025. "LEOPARD: missing view completion for multi-timepoint omics data via representation disentanglement and temporal knowledge transfer," Nature Communications, Nature, vol. 16(1), pages 1-20, December.
    17. Shirin Enshaeifar & Ahmed Zoha & Andreas Markides & Severin Skillman & Sahr Thomas Acton & Tarek Elsaleh & Masoud Hassanpour & Alireza Ahrabian & Mark Kenny & Stuart Klein & Helen Rostill & Ramin Nilf, 2018. "Health management and pattern analysis of daily living activities of people with dementia using in-home sensors and machine learning techniques," PLOS ONE, Public Library of Science, vol. 13(5), pages 1-20, May.
    18. Natalia Pardo-Lorente & Anestis Gkanogiannis & Luca Cozzuto & Antoni Gañez Zapater & Lorena Espinar & Ritobrata Ghose & Jacqueline Severino & Laura García-López & Rabia Gül Aydin & Laura Martin & Mari, 2024. "Nuclear localization of MTHFD2 is required for correct mitosis progression," Nature Communications, Nature, vol. 15(1), pages 1-23, December.
    19. Maciej Kostrzewski & Jadwiga Kostrzewska, 2021. "The Impact of Forecasting Jumps on Forecasting Electricity Prices," Energies, MDPI, vol. 14(2), pages 1-17, January.
    20. Andrea Lazzari & Simone Giovinazzo & Giovanni Cabassi & Massimo Brambilla & Carlo Bisaglia & Elio Romano, 2025. "Evaluating Urban Sewage Sludge Distribution on Agricultural Land Using Interpolation and Machine Learning Techniques," Agriculture, MDPI, vol. 15(2), pages 1-13, January.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:16:y:2025:i:1:d:10.1038_s41467-025-63751-1. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.