IDEAS home Printed from https://ideas.repec.org/a/nat/natcom/v12y2021i1d10.1038_s41467-021-26111-3.html
   My bibliography  Save this article

A proteomics sample metadata representation for multiomics integration and big data analysis

Author

Listed:
  • Chengxin Dai

    (Chongqing University of Posts and Telecommunications)

  • Anja Füllgrabe

    (European Bioinformatics Institute, Wellcome Genome Campus)

  • Julianus Pfeuffer

    (Freie Universität Berlin
    Visualization and Data analysis, Zuse Institute Berlin)

  • Elizaveta M. Solovyeva

    (Moscow Institute of Physics and Technology
    N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences)

  • Jingwen Deng

    (Chongqing University of Posts and Telecommunications)

  • Pablo Moreno

    (European Bioinformatics Institute, Wellcome Genome Campus)

  • Selvakumar Kamatchinathan

    (European Bioinformatics Institute, Wellcome Genome Campus)

  • Deepti Jaiswal Kundu

    (European Bioinformatics Institute, Wellcome Genome Campus)

  • Nancy George

    (European Bioinformatics Institute, Wellcome Genome Campus)

  • Silvie Fexova

    (European Bioinformatics Institute, Wellcome Genome Campus)

  • Björn Grüning

    (Albert-Ludwigs-University Freiburg)

  • Melanie Christine Föll

    (Medical Center – University of Freiburg, Faculty of Medicine, University of Freiburg
    Northeastern University)

  • Johannes Griss

    (Medical University of Vienna)

  • Marc Vaudel

    (University of Bergen)

  • Enrique Audain

    (Universitätsklinikum Schleswig-Holstein Kiel)

  • Marie Locard-Paulet

    (University of Copenhagen)

  • Michael Turewicz

    (Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center
    Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis)

  • Martin Eisenacher

    (Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center
    Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis)

  • Julian Uszkoreit

    (Ruhr University Bochum, Medical Faculty, Medizinisches Proteom-Center
    Ruhr University Bochum, Center for Protein Diagnostics (PRODI), Medical Proteome Analysis)

  • Tim Bossche

    (VIB – UGent Center for Medical Biotechnology, VIB
    Ghent University)

  • Veit Schwämmle

    (University of Southern Denmark, Campusvej 55)

  • Henry Webel

    (University of Copenhagen)

  • Stefan Schulze

    (University of Pennsylvania, Department of Biology)

  • David Bouyssié

    (University of Toulouse, CNRS, UPS)

  • Savita Jayaram

    (nference Labs)

  • Vinay Kumar Duggineni

    (nference Labs)

  • Patroklos Samaras

    (Technical University of Munich)

  • Mathias Wilhelm

    (Technical University of Munich)

  • Meena Choi

    (Proteomics and Lipidomics, Genentech)

  • Mingxun Wang

    (University of California San Diego)

  • Oliver Kohlbacher

    (University of Tübingen
    University of Tübingen
    University Hospital Tübingen)

  • Alvis Brazma

    (European Bioinformatics Institute, Wellcome Genome Campus)

  • Irene Papatheodorou

    (European Bioinformatics Institute, Wellcome Genome Campus)

  • Nuno Bandeira

    (University of California San Diego
    University of California)

  • Eric W. Deutsch

    (Institute for Systems Biology, 401 Terry Ave N)

  • Juan Antonio Vizcaíno

    (European Bioinformatics Institute, Wellcome Genome Campus)

  • Mingze Bai

    (Chongqing University of Posts and Telecommunications
    Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Life Omics)

  • Timo Sachsenberg

    (University of Tübingen)

  • Lev I. Levitsky

    (N.N. Semenov Federal Research Center for Chemical Physics, Russian Academy of Sciences)

  • Yasset Perez-Riverol

    (European Bioinformatics Institute, Wellcome Genome Campus)

Abstract

The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.

Suggested Citation

  • Chengxin Dai & Anja Füllgrabe & Julianus Pfeuffer & Elizaveta M. Solovyeva & Jingwen Deng & Pablo Moreno & Selvakumar Kamatchinathan & Deepti Jaiswal Kundu & Nancy George & Silvie Fexova & Björn Grüni, 2021. "A proteomics sample metadata representation for multiomics integration and big data analysis," Nature Communications, Nature, vol. 12(1), pages 1-8, December.
  • Handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-26111-3
    DOI: 10.1038/s41467-021-26111-3
    as

    Download full text from publisher

    File URL: https://www.nature.com/articles/s41467-021-26111-3
    File Function: Abstract
    Download Restriction: no

    File URL: https://libkey.io/10.1038/s41467-021-26111-3?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Philipp Mertins & D. R. Mani & Kelly V. Ruggles & Michael A. Gillette & Karl R. Clauser & Pei Wang & Xianlong Wang & Jana W. Qiao & Song Cao & Francesca Petralia & Emily Kawaler & Filip Mundt & Karste, 2016. "Proteogenomics connects somatic mutations to signalling in breast cancer," Nature, Nature, vol. 534(7605), pages 55-62, June.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Tine Claeys & Tim Van Den Bossche & Yasset Perez-Riverol & Kris Gevaert & Juan Antonio Vizcaíno & Lennart Martens, 2023. "lesSDRF is more: maximizing the value of proteomics data through streamlined metadata annotation," Nature Communications, Nature, vol. 14(1), pages 1-4, December.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Katrin Stuber & Tobias Schneider & Jill Werner & Michael Kovermann & Andreas Marx & Martin Scheffner, 2021. "Structural and functional consequences of NEDD8 phosphorylation," Nature Communications, Nature, vol. 12(1), pages 1-15, December.
    2. S. Mouron & M. J. Bueno & A. Lluch & L. Manso & I. Calvo & J. Cortes & J. A. Garcia-Saenz & M. Gil-Gil & N. Martinez-Janez & J. V. Apala & E. Caleiras & Pilar Ximénez-Embún & J. Muñoz & L. Gonzalez-Co, 2022. "Phosphoproteomic analysis of neoadjuvant breast cancer suggests that increased sensitivity to paclitaxel is driven by CDK4 and filamin A," Nature Communications, Nature, vol. 13(1), pages 1-18, December.
    3. Jonathan J. Swietlik & Stefanie Bärthel & Chiara Falcomatà & Diana Fink & Ankit Sinha & Jingyuan Cheng & Stefan Ebner & Peter Landgraf & Daniela C. Dieterich & Henrik Daub & Dieter Saur & Felix Meissn, 2023. "Cell-selective proteomics segregates pancreatic cancer subtypes by extracellular proteins in tumors and circulation," Nature Communications, Nature, vol. 14(1), pages 1-17, December.
    4. Jennifer G. Abelin & Erik J. Bergstrom & Keith D. Rivera & Hannah B. Taylor & Susan Klaeger & Charles Xu & Eva K. Verzani & C. Jackson White & Hilina B. Woldemichael & Maya Virshup & Meagan E. Olive &, 2023. "Workflow enabling deepscale immunopeptidome, proteome, ubiquitylome, phosphoproteome, and acetylome analyses of sample-limited tissues," Nature Communications, Nature, vol. 14(1), pages 1-22, December.
    5. Yiqun Zhang & Fengju Chen & Darshan S. Chandrashekar & Sooryanarayana Varambally & Chad J. Creighton, 2022. "Proteogenomic characterization of 2002 human cancers reveals pan-cancer molecular subtypes and associated pathways," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    6. Brian D. Lehmann & Antonio Colaprico & Tiago C. Silva & Jianjiao Chen & Hanbing An & Yuguang Ban & Hanchen Huang & Lily Wang & Jamaal L. James & Justin M. Balko & Paula I. Gonzalez-Ericsson & Melinda , 2021. "Multi-omics analysis identifies therapeutic vulnerabilities in triple-negative breast cancer subtypes," Nature Communications, Nature, vol. 12(1), pages 1-18, December.
    7. Yuen Lam Dora Ng & Evelyn Ramberger & Stephan R. Bohl & Anna Dolnik & Christian Steinebach & Theresia Conrad & Sina Müller & Oliver Popp & Miriam Kull & Mohamed Haji & Michael Gütschow & Hartmut Döhne, 2022. "Proteomic profiling reveals CDK6 upregulation as a targetable resistance mechanism for lenalidomide in multiple myeloma," Nature Communications, Nature, vol. 13(1), pages 1-13, December.
    8. Shizhong Ke & Fabin Dang & Lin Wang & Jia-Yun Chen & Mandar T. Naik & Wenxue Li & Abhishek Thavamani & Nami Kim & Nandita M. Naik & Huaxiu Sui & Wei Tang & Chenxi Qiu & Kazuhiro Koikawa & Felipe Batal, 2024. "Reciprocal antagonism of PIN1-APC/CCDH1 governs mitotic protein stability and cell cycle entry," Nature Communications, Nature, vol. 15(1), pages 1-21, December.
    9. Pasquale Simeone & Stefano Tacconi & Serena Longo & Paola Lanuti & Sara Bravaccini & Francesca Pirini & Sara Ravaioli & Luciana Dini & Anna M. Giudetti, 2021. "Expanding Roles of De Novo Lipogenesis in Breast Cancer," IJERPH, MDPI, vol. 18(7), pages 1-16, March.
    10. Karama Asleh & Gian Luca Negri & Sandra E. Spencer Miko & Shane Colborne & Christopher S. Hughes & Xiu Q. Wang & Dongxia Gao & C. Blake Gilks & Stephen K. L. Chia & Torsten O. Nielsen & Gregg B. Morin, 2022. "Proteomic analysis of archival breast cancer clinical specimens identifies biological subtypes with distinct survival outcomes," Nature Communications, Nature, vol. 13(1), pages 1-19, December.
    11. Ling Li & Mingming Niu & Alyssa Erickson & Jie Luo & Kincaid Rowbotham & Kai Guo & He Huang & Yuxin Li & Yi Jiang & Junguk Hur & Chunyu Liu & Junmin Peng & Xusheng Wang, 2022. "SMAP is a pipeline for sample matching in proteogenomics," Nature Communications, Nature, vol. 13(1), pages 1-9, December.
    12. Lingling Li & Dongxian Jiang & Hui Liu & Chunmei Guo & Rui Zhao & Qiao Zhang & Chen Xu & Zhaoyu Qin & Jinwen Feng & Yang Liu & Haixing Wang & Weijie Chen & Xue Zhang & Bin Li & Lin Bai & Sha Tian & Su, 2023. "Comprehensive proteogenomic characterization of early duodenal cancer reveals the carcinogenesis tracks of different subtypes," Nature Communications, Nature, vol. 14(1), pages 1-24, December.
    13. Sam Crowl & Ben T. Jordan & Hamza Ahmed & Cynthia X. Ma & Kristen M. Naegle, 2022. "KSTAR: An algorithm to predict patient-specific kinase activities from phosphoproteomic data," Nature Communications, Nature, vol. 13(1), pages 1-16, December.
    14. Brijesh Kumar & Aditi S. Khatpe & Jiang Guanglong & Katie Batic & Poornima Bhat-Nakshatri & Maggie M. Granatir & Rebekah Joann Addison & Megan Szymanski & Lee Ann Baldridge & Constance J. Temm & Georg, 2023. "Stromal heterogeneity may explain increased incidence of metaplastic breast cancer in women of African descent," Nature Communications, Nature, vol. 14(1), pages 1-22, December.
    15. Hailiang Zhang & Lin Bai & Xin-Qiang Wu & Xi Tian & Jinwen Feng & Xiaohui Wu & Guo-Hai Shi & Xiaoru Pei & Jiacheng Lyu & Guojian Yang & Yang Liu & Wenhao Xu & Aihetaimujiang Anwaier & Yu Zhu & Da-Long, 2023. "Proteogenomics of clear cell renal cell carcinoma response to tyrosine kinase inhibitor," Nature Communications, Nature, vol. 14(1), pages 1-21, December.
    16. Yuanyuan Qu & Jinwen Feng & Xiaohui Wu & Lin Bai & Wenhao Xu & Lingli Zhu & Yang Liu & Fujiang Xu & Xuan Zhang & Guojian Yang & Jiacheng Lv & Xiuping Chen & Guo-Hai Shi & Hong-Kai Wang & Da-Long Cao &, 2022. "A proteogenomic analysis of clear cell renal cell carcinoma in a Chinese population," Nature Communications, Nature, vol. 13(1), pages 1-21, December.
    17. Fengju Chen & Yiqun Zhang & Darshan S. Chandrashekar & Sooryanarayana Varambally & Chad J. Creighton, 2023. "Global impact of somatic structural variation on the cancer proteome," Nature Communications, Nature, vol. 14(1), pages 1-19, December.
    18. Isabelle Rose Leo & Luay Aswad & Matthias Stahl & Elena Kunold & Frederik Post & Tom Erkers & Nona Struyf & Georgios Mermelekas & Rubin Narayan Joshi & Eva Gracia-Villacampa & Päivi Östling & Olli P. , 2022. "Integrative multi-omics and drug response profiling of childhood acute lymphoblastic leukemia cell lines," Nature Communications, Nature, vol. 13(1), pages 1-19, December.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:nat:natcom:v:12:y:2021:i:1:d:10.1038_s41467-021-26111-3. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.nature.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.