IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0031362.html
   My bibliography  Save this article

Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment

Author

Listed:
  • Alexander Kawrykow
  • Gary Roumanis
  • Alfred Kam
  • Daniel Kwak
  • Clarence Leung
  • Chu Wu
  • Eleyine Zarour
  • Phylo players
  • Luis Sarmenta
  • Mathieu Blanchette
  • Jérôme Waldispühl

Abstract

Background: Comparative genomics, or the study of the relationships of genome structure and function across different species, offers a powerful tool for studying evolution, annotating genomes, and understanding the causes of various genetic disorders. However, aligning multiple sequences of DNA, an essential intermediate step for most types of analyses, is a difficult computational task. In parallel, citizen science, an approach that takes advantage of the fact that the human brain is exquisitely tuned to solving specific types of problems, is becoming increasingly popular. There, instances of hard computational problems are dispatched to a crowd of non-expert human game players and solutions are sent back to a central server. Methodology/Principal Findings: We introduce Phylo, a human-based computing framework applying “crowd sourcing” techniques to solve the Multiple Sequence Alignment (MSA) problem. The key idea of Phylo is to convert the MSA problem into a casual game that can be played by ordinary web users with a minimal prior knowledge of the biological context. We applied this strategy to improve the alignment of the promoters of disease-related genes from up to 44 vertebrate species. Since the launch in November 2010, we received more than 350,000 solutions submitted from more than 12,000 registered users. Our results show that solutions submitted contributed to improving the accuracy of up to 70% of the alignment blocks considered. Conclusions/Significance: We demonstrate that, combined with classical algorithms, crowd computing techniques can be successfully used to help improving the accuracy of MSA. More importantly, we show that an NP-hard computational problem can be embedded in casual game that can be easily played by people without significant scientific training. This suggests that citizen science approaches can be used to exploit the billions of “human-brain peta-flops” of computation that are spent every day playing games. Phylo is available at: http://phylo.cs.mcgill.ca.

Suggested Citation

  • Alexander Kawrykow & Gary Roumanis & Alfred Kam & Daniel Kwak & Clarence Leung & Chu Wu & Eleyine Zarour & Phylo players & Luis Sarmenta & Mathieu Blanchette & Jérôme Waldispühl, 2012. "Phylo: A Citizen Science Approach for Improving Multiple Sequence Alignment," PLOS ONE, Public Library of Science, vol. 7(3), pages 1-9, March.
  • Handle: RePEc:plo:pone00:0031362
    DOI: 10.1371/journal.pone.0031362
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0031362
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0031362&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0031362?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Cédric Notredame, 2007. "Recent Evolutions of Multiple Sequence Alignment Algorithms," PLOS Computational Biology, Public Library of Science, vol. 3(8), pages 1-4, August.
    2. Manolis Kellis & Nick Patterson & Matthew Endrizzi & Bruce Birren & Eric S. Lander, 2003. "Sequencing and comparison of yeast species to identify genes and regulatory elements," Nature, Nature, vol. 423(6937), pages 241-254, May.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Matthew Staffelbach & Peter Sempolinski & Tracy Kijewski-Correa & Douglas Thain & Daniel Wei & Ahsan Kareem & Gregory Madey, 2015. "Lessons Learned from Crowdsourcing Complex Engineering Tasks," PLOS ONE, Public Library of Science, vol. 10(9), pages 1-19, September.
    2. Naihui Zhou & Zachary D Siegel & Scott Zarecor & Nigel Lee & Darwin A Campbell & Carson M Andorf & Dan Nettleton & Carolyn J Lawrence-Dill & Baskar Ganapathysubramanian & Jonathan W Kelly & Iddo Fried, 2018. "Crowdsourcing image analysis for plant phenomics to generate ground truth data for machine learning," PLOS Computational Biology, Public Library of Science, vol. 14(7), pages 1-16, July.
    3. Andrei P. Kirilenko & Travis Desell & Hany Kim & Svetlana Stepchenkova, 2017. "Crowdsourcing Analysis of Twitter Data on Climate Change: Paid Workers vs. Volunteers," Sustainability, MDPI, vol. 9(11), pages 1-15, November.
    4. Barbara Strobl & Simon Etter & Ilja van Meerveld & Jan Seibert, 2019. "The CrowdWater game: A playful way to improve the accuracy of crowdsourced water level class data," PLOS ONE, Public Library of Science, vol. 14(9), pages 1-23, September.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Tao Song & Hong Gu, 2014. "Discriminative Motif Discovery via Simulated Evolution and Random Under-Sampling," PLOS ONE, Public Library of Science, vol. 9(2), pages 1-10, February.
    2. Arribas-Gil Ana & Matias Catherine, 2017. "A time warping approach to multiple sequence alignment," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 16(2), pages 133-144, April.
    3. Alessandro L. V. Coradini & Christopher Ne Ville & Zachary A. Krieger & Joshua Roemer & Cara Hull & Shawn Yang & Daniel T. Lusk & Ian M. Ehrenreich, 2023. "Building synthetic chromosomes from natural DNA," Nature Communications, Nature, vol. 14(1), pages 1-12, December.
    4. Valerie Storms & Marleen Claeys & Aminael Sanchez & Bart De Moor & Annemieke Verstuyf & Kathleen Marchal, 2010. "The Effect of Orthology and Coregulation on Detecting Regulatory Motifs," PLOS ONE, Public Library of Science, vol. 5(2), pages 1-11, February.
    5. Robert K Bradley & Adam Roberts & Michael Smoot & Sudeep Juvekar & Jaeyoung Do & Colin Dewey & Ian Holmes & Lior Pachter, 2009. "Fast Statistical Alignment," PLOS Computational Biology, Public Library of Science, vol. 5(5), pages 1-15, May.
    6. Rahul Siddharthan & Eric D Siggia & Erik van Nimwegen, 2005. "PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny," PLOS Computational Biology, Public Library of Science, vol. 1(7), pages 1-23, December.
    7. Harri Lähdesmäki & Alistair G Rust & Ilya Shmulevich, 2008. "Probabilistic Inference of Transcription Factor Binding from Multiple Data Sources," PLOS ONE, Public Library of Science, vol. 3(3), pages 1-24, March.
    8. Leelavati Narlikar & Raluca Gordân & Alexander J Hartemink, 2007. "A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast," PLOS Computational Biology, Public Library of Science, vol. 3(11), pages 1-10, November.
    9. J Roman Arguello & Carolina Sellanes & Yann Ru Lou & Robert A Raguso, 2013. "Can Yeast (S. cerevisiae) Metabolic Volatiles Provide Polymorphic Signaling?," PLOS ONE, Public Library of Science, vol. 8(8), pages 1-12, August.
    10. Fabio Pardi & Nick Goldman, 2005. "Species Choice for Comparative Genomics: Being Greedy Works," PLOS Genetics, Public Library of Science, vol. 1(6), pages 1-1, December.
    11. Krishna B. S. Swamy & Hsin-Yi Lee & Carmina Ladra & Chien-Fu Jeff Liu & Jung-Chi Chao & Yi-Yun Chen & Jun-Yi Leu, 2022. "Proteotoxicity caused by perturbed protein complexes underlies hybrid incompatibility in yeast," Nature Communications, Nature, vol. 13(1), pages 1-14, December.
    12. Eilon Sharon & Shai Lubliner & Eran Segal, 2008. "A Feature-Based Approach to Modeling Protein–DNA Interactions," PLOS Computational Biology, Public Library of Science, vol. 4(8), pages 1-17, August.
    13. Siewert Elizabeth A & Kechris Katerina J, 2009. "Prediction of Motifs Based on a Repeated-Measures Model for Integrating Cross-Species Sequence and Expression Data," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 8(1), pages 1-36, September.
    14. Lit-Hsin Loo & Danai Laksameethanasan & Yi-Ling Tung, 2014. "Quantitative Protein Localization Signatures Reveal an Association between Spatial and Functional Divergences of Proteins," PLOS Computational Biology, Public Library of Science, vol. 10(3), pages 1-17, March.
    15. Christian L Barrett & Bernhard O Palsson, 2006. "Iterative Reconstruction of Transcriptional Regulatory Networks: An Algorithmic Approach," PLOS Computational Biology, Public Library of Science, vol. 2(5), pages 1-10, May.
    16. Kemal Sonmez & Naunihal T Zaveri & Ilan A Kerman & Sharon Burke & Charles R Neal & Xinmin Xie & Stanley J Watson & Lawrence Toll, 2009. "Evolutionary Sequence Modeling for Discovery of Peptide Hormones," PLOS Computational Biology, Public Library of Science, vol. 5(1), pages 1-12, January.
    17. Kenzie D MacIsaac & Ernest Fraenkel, 2006. "Practical Strategies for Discovering Regulatory DNA Sequence Motifs," PLOS Computational Biology, Public Library of Science, vol. 2(4), pages 1-10, April.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0031362. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.