Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations

Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations

Author

Listed:

Andrew F Neuwald
Stephen F Altschul

Abstract

Over evolutionary time, members of a superfamily of homologous proteins sharing a common structural core diverge into subgroups filling various functional niches. At the sequence level, such divergence appears as correlations that arise from residue patterns distinct to each subgroup. Such a superfamily may be viewed as a population of sequences corresponding to a complex, high-dimensional probability distribution. Here we model this distribution as hierarchical interrelated hidden Markov models (hiHMMs), which describe these sequence correlations implicitly. By characterizing such correlations one may hope to obtain information regarding functionally-relevant properties that have thus far evaded detection. To do so, we infer a hiHMM distribution from sequence data using Bayes’ theorem and Markov chain Monte Carlo (MCMC) sampling, which is widely recognized as the most effective approach for characterizing a complex, high dimensional distribution. Other routines then map correlated residue patterns to available structures with a view to hypothesis generation. When applied to N-acetyltransferases, this reveals sequence and structural features indicative of functionally important, yet generally unknown biochemical properties. Even for sets of proteins for which nothing is known beyond unannotated sequences and structures, this can lead to helpful insights. We describe, for example, a putative coenzyme-A-induced-fit substrate binding mechanism mediated by arginine residue switching between salt bridge and π-π stacking interactions. A suite of programs implementing this approach is available (psed.igs.umaryland.edu).Author Summary: Protein sequence data, when gathered in great quantity, contain important but implicit biological information manifest as statistical correlations. Here we describe an approach to access this information by comprehensively modeling and characterizing the distribution of sequences belonging to a major protein superfamily. This approach takes as input a large set of unaligned sequences belonging to the superfamily. By applying the minimum description length principle, it seeks the statistical model that best explains the sequences while avoiding over-fitting the data. It concurrently aligns the sequences and, to model evolutionary divergence, partitions them into subgroups that are hierarchically-arranged based upon correlated residue patterns. Auxiliary routines create PyMOL scripts to visualize the locations of correlated residues within available structures. Because these correlations likely arise from structural and biochemical constraints, they can help elucidate protein properties important for functional specificity. Comparing and contrasting sequence and structural features in this way may therefore suggest, in the light of published studies, plausible biological hypotheses for experimental investigation. We illustrate this approach with N-acetyltransferases.

Suggested Citation

Andrew F Neuwald & Stephen F Altschul, 2016. "Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations," PLOS Computational Biology, Public Library of Science, vol. 12(12), pages 1-30, December.

Handle: RePEc:plo:pcbi00:1005294
DOI: 10.1371/journal.pcbi.1005294

Download full text from publisher

References listed on IDEAS

Stephen F Altschul & John C Wootton & Elena Zaslavsky & Yi-Kuo Yu, 2010. "The Construction and Use of Log-Odds Substitution Scores for Multiple Sequence Alignment," PLOS Computational Biology, Public Library of Science, vol. 6(7), pages 1-17, July.
John P A Ioannidis, 2005. "Why Most Published Research Findings Are False," PLOS Medicine, Public Library of Science, vol. 2(8), pages 1-1, August.
Andrew F Neuwald & Stephen F Altschul, 2016. "Bayesian Top-Down Protein Sequence Alignment with Inferred Position-Specific Gap Penalties," PLOS Computational Biology, Public Library of Science, vol. 12(5), pages 1-21, May.
Neuwald Andrew F., 2011. "Surveying the Manifold Divergence of an Entire Protein Class for Statistical Clues to Underlying Biochemical Mechanisms," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 10(1), pages 1-30, August.
Richard R Stein & Debora S Marks & Chris Sander, 2015. "Inferring Pairwise Interactions from Biological Data Using Maximum-Entropy Probability Models," PLOS Computational Biology, Public Library of Science, vol. 11(7), pages 1-22, July.

Full references (including those not matched with items on IDEAS)

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Alexander Frankel & Maximilian Kasy, 2022. "Which Findings Should Be Published?," American Economic Journal: Microeconomics, American Economic Association, vol. 14(1), pages 1-38, February.
- Kasy, Maximilian & Frankel, Alexander, 2018. "Which findings should be published?," MetaArXiv mbvz3, Center for Open Science.
Jyotirmoy Sarkar, 2018. "Will Pâ€ Value Triumph over Abuses and Attacks?," Biostatistics and Biometrics Open Access Journal, Juniper Publishers Inc., vol. 7(4), pages 66-71, July.
Stanley, T. D. & Doucouliagos, Chris, 2019. "Practical Significance, Meta-Analysis and the Credibility of Economics," IZA Discussion Papers 12458, IZA Network @ LISER.
Karin Langenkamp & Bodo Rödel & Kerstin Taufenbach & Meike Weiland, 2018. "Open Access in Vocational Education and Training Research," Publications, MDPI, vol. 6(3), pages 1-12, July.
Kevin J. Boyle & Mark Morrison & Darla Hatton MacDonald & Roderick Duncan & John Rose, 2016. "Investigating Internet and Mail Implementation of Stated-Preference Surveys While Controlling for Differences in Sample Frames," Environmental & Resource Economics, Springer;European Association of Environmental and Resource Economists, vol. 64(3), pages 401-419, July.
Jelte M Wicherts & Marjan Bakker & Dylan Molenaar, 2011. "Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results," PLOS ONE, Public Library of Science, vol. 6(11), pages 1-7, November.
Valentine, Kathrene D & Buchanan, Erin Michelle & Scofield, John E. & Beauchamp, Marshall T., 2017. "Beyond p-values: Utilizing Multiple Estimates to Evaluate Evidence," OSF Preprints 9hp7y, Center for Open Science.
Anton, Roman, 2014. "Sustainable Intrapreneurship - The GSI Concept and Strategy - Unfolding Competitive Advantage via Fair Entrepreneurship," MPRA Paper 69713, University Library of Munich, Germany, revised 01 Feb 2015.
Dudek, Thomas & Brenøe, Anne Ardila & Feld, Jan & Rohrer, Julia, 2022. "No Evidence That Siblings' Gender Affects Personality across Nine Countries," IZA Discussion Papers 15137, IZA Network @ LISER.
- Thomas Dudek & Anne Ardila Brenoe & Jan Feld & Julia M. Rohrer, 2022. "No Evidence that Siblings’ Gender Affects Personality Across Nine Countries," CEBI working paper series 22-02, University of Copenhagen. Department of Economics. The Center for Economic Behavior and Inequality (CEBI).
- Thomas Dudek & Anne Ardila Brenøe & Jan Feld & Julia M. Rohrer, 2022. "No evidence that siblings’ gender affects personality across nine countries," ECON - Working Papers 408, Department of Economics - University of Zurich.
Uwe Hassler & Marc‐Oliver Pohle, 2022. "Unlucky Number 13? Manipulating Evidence Subject to Snooping," International Statistical Review, International Statistical Institute, vol. 90(2), pages 397-410, August.
- Uwe Hassler & Marc-Oliver Pohle, 2020. "Unlucky Number 13? Manipulating Evidence Subject to Snooping," Papers 2009.02198, arXiv.org.
Frederique Bordignon, 2020. "Self-correction of science: a comparative study of negative citations and post-publication peer review," Scientometrics, Springer;Akadémiai Kiadó, vol. 124(2), pages 1225-1239, August.
Omar Al-Ubaydli & John A. List, 2015. "Do Natural Field Experiments Afford Researchers More or Less Control than Laboratory Experiments? A Simple Model," NBER Working Papers 20877, National Bureau of Economic Research, Inc.
- Omar Al-Ubaydli & John List, 2015. "Do Natural Field Experiments Afford Researchers More or Less Control than Laboratory Experiments? A Simple Model," Artefactual Field Experiments 00458, The Field Experiments Website.
Aurelie Seguin & Wolfgang Forstmeier, 2012. "No Band Color Effects on Male Courtship Rate or Body Mass in the Zebra Finch: Four Experiments and a Meta-Analysis," PLOS ONE, Public Library of Science, vol. 7(6), pages 1-11, June.
Ankur Moitra & Dhruv Rohatgi, 2022. "Provably Auditing Ordinary Least Squares in Low Dimensions," Papers 2205.14284, arXiv.org, revised Jun 2022.
Dragana Radicic & Geoffrey Pugh & Hugo Hollanders & RenÃ© Wintjes & Jon Fairburn, 2016. "The impact of innovation support programs on small and medium enterprises innovation in traditional manufacturing industries: An evaluation for seven European Union regions," Environment and Planning C, , vol. 34(8), pages 1425-1452, December.
Li, Lunzheng & Maniadis, Zacharias & Sedikides, Constantine, 2021. "Anchoring in Economics: A Meta-Analysis of Studies on Willingness-To-Pay and Willingness-To-Accept," Journal of Behavioral and Experimental Economics (formerly The Journal of Socio-Economics), Elsevier, vol. 90(C).
Charles F. Manski, 2018. "Reasonable patient care under uncertainty," Health Economics, John Wiley & Sons, Ltd., vol. 27(10), pages 1397-1421, October.
Eric van Diessen & Willemiek J E M Zweiphenning & Floor E Jansen & Cornelis J Stam & Kees P J Braun & Willem M Otte, 2014. "Brain Network Organization in Focal Epilepsy: A Systematic Review and Meta-Analysis," PLOS ONE, Public Library of Science, vol. 9(12), pages 1-21, December.
Kathryn Oliver & Annette Boaz, 2019. "Transforming evidence for policy and practice: creating space for new conversations," Humanities and Social Sciences Communications, Palgrave Macmillan, vol. 5(1), pages 1-10, December.
Neuwald Andrew F., 2014. "Protein domain hierarchy Gibbs sampling strategies," Statistical Applications in Genetics and Molecular Biology, De Gruyter, vol. 13(4), pages 497-517, August.

More about this item

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1005294. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Inference of Functionally-Relevant N-acetyltransferase Residues Based on Statistical Correlations

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Most related items

More about this item

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data