IDEAS home Printed from https://ideas.repec.org/a/plo/pcbi00/1008597.html
   My bibliography  Save this article

Investigating the mitochondrial genomic landscape of Arabidopsis thaliana by long-read sequencing

Author

Listed:
  • Bansho Masutani
  • Shin-ichi Arimura
  • Shinichi Morishita

Abstract

Plant mitochondrial genomes have distinctive features compared to those of animals; namely, they are large and divergent, with sizes ranging from hundreds of thousands of to a few million bases. Recombination among repetitive regions is thought to produce similar structures that differ slightly, known as “multipartite structures,” which contribute to different phenotypes. Although many reference plant mitochondrial genomes represent almost all the genes in mitochondria, the full spectrum of their structures remains largely unknown. The emergence of long-read sequencing technology is expected to yield this landscape; however, many studies aimed to assemble only one representative circular genome, because properly understanding multipartite structures using existing assemblers is not feasible. To elucidate multipartite structures, we leveraged the information in existing reference genomes and classified long reads according to their corresponding structures. We developed a method that exploits two classic algorithms, partial order alignment (POA) and the hidden Markov model (HMM) to construct a sensitive read classifier. This method enables us to represent a set of reads as a POA graph and analyze it using the HMM. We can then calculate the likelihood of a read occurring in a given cluster, resulting in an iterative clustering algorithm. For synthetic data, our proposed method reliably detected one variation site out of 9,000-bp synthetic long reads with a 15% sequencing-error rate and produced accurate clustering. It was also capable of clustering long reads from six very similar sequences containing only slight differences. For real data, we assembled putative multipartite structures of mitochondrial genomes of Arabidopsis thaliana from nine accessions sequenced using PacBio Sequel. The results indicated that there are recurrent and strain-specific structures in A. thaliana mitochondrial genomes.Author summary: Plant mitochondria have genes with important functions. For example, some mitochondrial genomes contain a gene responsible for cytoplasmic male sterility, a phenotype that is unable to create mature pollen. However, despite their small sizes, plant mitochondrial genomes can be difficult to assemble even if we use state-of-the-art long-read sequencers. The main obstacle is their high structural diversity and low sequence diversity, which hamper traditional methods to assemble plant mitochondrial genomes. Here, we introduce a new method for grouping long-reads to individual structures. For this purpose, we explored two traditional models in sequence analysis; hidden Markov model and partial order alignment, which enable us to detect a single base variation among several thousand bases and output accurate clusters while managing with observation errors associated with long-read sequencing. Applying this method to nine PacBio Sequel read datasets from Arabidopsis thaliana, we uncovered putative but unknown structures of plant mitochondrial genomes, suggesting that strain-specific structures are present in mitochondrial genomes, and that linear DNA fragments appear repeatedly in several strains.

Suggested Citation

  • Bansho Masutani & Shin-ichi Arimura & Shinichi Morishita, 2021. "Investigating the mitochondrial genomic landscape of Arabidopsis thaliana by long-read sequencing," PLOS Computational Biology, Public Library of Science, vol. 17(1), pages 1-16, January.
  • Handle: RePEc:plo:pcbi00:1008597
    DOI: 10.1371/journal.pcbi.1008597
    as

    Download full text from publisher

    File URL: https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1008597
    Download Restriction: no

    File URL: https://journals.plos.org/ploscompbiol/article/file?id=10.1371/journal.pcbi.1008597&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pcbi.1008597?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pcbi00:1008597. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: ploscompbiol (email available below). General contact details of provider: https://journals.plos.org/ploscompbiol/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.