IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0288140.html
   My bibliography  Save this article

Mining algorithm of accumulation sequence of unbalanced data based on probability matrix decomposition

Author

Listed:
  • Shaoxia Mou
  • Heming Zhang

Abstract

Due to the inherent characteristics of accumulation sequence of unbalanced data, the mining results of this kind of data are often affected by a large number of categories, resulting in the decline of mining performance. To solve the above problems, the performance of data cumulative sequence mining is optimized. The algorithm for mining cumulative sequence of unbalanced data based on probability matrix decomposition is studied. The natural nearest neighbor of a few samples in the unbalanced data cumulative sequence is determined, and the few samples in the unbalanced data cumulative sequence are clustered according to the natural nearest neighbor relationship. In the same cluster, new samples are generated from the core points of dense regions and non core points of sparse regions, and then new samples are added to the original data accumulation sequence to balance the data accumulation sequence. The probability matrix decomposition method is used to generate two random number matrices with Gaussian distribution in the cumulative sequence of balanced data, and the linear combination of low dimensional eigenvectors is used to explain the preference of specific users for the data sequence; At the same time, from a global perspective, the AdaBoost idea is used to adaptively adjust the sample weight and optimize the probability matrix decomposition algorithm. Experimental results show that the algorithm can effectively generate new samples, improve the imbalance of data accumulation sequence, and obtain more accurate mining results. Optimizing global errors as well as more efficient single-sample errors. When the decomposition dimension is 5, the minimum RMSE is obtained. The proposed algorithm has good classification performance for the cumulative sequence of balanced data, and the average ranking of index F value, G mean and AUC is the best.

Suggested Citation

  • Shaoxia Mou & Heming Zhang, 2023. "Mining algorithm of accumulation sequence of unbalanced data based on probability matrix decomposition," PLOS ONE, Public Library of Science, vol. 18(7), pages 1-17, July.
  • Handle: RePEc:plo:pone00:0288140
    DOI: 10.1371/journal.pone.0288140
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0288140
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0288140&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0288140?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. Shigeaki F. Hasegawa & Takenori Takada, 2019. "Probability of Deriving a Yearly Transition Probability Matrix for Land-Use Dynamics," Sustainability, MDPI, vol. 11(22), pages 1-11, November.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. René Ulloa-Espíndola & Jenny Cuyo-Cuyo & Elisa Lalama-Noboa, 2023. "Towards Rural Resilience: Assessing Future Spatial Urban Expansion and Population Growth in Quito as a Measure of Resilience," Land, MDPI, vol. 12(2), pages 1-30, January.
    2. Carlos Manjarrez-Domínguez & Mario Iván Uc-Campos & Mario Edgar Esparza-Vela & María del Rosario Baray-Guerrero & Omar Giner-Chávez & Eduardo Santellano-Estrada, 2023. "Geospatial-Temporal Dynamics of Land Use in the Juárez Valley: Urbanization and Displacement of Agriculture," Sustainability, MDPI, vol. 15(11), pages 1-20, May.
    3. Jessica Strzempko & Robert Gilmore Pontius, 2023. "The Flow Matrix Offers a Straightforward Alternative to the Problematic Markov Matrix," Land, MDPI, vol. 12(7), pages 1-18, July.
    4. J. Ronald Eastman & Jiena He, 2020. "A Regression-Based Procedure for Markov Transition Probability Estimation in Land Change Modeling," Land, MDPI, vol. 9(11), pages 1-12, October.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0288140. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.