IDEAS home Printed from https://ideas.repec.org/a/gam/jftint/v10y2018i12p124-d190446.html
   My bibliography  Save this article

A Method for Filtering Pages by Similarity Degree based on Dynamic Programming

Author

Listed:
  • Ziyun Deng

    (College of Economics and Trade, Changsha Commerce & Tourism College, Changsha 410116, China
    National Supercomputing Center in Changsha, Hunan University, Changsha 410116, China)

  • Tingqin He

    (National Supercomputing Center in Changsha, Hunan University, Changsha 410116, China)

Abstract

To obtain the target webpages from many webpages, we proposed a Method for Filtering Pages by Similarity Degree based on Dynamic Programming (MFPSDDP). The method needs to use one of three same relationships proposed between two nodes, so we give the definition of the three same relationships. The biggest innovation of MFPSDDP is that it does not need to know the structures of webpages in advance. First, we address the design ideas with queue and double threads. Then, a dynamic programming algorithm for calculating the length of the longest common subsequence and a formula for calculating similarity are proposed. Further, for obtaining detailed information webpages from 200,000 webpages downloaded from the famous website “www.jd.com”, we choose the same relationship Completely Same Relationship (CSR) and set the similarity threshold to 0.2. The Recall Ratio (RR) of MFPSDDP is in the middle in the four filtering methods compared. When the number of webpages filtered is nearly 200,000, the PR of MFPSDDP is highest in the four filtering methods compared, which can reach 85.1%. The PR of MFPSDDP is 13.3 percentage points higher than the PR of a Method for Filtering Pages by Containing Strings (MFPCS).

Suggested Citation

  • Ziyun Deng & Tingqin He, 2018. "A Method for Filtering Pages by Similarity Degree based on Dynamic Programming," Future Internet, MDPI, vol. 10(12), pages 1-12, December.
  • Handle: RePEc:gam:jftint:v:10:y:2018:i:12:p:124-:d:190446
    as

    Download full text from publisher

    File URL: https://www.mdpi.com/1999-5903/10/12/124/pdf
    Download Restriction: no

    File URL: https://www.mdpi.com/1999-5903/10/12/124/
    Download Restriction: no
    ---><---

    References listed on IDEAS

    as
    1. Gang Kou & Chunwei Lou, 2012. "Multiple factor hierarchical clustering algorithm for large scale web page and search engine clickstream data," Annals of Operations Research, Springer, vol. 197(1), pages 123-134, August.
    2. Lee, Ji-Hyun & Yeh, Wei-Chang & Chuang, Mei-Chi, 2015. "Web page classification based on a simplified swarm optimization," Applied Mathematics and Computation, Elsevier, vol. 270(C), pages 13-24.
    3. Xiuming Yu & Meijing Li & Kyung Ah Kim & Jimoon Chung & Keun Ho Ryu, 2016. "Emerging Pattern-Based Clustering of Web Users Utilizing a Simple Page-Linked Graph," Sustainability, MDPI, vol. 8(3), pages 1-18, March.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Yi Peng, 2015. "Regional earthquake vulnerability assessment using a combination of MCDM methods," Annals of Operations Research, Springer, vol. 234(1), pages 95-110, November.
    2. Roman Vavrek, 2019. "Evaluation of the Impact of Selected Weighting Methods on the Results of the TOPSIS Technique," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 18(06), pages 1821-1843, November.
    3. Isaac Machorro-Cano & Ingrid Aylin Ríos-Méndez & José Antonio Palet-Guzmán & Nidia Rodríguez-Mazahua & Lisbeth Rodríguez-Mazahua & Giner Alor-Hernández & José Oscar Olmedo-Aguirre, 2023. "Medical Opinions Analysis about the Decrease of Autopsies Using Emerging Pattern Mining," Data, MDPI, vol. 9(1), pages 1-14, December.
    4. Borja Ena & Alberto Gomez & Borja Ponte & Paolo Priore & Diego Diaz, 2022. "Homogeneous grouping of non-prime steel products for online auctions: a case study," Annals of Operations Research, Springer, vol. 315(1), pages 591-621, August.
    5. John A. Aloysius & Hartmut Hoehle & Soheil Goodarzi & Viswanath Venkatesh, 2018. "Big data initiatives in retail environments: Linking service process perceptions to shopping outcomes," Annals of Operations Research, Springer, vol. 270(1), pages 25-51, November.
    6. Wei-Chang Yeh & Yunzhi Jiang & Yee-Fen Chen & Zhe Chen, 2016. "A New Soft Computing Method for K-Harmonic Means Clustering," PLOS ONE, Public Library of Science, vol. 11(11), pages 1-14, November.
    7. R. Sujatha & T. M. Rajalaxmi, 2016. "Hierarchical Fuzzy Hidden Markov Chain for Web Applications," International Journal of Information Technology & Decision Making (IJITDM), World Scientific Publishing Co. Pte. Ltd., vol. 15(01), pages 83-118, January.
    8. Goran Matošević & Jasminka Dobša & Dunja Mladenić, 2021. "Using Machine Learning for Web Page Classification in Search Engine Optimization," Future Internet, MDPI, vol. 13(1), pages 1-20, January.
    9. Alessandro Massaro & Daniele Giannone & Vitangelo Birardi & Angelo Maurizio Galiano, 2021. "An Innovative Approach for the Evaluation of the Web Page Impact Combining User Experience and Neural Network Score," Future Internet, MDPI, vol. 13(6), pages 1-21, May.
    10. Dongwook Kim & Sungbum Kim, 2017. "The Role of Mobile Technology in Tourism: Patents, Articles, News, and Mobile Tour App Reviews," Sustainability, MDPI, vol. 9(11), pages 1-45, November.
    11. Xiaoli Wang & Yun Liu & Yanbing Ju, 2018. "Sustainable Public Procurement Policies on Promoting Scientific and Technological Innovation in China: Comparisons with the U.S., the UK, Japan, Germany, France, and South Korea," Sustainability, MDPI, vol. 10(7), pages 1-27, June.
    12. Idil Yavuz & Orrin Cooper, 2017. "A dynamic clustering method to improve the coherency of an ANP Supermatrix," Annals of Operations Research, Springer, vol. 254(1), pages 507-531, July.
    13. Sheng, Jie & Amankwah-Amoah, Joseph & Wang, Xiaojun, 2017. "A multidisciplinary perspective of big data in management research," International Journal of Production Economics, Elsevier, vol. 191(C), pages 97-112.
    14. Sheng, Jie & Amankwah-Amoah, Joseph & Wang, Xiaojun, 2019. "Technology in the 21st century: New challenges and opportunities," Technological Forecasting and Social Change, Elsevier, vol. 143(C), pages 321-335.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:10:y:2018:i:12:p:124-:d:190446. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.