IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v101y2014i2d10.1007_s11192-014-1292-9.html
   My bibliography  Save this article

Automatic classification of academic web page types

Author

Listed:
  • Patrick Kenekayoro

    (University of Wolverhampton)

  • Kevan Buckley

    (University of Wolverhampton)

  • Mike Thelwall

    (University of Wolverhampton)

Abstract

Counts of hyperlinks between websites can be unreliable for webometrics studies so researchers have attempted to find alternate counting methods or have tried to identify the reasons why links in websites are created. Manual classification of individual links in websites is infeasible for large webometrics studies, so a more efficient approach to identifying the reasons for link creation is needed to fully harness the potential of hyperlinks for webometrics research. This paper describes a machine learning method to automatically classify hyperlink source and target page types in university websites. 78 % accuracy was achieved for automatically classifying web page types and up to 74 % accuracy for predicting link target page types from link source page characteristics.

Suggested Citation

  • Patrick Kenekayoro & Kevan Buckley & Mike Thelwall, 2014. "Automatic classification of academic web page types," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1015-1026, November.
  • Handle: RePEc:spr:scient:v:101:y:2014:i:2:d:10.1007_s11192-014-1292-9
    DOI: 10.1007/s11192-014-1292-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-014-1292-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-014-1292-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Liwen Vaughan & Guozhu Wu, 2004. "Links to commercial websites as a source of business information," Scientometrics, Springer;Akadémiai Kiadó, vol. 60(3), pages 487-496, August.
    2. Mike Thelwall, 2001. "Extracting macroscopic information from Web links," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 52(13), pages 1157-1168.
    3. Mike Thelwall, 2006. "Interpreting social science link analysis research: A theoretical framework," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(1), pages 60-68, January.
    4. Judit Bar-Ilan, 2004. "A microscopic link analysis of academic institutions within a country — the case of Israel," Scientometrics, Springer;Akadémiai Kiadó, vol. 59(3), pages 391-403, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Patrick Kenekayoro & Kevan Buckley & Mike Thelwall, 2015. "Clustering research group website homepages," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2023-2039, March.
    2. Patrick Kenekayoro, 2018. "Identifying named entities in academic biographies with supervised learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 751-765, August.
    3. Florian Kreuchauff & Vladimir Korzinov, 2017. "A patent search strategy based on machine learning for the emerging field of service robotics," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 743-772, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bar-Ilan, Judit, 2008. "Informetrics at the beginning of the 21st century—A review," Journal of Informetrics, Elsevier, vol. 2(1), pages 1-52.
    2. José-Antonio Ontalba-Ruipérez & Enrique Orduna-Malea & Adolfo Alonso-Arroyo, 2016. "Identifying institutional relationships in a geographically distributed public health system using interlinking and co-authorship methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 1167-1191, March.
    3. George Masterton & Erik J. Olsson & Staffan Angere, 2016. "Linking as voting: how the Condorcet jury theorem in political science is relevant to webometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 945-966, March.
    4. Liwen Vaughan & Margaret E. I. Kipp & Yijun Gao, 2007. "Why are Websites co-linked? The case of Canadian universities," Scientometrics, Springer;Akadémiai Kiadó, vol. 72(1), pages 81-92, July.
    5. David Gunnarsson Lorentzen, 2014. "Webometrics benefitting from web mining? An investigation of methods and applications of two research fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(2), pages 409-445, May.
    6. Kim Holmberg & Mike Thelwall, 2009. "Local government web sites in Finland: A geographic and webometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 79(1), pages 157-169, April.
    7. Giada Baldessarelli & Nathalie Lazaric & Michele Pezzoni, 2022. "Organizational routines: Evolution in the research landscape of two core communities," Post-Print halshs-03718851, HAL.
    8. Tranos, Emmanouil & Incera, Andre Carrascal & Willis, George, 2022. "Using the web to predict regional trade flows: data extraction, modelling, and validation," OSF Preprints 9bu5z, Center for Open Science.
    9. Mike Thelwall, 2017. "Judit Bar-Ilan: information scientist, computer scientist, scientometrician," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1235-1244, December.
    10. Judit Bar-Ilan, 2004. "A microscopic link analysis of academic institutions within a country — the case of Israel," Scientometrics, Springer;Akadémiai Kiadó, vol. 59(3), pages 391-403, March.
    11. Enrique Orduña-Malea, 2021. "Dot-science top level domain: Academic websites or dumpsites?," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(4), pages 3565-3591, April.
    12. Judit Bar-Ilan & Mark Levene, 2015. "The hw-rank: an h-index variant for ranking web pages," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2247-2253, March.
    13. Enrique Orduña-Malea & Rodrigo Costas, 2021. "Link-based approach to study scientific software usage: the case of VOSviewer," Scientometrics, Springer;Akadémiai Kiadó, vol. 126(9), pages 8153-8186, September.
    14. Gali Halevi, 2020. "The scientific legacy of Judit Bar-Ilan," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(3), pages 1201-1209, June.
    15. Enrique Orduña-Malea & José-Antonio Ontalba-Ruipérez, 2013. "Selective linking from social platforms to university websites: a case study of the Spanish academic system," Scientometrics, Springer;Akadémiai Kiadó, vol. 95(2), pages 593-614, May.
    16. Mike Thelwall, 2012. "Journal impact evaluation: a webometric perspective," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(2), pages 429-441, August.
    17. Judit Bar-Ilan & Rina Azoulay, 2012. "Map of nonprofit organization websites in Israel," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 63(6), pages 1142-1167, June.
    18. Abbasiharofteh, Milad & Kinne, Jan & Krüger, Miriam, 2021. "The strength of weak and strong ties in bridging geographic and cognitive distances," ZEW Discussion Papers 21-049, ZEW - Leibniz Centre for European Economic Research.
    19. Lola García-Santiago & Felix Moya-Anegón, 2009. "Using co-outlinks to mine heterogeneous networks," Scientometrics, Springer;Akadémiai Kiadó, vol. 79(3), pages 681-702, June.
    20. Han Park & Mike Thelwall, 2008. "Link analysis: Hyperlink patterns and social structure on politicians’ Web sites in South Korea," Quality & Quantity: International Journal of Methodology, Springer, vol. 42(5), pages 687-697, October.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:101:y:2014:i:2:d:10.1007_s11192-014-1292-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.