IDEAS home Printed from https://ideas.repec.org/a/spr/scient/v101y2014i2d10.1007_s11192-014-1292-9.html
   My bibliography  Save this article

Automatic classification of academic web page types

Author

Listed:
  • Patrick Kenekayoro

    (University of Wolverhampton)

  • Kevan Buckley

    (University of Wolverhampton)

  • Mike Thelwall

    (University of Wolverhampton)

Abstract

Counts of hyperlinks between websites can be unreliable for webometrics studies so researchers have attempted to find alternate counting methods or have tried to identify the reasons why links in websites are created. Manual classification of individual links in websites is infeasible for large webometrics studies, so a more efficient approach to identifying the reasons for link creation is needed to fully harness the potential of hyperlinks for webometrics research. This paper describes a machine learning method to automatically classify hyperlink source and target page types in university websites. 78 % accuracy was achieved for automatically classifying web page types and up to 74 % accuracy for predicting link target page types from link source page characteristics.

Suggested Citation

  • Patrick Kenekayoro & Kevan Buckley & Mike Thelwall, 2014. "Automatic classification of academic web page types," Scientometrics, Springer;Akadémiai Kiadó, vol. 101(2), pages 1015-1026, November.
  • Handle: RePEc:spr:scient:v:101:y:2014:i:2:d:10.1007_s11192-014-1292-9
    DOI: 10.1007/s11192-014-1292-9
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s11192-014-1292-9
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s11192-014-1292-9?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Liwen Vaughan & Guozhu Wu, 2004. "Links to commercial websites as a source of business information," Scientometrics, Springer;Akadémiai Kiadó, vol. 60(3), pages 487-496, August.
    2. Mike Thelwall, 2001. "Extracting macroscopic information from Web links," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 52(13), pages 1157-1168.
    3. Mike Thelwall, 2006. "Interpreting social science link analysis research: A theoretical framework," Journal of the American Society for Information Science and Technology, Association for Information Science & Technology, vol. 57(1), pages 60-68, January.
    4. Judit Bar-Ilan, 2004. "A microscopic link analysis of academic institutions within a country — the case of Israel," Scientometrics, Springer;Akadémiai Kiadó, vol. 59(3), pages 391-403, March.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Patrick Kenekayoro & Kevan Buckley & Mike Thelwall, 2015. "Clustering research group website homepages," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2023-2039, March.
    2. Patrick Kenekayoro, 2018. "Identifying named entities in academic biographies with supervised learning," Scientometrics, Springer;Akadémiai Kiadó, vol. 116(2), pages 751-765, August.
    3. Florian Kreuchauff & Vladimir Korzinov, 2017. "A patent search strategy based on machine learning for the emerging field of service robotics," Scientometrics, Springer;Akadémiai Kiadó, vol. 111(2), pages 743-772, May.

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Bar-Ilan, Judit, 2008. "Informetrics at the beginning of the 21st century—A review," Journal of Informetrics, Elsevier, vol. 2(1), pages 1-52.
    2. José-Antonio Ontalba-Ruipérez & Enrique Orduna-Malea & Adolfo Alonso-Arroyo, 2016. "Identifying institutional relationships in a geographically distributed public health system using interlinking and co-authorship methods," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 1167-1191, March.
    3. George Masterton & Erik J. Olsson & Staffan Angere, 2016. "Linking as voting: how the Condorcet jury theorem in political science is relevant to webometrics," Scientometrics, Springer;Akadémiai Kiadó, vol. 106(3), pages 945-966, March.
    4. Liwen Vaughan & Margaret E. I. Kipp & Yijun Gao, 2007. "Why are Websites co-linked? The case of Canadian universities," Scientometrics, Springer;Akadémiai Kiadó, vol. 72(1), pages 81-92, July.
    5. David Gunnarsson Lorentzen, 2014. "Webometrics benefitting from web mining? An investigation of methods and applications of two research fields," Scientometrics, Springer;Akadémiai Kiadó, vol. 99(2), pages 409-445, May.
    6. Kim Holmberg & Mike Thelwall, 2009. "Local government web sites in Finland: A geographic and webometric analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 79(1), pages 157-169, April.
    7. Tranos, Emmanouil & Incera, Andre Carrascal & Willis, George, 2022. "Using the web to predict regional trade flows: data extraction, modelling, and validation," OSF Preprints 9bu5z, Center for Open Science.
    8. Mike Thelwall, 2017. "Judit Bar-Ilan: information scientist, computer scientist, scientometrician," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1235-1244, December.
    9. Judit Bar-Ilan, 2004. "A microscopic link analysis of academic institutions within a country — the case of Israel," Scientometrics, Springer;Akadémiai Kiadó, vol. 59(3), pages 391-403, March.
    10. Judit Bar-Ilan & Mark Levene, 2015. "The hw-rank: an h-index variant for ranking web pages," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(3), pages 2247-2253, March.
    11. Borisov, Petar & Petrov, Kamen & Tsonkov, Nikolay, 2024. "Integration perspectives for improving regional policy in rural areas of Bulgaria," Agricultural and Resource Economics: International Scientific E-Journal, Agricultural and Resource Economics: International Scientific E-Journal, vol. 10(1), March.
    12. Mike Thelwall, 2012. "Journal impact evaluation: a webometric perspective," Scientometrics, Springer;Akadémiai Kiadó, vol. 92(2), pages 429-441, August.
    13. Abbasiharofteh, Milad & Kinne, Jan & Krüger, Miriam, 2021. "The strength of weak and strong ties in bridging geographic and cognitive distances," ZEW Discussion Papers 21-049, ZEW - Leibniz Centre for European Economic Research.
    14. Han Park & Mike Thelwall, 2008. "Link analysis: Hyperlink patterns and social structure on politicians’ Web sites in South Korea," Quality & Quantity: International Journal of Methodology, Springer, vol. 42(5), pages 687-697, October.
    15. Muhammad Omar & Arif Mehmood & Gyu Sang Choi & Han Woo Park, 2017. "Global mapping of artificial intelligence in Google and Google Scholar," Scientometrics, Springer;Akadémiai Kiadó, vol. 113(3), pages 1269-1305, December.
    16. Benedetto Lepori & Isidro F. Aguillo & Marco Seeber, 2014. "Size of web domains and interlinking behavior of higher education institutions in Europe," Scientometrics, Springer;Akadémiai Kiadó, vol. 100(2), pages 497-518, August.
    17. Frank Bakker & Iina Hellsten, 2013. "Capturing Online Presence: Hyperlinks and Semantic Networks in Activist Group Websites on Corporate Social Responsibility," Journal of Business Ethics, Springer, vol. 118(4), pages 807-823, December.
    18. Young Mee Chung & So Young Yu & Yong Kwang Kim & Su Yeon Kim, 2009. "Characteristics and link structure of a national scholarly Web space: The case of South Korea," Scientometrics, Springer;Akadémiai Kiadó, vol. 80(3), pages 595-612, September.
    19. Font-Julián, Cristina I & Ontalba-Ruipérez, José-Antonio & Orduña-Malea, Enrique & Thelwall, Mike, 2022. "Which types of online resource support US patent claims?," Journal of Informetrics, Elsevier, vol. 16(1).
    20. Liwen Vaughan & Guozhu Wu, 2004. "Links to commercial websites as a source of business information," Scientometrics, Springer;Akadémiai Kiadó, vol. 60(3), pages 487-496, August.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:scient:v:101:y:2014:i:2:d:10.1007_s11192-014-1292-9. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.