IDEAS home Printed from https://ideas.repec.org/p/sru/ssewps/2015-03.html
   My bibliography  Save this paper

SiSOB Data Extraction and Codification: A Tool to Analyse Scientific Careers

Author

Listed:
  • Aldo Geuna

    () (Department of Economics and Statistics Cognetti De Martiis, University of Turin, Italy and BRICK, Collegio Carlo Alberto, Moncalieri (Turin), Italy)

  • Rodrigo Kataishi

    () (Department of Economics and Statistics Cognetti De Martiis, University of Turin, Italy and BRICK, Collegio Carlo Alberto, Moncalieri (Turin), Italy)

  • Manuel Toselli

    (Department of Economics and Statistics Cognetti De Martiis, University of Turin, Italy and BRICK, Collegio Carlo Alberto, Moncalieri (Turin), Italy)

  • Eduardo Guzmán

    (Department of Languages and Computer Science, University of Malaga, Spain)

  • Cornelia Lawson

    () (Department of Economics and Statistics Cognetti De Martiis, University of Turin, Italy and BRICK, Collegio Carlo Alberto, Moncalieri (Turin), Italy and School of Sociology and Social Policy, University of Nottingham)

  • Ana Fernandez-Zubieta

    (Institute for Advanced Social Studies - Spanish Council for Scientific Research)

  • Beatriz Barros

    (Department of Languages and Computer Science, University of Malaga, Spain)

Abstract

This paper describes the methodology and software tool used to build a database on the careers and productivity of academics, using public information available on the Internet, and provides a first analysis of the data collected for a sample of 360 US scientists funded by the National Institute of Health (NIH) and 291 UK scientists funded by the Biotechnology and Biological Sciences Research Council (BBSRC). The tool’s structured outputs can be used for either econometric research or data representation for policy analysis. The methodology and software tool is validated for a sample of US and UK biomedical scientists, but can be applied to any countries where scientists’ CVs are available in English. We provide an overview of the motivations for constructing the database, and the data crawling and data mining techniques used to transform webpage-based information and CV information into a relational database. We describe the database and the effectiveness of our algorithms and provide suggestions for further improvements. The software developed is released under free software GNU General Public License; the aim is for it to be available to the community of social scientists and economists interested in analysing scientific production and scientific careers, who it is hoped will develop this tool further.

Suggested Citation

  • Aldo Geuna & Rodrigo Kataishi & Manuel Toselli & Eduardo Guzmán & Cornelia Lawson & Ana Fernandez-Zubieta & Beatriz Barros, 2015. "SiSOB Data Extraction and Codification: A Tool to Analyse Scientific Careers," SPRU Working Paper Series 2015-03, SPRU - Science and Technology Policy Research, University of Sussex.
  • Handle: RePEc:sru:ssewps:2015-03
    as

    Download full text from publisher

    File URL: https://www.sussex.ac.uk/webteam/gateway/file.php?name=2015-03-swps-geuna-etal.pdf&site=25
    Download Restriction: no

    Other versions of this item:

    References listed on IDEAS

    as
    1. Carolina Cañibano & Barry Bozeman, 2009. "Curriculum vitae method in science policy and research evaluation: the state-of-the-art," Research Evaluation, Oxford University Press, vol. 18(2), pages 86-94, June.
    2. Fernández-Zubieta, Ana & Geuna, Aldo & Lawson, Cornelia, 2013. "Researchers’ mobility and its impact on scientific productivity"," Department of Economics and Statistics Cognetti de Martiis. Working Papers 201313, University of Turin.
    3. Loet Leydesdorff & Henry Etzkowitz, 1996. "Emergence of a Triple Helix of university—industry—government relations," Science and Public Policy, Oxford University Press, vol. 23(5), pages 279-286, October.
    4. Azoulay, Pierre & Stellman, Andrew & Zivin, Joshua Graff, 2006. "PublicationHarvester: An open-source software tool for science policy research," Research Policy, Elsevier, vol. 35(7), pages 970-974, September.
    5. Gaughan, Monica & Robin, Stephane, 2004. "National science training policy and early scientific careers in France and the United States," Research Policy, Elsevier, vol. 33(4), pages 569-581, May.
    6. Mangematin, V., 2000. "PhD job market: professional trajectories and incentives during the PhD," Research Policy, Elsevier, vol. 29(6), pages 741-756, June.
    7. Carolina Cañibano & Javier Otamendi & Inés Andújar, 2008. "Measuring and assessing researcher mobility from CV analysis: the case of the Ramón y Cajal programme in Spain," Research Evaluation, Oxford University Press, vol. 17(1), pages 17-31, March.
    8. Cristiano Antonelli & Chiara Franzoni & Aldo Geuna, 2011. "The organization, economics, and policy of scientific research: what we do know and what we don't know--an agenda for research," Industrial and Corporate Change, Oxford University Press, vol. 20(1), pages 201-213, February.
    9. Fernandez-Zubieta, Ana & Geuna, Aldo & Lawson, Cornelia, 2015. "What do We Know of the Mobility of Research Scientists and of its Impact on Scientific Production," Department of Economics and Statistics Cognetti de Martiis. Working Papers 201522, University of Turin.
    10. Dietz, James S. & Bozeman, Barry, 2005. "Academic careers, patents, and productivity: industry experience as scientific and technical human capital," Research Policy, Elsevier, vol. 34(3), pages 349-367, April.
    11. Chiara Franzoni & Giuseppe Scellato & Paula Stephan, 2012. "Foreign Born Scientists: Mobility Patterns for Sixteen Countries," NBER Working Papers 18067, National Bureau of Economic Research, Inc.
    12. Monica Gaughan & Branco Ponomariov, 2008. "Faculty publication productivity, collaboration, and grants velocity: using curricula vitae to compare center-affiliated and unaffiliated scientists," Research Evaluation, Oxford University Press, vol. 17(2), pages 103-110, June.
    13. Ana F Zubieta, 2009. "Recognition and weak ties: is there a positive effect of postdoctoral position on academic performance and career development?," Research Evaluation, Oxford University Press, vol. 18(2), pages 105-115, June.
    14. Monica Gaughan & Barry Bozeman, 2002. "Using curriculum vitae to compare some impacts of NSF research grants with research center funding," Research Evaluation, Oxford University Press, vol. 11(1), pages 17-26, April.
    15. Laudeline Auriol & Max Misu & Rebecca Ann Freeman, 2013. "Careers of Doctorate Holders: Analysis of Labour Market and Mobility Indicators," OECD Science, Technology and Industry Working Papers 2013/4, OECD Publishing.
    16. Jeremy Howells & Ronnie Ramlogan & Shu-Li Cheng, 2012. "Innovation and university collaboration: paradox and complexity within the knowledge economy," Cambridge Journal of Economics, Oxford University Press, vol. 36(3), pages 703-721.
    Full references (including those not matched with items on IDEAS)

    Citations

    Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.
    as


    Cited by:

    1. Lawson, Cornelia & Geuna, Aldo & Ana Fernández-Zubieta & Toselli, Manuel & Kataishi, Rodrigo, 2015. "International Careers of Researchers in Biomedical Sciences: A Comparison of the US and the UK," Department of Economics and Statistics Cognetti de Martiis. Working Papers 201514, University of Turin.
    2. Stéphane Lhuillery & Julio Raffo & Intan Hamdan-Livramento, 2016. "Measuring creativity: Learning from innovation measurement," WIPO Economic Research Working Papers 31, World Intellectual Property Organization - Economics and Statistics Division.
    3. Loet Leydesdorff & Gaston Heimeriks & Daniele Rotolo, 2016. "Journal portfolio analysis for countries, cities, and organizations: Maps and comparisons," Journal of the Association for Information Science & Technology, Association for Information Science & Technology, vol. 67(3), pages 741-748, March.
    4. Stapleton, Lee & Sorrell, Steve & Schwanen, Tim, 2016. "Estimating direct rebound effects for personal automotive travel in Great Britain," Energy Economics, Elsevier, vol. 54(C), pages 313-325.
    5. repec:kap:jtecht:v:42:y:2017:i:4:d:10.1007_s10961-017-9556-1 is not listed on IDEAS

    More about this item

    Keywords

    Information retrieval; Extraction and data integration; Academic careers; Research productivity; Mobility of Research Scientists;

    JEL classification:

    • C81 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Methodology for Collecting, Estimating, and Organizing Microeconomic Data; Data Access
    • C88 - Mathematical and Quantitative Methods - - Data Collection and Data Estimation Methodology; Computer Programs - - - Other Computer Software
    • I23 - Health, Education, and Welfare - - Education - - - Higher Education; Research Institutions
    • O31 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Innovation and Invention: Processes and Incentives

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:sru:ssewps:2015-03. See general information about how to correct material in RePEc.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: (Russell Eke) or (Rebekah McClure). General contact details of provider: http://edirc.repec.org/data/spessuk.html .

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service hosted by the Research Division of the Federal Reserve Bank of St. Louis . RePEc uses bibliographic data supplied by the respective publishers.