IDEAS home Printed from https://ideas.repec.org/p/iab/iabfme/201410(en).html
   My bibliography  Save this paper

Automating survey coding for occupation

Author

Listed:
  • Schierholz, Malte

    (Institute for Employment Research (IAB), Nuremberg, Germany ; Universität Mannheim, MZES)

Abstract

"Currently, most surveys ask for occupation with open-ended questions. The verbatim responses are coded afterwards into a classification with hundreds of categories and thousands of jobs, which is an error-prone, time-consuming, and costly task. Research related to the coding of occupations is summarized with an international literature review. Special attention is paid to our main topic, the automation of coding. A prominent approach for automated coding is to consult a dictionary on the correct code. In contrast, we focus on data-based methods where codes for new answers are predicted from those answers that are already coded. Four different coding methods are tested on two data sets: (1) Rule-based Coding that consults a dictionary, (2) data-based Naive Bayes that allows coding for text answers with multiple words, (3) data-based Bayesian Categorical is used to improve performance when relatively few answers were coded before, and (4) Combined Methods (Boosting) combining predictions from the first three methods. The proposed Bayesian Categorical model is able to code 38% of all answers at 3% error rate without human interaction. In all remaining cases or for higher quality human intellect is needed to decide on the correct code and computer software can only assist by suggesting possible job codes. With the prototype software we developed for this task, we expect that for 74% of all answers the correct category is provided within the top five code suggestions. The training data used for prediction consists of only 32882 coded answers which is small compared to other systems with similar purpose. The proportions given above are expected to improve with additional training data." (Author's abstract, IAB-Doku) ((en))

Suggested Citation

  • Schierholz, Malte, 2014. "Automating survey coding for occupation," FDZ-Methodenreport 201410 (en), Institut für Arbeitsmarkt- und Berufsforschung (IAB), Nürnberg [Institute for Employment Research, Nuremberg, Germany].
  • Handle: RePEc:iab:iabfme:201410(en)
    as

    Download full text from publisher

    File URL: https://doku.iab.de/fdz/reporte/2014/MR_10-14_EN.pdf
    Download Restriction: no
    ---><---

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:iab:iabfme:201410(en). See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: IAB, Geschäftsbereich Wissenschaftliche Fachinformation und Bibliothek (email available below). General contact details of provider: https://edirc.repec.org/data/iabfzde.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.