IDEAS home Printed from https://ideas.repec.org/a/wly/apsmda/v7y1991i1p47-62.html
   My bibliography  Save this article

Recent developments in the statistical processing of textual data

Author

Listed:
  • Ludovic Lebart
  • André Salem
  • Lisette Berry

Abstract

Statisticians are accustomed to processing numerical, ordinal or nominal data. In many circumstances, such as socio‐economic, epidemiologic sample surveys and documentary data bases, this data is juxtaposed with textual data (for example, responses to open questions in surveys). This article presents a series of language‐independent procedures based upon applying multivariate techniques (such as correspondence analysis and clustering) to sets of generalized lexical profiles. The generalized lexical profile of a text is a vector whose components are the frequencies of each word (graphical form) or ‘repeated segment’ (sequence of words appearing with a significant frequency in the text). The processing of such large (and often sparse) vectors and matrices requires special algorithms. The main outputs are the following: (1) printouts of the characteristic words and characteristic responses for each category of respondent (these categories are generally derived from available nominal variables); (2) graphical displays of the proximities between words or segments and categories of respondents; (3) when analysing a combination of several texts: graphical displays of proximities between words or segments and each text, or between words or segments and groupings of texts. The systematic use of ‘repeated segments’ provides a valuable help in interpreting the results from a semantic point of view.

Suggested Citation

  • Ludovic Lebart & André Salem & Lisette Berry, 1991. "Recent developments in the statistical processing of textual data," Applied Stochastic Models and Data Analysis, John Wiley & Sons, vol. 7(1), pages 47-62, March.
  • Handle: RePEc:wly:apsmda:v:7:y:1991:i:1:p:47-62
    DOI: 10.1002/asm.3150070106
    as

    Download full text from publisher

    File URL: https://doi.org/10.1002/asm.3150070106
    Download Restriction: no

    File URL: https://libkey.io/10.1002/asm.3150070106?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:wly:apsmda:v:7:y:1991:i:1:p:47-62. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Wiley Content Delivery (email available below). General contact details of provider: https://doi.org/10.1002/(ISSN)1099-0747 .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.