IDEAS home Printed from https://ideas.repec.org/a/rge/journl/v2y2014i3p35-45.html
   My bibliography  Save this article

Detección de Idioma en Twitter (Language Detection on Twitter)

Author

Listed:
  • Yudivián Almeida-Cruz

    (Universidad de La Habana)

  • Suilan Estévez-Velarde

    (Universidad de La Habana)

  • Alejandro Piad-Morffis

    (Universidad de La Habana)

Abstract

Resumen El trabajo presenta una alternativa para identificar idiomas en Twitter sin que sea necesario utilizar conjuntos de entrenamiento o información agregada. En dicha alternativa se utilizan técnicas basadas en los algoritmos de reconocimiento de trigramas y small words. Se valora la utilización de estos algoritmos por sí solos y en un modelo de composición. Asimismo, se analiza la incidencia del pre-procesamiento de los tweets en la precisión de la identificación de los idiomas. Finalmente, después de un proceso de experimentación, se determina la mejor alternativa de las estudiadas. English Abstract The paper presents an alternative to identify languages on Twitter without having to use training sets or aggregated information. Such alternative is based on trigram recognition algorithms and small words techniques. The use of these algorithms is evaluated both on their own and in a model of composition. Also, the incidence of pre-processing of tweets in the accuracy of identifying the language is discussed. Finally, after a process of experimentation, the best alternative, out of those studied, is determined. The data obtained were interpreted through analysis and discussion of statistical information. The results indicate that the axes of technological surveillance are applied in the selected universities, emphasizing the following elements of study: competitive surveillance, commercial surveillance, technological surveillance itself and surveillance of the environment.

Suggested Citation

  • Yudivián Almeida-Cruz & Suilan Estévez-Velarde & Alejandro Piad-Morffis, 2014. "Detección de Idioma en Twitter (Language Detection on Twitter)," Revista Internacional de Gestión del Conocimiento y la Tecnología (GECONTEC), Revista Internacional de Gestión del Conocimiento y la Tecnología (GECONTEC), vol. 2(3), pages 35-45.
  • Handle: RePEc:rge:journl:v:2:y:2014:i:3:p:35-45
    DOI: 10.5281/zenodo.7080732
    as

    Download full text from publisher

    File URL: https://gecontec.org/index.php/unesco/article/view/65/55
    File Function: Full text
    Download Restriction: no

    File URL: https://libkey.io/10.5281/zenodo.7080732?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Keywords

    Detección de Idiomas; n-gramas; trigramas; small words; Twitter; Language detection; n-grams; trigrams; small words;
    All these keywords.

    JEL classification:

    • M1 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Administration
    • M15 - Business Administration and Business Economics; Marketing; Accounting; Personnel Economics - - Business Administration - - - IT Management
    • O3 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights
    • O31 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Innovation and Invention: Processes and Incentives
    • O32 - Economic Development, Innovation, Technological Change, and Growth - - Innovation; Research and Development; Technological Change; Intellectual Property Rights - - - Management of Technological Innovation and R&D
    • D8 - Microeconomics - - Information, Knowledge, and Uncertainty
    • D81 - Microeconomics - - Information, Knowledge, and Uncertainty - - - Criteria for Decision-Making under Risk and Uncertainty

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:rge:journl:v:2:y:2014:i:3:p:35-45. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Dr. Luis Camilo Ortigueira Sánchez (email available below). General contact details of provider: https://www.gecontec.org .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.