IDEAS home Printed from https://ideas.repec.org/a/prg/jnlaip/vpreprintid309.html

Effect of Dimension Size and Window Size on Word Embedding in Classification Tasks

Author

Listed:
  • Dávid Držík
  • Jozef Kapusta

Abstract

Background: Static word embedding models such as Word2Vec and GloVe remain widely used in natural language processing, yet key hyperparameters are often selected heuristically rather than through systematic validation.Objective: This study provides an extrinsic evaluation of context window size and embedding dimensionality for Word2Vec (CBOW and Skip-gram) and GloVe embeddings in a downstream spam classification task.Methods: Embeddings were trained on a large external corpus and evaluated using a neural network and several classical machine learning classifiers.Results: The results show that context window size has a moderate influence on performance, whereas embedding dimensionality has a clearer effect: values below approximately 50 degrade performance, while increases beyond moderate ranges (approximately 100-150) yield diminishing returns. Across all experiments, Word2Vec achieves higher stability and performance than GloVe.Conclusion: Overall, the findings suggest that robust classification performance can be achieved with moderate embedding dimensionalities and smaller context windows, providing practical guidance for efficient embedding configuration.

Suggested Citation

  • Dávid Držík & Jozef Kapusta, . "Effect of Dimension Size and Window Size on Word Embedding in Classification Tasks," Acta Informatica Pragensia, Prague University of Economics and Business, vol. 0.
  • Handle: RePEc:prg:jnlaip:v:preprint:id:309
    DOI: 10.18267/j.aip.309
    as

    Download full text from publisher

    File URL: http://aip.vse.cz/doi/10.18267/j.aip.309.html
    Download Restriction: free of charge

    File URL: https://libkey.io/10.18267/j.aip.309?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to

    for a different version of it.

    More about this item

    Keywords

    ;
    ;
    ;
    ;
    ;

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:prg:jnlaip:v:preprint:id:309. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Stanislav Vojir (email available below). General contact details of provider: https://edirc.repec.org/data/uevsecz.html .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.