IDEAS home Printed from https://ideas.repec.org/a/spr/elcore/v20y2020i2d10.1007_s10660-019-09371-6.html
   My bibliography  Save this article

Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM

Author

Listed:
  • Yongcong Luo

    (Nanjing University of Aeronautics and Astronautics)

  • Jing Ma

    (Nanjing University of Aeronautics and Astronautics)

  • Chi Li

    (Cainiao Logistics Co., Ltd.)

Abstract

Commodity information must be matched to HSCode so as to be quickly through customs for export. So it is particularly important to identify entity name in the commodity title of e-commerce platform quickly and accurately. Aim at the problem, an approach based on TWs-LSTM is proposed to identify the entity name of commodity. In this paper, we apply TFIDF algorithm to manipulate text corpus of the commodity for getting the weight matrix of the commodity words. Meanwhile, we use the Word2Vec model to represent the semantic meanings of the words extracted from the bag of words. Then, the weight vector of commodity titles and every word vector of the title are combined into a new one-dimensional vector. We use these one-dimensional vectors to represent the commodity titles, named TWs model. Finally, we put the TWs vector into the LSTM for commodity entity name recognition. In the experimental stage, we compare the TWs-LSTM model with other text processing models for experimental calculation by dividing the commodity entity name data into a training set and a testing set. After applying the TWs-LSTM model, the F1-Score reached 64.58% with the commodity title corpus of the Tmall platform, where the TWs-LSTM achieves a state-of-the-art in comparison with the baseline models and previous studies.

Suggested Citation

  • Yongcong Luo & Jing Ma & Chi Li, 2020. "Entity name recognition of cross-border e-commerce commodity titles based on TWs-LSTM," Electronic Commerce Research, Springer, vol. 20(2), pages 405-426, June.
  • Handle: RePEc:spr:elcore:v:20:y:2020:i:2:d:10.1007_s10660-019-09371-6
    DOI: 10.1007/s10660-019-09371-6
    as

    Download full text from publisher

    File URL: http://link.springer.com/10.1007/s10660-019-09371-6
    File Function: Abstract
    Download Restriction: Access to the full text of the articles in this series is restricted.

    File URL: https://libkey.io/10.1007/s10660-019-09371-6?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    As the access to this document is restricted, you may want to search for a different version of it.

    References listed on IDEAS

    as
    1. Tom Magerman & Bart Looy & Xiaoyan Song, 2010. "Exploring the feasibility and accuracy of Latent Semantic Analysis based text mining techniques to detect similarity between patent documents and scientific publications," Scientometrics, Springer;Akadémiai Kiadó, vol. 82(2), pages 289-306, February.
    2. Iwona Grabska-Gradzińska & Andrzej Kulig & Jarosław Kwapień & Stanisław Drożdż, 2012. "Complex Network Analysis Of Literary And Scientific Texts," International Journal of Modern Physics C (IJMPC), World Scientific Publishing Co. Pte. Ltd., vol. 23(07), pages 1-15.
    3. Kai Hu & Huayi Wu & Kunlun Qi & Jingmin Yu & Siluo Yang & Tianxing Yu & Jie Zheng & Bo Liu, 2018. "A domain keyword analysis approach extending Term Frequency-Keyword Active Index with Google Word2Vec model," Scientometrics, Springer;Akadémiai Kiadó, vol. 114(3), pages 1031-1068, March.
    4. Daqian Wei & Bo Wang & Gang Lin & Dichen Liu & Zhaoyang Dong & Hesen Liu & Yilu Liu, 2017. "Research on Unstructured Text Data Mining and Fault Classification Based on RNN-LSTM with Malfunction Inspection Report," Energies, MDPI, vol. 10(3), pages 1-22, March.
    5. Shuqing Li & Ying Sun & Dagobert Soergel, 2016. "Erratum to: A new method for automatically constructing domain-oriented term taxonomy based on weighted word co-occurrence analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 108(2), pages 1005-1005, August.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Magerman, Tom & Looy, Bart Van & Debackere, Koenraad, 2015. "Does involvement in patenting jeopardize one’s academic footprint? An analysis of patent-paper pairs in biotechnology," Research Policy, Elsevier, vol. 44(9), pages 1702-1713.
    2. Xiang Zhu & Yunqiu Zhang, 2020. "Co-word analysis method based on meta-path of subject knowledge network," Scientometrics, Springer;Akadémiai Kiadó, vol. 123(2), pages 753-766, May.
    3. Hongchen Li & Zhong Yang & Jiaming Han & Shangxiang Lai & Qiuyan Zhang & Chi Zhang & Qianhui Fang & Guoxiong Hu, 2020. "TL-Net: A Novel Network for Transmission Line Scenes Classification," Energies, MDPI, vol. 13(15), pages 1-15, July.
    4. Diego R Amancio, 2015. "Probing the Topological Properties of Complex Networks Modeling Short Written Texts," PLOS ONE, Public Library of Science, vol. 10(2), pages 1-17, February.
    5. Sabrina L. Woltmann & Lars Alkærsig, 2018. "Tracing university–industry knowledge transfer through a text mining approach," Scientometrics, Springer;Akadémiai Kiadó, vol. 117(1), pages 449-472, October.
    6. Chiarello, Filippo & Fantoni, Gualtiero & Hogarth, Terence & Giordano, Vito & Baltina, Liga & Spada, Irene, 2021. "Towards ESCO 4.0 – Is the European classification of skills in line with Industry 4.0? A text mining approach," Technological Forecasting and Social Change, Elsevier, vol. 173(C).
    7. Yonghan Ju & So Young Sohn, 2015. "Identifying patterns in rare earth element patents based on text and data mining," Scientometrics, Springer;Akadémiai Kiadó, vol. 102(1), pages 389-410, January.
    8. Su Jin Seo & Eun Jin Han & So Young Sohn, 2015. "Trend analysis of academic research and technical development pertaining to gas hydrates," Scientometrics, Springer;Akadémiai Kiadó, vol. 105(2), pages 905-920, November.
    9. Solomija Buk & Yuri Krynytskyi & Andrij Rovenchak, 2019. "Properties Of Autosemantic Word Networks In Ukrainian Texts," Advances in Complex Systems (ACS), World Scientific Publishing Co. Pte. Ltd., vol. 22(06), pages 1-22, December.
    10. Puccetti, Giovanni & Giordano, Vito & Spada, Irene & Chiarello, Filippo & Fantoni, Gualtiero, 2023. "Technology identification from patent texts: A novel named entity recognition method," Technological Forecasting and Social Change, Elsevier, vol. 186(PB).
    11. Julie Callaert & Joris Grouwels & Bart Looy, 2012. "Delineating the scientific footprint in technology: Identifying scientific publications within non-patent references," Scientometrics, Springer;Akadémiai Kiadó, vol. 91(2), pages 383-398, May.
    12. Jongchan Kim & Jaehyun Choi & Sangsung Park & Dongsik Jang, 2018. "Patent Keyword Extraction for Sustainable Technology Management," Sustainability, MDPI, vol. 10(4), pages 1-18, April.
    13. Jiyoung Woo & Jaeseok Yun, 2020. "Content Noise Detection Model Using Deep Learning in Web Forums," Sustainability, MDPI, vol. 12(12), pages 1-16, June.
    14. Wagner, Stefan & Sternitzke, Christian & Walter, Sascha, 2022. "Mapping Markush," Research Policy, Elsevier, vol. 51(10).
    15. Chen, Lixin, 2017. "Do patent citations indicate knowledge linkage? The evidence from text similarities between patents and their citations," Journal of Informetrics, Elsevier, vol. 11(1), pages 63-79.
    16. Higham, Kyle & de Rassenfosse, Gaétan & Jaffe, Adam B., 2021. "Patent Quality: Towards a Systematic Framework for Analysis and Measurement," Research Policy, Elsevier, vol. 50(4).
    17. Yixing Wang & Meiqin Liu & Zhejing Bao & Senlin Zhang, 2018. "Short-Term Load Forecasting with Multi-Source Data Using Gated Recurrent Unit Neural Networks," Energies, MDPI, vol. 11(5), pages 1-19, May.
    18. Kai Chen & Rabea Jamil Mahfoud & Yonghui Sun & Dongliang Nan & Kaike Wang & Hassan Haes Alhelou & Pierluigi Siano, 2020. "Defect Texts Mining of Secondary Device in Smart Substation with GloVe and Attention-Based Bidirectional LSTM," Energies, MDPI, vol. 13(17), pages 1-17, September.
    19. Xuefeng Wang & Huichao Ren & Yun Chen & Yuqin Liu & Yali Qiao & Ying Huang, 2019. "Measuring patent similarity with SAO semantic analysis," Scientometrics, Springer;Akadémiai Kiadó, vol. 121(1), pages 1-23, October.
    20. Kai Ding & Chen Yao & Yifan Li & Qinglong Hao & Yaqiong Lv & Zengrui Huang, 2022. "A Review on Fault Diagnosis Technology of Key Components in Cold Ironing System," Sustainability, MDPI, vol. 14(10), pages 1-28, May.

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:spr:elcore:v:20:y:2020:i:2:d:10.1007_s10660-019-09371-6. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: Sonal Shukla or Springer Nature Abstracting and Indexing (email available below). General contact details of provider: http://www.springer.com .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.