Automated Identification of Sensitive Financial Data Based on the Topic Analysis

My bibliography Save this article

Automated Identification of Sensitive Financial Data Based on the Topic Analysis

Author

Listed:

Meng Li
(School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China)
Jiqiang Liu
(School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China)
Yeping Yang
(School of Software Engineering, Beijing Jiaotong University, Beijing 100044, China)

Registered:

Abstract

Data governance is an extremely important protection and management measure throughout the entire life cycle of data. However, there are still data governance issues, such as data security risks, data privacy breaches, and difficulties in data management and access control. These problems lead to a risk of data breaches and abuse. Therefore, the security classification and grading of data has become an important task to accurately identify sensitive data and adopt appropriate maintenance and management measures with different sensitivity levels. This work started from the problems existing in the current data security classification and grading work, such as inconsistent classification and grading standards, difficult data acquisition and sorting, and weak semantic information of data fields, to find the limitations of the current methods and the direction for improvement. The automatic identification method of sensitive financial data proposed in this paper is based on topic analysis and was constructed by incorporating Jieba word segmentation, word frequency statistics, the skip-gram model, K-means clustering, and other technologies. Expert assistance was sought to select appropriate keywords for enhanced accuracy. This work used the descriptive text library and real business data of a Chinese financial institution for training and testing to further demonstrate its effectiveness and usefulness. The evaluation indicators illustrated the effectiveness of this method in the classification of data security. The proposed method addressed the challenge of sensitivity level division in texts with limited semantic information, which overcame the limitations on model expansion across different domains and provided an optimized application model. All of the above pointed out the direction for the real-time updating of the method.

Suggested Citation

Meng Li & Jiqiang Liu & Yeping Yang, 2024. "Automated Identification of Sensitive Financial Data Based on the Topic Analysis," Future Internet, MDPI, vol. 16(2), pages 1-17, February.

Handle: RePEc:gam:jftint:v:16:y:2024:i:2:p:55-:d:1335738

Download full text from publisher

References listed on IDEAS

Mark Chiang & Boris Mirkin, 2010. "Intelligent Choice of the Number of Clusters in K-Means Clustering: An Experimental Study with Different Cluster Spreads," Journal of Classification, Springer;The Classification Society, vol. 27(1), pages 3-40, March.
Abraham, Rene & Schneider, Johannes & vom Brocke, Jan, 2019. "Data governance: A conceptual framework, structured review, and research agenda," International Journal of Information Management, Elsevier, vol. 49(C), pages 424-438.

Full references (including those not matched with items on IDEAS)

Citations

Citations are extracted by the CitEc Project, subscribe to its RSS feed for this item.

Cited by:

Yage Jin & Hongming Chen & Rui Ma & Yanhua Wu & Qingxin Li, 2025. "ICRSSD: Identification and Classification for Railway Structured Sensitive Data," Future Internet, MDPI, vol. 17(7), pages 1-24, June.

Most related items

These are the items that most often cite the same works as this one and are cited by the same works as this one.

Shouxiang Wang & Pengfei Dong & Yingjie Tian, 2017. "A Novel Method of Statistical Line Loss Estimation for Distribution Feeders Based on Feeder Cluster and Modified XGBoost," Energies, MDPI, vol. 10(12), pages 1-17, December.
Eugénie Coche & Ans Kolk & Václav Ocelík, 2024. "Unravelling cross-country regulatory intricacies of data governance: the relevance of legal insights for digitalization and international business," Journal of International Business Policy, Palgrave Macmillan, vol. 7(1), pages 112-127, March.
Pawel Dlotko & Wanling Qiu & Simon Rudkin, 2022. "Topological Data Analysis Ball Mapper for Finance," Papers 2206.03622, arXiv.org.
Antonella Pireddu & Angelico Bedini & Mara Lombardi & Angelo L. C. Ciribini & Davide Berardi, 2024. "A Review of Data Mining Strategies by Data Type, with a Focus on Construction Processes and Health and Safety Management," IJERPH, MDPI, vol. 21(7), pages 1-26, June.
J. Fernando Vera & Rodrigo Macías, 2021. "On the Behaviour of K-Means Clustering of a Dissimilarity Matrix by Means of Full Multidimensional Scaling," Psychometrika, Springer;The Psychometric Society, vol. 86(2), pages 489-513, June.
Struijk, Mylène, 2023. "IT Governance in the digital era : Insights from meta-organizations," Other publications TiSEM a6f02085-ff68-427f-b65a-8, Tilburg University, School of Economics and Management.
Shamim, Saqib & Yang, Yumei & Ul Zia, Najam & Khan, Zaheer & Shariq, Syed Muhammad, 2023. "Mechanisms of cognitive trust development in artificial intelligence among front line employees: An empirical examination from a developing economy," Journal of Business Research, Elsevier, vol. 167(C).
Moritz von Zahn & Jan Zacharias & Maximilian Lowin & Johannes Chen & Oliver Hinz, 2025. "Navigating AI conformity: A design framework to assess fairness, explainability, and performance," Electronic Markets, Springer;IIM University of St. Gallen, vol. 35(1), pages 1-24, December.
Sara Dolnicar & Friedrich Leisch, 2017. "Using segment level stability to select target segments in data-driven market segmentation studies," Marketing Letters, Springer, vol. 28(3), pages 423-436, September.
Muhamad Rizki & Muhammad Zudhy Irawan & Puspita Dirgahayani & Prawira Fajarindra Belgiawan & Retno Wihanesta, 2022. "Low Emission Zone (LEZ) Expansion in Jakarta: Acceptability and Restriction Preference," Sustainability, MDPI, vol. 14(19), pages 1-22, September.
Marc-André Filz & Jan Philipp Bosse & Christoph Herrmann, 2024. "Digitalization platform for data-driven quality management in multi-stage manufacturing systems," Journal of Intelligent Manufacturing, Springer, vol. 35(6), pages 2699-2718, August.
Anassaya Chawviang & Supaporn Kiattisin, 2022. "Sustainable Development: Smart Co-Operative Management Framework," Sustainability, MDPI, vol. 14(6), pages 1-25, March.
Aslani, Mehrdad & Faraji, Jamal & Hashemi-Dezaki, Hamed & Ketabi, Abbas, 2022. "A novel clustering-based method for reliability assessment of cyber-physical microgrids considering cyber interdependencies and information transmission errors," Applied Energy, Elsevier, vol. 315(C).
Rene Abraham & Johannes Schneider & Jan vom Brocke, 2023. "A taxonomy of data governance decision domains in data marketplaces," Electronic Markets, Springer;IIM University of St. Gallen, vol. 33(1), pages 1-13, December.
Nunan, Daniel & Di Domenico, MariaLaura, 2022. "Value creation in an algorithmic world: Towards an ethics of dynamic pricing," Journal of Business Research, Elsevier, vol. 150(C), pages 451-460.
Vítor Ribeiro & João Barata & Paulo Rupino Cunha, 2024. "Modeling inter-organizational business process governance in the age of collaborative networks," Electronic Markets, Springer;IIM University of St. Gallen, vol. 34(1), pages 1-27, December.
J. Fernando Vera & Rodrigo Macías, 2017. "Variance-Based Cluster Selection Criteria in a K-Means Framework for One-Mode Dissimilarity Data," Psychometrika, Springer;The Psychometric Society, vol. 82(2), pages 275-294, June.
Cristina Tortora & Mireille Gettler Summa & Marina Marino & Francesco Palumbo, 2016. "Factor probabilistic distance clustering (FPDC): a new clustering method," Advances in Data Analysis and Classification, Springer;German Classification Society - Gesellschaft für Klassifikation (GfKl);Japanese Classification Society (JCS);Classification and Data Analysis Group of the Italian Statistical Society (CLADAG);International Federation of Classification Societies (IFCS), vol. 10(4), pages 441-464, December.
Jaehong Yu & Hua Zhong & Seoung Bum Kim, 2020. "An Ensemble Feature Ranking Algorithm for Clustering Analysis," Journal of Classification, Springer;The Classification Society, vol. 37(2), pages 462-489, July.
Haoyang Ping & Zhuocheng Li & Xizhu Shen & Haizhen Sun, 2024. "Optimization of Vegetable Restocking and Pricing Strategies for Innovating Supermarket Operations Utilizing a Combination of ARIMA, LSTM, and FP-Growth Algorithms," Mathematics, MDPI, vol. 12(7), pages 1-17, March.

More about this item

Keywords

; ; ;

Statistics

Access and download statistics

Corrections

All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:16:y:2024:i:2:p:55-:d:1335738. See general information about how to correct material in RePEc.

If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .

Please note that corrections may take a couple of weeks to filter through the various RePEc services.

IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.

Browse Econ Literature

More features

Automated Identification of Sensitive Financial Data Based on the Topic Analysis

Author

Abstract

Suggested Citation

Download full text from publisher

References listed on IDEAS

Citations

Most related items

More about this item

Keywords

Statistics

Corrections

More services and features

MyIDEAS

Author registration

Rankings

RePEc Genealogy

RePEc Biblio

MPRA

New papers by email

EconAcademics

Plagiarism

About RePEc

RePEc home

Blog

Help/FAQ

RePEc team

Participating archives

Privacy statement

Help us

Corrections

Volunteers

Get papers listed

Open a RePEc archive

Get RePEc data