Author
Listed:
- Yage Jin
(School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China)
- Hongming Chen
(School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China)
- Rui Ma
(School of Computer Science & Technology, Beijing Institute of Technology, Beijing 100081, China)
- Yanhua Wu
(The Center of National Railway Intelligent Transportation System Engineering and Technology, Beijing 100081, China
Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China)
- Qingxin Li
(The Center of National Railway Intelligent Transportation System Engineering and Technology, Beijing 100081, China
Institute of Computing Technology, China Academy of Railway Sciences Corporation Limited, Beijing 100081, China)
Abstract
The rapid growth of the railway industry has resulted in the accumulation of large structured data that makes data security a critical component of reliable railway system operations. However, existing methods for identifying and classifying often suffer from limitations such as overly coarse identification granularity and insufficient flexibility in classification. To address these issues, we propose ICRSSD, a two-stage method for identification and classification in terms of the railway domain. The identification stage focuses on obtaining the sensitivity of all attributes. We first divide structured data into canonical data and semi-canonical data at a finer granularity to improve the identification accuracy. For canonical data, we use information entropy to calculate the initial sensitivity. Subsequently, we update the attribute sensitivities through cluster analysis and association rule mining. For semi-canonical data, we calculate attribute sensitivity by using a combination of regular expressions and keyword lists. In the classification stage, to further enhance accuracy, we adopt a dynamic and multi-granularity classified strategy. It considers the relative sensitivity of attributes across different scenarios and classifies them into three levels based on the sensitivity values obtained during the identification stage. Additionally, we design a rule base specifically for the identification and classification of sensitive data in the railway domain. This rule base enables effective data identification and classification, while also supporting the expiry management of sensitive attribute labels. To improve the efficiency of regular expression generation, we developed an auxiliary tool with the help of large language models and a well-designed prompt framework. We conducted experiments on a real-world dataset from the railway domain. The results demonstrate that ICRSSD significantly improves the accuracy and adaptability of sensitive data identification and classification in the railway domain.
Suggested Citation
Yage Jin & Hongming Chen & Rui Ma & Yanhua Wu & Qingxin Li, 2025.
"ICRSSD: Identification and Classification for Railway Structured Sensitive Data,"
Future Internet, MDPI, vol. 17(7), pages 1-24, June.
Handle:
RePEc:gam:jftint:v:17:y:2025:i:7:p:294-:d:1691596
Download full text from publisher
Most related items
These are the items that most often cite the same works as this one and are cited by the same works as this one.
Corrections
All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:gam:jftint:v:17:y:2025:i:7:p:294-:d:1691596. See general information about how to correct material in RePEc.
If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.
If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .
If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.
For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: MDPI Indexing Manager (email available below). General contact details of provider: https://www.mdpi.com .
Please note that corrections may take a couple of weeks to filter through
the various RePEc services.