IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0238694.html
   My bibliography  Save this article

A detection method for android application security based on TF-IDF and machine learning

Author

Listed:
  • Hongli Yuan
  • Yongchuan Tang
  • Wenjuan Sun
  • Li Liu

Abstract

Android is the most widely used mobile operating system (OS). A large number of third-party Android application (app) markets have emerged. The absence of third-party market regulation has prompted research institutions to propose different malware detection techniques. However, due to improvements of malware itself and Android system, it is difficult to design a detection method that can efficiently and effectively detect malicious apps for a long time. Meanwhile, adopting more features will increase the complexity of the model and the computational cost of the system. Permissions play a vital role in the security of the Android apps. Term Frequency—Inverse Document Frequency (TF-IDF) is used to assess the importance of a word for a file set in a corpus. The static analysis method does not need to run the app. It can efficiently and accurately extract the permissions from an app. Based on this cognition and perspective, in this paper, a new static detection method based on TF-IDF and Machine Learning is proposed. The system permissions are extracted in Android application package’s (Apk’s) manifest file. TF-IDF algorithm is used to calculate the permission value (PV) of each permission and the sensitivity value of apk (SVOA) of each app. The SVOA and the number of the used permissions are learned and tested by machine learning. 6070 benign apps and 9419 malware are used to evaluate the proposed approach. The experiment results show that only use dangerous permissions or the number of used permissions can’t accurately distinguish whether an app is malicious or benign. For malware detection, the proposed approach achieve up to 99.5% accuracy and the learning and training time only needs 0.05s. For malware families detection, the accuracy is 99.6%. The results indicate that the method for unknown/new sample’s detection accuracy is 92.71%. Compared against other state-of-the-art approaches, the proposed approach is more effective by detecting malware and malware families.

Suggested Citation

  • Hongli Yuan & Yongchuan Tang & Wenjuan Sun & Li Liu, 2020. "A detection method for android application security based on TF-IDF and machine learning," PLOS ONE, Public Library of Science, vol. 15(9), pages 1-19, September.
  • Handle: RePEc:plo:pone00:0238694
    DOI: 10.1371/journal.pone.0238694
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0238694
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article/file?id=10.1371/journal.pone.0238694&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0238694?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0238694. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    We have no bibliographic references for this item. You can help adding them by using this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.