IDEAS home Printed from https://ideas.repec.org/a/plo/pone00/0206068.html
   My bibliography  Save this article

Exploring efficient grouping algorithms in regular expression matching

Author

Listed:
  • Chengcheng Xu
  • Jinshu Su
  • Shuhui Chen

Abstract

Background: Regular expression matching (REM) is widely employed as the major tool for deep packet inspection (DPI) applications. For automatic processing, the regular expression patterns need to be converted to a deterministic finite automata (DFA). However, with the ever-increasing scale and complexity of pattern sets, state explosion problem has brought a great challenge to the DFA based regular expression matching. Rule grouping is a direct method to solve the state explosion problem. The original rule set is divided into multiple disjoint groups, and each group is compiled to a separate DFA, thus to significantly restrain the severe state explosion problem when compiling all the rules to a single DFA. Objective: For practical implementation, the total number of DFA states should be as few as possible, thus the data structures of these DFAs can be deployed on fast on-chip memories for rapid access. In addition, to support fast pattern update in some applications, the time cost for grouping should be as small as possible. In this study, we aimed to propose an efficient grouping method, which generates as few states as possible with as little time overhead as possible. Methods: When compiling multiple patterns into a single DFA, the number of DFA states is usually greater than the total number of states when compiling each pattern to a separate DFA. This is mainly caused by the semantic overlaps among different rules. By quantifying the interaction values for each pair of rules, the rule grouping problem can be reduced to the maximum k-cut graph partitioning problem. Then, we propose a heuristic algorithm called the one-step greedy (OSG) algorithm to solve this NP-hard problem. What’s more, a subroutine named the heuristic initialization (HI) algorithm is devised to further optimize the grouping algorithms. Results: We employed three practical rule sets for the experimental evaluation. Results show that the OSG algorithm outperforms the state-of-the-art grouping solutions regarding both the total number of DFA states and time cost for grouping. The HI subroutine also demonstrates its significant optimization effect on the grouping algorithms. Conclusions: The DFA state explosion problem has became the most challenging issue in the regular expression matching applications. Rule grouping is a practical direction by dividing the original rule sets into multiple disjoint groups. In this paper, we investigate the current grouping solutions, and propose a compact and efficient grouping algorithm. Experiments conducted on practical rule sets demonstrate the superiority of our proposal.

Suggested Citation

  • Chengcheng Xu & Jinshu Su & Shuhui Chen, 2018. "Exploring efficient grouping algorithms in regular expression matching," PLOS ONE, Public Library of Science, vol. 13(10), pages 1-14, October.
  • Handle: RePEc:plo:pone00:0206068
    DOI: 10.1371/journal.pone.0206068
    as

    Download full text from publisher

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206068
    Download Restriction: no

    File URL: https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0206068&type=printable
    Download Restriction: no

    File URL: https://libkey.io/10.1371/journal.pone.0206068?utm_source=ideas
    LibKey link: if access is restricted and if your library uses this service, LibKey will redirect you to where you can use your library subscription to access this item
    ---><---

    References listed on IDEAS

    as
    1. David S. Johnson & Cecilia R. Aragon & Lyle A. McGeoch & Catherine Schevon, 1989. "Optimization by Simulated Annealing: An Experimental Evaluation; Part I, Graph Partitioning," Operations Research, INFORMS, vol. 37(6), pages 865-892, December.
    Full references (including those not matched with items on IDEAS)

    Most related items

    These are the items that most often cite the same works as this one and are cited by the same works as this one.
    1. Dell'Amico, Mauro & Trubian, Marco, 1998. "Solution of large weighted equicut problems," European Journal of Operational Research, Elsevier, vol. 106(2-3), pages 500-521, April.
    2. Schlereth, Christian & Stepanchuk, Tanja & Skiera, Bernd, 2010. "Optimization and analysis of the profitability of tariff structures with two-part tariffs," European Journal of Operational Research, Elsevier, vol. 206(3), pages 691-701, November.
    3. Orlin, James & Sharma, Dushyant, 2003. "The Extended Neighborhood: Definition And Characterization," Working papers 4392-02, Massachusetts Institute of Technology (MIT), Sloan School of Management.
    4. Melissa Gama & Bruno Filipe Santos & Maria Paola Scaparra, 2016. "A multi-period shelter location-allocation model with evacuation orders for flood disasters," EURO Journal on Computational Optimization, Springer;EURO - The Association of European Operational Research Societies, vol. 4(3), pages 299-323, September.
    5. Pirlot, Marc, 1996. "General local search methods," European Journal of Operational Research, Elsevier, vol. 92(3), pages 493-511, August.
    6. M Kumral & P A Dowd, 2005. "A simulated annealing approach to mine production scheduling," Journal of the Operational Research Society, Palgrave Macmillan;The OR Society, vol. 56(8), pages 922-930, August.
    7. Noureddine Bouhmala, 2019. "Combining simulated annealing with local search heuristic for MAX-SAT," Journal of Heuristics, Springer, vol. 25(1), pages 47-69, February.
    8. Yiyo Kuo, 2014. "Design method using hybrid of line-type and circular-type routes for transit network system optimization," TOP: An Official Journal of the Spanish Society of Statistics and Operations Research, Springer;Sociedad de Estadística e Investigación Operativa, vol. 22(2), pages 600-613, July.
    9. Gabriel M. Portal & Marcus Ritt & Leonardo M. Borba & Luciana S. Buriol, 2016. "Simulated annealing for the machine reassignment problem," Annals of Operations Research, Springer, vol. 242(1), pages 93-114, July.
    10. Graeme J. Doole & David J. Pannell, 2008. "Optimisation of a Large, Constrained Simulation Model using Compressed Annealing," Journal of Agricultural Economics, Wiley Blackwell, vol. 59(1), pages 188-206, February.
    11. Michele Battistutta & Andrea Schaerf & Tommaso Urli, 2017. "Feature-based tuning of single-stage simulated annealing for examination timetabling," Annals of Operations Research, Springer, vol. 252(2), pages 239-254, May.
    12. Helena Ramalhinho-Lourenço & Daniel Serra, 1998. "Adaptive approach heuristics for the generalized assignment problem," Economics Working Papers 288, Department of Economics and Business, Universitat Pompeu Fabra.
    13. Jeffrey W. Ohlmann & Barrett W. Thomas, 2007. "A Compressed-Annealing Heuristic for the Traveling Salesman Problem with Time Windows," INFORMS Journal on Computing, INFORMS, vol. 19(1), pages 80-90, February.
    14. Kaji, Taichi & Ohuchi, Azuma, 1999. "A simulated annealing algorithm with the random compound move for the sequential partitioning problem of directed acyclic graphs," European Journal of Operational Research, Elsevier, vol. 112(1), pages 147-157, January.
    15. Healy, Patrick & Moll, Robert, 1995. "A new extension of local search applied to the Dial-A-Ride Problem," European Journal of Operational Research, Elsevier, vol. 83(1), pages 83-104, May.
    16. Jose M Jimenez-Guardeño & Ana Maria Ortega-Prieto & Borja Menendez Moreno & Thomas J A Maguire & Adam Richardson & Juan Ignacio Diaz-Hernandez & Javier Diez Perez & Mark Zuckerman & Albert Mercadal Pl, 2022. "Drug repurposing based on a quantum-inspired method versus classical fingerprinting uncovers potential antivirals against SARS-CoV-2," PLOS Computational Biology, Public Library of Science, vol. 18(7), pages 1-22, July.
    17. Dudek, Gregor & Stadtler, Hartmut, 2005. "Negotiation-based collaborative planning between supply chains partners," European Journal of Operational Research, Elsevier, vol. 163(3), pages 668-687, June.
    18. Lim, A. & Rodrigues, B. & Zhang, X., 2006. "A simulated annealing and hill-climbing algorithm for the traveling tournament problem," European Journal of Operational Research, Elsevier, vol. 174(3), pages 1459-1478, November.
    19. Kim, Yeong-Dae & Lim, Hyeong-Gyu & Park, Moon-Won, 1996. "Search heuristics for a flowshop scheduling problem in a printed circuit board assembly process," European Journal of Operational Research, Elsevier, vol. 91(1), pages 124-143, May.
    20. Marc Robini & Pierre-Jean Reissman, 2013. "From simulated annealing to stochastic continuation: a new trend in combinatorial optimization," Journal of Global Optimization, Springer, vol. 56(1), pages 185-215, May.

    More about this item

    Statistics

    Access and download statistics

    Corrections

    All material on this site has been provided by the respective publishers and authors. You can help correct errors and omissions. When requesting a correction, please mention this item's handle: RePEc:plo:pone00:0206068. See general information about how to correct material in RePEc.

    If you have authored this item and are not yet registered with RePEc, we encourage you to do it here. This allows to link your profile to this item. It also allows you to accept potential citations to this item that we are uncertain about.

    If CitEc recognized a bibliographic reference but did not link an item in RePEc to it, you can help with this form .

    If you know of missing items citing this one, you can help us creating those links by adding the relevant references in the same way as above, for each refering item. If you are a registered author of this item, you may also want to check the "citations" tab in your RePEc Author Service profile, as there may be some citations waiting for confirmation.

    For technical questions regarding this item, or to correct its authors, title, abstract, bibliographic or download information, contact: plosone (email available below). General contact details of provider: https://journals.plos.org/plosone/ .

    Please note that corrections may take a couple of weeks to filter through the various RePEc services.

    IDEAS is a RePEc service. RePEc uses bibliographic data supplied by the respective publishers.