Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397

Vol. 7, no 1 2016 year

DOI: 10.17587/prin.7.3-13
Real-Time Analysis of Streaming Data for Presence of Keywords and Key Phrases
V. A. Vasenin, vasenin@msu.ru, V. A. Roganov, var@msu.ru, Moscow State University, 119234, Moscow, Russian Federation, M. D. Dzabraev, dzabraew@gmail.com, Scientific Research Institute for System, Analysis of the Russian Academy of Science, 117218, Moscow, Russian Federation
Corresponding author: Vasenin Valery A., Professor, Moscow State University, 119234, Moscow, Russian Federation, e-mail: vasenin@msu.ru
Received on October 15, 2015
Accepted on November 5, 2015

In this article are presented the results of the first stage of research. Among them authors proposed the approach to the problem of detecting keywords and key phrases in streaming data. The key moment for solving this problem is the speed of computational kernel of analyzer, which must provide the real-time processing of intensive data streams. In the article are presented the examples of effective algorithms which solve the formulated problem. Most of them are used widely in many areas of data processing. Proposed approach is a kind of optimization of Rabin—Karp algorithm with use of special form of injective hash function and first fit decrease method (FFDM) combined. Additional special preprocessing of keyword list provides speed-up in two and more times comparing to widely used implementation of Commentz—Walter algorithm. The performance of existing CUDA-grep analyzer was evaluated too. The article contains benchmarks results for all considered analyzers. The parallel version of described algorithm is in a development stage. Simple experiment with such kind of algorithm showed that performance > 100 Gbit/s is reachable using single multi-core CPU. CUDA version of proposed analyzer is planned for implementation too.

Keywords: deep packet inspection, DPI, text analyzer, deterministic finite state automata, nondeterministic finite state automata, perfect hash function, CUDA
pp. 3–13
For citation:
Vasenin V. A., Roganov V. A., Dzabraev M. D. Real-Time Analysis of Streaming Data for Presence of Keywords and Key Phrases, Programmnaya Ingeneria, 2016, vol. 7, no. 1, pp. 3—13.