main| new issue| archive| editorial board| for the authors| publishing house|
Đóññêèé
Main page
New issue
Archive of articles
Editorial board
For the authors
Publishing house

 

 


ABSTRACTS OF ARTICLES OF THE JOURNAL "INFORMATION TECHNOLOGIES".
No. 7. Vol. 29. 2023

DOI: 10.17587/it.29.351-359

G. A. Drachev, PhD Student,
National Research University "Higher School of Economics", Moscow, Russian Federation

Development of an Algorithm for Extracting and Encoding Data from Log Messages of a Computing System for Anomaly Detection Systems

This article is devoted to development of an algorithm for automated analysis and transformation of a log message into a list of features in the form of a fixed-length vector and accumulation of the obtained vectors into a single dataset. The resulted dataset is proposed to be used in machine learning based anomaly detection systems. An additional requirement for the algorithm being developed is the diversity of protocols used to collect log messages in a computer system. These goals were achieved by develop of the software package. The software package collect and parse data from log messages in order to isolate and encode the features from log messages. The software package is enable to collect log messages by several protocols: syslog, SNMP, SQL, reading text and binary files. The data extracted from the log messages of the computing system is considered. The support of LUA scripts for data enrichment is applied. The list of features is generated. The method to encode text data extracted from log messages is proposed. The transformation algorithm of an arbitrary log message into a features vector of fixed dimension is proposed. A methodology for the formation of a dataset for subsequent use in machine learning of the anomaly detection system in a computing system is provided. An example of a dataset storage structure is given. Keywords: computer attacks, information security, anomaly detection, log messages, data preprocessing, dataset formation

P. 351-359

 

References

  1. Shema M. Detecting and Preventing Web Application Security Problems, Hacking Web Apps Syngress, 2012, pp. 171—207.
  2. Barbara D., Wu N., Jajodia S. Detecting novel network intrusions using bayes estimators, Proceedings of First SIAM Conference on Data Mining, 2001.
  3. Bloedorn E., Christiansen A. D., Hill W., Skorupka C., Talbot L. M., Tivel J. Data mining for network intrusion detection: How to get started, Technical report, The MITRE Corpora­tion, 2001.
  4. Lee W., Stolfo S. Data mining approaches for intrusion detection, Proceedings of the 7th USENIX Security Symposium, 1998.
  5. Luo J. Integrating fuzzy logic with data mining methods for intrusion detection, Master's thesis, Department of Computer Science. Mississippi State University, 1999.
  6. Manganaris S., Christensen M., Zerkle D., Hermiz K. A data mining analysis of rtid alarms, Proceedings of the 2nd International Workshop on Recent Advances in Intrusion Detection RAID, 1999.
  7. Matzner S., Sinclair C., Pierce L. An application of machine learning to network intrusion detection, Proceedings of the 15th Annual Computer Security Applications Conference. Phoenix, 1999, pp. 371—377.
  8. Ghosh A. K, Schwartzbard A. A study in using neural networks for anomaly and misuse detection, Proceedings of the Eighth USENIX Security Symposium, 1999, pp. 141—151.
  9. Lippmann R. P., Cunningham R. K. Improving intrusion detection performance using keyword selection and neural networks, Computer Networks, 2000, vol. 34, pp. 597—603.
  10. Joshi M., Credos V. K. Classification using ripple down structure (a case for rare classes), Proceedings of 19th International Conference on Data Engineering, 2003.
  11. Joshi M. V., Agarwal R., Kumar V. Mining needles in a haystack: Classifying rare classes via two-phase rule induction, Proceedings of ACM SIGMOD Conference on Management of Data, 2001.
  12. Joshi M. V., Agarwal R., Kumar V. Predicting rare classes: Can boosting make any weak learner strong, KDD, 2002.
  13. Joshi M. V., Kumar V., Agarwal R. C. Evaluating boosting algorithms to classify rare classes: Comparison and improvements, ICDM, 2001. pp. 257—264.
  14. 14. Lazarevic A., Chawla N. V., Hall L. O., Bowyer K. W. Smoteboost: Improving the prediction of minority class in boosting, Technical Report 2002-136. AHPCRC, 2002.
  15. Fan W., Stolfo S. J., Zhang J., Chan P. K. AdaCost: mis-classification costsensitive boosting, Proceedings 16th International Conf. on Machine Learning, 1999, pp. 97—105.
  16. Denning D. E. An intrusion-detection model, IEEE Transactions on Software Engineering, 1987, SE-13, pp. 222—232.
  17. Javitz H. S., Valdes A. The nides statistical component: Description and justification, Technical report. Computer Science Laboratory. SRI International, 1993.
  18. Ryan J., Lin M., Miikkulainen R. Intrusion detection with neural networks, Proceedings of AAAI-97 Workshop on AI Approaches to Fraud Detection and Risk Management, AAAI Press, 1997, pp. 72—77.
  19. Lee W., Xiang D. Information-theoretic measures for anomaly detection, In IEEE Symposium on Security and Privacy, 2001.
  20. Cabrera J. B. D., Ravichandran B., Mehra R. K. Statistical traffic modeling for network intrusion detection, Proceedings of the 8th International Symposium on Modeling, Analysis and Simula­tion of Computer and Telecommunication Systems, 2000.
  21. Staniford S., Hoagland J. A., McAlerney J. M. Practical automated detection of stealthy portscans, Journal of Computer Security, 2002, no. 10, pp. 105—136.
  22. Yamanishi K., Takeuchi J., Williams G., Milne P. On­line unsupervised oultlier detection using finite mixtures with discounting learning algorithms, KDD, 2000, pp. 320—324.
  23. Ye N., Chen Q. An anomaly detection technique based on a chi-square statistic for detecting intrusions into information systems, Quality and Reliability Engineering International, 2001, no. 17, pp. 105—112.
  24. Eskin E., Arnold A., Prerau M., Portnoy L., Stolfo S. A geometric framework for unsupervised anomaly detection: Detecting intru­sions in unlabeled data, Data Mining for Security Applications, 2002.
  25. Sekar R., Gupta A., Frullo J., Shanbhag T., Tiwari A., Yang H., Zhou S. Specification based anomaly detection: A new approach for detecting network intrusions, ACM Conference on Computer and Communications Security, 2002.
  26. Aggarwal C. C., Yu P. S. Outlier detection for high dimensional data, SIGMOD Conference, 2001.
  27. Breunig M., Kriegel H., Ng R. T., Sander J. Lof: Identifying densitybased local outliers, Proceedings of the ACM SIGMOD Conference, 2000.
  28. Knorr E. M., Ng R. T. Algorithms for mining distance-based outliers in large datasets, Proceedings 24th Int. Conf. Very Large Data Bases. VLDB, 1998, pp. 392—403.
  29. Ramaswamy S., Rastogi R., Shim K. Efficient algorithms for mining outliers from large data sets, Proceedings of the ACM SIGMOD Conference, 2000, pp. 427—438.
  30. Gerhards R. The Syslog Protocol, Network Working Group, 2009.
  31. Case J., Harrington D., Presuhn R., Wijnen B. Message Processing and Dispatching for the Simple Network Management Protocol (SNMP), Network Working Group, 2002.
  32. Lou J., Fu Q., Yang S., Xu Y., Li J. Mining Invariants from Console Logs for System Problem Detection, Proceedings USENIX Annual Technical Conference (ATC), 2010, pp. 231—244.
  33. Roy S., Konig A. C., Dvorkin I., Kumar M. Perfaugur: Robust diagnostics for performance anomalies in cloud services, Proceedings IEEE International Conference on Data Engineering (ICDE), IEEE, 2015, pp. 1167—1178
  34. Xu W., Huang L., Fox A., Patterson D., Jordan M. Detecting large-scale system problems by mining console logs, Proceedings ACM Symposium on Operating Systems Principles (SOSP), 2009, pp. 117—132.
  35. Yu X., Joshi P., Xu J., Jin G., Zhang H., Jiang G. CloudSeer: Workflow Monitoring of Cloud Infrastructures via Interleaved Logs, Proceedings ACM International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2016, pp. 489—502.
  36. Zhang K., Xu J., Min M. R., Jiang G., Pelechrinis K., Zhang H. Automated IT system failure prediction: A deep learning approach, Proceedings IEEE International Conference on Big Data (IEEE BigData), 2016, pp. 1291—1300.
  37. Du M., Li F., Zheng G., Srikumar V. DeepLog: Anomaly Detection and Diagnosis from System Logs through Deep Learning, Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, 2017.
  38. Zuckerman E., Roberts H., McGrady R., York J., Palfrey J. Distributed Denial of Service Attacks Against Independent Media and Human Rights Sites. The Berkman Center for Internet & Society at Harvard University, 2010, pp. 15—20.
  39. PCRE, available at: https://www.pcre.org/original/doc/ html/ (date of access: 10.07.2022).
  40. Federalnoe agenstvo po tehnicheskomu regulirovaniu i metropolii. GOST R 34.11-2012, Moscow, Standartinform, 2012 (in Russian).

To the contents