Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397
Issue N5 2018 year
The algorithms for computation on strings find increasing use in various applied fields, such as word processing, data compression, cryptography, speech recognition, computational geometry, molecular biology. Their relevance has significantly increased in recent decades with the appearance of new subject areas that require efficient processing of giant character sequences. These areas, in particular, include computer vision, information search in global networks, computational genomics and some others. Many effective algorithms for texts comparing and analyzing are based on preprocessing — a preliminary study of the structure of a string. As a result of this processing, some auxiliary construction is built, which allows one to significantly reduce the number of redundant character comparisons. At this stage, only one of the processed strings (for example, a sample or text) is usually examined, and the other string compared can remain unknown. It is important that preprocessing takes an acceptable amount of time. This is followed by the phase of direct search or analysis of substrings, on which the earlier obtained information about the string is used to reduce the work. This article considers two of the most well-known and effective schemes for preliminary analysis and description of the strings structure — borders arrays (prefix-function) and Z-blocks (Z-function). The author carries out a comparative analysis of their expressive possibilities. He also proposes and justifies a new linear algorithm for transforming a borders array into an array of Z-blocks, discusses some of its modifications that can improve the efficiency of work. The results obtained can be applied to practical study of large character sequences.