main| new issue| archive| editorial board| for the authors| publishing house|
Ðóññêèé
Main page
New issue
Archive of articles
Editorial board
For the authors
Publishing house

 

 


ABSTRACTS OF ARTICLES OF THE JOURNAL "INFORMATION TECHNOLOGIES".
No. 1. Vol. 25. 2019

DOI: 10.17587/it.25.53-58

Yu. V. Polishuk, PhD, Associate Professor of Computer Security mathematical software and information systems, Orenburg State University

The Method of Storing Electronic Documents with Semistructured Content

The accompanying operational documentation is formed in the work process of the enterprise, which, as a rule, is represented by a collection of documents, among which for documents of a single type it is possible to single out a general structure, but the design and procedure for placing information in them will be different. The latter is because the documentation is formed on the basis of internal standards of the enterprise or in accordance with the requirements of GOST in which the requirements for the content of documents are defined. The content of documents of this type is semistructured, i.e. is represented by semistructured information. The semistructured information is understood as information in which a certain structure can be identified, but this structure is completely unknown in advance or may change with time. With a long period of the enterprise's work, a large number of electronic documentation with semistructured content is accumulated, as a rule, represented by MS Word documents and minimization of its storage volume is an actual task. The described method of storing electronic documents is to minimize the amount of storage of electronic documents with semistructured content by allocating a single template for processing documents, extracting factual data from documents, and then compressing the template and factographic data. The restoring the document is done by extracting from the archive the factual data of the form template and specified content of the document, and applying the form template to the newly received document content. The method provides compression of electronic documents with semistructured content up to 22 times more efficient than standard archivers.
Keywords: semistructured data; information compression; electronic documents processing

P. 53–58

 

To the contents