Journal "Software Engineering"
a journal on theoretical and applied science and technology
ISSN 2220-3397
Issue N6 2018 year
The paper presents a solution to the problem of estimating the forecast error of statistical probability of presence of a feature sequence in the objects verifying the model of the studied object domain. The author considers objects feature sequence as n-grams. The object is an n-gram of fixed length. A population of "mutating" objects, united by a given criterion (e.g. time), forms evolving multisets of variative power. With such formalization, a common element of two multisets or meme-gram identifies the presence of a feature sequence. The probability of such event is defined as a functional of the number of copies of a meme-gram. The model in which this axiomatics is determined is called a meme-gram model. The presented solution focuses on the issue of estimating relative frequency of repeating elements of multisets with the condition of possibility of forecasting their number only for a part of these elements. The proposed solution is particularly in demand in the field of creating knowledge representation models for self-learning systems in the condition of limited amount of training examples from the total volume of permanently changing Big Data.