

These problems are characterized by long runs of identical labels. (17) showed that stacked sequential learning (SSL from now on) performed better than CRF and HMM on a subset of problems called “sequential partitioning problems”. In this thesis, we are concerned with meta-learning strategies.
Sequential learning windows#
Sequential learning has been addressed from different perspectives: from the point of view of meta-learning by means of sliding window techniques, recurrent sliding windows or stacked sequential learning where the method is formulated as a combination of classifiers or from the point of view of graphical models, using for example Hidden Markov Models or Conditional Random Fields.
Sequential learning full#
If we consider the image domain, the sequential learning goal is to classify the pixels of the image taking into account their context, while sequence classification is equivalent to classify one full image as one class. In this case, the problem is to predict a single label for an input sequence. Another related but different problem is sequence classification.
Sequential learning series#
On the other hand, time series prediction has access to real labels up to the current time t and the goal is to predict the label at t + 1. The main difference between both problems lays in the fact that sequential learning has access to the whole data set before any prediction is made and the full set of labels is to be provided at the same time. Sequential learning should not be confused with time series prediction. Usually sequential learning applications consider one-dimensional relationship support, but these types of relationships appear very frequently in other domains, such as images, or video. In sequential learning the training data actually consists of sequences of pairs (x, y), so that neighboring examples exhibit some kind of correlation. assumption and assumes that samples are not independently drawn from a joint distribution of the data samples X and their labels Y. Sequential learning (25) breaks the i.i.d. All these applications present a common feature: the sequence/context of the labels matters. In this case it is very unlikely that patterns such as occur. Another case is part-of-speech tagging in which each example describes a word that is categorized as noun, verb, adjective, etc. In this case, the signature is usually found at the end of the mail, thus important discriminant information is found in the context. Another example can be found in the case of signature section recognition in an e-mail. Thus, discriminant information comes from the alternating pattern, and not just by the samples on their own. A laugh has a clear pattern alternating voice and non-voice segments. Another example is the case of a laughter detection application from voice records. In this case, if one pixel belongs to a certain object category, it is very likely that neighboring pixels also belong to the same object, with the exception of the borders. For example, consider the case of object recognition in image understanding. However, classification problems in real world databases can break this i.i.d. This means that each sample in the data set has the same probability distribution as the others and all are mutually independent. In supervised learning often it is assumed that data is independent and identically distributed (i.i.d ).

One of the most common tasks in ML is supervised learning, where the goal is to learn a general model able to predict the correct label of unseen examples from a set of known labeled input data. Some examples of applications where machine learning has been applied successfully are spam filtering, optical character recognition (OCR), search engines and computer vision. Over the past few decades, machine learning (ML) algorithms have become a very useful tool in tasks where designing and programming explicit, rule-based algorithms are infeasible.
