Suppose I want to predict whether a person has class1=healthy or class2= fever . I have a dataset with the following domain: {normal,cold,dizzy}
The transition matrix will contain the probability of transition formed from our training data set, while the initial vector will contain the probability that a person will start (day1) with state x from the domain {normal,cold,dizzy} , again this is also generated from our training kit.
If I want to build a chain of first-order marks, I would generate a 3x3 transition matrix and an initial 1x3 vector for each class:
> TransitionMatrix normal cold dizzy normal NA NA NA cold NA NA NA dizzy NA NA NA >Initial Vector normal cold dizzy [1,] NA NA NA
NA is filled with corresponding probabilities.
1-My question is about transition matrices in a higher order chain. For example, in a second-order MC, we would have a transition matrix of size domain²xdomain² like this:
normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy normal->normal NA NA NA NA NA NA NA NA NA normal->cold NA NA NA NA NA NA NA NA NA normal->dizzy NA NA NA NA NA NA NA NA NA cold->normal NA NA NA NA NA NA NA NA NA cold->cold NA NA NA NA NA NA NA NA NA cold->dizzy NA NA NA NA NA NA NA NA NA dizzy->normal NA NA NA NA NA NA NA NA NA dizzy->cold NA NA NA NA NA NA NA NA NA dizzy->dizzy NA NA NA NA NA NA NA NA NA
here cell (1,1) represents the following sequence: normal->normal->normal->normal
or instead, there will simply be domain²xdomain as follows:
normal cold dizzy normal->normal NA NA NA normal->cold NA NA NA normal->dizzy NA NA NA cold->normal NA NA NA cold->cold NA NA NA cold->dizzy NA NA NA dizzy->normal NA NA NA dizzy->cold NA NA NA dizzy->dizzy NA NA NA
here cell (1,1) represents normal->normal->normal , which is different from the previous view
2 - What about the initial vector for an MC of degree 2. Do we need two initial vectors of size 1xdomain like this:
normal cold dizzy [1,] NA NA NA
leading to two initial vectors per class. the first gives the probability of occurrence {normal,cold,dizzy} on the first day for the class healthy/fever , and the second gives the probability of occurrence on the second day for healthy/fever . this will give 4 initial vectors.
OR we need only one initial vector of size 1xdomain² as follows:
normal->normal normal->cold normal->dizzy cold->normal cold->cold cold->dizzy dizzy->normal dizzy->cold dizzy->dizzy [1,] NA NA NA NA NA NA NA NA NA
I see how the second way of representing the original vector would be problematic if we want to classify an observation with only one state.