What should you do ...
A sincere, completely unscrupulous suggestion is to write a for pair to count all the transitions and state-radiation pairs present in the sequences, then normalize the rows in the two matrices received (transition and emission) so that they add 1. This is what does hmmestimate at the end, and this is probably how you should do it.
However, let it go ahead and force the square snap into the round hole ...
and what could you do
If you combined your sequences together, then this can be run through hmmestimate . This will give the correct emission matrix, but transitions between adjacent sequences will be random with transition probabilities. The trick around this is to increase each sequence with a new unique state and corresponding emission. Thus, all information about concatenations will be assigned to a subset of the output matrix that you can discard.
Example
Let some data be generated, so the input is clear.
% true transitions and emission probabilities tr = [0.9 0.1; 0.05 0.95]; em = [0.9 0.1; 0.2 0.8]; num_seqs = 100; seq_len = 100; seqs = zeros(num_seqs,seq_len); states = zeros(num_seqs,seq_len); % generate some sequences for i = 1:num_seqs [seqs(i,:), states(i,:)] = hmmgenerate(seq_len,tr,em); end
Using hmmestimate to evaluate
Note that MATLAB represents its states as consecutive integers, so we need to use the following integer for our token separator. In the sample example, we use '3'.
% augment the sequences seqs_aug = [3*ones(num_seqs,1) seqs]; states_aug = [3*ones(num_seqs,1) states]; % concatenate the rows, and estimate % credit: http:
Using rng(1) before creating the data above, this gives
tr_hat % [0.9008 0.0992; 0.0490 0.9510] em_hat % [0.9090 0.0910; 0.1950 0.8050]