Counting runs in matrix columns

I have a matrix 1s and -1s with randomly mixed 0s :

 %// create matrix of 1s and -1s hypwayt = randn(10,5); hypwayt(hypwayt > 0) = 1; hypwayt(hypwayt < 0) = -1; %// create numz random indices at which to insert 0s (pairs of indices may %// repeat, so final number of inserted zeros may be < numz) numz = 15; a = 1; b = 10; r = round((ba).*rand(numz,1) + a); s = round((5-1).*rand(numz,1) + a); for nx = 1:numz hypwayt(r(nx),s(nx)) = 0 end 

Input:

 hypwayt = -1 1 1 1 1 1 -1 1 1 1 1 -1 1 0 0 -1 1 0 -1 1 1 -1 0 0 0 -1 1 -1 -1 -1 1 1 0 1 -1 0 1 -1 1 -1 -1 0 1 1 0 1 -1 0 -1 -1 

I want to count how many times nonzero elements nonzero repeated in a column to create something like this:

The main idea (provided by @rayryeng). For each column, independently of each other, every time you press a unique number, you start to increase the total run counter, and it increases every time you click the same number as the previous one. As soon as you press the new number, it will get reset to 1, unless you press 0, and so that 0

Expected Result:

 hypwayt_runs = 1 1 1 1 1 1 1 2 2 2 2 2 3 0 0 1 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 2 0 1 2 0 3 1 2 3 1 0 1 3 0 1 1 0 1 1 

What is the cleanest way to accomplish this?

+5
source share
4 answers

There should be a better way, I suppose, but it should work

Using cumsum , diff , accumarray and bsxfun

 %// doing the 'diff' along default dim to get the adjacent equality out = [ones(1,size(A,2));diff(A)]; %// Putting all other elements other than zero to 1 out(find(out)) = 1; %// getting all the indexes of 0 elements ind = find(out == 0); %// doing 'diff' on indices to find adjacent indices out1 = [0;diff(ind)]; %// Putting all those elements which are 1 to zero and rest to 1 out1 = 0.*(out1 == 1) + out1 ~= 1; %// counting each unique group number of elements out1 = accumarray(cumsum(out1),1); %// Creating a mask for next operation mask = bsxfun(@le, (1:max(out1)).',out1.'); %// Doing colon operation from 2 to maxsize out1 = bsxfun(@times,mask,(2:size(mask,1)+1).'); %' %// Assign the values from the out1 to corresponding indices of out out(ind) = out1(mask); %// finally replace all elements of A which were zero to zero out(A==0) = 0 

Results:

Input:

 >> A A = -1 1 1 1 1 1 -1 1 1 1 1 -1 1 0 0 -1 1 0 -1 1 1 -1 0 0 0 -1 1 -1 -1 -1 1 1 0 1 -1 0 1 -1 1 -1 -1 0 1 1 0 1 -1 0 -1 -1 

Conclusion:

 >> out out = 1 1 1 1 1 1 1 2 2 2 2 2 3 0 0 1 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 2 0 1 2 0 3 1 2 3 1 0 1 3 0 1 1 0 1 1 
+2
source

As a motivation for Dev-IL, a loop solution is used here. Although the code is readable, I would say that it is slow because you have to go through each element individually.

 hypwayt = [-1 1 1 1 1; 1 -1 1 1 1; 1 -1 1 0 0; -1 1 0 -1 1; 1 -1 0 0 0; -1 1 -1 -1 -1; 1 1 0 1 -1; 0 1 -1 1 -1; -1 0 1 1 0; 1 -1 0 -1 -1]; %// Initialize output array out = ones(size(hypwayt)); %// For each column for idx = 1 : size(hypwayt, 2) %// Previous value initialized as the first row prev = hypwayt(1,idx); %// For each row after this point... for idx2 = 2 : size(hypwayt,1) % // If the current value isn't equal to the previous value... if hypwayt(idx2,idx) ~= prev %// Set the new previous value prev = hypwayt(idx2,idx); %// Case for 0 if hypwayt(idx2,idx) == 0 out(idx2,idx) = 0; end %// Else, reset the value to 1 %// Already done by initialization %// If equal, increment %// Must also check for 0 else if hypwayt(idx2,idx) ~= 0 out(idx2,idx) = out(idx2-1,idx) + 1; else out(idx2,idx) = 0; end end end end 

Output

 >> out out = 1 1 1 1 1 1 1 2 2 2 2 2 3 0 0 1 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 2 0 1 2 0 3 1 2 3 1 0 1 3 0 1 1 0 1 1 
+3
source

Based on rayryeng's answer , below my loop based solution.

Inputs

 hypwayt = [ -1 1 1 1 1 1 -1 1 1 1 1 -1 1 0 0 -1 1 0 -1 1 1 -1 0 0 0 -1 1 -1 -1 -1 1 1 0 1 -1 0 1 -1 1 -1 -1 0 1 1 0 1 -1 0 -1 -1 ]; expected_out = [ 1 1 1 1 1 1 1 2 2 2 2 2 3 0 0 1 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 2 0 1 2 0 3 1 2 3 1 0 1 3 0 1 1 0 1 1 ]; 

Actual code:

 CNT_INIT = 2; %// a constant representing an initialized counter out = hypwayt; %// "preallocation" out(2:end,:) = diff(out); %// ...we'll deal with the top row later hyp_nnz = hypwayt~=0; %// nonzero mask for later brevity cnt = CNT_INIT; %// first initialization of the counter for ind1 = 2:numel(out) switch abs(out(ind1)) case 2 %// switch from -1 to 1 and vice versa: out(ind1) = 1; cnt = CNT_INIT; case 0 %// means we have the same number again: out(ind1) = cnt*hyp_nnz(ind1); %//put cnt unless we're zero cnt = cnt+1; case 1 %// means we transitioned to/from zero: out(ind1) = hyp_nnz(ind1); %// was it a nonzero element? cnt = CNT_INIT; end end %// Finally, take care of the top row: out(1,:) = hyp_nnz(1,:); 

Validation:

 assert(isequal(out,expected_out)) 

I suppose this could be simplified further by using some of the "complex" MATLAB functions, but IMHO looks pretty elegant :)

Note: the top line out calculated twice (once in a loop and once at the end), so that it is half the inefficiency associated with computational values. However, this allows you to translate all the logic into a single cycle running on numel() , which, in my opinion, justifies this tiny bit of extra computation.

+1
source

This is a good problem, and since @rayryeng did not offer a vectorized solution, here's mine in a few lines - okay, this is dishonest, it took me half a day to finish this. The basic idea is to use cumsum as the final function.

 p = size(hypwayt,2); % keep nb of columns in mind % H1 is the mask of consecutive identical values, but kept as an array of double (it will be incremented later) H1 = [zeros(1,p);diff(hypwayt)==0]; % H2 is the mask of elements where a consecutive sequence of identical values ends. Note the first line of trues. H2 = [true(1,p);diff(~H1)>0]; % 1st trick: compute the vectorized cumsum of H1 H3 = cumsum(H1(:)); % 2nd trick: take the diff of H3(H2). % it results in a vector of the lengths of consecutive sequences of identical values, interleaved with some zeros. % substract it to H1 at the same locations H1(H2) = H1(H2)-[0;diff(H3(H2))]; % H1 is ready to be cumsummed! Add one to the array, all lengths are decreased by one. Output = cumsum(H1)+1; % last force input zeros to be zero Output(hypwayt==0) = 0; 

And the expected result:

 Output = 1 1 1 1 1 1 1 2 2 2 2 2 3 0 0 1 1 0 1 1 1 1 0 0 0 1 1 1 1 1 1 2 0 1 2 0 3 1 2 3 1 0 1 3 0 1 1 0 1 1 

Let me add some explanation. The big trick, of course, is the second, it took me a while to figure out how to quickly calculate the lengths of consecutive identical values. The first one is just a little trick to figure it all out without any cycles. If you cumsum H1 directly, you will get the result with some offsets. These offsets are removed using the cumsum-compatible method, using the local difference in some key values ​​and removing them immediately after the ends of these sequences. These special values ​​are numbered, I also take the first row (the first row of H2 ): each element of the first column is considered different from the last element of the previous column.

Hopefully this will become clearer now (and there is no shortage in some special case ...).

+1
source

All Articles