Inconsistent Matlab regex behavior with tokens

I have a set of strings (just a few thousand). I need to make out that they look like this:

'22-213-1-0,0'
'4-23-1-1,0'
'85-572A-1-1,0'
'3-13-1-1,0'
'6-58A-1-1,0'

I want the first number (not a number!), The second number and the letter (if it exists) to be returned separately:

'22' '213' ''
'4'  '23'  ''
'85' '572' 'A'
'3'  '13'  ''
'6'  '58'  'A' 

I used regex for this:

input = {'22-213-1-0,0' '4-23-1-1,0' '85-572A-1-1,0' '3-13-1-1,0' '6-58A-1-1,0'}'

test='(\d*)+[-]+(\d*)+(\w)+[-]\w*';

for i=1:length(input)

    parsedstring=regexp(input(i),test,'tokens');
    output(i,1)=cellfun(@str2num,parsedstring{1}{1}(1));
    output(i,2)=cellfun(@str2num,parsedstring{1}{1}(2));
    letter(i)=parsedstring{1}{1}(3);
end

But the results seem inconsistent: output =

22    21
 4     2
85   572
 3     1
 6    58

letter =

'3'    '3'    'A'    '3'    'A'

Why does a regex sometimes return only the first digit of this second number? I thought this could happen when the first number is only 1 digit long, but the last line proves that sometimes it parses one digit correctly. What am I missing?

+4
source share
3 answers

\w?, \w. , , test ,

test='(\d*)-(\d*)(\w?)-.*';
+1

2 ( , ):

(\d+)-(\d+)([a-zA-Z])?.*
+1

, , .. ^. , :

^(\d+)-(\d+)(\w?)-

? - ( + *), " ".

, . , , (\d*)+, , , - . , .

0
source

All Articles