Oracle REGEX_SUBSTR Doesn't honor null values

I have a problem with regex_substr not performing a null value.

select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 1) AS phn_nbr, REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 2) AS phn_pos, REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 3) AS phn_typ, REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 4) AS phn_strt_dt, REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5) AS phn_end_dt, REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 6) AS pub_indctr from dual; 

If phn_end_dt is null and pub_indctr is not null, pub_indctr is shifted to phn_end_dt.

Result: -

 PHN_NBR PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR ---------- ------- ------- ----------- ---------- ------------ 2035197553 2 S 14-JUN-14 P 

For now it should be

 PHN_NBR PHN_POS PHN_TYP PHN_STRT_DT PHN_END_DT PUB_INDCTR ---------- ------- ------- ----------- ---------- ------------ 2035197553 2 S 14-JUN-14 P 

Any suggestions?

+1
oracle regex
Aug 27 '14 at 14:13
source share
5 answers

Thanks for pointing me in the right direction, I used this to solve the problem.

SELECT REGEXP_SUBSTR (val, '([^,]*),|$', 1, 1, NULL, 1) phn_nbr , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 2, NULL, 1) phn_pos , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 3, NULL, 1) phn_typ , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 4, NULL, 1) phn_strt_dt , REGEXP_SUBSTR (val, '([^,]*),|$', 1, 5, NULL, 1) phn_end_dt , REGEXP_SUBSTR (val || ',', '([^,]*),|$', 1, 6, NULL, 1) pub_indctr FROM (SELECT '2035197553,2,S,14-JUN-14,,P' val FROM dual );

Oracle Version: - Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production

0
Aug 14 '15 at 13:14
source share

You can solve your problem as follows:

 with t(val) as ( select '2035197553,2,S,14-JUN-14,,P' from dual ), t1 (val) as ( select ',' || val || ',' from t ) select substr(val, REGEXP_INSTR(val, ',', 1, 1) + 1, REGEXP_INSTR(val, ',', 1, 1 + 1) - REGEXP_INSTR(val, ',', 1, 1) - 1) a , substr(val, REGEXP_INSTR(val, ',', 1, 2) + 1, REGEXP_INSTR(val, ',', 1, 2 + 1) - REGEXP_INSTR(val, ',', 1, 2) - 1) b , substr(val, REGEXP_INSTR(val, ',', 1, 3) + 1, REGEXP_INSTR(val, ',', 1, 3 + 1) - REGEXP_INSTR(val, ',', 1, 3) - 1) c , substr(val, REGEXP_INSTR(val, ',', 1, 4) + 1, REGEXP_INSTR(val, ',', 1, 4 + 1) - REGEXP_INSTR(val, ',', 1, 4) - 1) d , substr(val, REGEXP_INSTR(val, ',', 1, 5) + 1, REGEXP_INSTR(val, ',', 1, 5 + 1) - REGEXP_INSTR(val, ',', 1, 5) - 1) e , substr(val, REGEXP_INSTR(val, ',', 1, 6) + 1, REGEXP_INSTR(val, ',', 1, 6 + 1) - REGEXP_INSTR(val, ',', 1, 6) - 1) f from t1 ABCDEF ------------------------------------- 2035197553 2 S 14-JUN-14 - P 
+2
Aug 27 '14 at 14:29
source share

A typical csv parsing approach is as follows:

 WITH t(csv_str) AS ( SELECT '2035197553,2,S,14-JUN-14,,P' FROM dual UNION ALL SELECT '2035197553,2,S,14-JUN-14,,' FROM dual ) SELECT LTRIM(REGEXP_SUBSTR (',' || csv_str, ',[^,]*', 1, 1), ',') AS phn_nbr, LTRIM(REGEXP_SUBSTR (',' || csv_str, ',[^,]*', 1, 2), ',') AS phn_pos, LTRIM(REGEXP_SUBSTR (',' || csv_str, ',[^,]*', 1, 3), ',') AS phn_typ, LTRIM(REGEXP_SUBSTR (',' || csv_str, ',[^,]*', 1, 4), ',') AS phn_strt_dt, LTRIM(REGEXP_SUBSTR (',' || csv_str, ',[^,]*', 1, 5), ',') AS phn_end_dt, LTRIM(REGEXP_SUBSTR (',' || csv_str, ',[^,]*', 1, 6), ',') AS pub_indctr FROM t 

I like to place a comma preceding my csv, and then I would count non-comma commas.

Search pattern explanation

The search pattern looks for the nth substring (the nth corresponds to the nth element in csv), which has the following:

-The program starts with the character ' , '

-Next, followed by the pattern ' [^,] '. This is just an inconsistent list expression. The caret, ^ , indicates that the characters in the list must not match.

. This inconsistent list of characters has a * quantifier, which means that this can happen 0 or more times.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Once a match is found, I would also use the LTRIM function to remove the comma after using the reg expression.

What you like about this approach is that the appearance of the search pattern will always match the occurrences of the comma.

+2
Aug 27 '14 at
source share

I am afraid that your accepted answer does not handle the case when you need a value after the zero position (try to get the 6th field):

 SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 6) phn_end _dt 2 from dual; P - 

You need to do this, instead I believe (works on 11g):

 SQL> select REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '([^,]*)(,|$)', 1, 6, NULL, 1) phn_end_dt 2 from dual; P - P 

I just discovered this after posting my own question: REGEX to select the nth value from a list, allowing null values

+2
Sep 03 '14 at 19:28
source share

You need to change this line,

 REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]+', 1, 5) AS phn_end_dt, 

at

 REGEXP_SUBSTR ('2035197553,2,S,14-JUN-14,,P', '[^,]*', 1, 5) AS phn_end_dt, ^ 

[^,]+ means that it matches any non character , one or more times. [^,]* means that it matches any character not , zero or more times. Thus, [^,]+ assumes that there must be a single character not from [^,]+ But actually not, changing + to * will force the regex engine to match empty characters.

+1
Aug 27 '14 at 14:15
source share



All Articles