Splitting a string into multiple rows in Oracle

I know that PHP and MYSQL answered this question to some extent, but I was wondering if anyone could teach me the simplest approach to splitting a string (comma separated) into multiple lines in Oracle 10g (preferably) and 11g.

The table is as follows:

Name | Project | Error 108 test Err1, Err2, Err3 109 test2 Err1 

I want to create the following:

 Name | Project | Error 108 Test Err1 108 Test Err2 108 Test Err3 109 Test2 Err1 

I saw several potential solutions for the stack, however they only took into account one column (which is a comma separated string). Any help would be greatly appreciated.

+96
string split sql oracle plsql oracle10g tokenize oracle11g
Jan 14 '13 at 23:20
source share
14 answers

This can be an improved way (also with regex and join):

 with temp as ( select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual union all select 109, 'test2', 'Err1' from dual ) select distinct t.name, t.project, trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) as error from temp t, table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels order by name 

EDIT : Here is a simple (as in "not in depth") explanation of the request.

  1. length (regexp_replace(t.error, '[^,]+')) + 1 uses regexp_replace to remove everything that is not a separator (in this case, a comma), and length +1 to get the number of elements (errors).
  2. select level from dual connect by level <= (...) uses a hierarchical query to create a column with a growing number of matches found, from 1 to the total number of errors.

    Preview:

     select level, length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1 as max from dual connect by level <= length (regexp_replace('Err1, Err2, Err3', '[^,]+')) + 1 
  3. table(cast(multiset(.....) as sys.OdciNumberList)) casts oracles.
    • cast(multiset(.....)) as sys.OdciNumberList converts several collections (one collection for each row in the source dataset) into one collection of OdciNumberList numbers.
    • The table() function converts the collection into a result set.
  4. FROM without combining creates a cross-connection between your dataset and multiset. As a result, the row in the data set with 4 matches will be repeated 4 times (with an increase in the number in the column named "column_value").

    Preview:

     select * from temp t, table(cast(multiset(select level from dual connect by level <= length (regexp_replace(t.error, '[^,]+')) + 1) as sys.OdciNumberList)) levels 
  5. trim(regexp_substr(t.error, '[^,]+', 1, levels.column_value)) uses column_value as the nth_appearance / ocurrence parameter for regexp_substr .
  6. You can add some other columns from the dataset (e.g. t.name, t.project ) to simplify visualization.

Some links to Oracle docs:

+104
Nov 25 '14 at 22:05
source share

regular expressions are a wonderful thing :)

 with temp as ( select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual union all select 109, 'test2', 'Err1' from dual ) SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str FROM (SELECT Name, Project, Error str FROM temp) t CONNECT BY instr(str, ',', 1, level - 1) > 0 order by Name 
+28
Jan 15 '13 at 4:12
source share

There is a huge difference between the two below:

  • delimiting a single line
  • Separating rows with separators for multiple rows in a table.

If you do not limit the rows, the CONNECT BY clause will create several rows and will not produce the desired result.

In addition to Regular Expressions , several other alternatives are used:

  • XMLTable
  • MODEL .

Customization

 SQL> CREATE TABLE t ( 2 ID NUMBER GENERATED ALWAYS AS IDENTITY, 3 text VARCHAR2(100) 4 ); Table created. SQL> SQL> INSERT INTO t (text) VALUES ('word1, word2, word3'); 1 row created. SQL> INSERT INTO t (text) VALUES ('word4, word5, word6'); 1 row created. SQL> INSERT INTO t (text) VALUES ('word7, word8, word9'); 1 row created. SQL> COMMIT; Commit complete. SQL> SQL> SELECT * FROM t; ID TEXT ---------- ---------------------------------------------- 1 word1, word2, word3 2 word4, word5, word6 3 word7, word8, word9 SQL> 

Using XMLTABLE :

 SQL> SELECT id, 2 trim(COLUMN_VALUE) text 3 FROM t, 4 xmltable(('"' 5 || REPLACE(text, ',', '","') 6 || '"')) 7 / ID TEXT ---------- ------------------------ 1 word1 1 word2 1 word3 2 word4 2 word5 2 word6 3 word7 3 word8 3 word9 9 rows selected. SQL> 

Using MODEL :

 SQL> WITH 2 model_param AS 3 ( 4 SELECT id, 5 text AS orig_str , 6 ',' 7 || text 8 || ',' AS mod_str , 9 1 AS start_pos , 10 Length(text) AS end_pos , 11 (Length(text) - Length(Replace(text, ','))) + 1 AS element_count , 12 0 AS element_no , 13 ROWNUM AS rn 14 FROM t ) 15 SELECT id, 16 trim(Substr(mod_str, start_pos, end_pos-start_pos)) text 17 FROM ( 18 SELECT * 19 FROM model_param MODEL PARTITION BY (id, rn, orig_str, mod_str) 20 DIMENSION BY (element_no) 21 MEASURES (start_pos, end_pos, element_count) 22 RULES ITERATE (2000) 23 UNTIL (ITERATION_NUMBER+1 = element_count[0]) 24 ( start_pos[ITERATION_NUMBER+1] = instr(cv(mod_str), ',', 1, cv(element_no)) + 1, 25 end_pos[iteration_number+1] = instr(cv(mod_str), ',', 1, cv(element_no) + 1) ) 26 ) 27 WHERE element_no != 0 28 ORDER BY mod_str , 29 element_no 30 / ID TEXT ---------- -------------------------------------------------- 1 word1 1 word2 1 word3 2 word4 2 word5 2 word6 3 word7 3 word8 3 word9 9 rows selected. SQL> 
+28
May 05 '15 at 5:24
source share

A few other examples:

 SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab FROM dual CONNECT BY LEVEL <= regexp_count('Err1, Err2, Err3', ',')+1 / SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab FROM dual CONNECT BY LEVEL <= length('Err1, Err2, Err3') - length(REPLACE('Err1, Err2, Err3', ',', ''))+1 / 

Alternatively, you can use DBMS_UTILITY.comma_to_table and table_to_comma: http://www.oracle-base.com/articles/9i/useful-procedures-and-functions-9i.php#DBMS_UTILITY.comma_to_table

+8
Jan 17 '13 at 20:27
source share

I would like to suggest a different approach using the PIPELINED table function. This is somewhat similar to the XMLTABLE technique, except that you provide your own custom function for splitting a character string:

 -- Create a collection type to hold the results CREATE OR REPLACE TYPE typ_str2tbl_nst AS TABLE OF VARCHAR2(30); / -- Split the string according to the specified delimiter CREATE OR REPLACE FUNCTION str2tbl ( p_string VARCHAR2, p_delimiter CHAR DEFAULT ',' ) RETURN typ_str2tbl_nst PIPELINED AS l_tmp VARCHAR2(32000) := p_string || p_delimiter; l_pos NUMBER; BEGIN LOOP l_pos := INSTR( l_tmp, p_delimiter ); EXIT WHEN NVL( l_pos, 0 ) = 0; PIPE ROW ( RTRIM( LTRIM( SUBSTR( l_tmp, 1, l_pos-1) ) ) ); l_tmp := SUBSTR( l_tmp, l_pos+1 ); END LOOP; END str2tbl; / -- The problem solution SELECT name, project, TRIM(COLUMN_VALUE) error FROM t, TABLE(str2tbl(error)); 

Results:

  NAME PROJECT ERROR ---------- ---------- -------------------- 108 test Err1 108 test Err2 108 test Err3 109 test2 Err1 

The problem with this type of approach is that the optimizer often does not know the power of the table function, and it will have to guess. This can be potentially dangerous for your execution plans, so this solution can be expanded to provide performance statistics for the optimizer.

You can see this optimizer rating by running EXPLAIN PLAN at the above query:

 Execution Plan ---------------------------------------------------------- Plan hash value: 2402555806 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 16336 | 366K| 59 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 16336 | 366K| 59 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 | | 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 8168 | 16336 | 28 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- 

Despite the fact that the collection has only 3 values, the optimizer rated 8168 rows for it (the default value). At first this may seem inappropriate, but the optimizer may be enough to decide on a suboptimal plan.

The solution is to use optimizer extensions to provide statistics for the collection:

 -- Create the optimizer interface to the str2tbl function CREATE OR REPLACE TYPE typ_str2tbl_stats AS OBJECT ( dummy NUMBER, STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList ) RETURN NUMBER, STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo, p_stats OUT SYS.ODCITabFuncStats, p_args IN SYS.ODCIArgDescList, p_string IN VARCHAR2, p_delimiter IN CHAR DEFAULT ',' ) RETURN NUMBER ); / -- Optimizer interface implementation CREATE OR REPLACE TYPE BODY typ_str2tbl_stats AS STATIC FUNCTION ODCIGetInterfaces ( p_interfaces OUT SYS.ODCIObjectList ) RETURN NUMBER AS BEGIN p_interfaces := SYS.ODCIObjectList ( SYS.ODCIObject ('SYS', 'ODCISTATS2') ); RETURN ODCIConst.SUCCESS; END ODCIGetInterfaces; -- This function is responsible for returning the cardinality estimate STATIC FUNCTION ODCIStatsTableFunction ( p_function IN SYS.ODCIFuncInfo, p_stats OUT SYS.ODCITabFuncStats, p_args IN SYS.ODCIArgDescList, p_string IN VARCHAR2, p_delimiter IN CHAR DEFAULT ',' ) RETURN NUMBER AS BEGIN -- I'm using basically half the string lenght as an estimator for its cardinality p_stats := SYS.ODCITabFuncStats( CEIL( LENGTH( p_string ) / 2 ) ); RETURN ODCIConst.SUCCESS; END ODCIStatsTableFunction; END; / -- Associate our optimizer extension with the PIPELINED function ASSOCIATE STATISTICS WITH FUNCTIONS str2tbl USING typ_str2tbl_stats; 

Verification of the final implementation plan:

 Execution Plan ---------------------------------------------------------- Plan hash value: 2402555806 ---------------------------------------------------------------------------------------------- | Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | ---------------------------------------------------------------------------------------------- | 0 | SELECT STATEMENT | | 1 | 23 | 59 (0)| 00:00:01 | | 1 | NESTED LOOPS | | 1 | 23 | 59 (0)| 00:00:01 | | 2 | TABLE ACCESS FULL | T | 2 | 42 | 3 (0)| 00:00:01 | | 3 | COLLECTION ITERATOR PICKLER FETCH| STR2TBL | 1 | 2 | 28 (0)| 00:00:01 | ---------------------------------------------------------------------------------------------- 

As you can see, the power on the plan above is not 8196 guessed value anymore. This is still not true because we are passing a column instead of a string literal to a function.

Some adjustment to the functional code will be necessary to give a closer assessment in this particular case, but I think the general concept is largely explained here.

The str2tbl function used in this answer was originally developed by Tom Keith: https://asktom.oracle.com/pls/asktom/f?p=100:11:07::::P11_QUESTION_ID:110612348061

The concept of associating statistics with object types can be further explored by reading this article: http://www.oracle-developer.net/display.php?id=427

The technique described here works in 10g +.

+6
Oct 27 '16 at 13:55
source share

REGEXP_COUNT was not added until Oracle 11i. Here's the Oracle 10g solution, taken from Art.

 SELECT trim(regexp_substr('Err1, Err2, Err3', '[^,]+', 1, LEVEL)) str_2_tab FROM dual CONNECT BY LEVEL <= LENGTH('Err1, Err2, Err3') - LENGTH(REPLACE('Err1, Err2, Err3', ',', '')) + 1; 
+4
Jul 21 '14 at 20:20
source share

I think the best way I connect and regex function

  with temp as ( select 108 Name, 'test' Project, 'Err1, Err2, Err3' Error from dual union all select 109, 'test2', 'Err1' from dual ) SELECT distinct Name, Project, trim(regexp_substr(str, '[^,]+', 1, level)) str FROM (SELECT Name, Project, Error str FROM temp) t CONNECT BY instr(str, ',', 1, level - 1) > 0 order by Name 

SOURCE

+4
Nov 19 '14 at 13:17
source share

Without using connect or regexp :

  with mytable as ( select 108 name, 'test' project, 'Err1,Err2,Err3' error from dual union all select 109, 'test2', 'Err1' from dual ) ,x as ( select name ,project ,','||error||',' error from mytable ) ,iter as (SELECT rownum AS pos FROM all_objects ) select x.name,x.project ,SUBSTR(x.error ,INSTR(x.error, ',', 1, iter.pos) + 1 ,INSTR(x.error, ',', 1, iter.pos + 1)-INSTR(x.error, ',', 1, iter.pos)-1 ) error from x, iter where iter.pos < = (LENGTH(x.error) - LENGTH(REPLACE(x.error, ','))) - 1; 
+2
Nov 26 '15 at
source share

Here is an alternative implementation using XMLTABLE that allows you to distinguish between different data types:

 select xmltab.txt from xmltable( 'for $text in tokenize("a,b,c", ",") return $text' columns txt varchar2(4000) path '.' ) xmltab ; 

... or if your delimited rows are stored in one or more rows of the table:

 select xmltab.txt from ( select 'a;b;c' inpt from dual union all select 'd;e;f' from dual ) base inner join xmltable( 'for $text in tokenize($input, ";") return $text' passing base.inpt as "input" columns txt varchar2(4000) path '.' ) xmltab on 1=1 ; 
+2
Dec 01 '17 at 13:16
source share

Starting with Oracle 12c, you can use JSON_TABLE and JSON_ARRAY :

 CREATE TABLE tab(Name, Project, Error) AS SELECT 108,'test' ,'Err1, Err2, Err3' FROM dual UNION SELECT 109,'test2','Err1' FROM dual; 

And request:

 SELECT * FROM tab t OUTER APPLY (SELECT TRIM(p) AS p FROM JSON_TABLE(REPLACE(JSON_ARRAY(t.Error), ',', '","'), '$[*]' COLUMNS (p VARCHAR2(4000) PATH '$'))) s; 

Exit:

 β”Œβ”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β” β”‚ Name β”‚ Project β”‚ Error β”‚ P β”‚ β”œβ”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€ β”‚ 108 β”‚ test β”‚ Err1, Err2, Err3 β”‚ Err1 β”‚ β”‚ 108 β”‚ test β”‚ Err1, Err2, Err3 β”‚ Err2 β”‚ β”‚ 108 β”‚ test β”‚ Err1, Err2, Err3 β”‚ Err3 β”‚ β”‚ 109 β”‚ test2 β”‚ Err1 β”‚ Err1 β”‚ β””β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”˜ 

db <> Fiddle demo

+2
Jun 06 '18 at 12:48
source share

I would like to add another method. In this case, recursive queries are used, which I have not seen in other answers. It is supported by Oracle with 11gR2.

 with cte0 as ( select phone_number x from hr.employees ), cte1(xstr,xrest,xremoved) as ( select x, x, null from cte0 union all select xstr, case when instr(xrest,'.') = 0 then null else substr(xrest,instr(xrest,'.')+1) end, case when instr(xrest,'.') = 0 then xrest else substr(xrest,1,instr(xrest,'.') - 1) end from cte1 where xrest is not null ) select xstr, xremoved from cte1 where xremoved is not null order by xstr 

It is quite flexible with a split character. Just change it to INSTR calls.

+1
Aug 17 '15 at 13:54
source share

I had the same problem and xmltable helped me:

SELECT id, trim (COLUMN_VALUE) text FROM t, xmltable (('' '|| REPLACE (text,', ',' "," ') ||' "'))

+1
Jan 14 '18 at 4:34
source share

I used the DBMS_UTILITY.comma_to _table function, actually its work code is as follows

 declare l_tablen BINARY_INTEGER; l_tab DBMS_UTILITY.uncl_array; cursor cur is select * from qwer; rec cur%rowtype; begin open cur; loop fetch cur into rec; exit when cur%notfound; DBMS_UTILITY.comma_to_table ( list => rec.val, tablen => l_tablen, tab => l_tab); FOR i IN 1 .. l_tablen LOOP DBMS_OUTPUT.put_line(i || ' : ' || l_tab(i)); END LOOP; end loop; close cur; end; 

I used my own names for tables and columns

-one
Oct 20 '14 at 12:18
source share
 CREATE FUNCTION dbo.BreakStringIntoRows (@CommadelimitedString varchar(1000)) RETURNS @Result TABLE (Column1 VARCHAR(100)) AS BEGIN DECLARE @IntLocation INT WHILE (CHARINDEX(',', @CommadelimitedString, 0) > 0) BEGIN SET @IntLocation = CHARINDEX(',', @CommadelimitedString, 0) INSERT INTO @Result (Column1) --LTRIM and RTRIM to ensure blank spaces are removed SELECT RTRIM(LTRIM(SUBSTRING(@CommadelimitedString, 0, @IntLocation))) SET @CommadelimitedString = STUFF(@CommadelimitedString, 1, @IntLocation, '') END INSERT INTO @Result (Column1) SELECT RTRIM(LTRIM(@CommadelimitedString))--LTRIM and RTRIM to ensure blank spaces are removed RETURN END GO --Using the UDF to convert comma separated values into rows SELECT * FROM dbo.BreakStringIntoRows('Apple,Banana,Orange') SELECT * FROM dbo.BreakStringIntoRows('Apple , Banana, Orange') 
-2
Apr 11 '16 at 11:59 on
source share



All Articles