SQL Regex - replace a substring from another field

I have a database table (Oracle 11g) of the questionnaire feedback, including multiple selection, several answer questions. The Options column has every value that the user can select, and the Answers column has numerical values ​​for what they selected.

ID_NO OPTIONS ANSWERS 1001 Apple Pie|Banana-Split|Cream Tea 1|2 1002 Apple Pie|Banana-Split|Cream Tea 2|3 1003 Apple Pie|Banana-Split|Cream Tea 1|2|3 

I need a query that will decode the answers, and text versions of the answers as one line.

 ID_NO ANSWERS ANSWER_DECODE 1001 1|2 Apple Pie|Banana-Split 1002 2|3 Banana-Split|Cream Tea 1003 1|2|3 Apple Pie|Banana-Split|Cream Tea 

I experimented with regular expressions to replace values ​​and get substrings, but I cannot develop a way to merge the two correctly.

 WITH feedback AS ( SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL ) SELECT id_no, options, REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, 2) second_option, answers, REGEXP_REPLACE(answers, '(\d)+', ' \1 ') answer_numbers, REGEXP_REPLACE(answers, '(\d)+', REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, To_Number('2'))) "???" FROM feedback 

I do not want to manually define or decode responses in SQL; There are many polls with different questions (and a different number of options), so I hope there will be a solution that will work dynamically for all of them.

I tried to split the parameters and responses into separate lines using LEVEL and reattach them to the codes, but this works very slowly with the actual data set (5-option question with 600 rows of answers).

 WITH feedback AS ( SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL ) SELECT answer_rows.id_no, ListAgg(option_rows.answer) WITHIN GROUP(ORDER BY option_rows.lvl) FROM (SELECT DISTINCT LEVEL lvl, REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, LEVEL) answer FROM (SELECT DISTINCT options, REGEXP_COUNT(options||'|', '(.)+?\|') num_choices FROM feedback) CONNECT BY LEVEL <= num_choices ) option_rows LEFT OUTER JOIN (SELECT DISTINCT id_no, to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) answer FROM (SELECT DISTINCT id_no, answers, To_Number(REGEXP_SUBSTR(answers, '(\d)+$')) max_answer FROM feedback) WHERE to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) IS NOT NULL CONNECT BY LEVEL <= max_answer ) answer_rows ON option_rows.lvl = answer_rows.answer GROUP BY answer_rows.id_no ORDER BY answer_rows.id_no 

If there is no solution using Regex, is there a better way than LEVEL to separate the values? Or is there another approach that will work?

+6
source share
4 answers

It is slow because you expand each line too many times; the sentences associated with you look through all the lines, so you get a huge amount of data to sort - which is probably why you ended up with DISTINCT .

You can add two PRIOR clauses to the connection, firstly, to save ID_NO , and second, to avoid a loop - any non-deterministic function will do this, I chose dbms_random.value but you can use sys_guid if you want, or something else . You also do not need many subqueries; you can do this with two; or like CTEs, which in my opinion are a little clearer:

 WITH feedback AS ( SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL ), option_rows AS ( SELECT id_no, LEVEL answer, REGEXP_SUBSTR(options, '[^|]+', 1, LEVEL) answer_text FROM feedback CONNECT BY LEVEL <= REGEXP_COUNT(options, '[^|]+') AND id_no = PRIOR id_no AND PRIOR dbms_random.value IS NOT NULL ), answer_rows AS ( SELECT id_no, REGEXP_SUBSTR(answers, '[^|]+', 1, LEVEL) answer FROM feedback CONNECT BY LEVEL <= REGEXP_COUNT(answers, '[^|]+') AND PRIOR id_no = id_no AND PRIOR dbms_random.value IS NOT NULL ) SELECT option_rows.id_no, LISTAGG(option_rows.answer, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answers, LISTAGG(option_rows.answer_text, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answer_decode FROM option_rows JOIN answer_rows ON option_rows.id_no = answer_rows.id_no AND option_rows.answer = answer_rows.answer GROUP BY option_rows.id_no ORDER BY option_rows.id_no; 

What gets:

  ID_NO ANSWERS ANSWER_DECODE ---------- ---------- ---------------------------------------- 1001 1|2 Apple Pie|Banana-Split 1002 2|3 Banana-Split|Cream Tea 1003 1|2|3 Apple Pie|Banana-Split|Cream Tea 

I also changed the regular expression pattern, so you do not need to add or delete | .

+1
source

Check out this compact solution:

  with sample_data as ( select 'ala|ma|kota' options, '1|2' answers from dual union all select 'apples|oranges|bacon', '1|2|3' from dual union all select 'a|b|c|d|e|f|h|i','1|3|4|5|8' from dual ) select answers, options, regexp_replace(regexp_replace(options,'([^|]+)\|([^|]+)\|([^|]+)','\' || replace(answers,'|','|\')),'[|]+','|') answer_decode from sample_data; 

Output:

  ANSWERS OPTIONS ANSWER_DECODE --------- -------------------- --------------------------- 1|2 ala|ma|kota ala|ma 1|2|3 apples|oranges|bacon apples|oranges|bacon 1|3|4|5|8 a|b|c|d|e|f|h|ia|c|d|f|h|i 
+1
source

I wrote a close solution in MySQL (now Oracle is not installed), but I wrote what needs to be changed for the query to work in Oracle.

Also, the ugliest part of my code will look much better in Oracle, since it has a much better INSTR function.

The idea is to make a CROSS JOIN with a list of numbers (from 1 to 10 to support up to 10 options in the survey) and split the OPTIONS field into different lines ... (you do this using both the list of numbers and the Oracle INSTR function , see comments).

From there, you filter out rows that have not been selected, and group everything together.

 -- I've used GROUP_CONCAT in MySQL, but in Oracle you'll have to use WM_CONCAT select ID_NO, ANSWERS, group_concat(broken_down_options,'|') `OPTIONS` from ( select your_table.ID_NO, your_table.ANSWERS, -- Luckily, you're using ORACLE so you can use an INSTR function that has the "occurrence" parameter -- INSTR(string, substring, [position, [occurrence]]) -- use the nums.num field as input for the occurrence parameter -- and just put '1' under "position" case when nums.num = 1 then substr(your_table.`OPTIONS`, 1, instr(your_table.`OPTIONS`, '|') - 1) when nums.num = 2 then substr(substr(your_table.`OPTIONS`, instr(your_table.`OPTIONS`, '|') + 1), 1, instr(substr(your_table.`OPTIONS`, instr(your_table.`OPTIONS`, '|') + 1), '|') - 1) else substr(your_table.`OPTIONS`, length(your_table.`OPTIONS`) - instr(reverse(your_table.`OPTIONS`), '|') + 2) end broken_down_options from (select 1 num union all select 2 num union all select 3 num union all select 4 num union all select 5 num union all select 6 num union all select 7 num union all select 8 num union all select 9 num union all select 10 num ) nums CROSS JOIN (select 1001 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '1|2' ANSWERS union select 1002 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '2|3' ANSWERS union select 1003 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '1|2|3' ANSWERS ) your_table -- for example: 2|3 matches 2 and 3 but not 1 where your_table.ANSWERS like concat(concat('%',nums.num),'%') ) some_query group by ID_NO, ANSWERS 
0
source

Create your saved foresight and follow the steps below.

  • Declare an array of your size.
  • Get option data from the first row. Use a regular expression or level to extract values ​​between pipes, and then store them in an array. Note. It will be only a single appeal. Therefore, you do not need to repeat it for each line.
  • Now in a loop for each row, select answers and use the array values ​​to assign the answers
0
source

All Articles