For short lines, a small number of numbers
If the number "number" and the maximum length are limited, there is a regular expression solution.
The idea is this:
- Put all digits with 20 zeros
- Remove extra zeros using another regular expression. This can be slow due to regexp backtracking .
Assumptions:
- The maximum length of the digits is known in advance (for example, 20).
- All numeric elements can be padded (in other words,
lpad('1 ', 3000, '1 ') will not work because they cannot put padded numbers in varchar2(4000) )
The following query is optimized for the case of "short numbers" (see *? ) And takes 0.4 seconds. However, when using this approach, you need to predetermine the fill length.
select * from ( select dbms_random.string('X', 30) val from xmltable('1 to 1000') ) order by regexp_replace(regexp_replace(val, '(\d+)', lpad('0', 20, '0')||'\1') , '0*?(\d{21})', '\1');
Smart approach
Although a separate natural_sort function may be convenient, there is a little-known trick in pure SQL.
Key ideas:
- We discard leading zeros from all numbers, so
02 ordered between 1 and 3 : regexp_replace(val, '(^|\D)0+(\d+)', '\1\2') . Note: this can lead to an “unexpected” sorting of 10.02 > 10.1 (since 02 converted to 2 ), however there is no single answer on how to sort things like 10.02.03 - Convert
" to "" so quoted text works correctly - Converting an input string to a comma-delimited format:
'"'||regexp_replace(..., '([^0-9]+)', '","\1","')||'"' - Convert csv to list of elements via
xmltable - Stack numeric elements so that the strings are sorted correctly
- Use
length(length(num))||length(num)||num instead of lpad(num, 10, '0') , as the latter is less compact and does not support 11+ digital numbers. Note:
The response time is approximately 3-4 seconds for sorting a list of 1000 random lines of length 30 (generating random lines takes 0.2 seconds). The main consumer of time is xmltable , which breaks the text into lines. If you use PL / SQL instead of xmltable to split a line into lines, the response time is reduced to 0.4 s for the same 1000 lines.
The following query performs a natural sort of 100 random alphanumeric strings:
select * from ( select (select listagg(case when regexp_like(w, '^[0-9]') then length(length(w))||length(w)||w else w end ) within group (order by ord) from xmltable(t.csv columns w varchar2(4000) path '.' , ord for ordinality) q ) order_by , t.* from ( select '"'||regexp_replace(replace( regexp_replace(val, '(^|\D)0+(\d+)', '\1\2') , '"', '""') , '([^0-9]+)', '","\1","')||'"' csv , t.* from ( select dbms_random.string('X', 30) val from xmltable('1 to 100') ) t ) t ) t order by order_by;
The interesting part of this order by can be expressed without subqueries, so this is a handy tool to make your reviewer crazy:
select * from (select dbms_random.string('X', 30) val from xmltable('1 to 100')) t order by ( select listagg(case when regexp_like(w, '^[0-9]') then length(length(w))||length(w)||w else w end ) within group (order by ord) from xmltable('$X' passing xmlquery(('"'||regexp_replace(replace( regexp_replace(t.val, '(^|\D)0+(\d+)', '\1\2') , '"', '""') , '([^0-9]+)', '","\1","')||'"') returning sequence ) as X columns w varchar2(4000) path '.', ord for ordinality) q );