Oracle: How can I implement the "natural" order in an SQL query?

eg,

foo1 foo2 foo10 foo100 

but not

 foo1 foo10 foo100 foo2 

Update: I am not interested in sorting coding myself (although this is interesting on my own), but having a database to do sorting for me.

+7
sql oracle
source share
3 answers

You can use functions in your order-by clause. In this case, you can separate the non-numeric and numerical parts of the field and use them as two criteria for ordering.

 select * from t order by to_number(regexp_substr(a,'^[0-9]+')), to_number(regexp_substr(a,'[0-9]+$')), a; 

You can also create a functional index to support this:

 create index t_ix1 on t (to_number(regexp_substr(a, '^[0-9]+')), to_number(regexp_substr(a, '[0-9]+$')), a); 
+8
source share

I use the following function to 0-pad all sequences of digits shorter than 10 that can be found in the value, so that the total length of each of them will be 10 digits. It is compatible even with mixed sets of values ​​that contain one or more sequences of numbers.

 CREATE OR replace function NATURAL_ORDER( P_STR varchar2 ) return varchar2 IS /** -------------------------------------------------------------------- Replaces all sequences of numbers shorter than 10 digits by 0-padded numbers that exactly 10 digits in length. Usefull for ordering-by using NATURAL ORDER algorithm. */ l_result varchar2( 32700 ); l_len integer; l_ix integer; l_end integer; begin l_result := P_STR; l_len := LENGTH( l_result ); l_ix := 1; while l_len > 0 loop l_ix := REGEXP_INSTR( l_result, '[0-9]{1,9}', l_ix, 1, 0 ); EXIT when l_ix = 0; l_end := REGEXP_INSTR( l_result, '[^0-9]|$', l_ix, 1, 0 ); if ( l_end - l_ix >= 10 ) then l_ix := l_end; else l_result := substr( l_result, 1, l_ix - 1 ) || LPAD( SUBSTR( l_result, l_ix, l_end-l_ix ), 10, '0' ) || substr( l_result, l_end ) ; l_ix := l_ix + 10; end if; end loop; return l_result; end; / 

For example:

 select 'ABC' || LVL || 'DEF' as STR from ( select LEVEL as LVL from DUAL start with 1=1 connect by LEVEL <= 35 ) order by NATURAL_ORDER( STR ) 
+3
source share

For short lines, a small number of numbers

If the number "number" and the maximum length are limited, there is a regular expression solution.

The idea is this:

  • Put all digits with 20 zeros
  • Remove extra zeros using another regular expression. This can be slow due to regexp backtracking .

Assumptions:

  • The maximum length of the digits is known in advance (for example, 20).
  • All numeric elements can be padded (in other words, lpad('1 ', 3000, '1 ') will not work because they cannot put padded numbers in varchar2(4000) )

The following query is optimized for the case of "short numbers" (see *? ) And takes 0.4 seconds. However, when using this approach, you need to predetermine the fill length.

 select * from ( select dbms_random.string('X', 30) val from xmltable('1 to 1000') ) order by regexp_replace(regexp_replace(val, '(\d+)', lpad('0', 20, '0')||'\1') , '0*?(\d{21})', '\1'); 

Smart approach

Although a separate natural_sort function may be convenient, there is a little-known trick in pure SQL.

Key ideas:

  • We discard leading zeros from all numbers, so 02 ordered between 1 and 3 : regexp_replace(val, '(^|\D)0+(\d+)', '\1\2') . Note: this can lead to an “unexpected” sorting of 10.02 > 10.1 (since 02 converted to 2 ), however there is no single answer on how to sort things like 10.02.03
  • Convert " to "" so quoted text works correctly
  • Converting an input string to a comma-delimited format: '"'||regexp_replace(..., '([^0-9]+)', '","\1","')||'"'
  • Convert csv to list of elements via xmltable
  • Stack numeric elements so that the strings are sorted correctly
  • Use length(length(num))||length(num)||num instead of lpad(num, 10, '0') , as the latter is less compact and does not support 11+ digital numbers. Note:

The response time is approximately 3-4 seconds for sorting a list of 1000 random lines of length 30 (generating random lines takes 0.2 seconds). The main consumer of time is xmltable , which breaks the text into lines. If you use PL / SQL instead of xmltable to split a line into lines, the response time is reduced to 0.4 s for the same 1000 lines.

The following query performs a natural sort of 100 random alphanumeric strings:

 select * from ( select (select listagg(case when regexp_like(w, '^[0-9]') then length(length(w))||length(w)||w else w end ) within group (order by ord) from xmltable(t.csv columns w varchar2(4000) path '.' , ord for ordinality) q ) order_by , t.* from ( select '"'||regexp_replace(replace( regexp_replace(val, '(^|\D)0+(\d+)', '\1\2') , '"', '""') , '([^0-9]+)', '","\1","')||'"' csv , t.* from ( select dbms_random.string('X', 30) val from xmltable('1 to 100') ) t ) t ) t order by order_by; 

The interesting part of this order by can be expressed without subqueries, so this is a handy tool to make your reviewer crazy:

 select * from (select dbms_random.string('X', 30) val from xmltable('1 to 100')) t order by ( select listagg(case when regexp_like(w, '^[0-9]') then length(length(w))||length(w)||w else w end ) within group (order by ord) from xmltable('$X' passing xmlquery(('"'||regexp_replace(replace( regexp_replace(t.val, '(^|\D)0+(\d+)', '\1\2') , '"', '""') , '([^0-9]+)', '","\1","')||'"') returning sequence ) as X columns w varchar2(4000) path '.', ord for ordinality) q ); 
+2
source share

All Articles