SQL Server isNumeric () equivalent in Amazon redshift

  • I am using amazon redshift as a data store
  • I have a field (field1) of type string. Some lines begin with four numbers and others with letters:

'test alpha'
'1382 test beta'

  • I want to filter the lines where the line does not start with four numbers
  • Looking at the redshift documentation, I don't believe that isnumber or isnumeric are functions. It seems that a "similar" function is the best opportunity.
  • I tried

    where left (field1,4), for example '[0-9] [0-9] [0-9] [0-9]'

this did not work, and from the link below it seems that redshift may not support this:

https://forums.aws.amazon.com/message.jspa?messageID=439850

is there an error in the where clause? if not, and this suggestion is not supported in redshift, is there a filtering method? I was thinking about using

cast(left(field1,4) as integer) 

and then pass the line if it generated an error but not sure how to do this at Amazon redshift. or is there some other proxy file for an isnumeric filter.

thanks

+12
source share
8 answers

Redshift does not seem to support one of the following:

 where left(field1,4) like '[0-9][0-9][0-9][0-9]' where left(field1,4) ~ '^[0-9]{4}' where left(field1,4) like '^[0-9]{4}' 

what works:

 where left(field1,4) between 0 and 9999 

returns all lines starting with four numeric characters.

it seems that although field1 is a type string, the between function interprets left (field1,4) as one when the string characters are numeric (and does not give an error if they are not numeric). I will follow up if I find a problem. For example, I do not do anything less than 1000, so I assume, but I'm not sure, that 0001 is interpreted as 1.

+3
source

Try something like:

 where field1 ~ '^[0-9]{4}' 

It will match any line that starts with 4 digits.

+11
source

Although a lot of time has passed since this question was asked, I did not find an adequate answer. Therefore, I feel obligated to share my solution, which works fine on my Redshift cluster today (March 2016).

UDF Function:

 create or replace function isnumeric (aval VARCHAR(20000)) returns bool IMMUTABLE as $$ try: x = int(aval); except: return (1==2); else: return (1==1); $$ language plpythonu; 

Using:

 select isnumeric(mycolumn), * from mytable where isnumeric(mycolumn)=false 
+9
source

looks like what you are looking for is a similar to ( Redshift doc ) function

 where left(field,4) similar to [0-9]{4} 
+4
source

According to Amazon, posix ~ regex-style expressions work slowly ... https://docs.aws.amazon.com/redshift/latest/dg/pattern-matching-conditions.html

Using their own REGEXP_* functions seems faster. https://docs.aws.amazon.com/redshift/latest/dg/String_functions_header.html

To check only true / false for integers, I successfully used the following. REGEXP_COUNT(my_field_to_check, '^[0-9]+$') > 0

it returns 1 if only numeric, 0 if anything else

+3
source
 where regexp_instr(field1,'^[0-9]{4}') = 0 

will delete lines starting with 4 digits (the above regexp_instr returns 1 for lines with a field of 1 starting with 4 digits)

+2
source

We tried the following and worked in most of our scenarios:

columnn ~ '^ [-] {0,1} [0-9] {1,} [.] {0,1} [0-9] {0,} $'

It will be positive, negative, integer and floating numbers.

+2
source

The redshift should support similarly.

 WHERE field1 SIMILAR TO '[0-9]{4}%' 

This means that field1 starts with 4 characters in the range 0 to 9, and then everything else.

0
source

All Articles