Monday, January 28, 2008

Is this string a number? Really?

Jared Still ponders this question in a recent blog post. He runs some benchmarks against the various approaches and comes to the conclusion that using TRANSLATE() is the fastest approach. Which is fine, as far as it goes. It's a solution which works for Jared's situation but is not universally applicable.

Note that I have slightly complicated Jared's approach to allow for fake_number values of varying length:
`SQL> select * from detect_numeric  2  order by 1  3  /FAKE_NUMBER--------------------0000000000010000020000100000110000120000200000210000229 rows selected.SQL> select fake_number  2  from detect_numeric  3  where lpad('|', length(fake_number), '|')  4       = translate(fake_number,'0123456789','||||||||||')  5  order by 1  6  /FAKE_N------0000000000010000020000100000110000120000200000210000229 rows selected.SQL>`
So far so good. Let's add another record....
`SQL> insert into detect_numeric values ('123.45')  2  /1 row created.SQL> select fake_number  2  from detect_numeric  3  where lpad('|', length(fake_number), '|')  4       = translate(fake_number,'0123456789','||||||||||')  5  order by 1  6  /FAKE_N------0000000000010000020000100000110000120000200000210000229 rows selected.SQL>`
Wha'pen? Isn't 123.45 is a number? Well, no, not in this context. The TRANSLATE() call is only counting digits. Hmmm, obviously we need to allow for decimal points.
`SQL> select fake_number  2  from detect_numeric  3  where lpad('|', length(fake_number), '|')  4       = translate(fake_number,'0123456789.','|||||||||||')  5  order by 1  6  /FAKE_N------000000000001000002000010000011000012000020000021000022123.4510 rows selected.SQL>`
Problem solved? Not quite. There's more to being numeric than just comprising digits and decimal points. They have to be in the right quantities and right places. Let's add an IP address to the mix....
` SQL> insert into detect_numeric values ('127.0.0.1')  2  /1 row created.SQL> select fake_number  2  from detect_numeric  3  where lpad('|', length(fake_number), '|')  4       = translate(fake_number,'0123456789.','|||||||||||')  5  order by 1  6  /FAKE_NUMBER--------------------000000000001000002000010000011000012000020000021000022123.45127.0.0.111 rows selected.SQL> `
And that's why we might need a function like IS_NUMERIC(), which wraps a TO_NUMBER call:
`SQL> create or replace function is_numeric  2      (p_str in varchar2, p_fmt_msk in varchar2 := null)  3      return varchar2  4  as  5    return_value varchar2(5);  6    n number;  7  begin  8      begin  9         if p_fmt_msk is null then 10         n := to_number(p_str); 11         else 12         n := to_number(p_str, p_fmt_msk); 13         end if; 14         return_value := 'TRUE'; 15      exception 16        when others then 17         return_value := 'FALSE'; 18      end; 19    return return_value; 20  end; 21  /Function created.SQL> column is_numeric format a10SQL> select fake_number, is_numeric(fake_number) is_numeric  2  from detect_numeric  3  order by 2,1  4  /FAKE_NUMBER          IS_NUMERIC-------------------- ----------127.0.0.1            FALSE000000               TRUE000001               TRUE000002               TRUE000010               TRUE000011               TRUE000012               TRUE000020               TRUE000021               TRUE000022               TRUE123.45               TRUE11 rows selected.SQL> `

All of which underlines the importance of understanding the data with which we are working. If we just need to assert that a string consists solely of digits then a simple TRANSLATE() will suffice and will be very efficient. But if we need to assert something more precise - that the string contains a valid number - then we may require a slower but more reliable approach.

Incidentally, anyone who is interested in seeing how to use 10g's regex functionality to winnow numeric strings from non-numeric strings should read this OTN Forum post from CD.

Labels: ,