Monday, March 31, 2008

Comparing CHAR values

Here is a table with a single row of data.

SQL> create table my_tab (c2 char(2))
2 /

SQL> insert into my_tab (c2) values ('Y ')
2 /

1 row created.

SQL>

Which of the following queries will match that row?

select * from my_tab where c2 = 'Y'
/

select * from my_tab where c2 = 'Y '
/

select * from my_tab where c2 = 'Y '
/

If you said all three you get a cigar (providing you live in a jurisdiction where such infernal devices are still permitted).

SQL> select * from my_tab where c2 = 'Y';
c2
--
Y

SQL> select * from my_tab where c2 = 'Y ';
c2
--
Y

SQL> select * from my_tab where c2 = 'Y ';
c2
--
Y

The reason is due to Oracle's rules for comparing blank-padded datatypes. If the two values are of differing sizes Oracle pads the smaller variable with the requisite number of blank spaces. Obviously it is documented.

On the whole I think this is a boon - CHAR columns are a pain in the neck at the best of times, without having to worry unnecessarily about additional RPAD calls. This is unfortunate if you are relying on 'Y ' being different to 'Y'; but if your application depends on trailing spaces for data integrity then you probably have bigger problems.

8 comments:

Noons said...

why would CHAR still be used as a data type baffles me.

Worked once in an IBM site where all character fields were mandatorily defined as CHAR in all databases, be it db2 or oracle.

The theory was that it made for better performance as VARCHAR implies a lookup for the length while CHAR is fixed length.

Never been able to prove it one way or the other although I'm quite sure in db2 it must be terribly important...

anyways: I digress. good catch, Andrew.

Laurent Schneider said...

note the "like" operator is different.

SCOTT@LSC08> select * from t where x like 'x';
no rows selected
SCOTT@LSC08> select * from t where x like 'x ';
X
--
x
SCOTT@LSC08> select * from t where x like 'x ';
no rows selected

It used to be different until 9.2.0.5 (+- 10%) where they "corrected" to the appropriate behavior which is apparently correct

Anonymous said...

CHAR has to be there for backward compatibility and because, if it wasn't there, people would demand it. Maybe not for very good reasons, but it's free to leave it in.

I've never seen the performance improvements of char comparison over varchar2 actually demonstrated, though I've seen it quoted a few times. I suspect that when it comes to packing out the smallest variable to the length of the longest, any small performance gain is lost.

However, there is the space saving consideration. There is no length byte overhead. Thus for a one character column, it still makes sense to set it to char(1) as opposed to varchar2(1). Yes, disk is cheap and the overall impact on your database space usage is tiny, but heck it's a free saving.

SydOracle said...

Worth adding :
set serveroutput on

declare
v_v1 varchar2(1) := 'Y';
v_v2 varchar2(2) := 'Y ';
v_v3 varchar2(3) := 'Y ';
v_c1 char(1) := 'Y';
v_c2 char(2) := 'Y ';
v_c3 char(3) := 'Y ';
v_cnt number;
begin
select count(*) into v_cnt from my_tab where c2 = v_v1;
dbms_output.put_line ('v1:'||v_cnt);
select count(*) into v_cnt from my_tab where c2 = v_v2;
dbms_output.put_line ('v2:'||v_cnt);
select count(*) into v_cnt from my_tab where c2 = v_v3;
dbms_output.put_line ('v3:'||v_cnt);
select count(*) into v_cnt from my_tab where c2 = v_c1;
dbms_output.put_line ('c1:'||v_cnt);
select count(*) into v_cnt from my_tab where c2 = v_c2;
dbms_output.put_line ('c2:'||v_cnt);
select count(*) into v_cnt from my_tab where c2 = v_c3;
dbms_output.put_line ('c3:'||v_cnt);
end;
/

v1:0
v2:1
v3:0
c1:1
c2:1
c3:1

When you compare a VARCHAR2 to a CHAR, the results differ.
Two conclusions
1. When you use a literal in SQL, it is a CHAR, not a VARCHAR2. So a lot of people use CHAR and don't know it.
2. When comparing a CHAR to a VARCHAR2, the CHAR gets cast to a VARCHAR2, not the other way round.

PS. A nullable CHAR still has to store a length, or at least a null indicator. But the 'nullability' is actually a metadata property, so even a column declared as a NOT NULL CHAR(1) has a byte set aside for length.

Joel Garry said...

The 3rd party stuff I work on only recently, generationally speaking, converted from char to varchar. I'm still finding all sorts of things with trailing blanks, null strangenesses, etc. etc. every time I try to upgrade or mass-process something. But definitely better than when it was char.

word: lhjdwmex

Anonymous said...

Not to mention this on version 10.2.0.3

SQL> SELECT DECODE(c2, 'Y', 'YES', 'NO') none,
2 DECODE(c2, 'Y ', 'YES', 'NO') one,
3 DECODE(c2, 'Y ', 'YES', 'NO') two
4 FROM t;

NON ONE TWO
--- --- ---
NO YES NO
At least case works like a plain select

William Robertson said...

I think some sort of perverse standardisation requirement is behind CHAR, certainly not backward compatibility. It was introduced in its present form in Oracle v6 and has not changed since that day. There was no previous version that relied on blank-padding semantics.

Also there is no length byte and no space saving, as is fairly obvious from any simple test.

William Robertson said...

Correction - now I check, CHAR was introduced in its present form in Oracle v7.0 and has not changed since that day apart an increase in maximum length from the original 255 to 2000.

I'm afraid the early 90s are becoming a bit of a blur.