Monday, November 07, 2005

Trailing Spaces in SQL Server 2000

String values in SQL Server are rather unique, and I recently discovered some quirks of strings in SQL Server I though I would share. Some of these quirks are documented in SQL Server Books Online and some are not.

Trailing Spaces and the LEN Function
Having programmed in C/C++, VB, C#, and other programming languages, I am very familiar with functions that return the length of strings. In all of the cases that I know of a “X” (a string with a single X character) string is 1 character long and a “X ” (an X character with a trailing space) string is two characters long. This does not appear to be the case in SQL Server. The following is the results of a script I wrote to demonstrate this.

Line Expression        Result    Unicode Exp.      Unicode Result
---- ----------------- --------- ----------------- --------------
 1   LEN('X')          1         LEN(N'X')         1
 2   LEN('X ')         1         LEN(N'X ')        1
 3   LEN('X     ')     1         LEN(N'X    ')     1
 4   LEN(' X ')        2         LEN(N' X ')       2

As you can see, the LEN function ignores trailing spaces. This is actually a documented feature of the function. SQL Server Books Online states the following:

LEN
Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.

If you do want to know the length of a string including trailing spaces, you have a few options. One is to use the DATALENGTH function. SQL Server Books Online defines DATALENGTH as follows:

DATALENGTH
Returns the number of bytes used to represent any expression.

Running the same set of strings through the DATALENGTH function returns the following results:

Line Expression           Result  Unicode Exp.          Unicode R
---- -------------------- ------- --------------------- ---------
 5   DATALENGTH('X')      1       DATALENGTH(N'X')      2
 6   DATALENGTH('X ')     2       DATALENGTH(N'X ')     4
 7   DATALENGTH('X     ') 6       DATALENGTH(N'X     ') 12
 8   DATALENGTH(' X ')    3       DATALENGTH(N' X ')    6

These results are more in line with what you would expect, except you need to be aware of what kind of strings you are dealing with. If you use the DATALENGTH function on a Unicode string the result is actually twice the length of the string. You will need to divide the result by two to get the string length.

Another option would be to create a user defined function that appended a single known character to the end of the input string, perform the LEN function and subtracted one from the result, but I do not see how that is any better than just using the DATALENGTH function.

Comparing Strings with Trailing Spaces
It would appear that trailing spaces have no significance to other SQL Server 2000 operations as well. For example, if I were to compare the string “X” (a string with a single X character) with the string “X ” (an X character with a trailing space), I would expect them to be unequal. However, in SQL Server 2000 these strings are considered equal.

This makes since if the strings in question are CHAR or NCHAR strings, because CHAR and NCHAR strings are always padded with extra spaces to the length of the string. If you have an “X” (a string with a single X character) string and a “X ” (an X character with a trailing space) stored in a CHAR(3) variable or column, the actual value of both strings is “X ” (an X character with two trailing spaces).

If the strings in question are VARCHAR or NVARCHAR strings and ANSI_PADDING is ON, no padding or trimming of string values should take place. If you store an “X” (a string with a single X character) string into a VARCHAR(3) variable or column, an “X” (a string with a single X character) string is exactly what is stored. Likewise, if you store an “X ” (an X character with a trailing space) string into a VARCHAR(3) variable or column, a “X ” (an X character with a trailing space) string is exactly what is stored. Therefore if you compare “X” (a string with a single X character) and “X ” (an X character with a trailing space), one would expect them to be different. That, however, is not the case with SQL Server 2000. Below is a query and the query results that demonstrates this oddity:

DECLARE @NVarChar1 NVARCHAR(10)
DECLARE @NVarChar2 NVARCHAR(10)
 
SET @NVarChar1 = N'X'
SET @NVarChar2 = N'X '
 
SELECT NCHAR(39) + @NVarChar1 + NCHAR(39) AS [@NVarChar1],
      NCHAR(39) + @NVarChar2 + NCHAR(39) AS [@NVarChar2],
      CASE WHEN @NVarChar1 = @NVarChar2 
      THEN N'True' ELSE N'False' END AS [@NVarChar1 = @NVarChar2]

Results:
@NVarChar1   @NVarChar2   @NVarChar1 = @NVarChar2
------------ ------------ -----------------------
'X'          'X '         True

If trailing spaces are significant in your query, one way around this would be to add the same character to the end of both strings when you do the comparison. This will not change the results of strings that are equal, but strings with unequal numbers of trailing spaces would no longer be equal.

SELECT NCHAR(39) + @NVarChar1 + NCHAR(39) AS [@NVarChar1],
      NCHAR(39) + @NVarChar2 + NCHAR(39) AS [@NVarChar2],
      CASE WHEN @NVarChar1 + NCHAR(1) = @NVarChar2 + NCHAR(1)
      THEN N'True' ELSE N'False' END AS [@NVarChar1 = @NVarChar2]
 
Results:
@NVarChar1   @NVarChar2   @NVarChar1 = @NVarChar2
------------ ------------ -----------------------
'X'          'X '         False

Some Lessons Learned
Here are some things to remember about SQL Server 2000:

   LEN('X') = 1
   LEN('X ') = 1
   DATALENGTH('X') = 1
   DATALENGTH('X ') = 2
   'X' = 'X ' returns True

No comments: