Tuesday, November 29, 2005

Dynamic IN Clauses in SQL Server 2000

Background
Sometimes it is necessary to limit the results of a SELECT statement by some list of values. This is typically done using the IN operator. The following is an example query that uses the IN operator:

SELECT *
FROM TestOrder
WHERE OrderDateTime BETWEEN '1/1/2005' AND '1/4/2005' AND
    State IN ('MO', 'TX')

Often both the list of values used to limit the results of a SELECT statement and the size of that list is dynamic. Consider an application that allows users to search for orders by state. If this application allows users to search for orders in multiple states at the same time, the number of states selected could be different for each search.

This is not a big deal if the application dynamically generates the SQL statements, but if the application is required to use stored procedures for all of its data access, these dynamic lists can cause trouble. In SQL Server 2000 there is no mechanism for passing in an array of values to a stored procedure, so other ways must be employed.

Dynamically Generate SQL in Stored Procedure
One solution is to dynamically generate SQL in the stored procedure. The following is a stored procedure that makes use of this technique:

CREATE PROCEDURE DynamicFindOrders1
    @StartDateTime DATETIME,
    @EndDateTime DATETIME,
    @StateList VARCHAR(7500)
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @DynamicSQL VARCHAR(8000)

    SET @DynamicSQL = 'SELECT * ' +
        'FROM TestOrder ' +
        'WHERE OrderDateTime BETWEEN ''' +
            CONVERT(VARCHAR(10), @StartDateTime, 101) +
            ''' AND ''' +
            CONVERT(VARCHAR(10), @EndDateTime, 101) + ''' AND ' +
            'State IN (' + @StateList + ')'

    SET NOCOUNT OFF;

    EXECUTE (@DynamicSQL)
END

The following is an example call to this stored procedure:
EXEC DynamicFindOrders1 '1/1/2005', '1/4/2005', '''MO'', ''TX'''

This solution works, but some of the benefits of stored procedures are lost when dynamic SQL is introduced. Another version of using dynamic SQL is shown below:

CREATE PROCEDURE DynamicFindOrders2
    @StartDateTime DATETIME,
    @EndDateTime DATETIME,
    @StateList VARCHAR(3000)
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @DynamicSQL NVARCHAR(4000)

    SET @DynamicSQL = 'SELECT * ' +
        'FROM TestOrder ' +
        'WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime AND ' +
            'State IN (' + @StateList + ')'

    SET NOCOUNT OFF;

    EXECUTE sp_executesql @DynamicSQL,
        N'@StartDateTime DATETIME, @EndDateTime DATETIME',
        @StartDateTime, @EndDateTime
END

The following is an example call to this stored procedure:
EXEC DynamicFindOrders2 '1/1/2005', '1/4/2005', '''MO'', ''TX'''

Using a User Defined Function
Another way to dynamically restrict the results of a SELECT statement in a stored procedure is through the use of a user defined function. The following is an example function that returns True (1) if a value appears in a list and False (0) if the value does not appear in the list.

-- Purpose: Indicates if the input value is in the list of values.
-- Returns: True if the indicated value is in the list of values.
-- Arguments:
-- InputValue - Value to look for
-- ListOfValues - Delimeter list of values
-- Delimeter - Delimeter character, cannot be a space (default: ,)
-------------------------------------------------------------------
CREATE FUNCTION IsValueInList(
    @InputValue VARCHAR(7996),
    @ListOfValues VARCHAR(7996),
    @Delimeter VARCHAR(2) = ',')
RETURNS BIT AS
BEGIN
    RETURN CASE WHEN CHARINDEX(@Delimeter + @InputValue + @Delimeter,
            @Delimeter + @ListOfValues + @Delimeter) > 0 THEN 1
        ELSE 0
    END
END

This function uses a comma character (,) as its delimiter by default. The delimiter cannot appear as part of a value in the @ListOfValues argument. A space character cannot be used as a delimiter. The @ListOfValues argument cannot have any white space between items in the list. For example, “MO,TX,MN” would be an acceptable @ListOfValues, but “MO, TX, MN” would not work.

Using this function, the stored procedure used above can be rewritten as follows:

CREATE PROCEDURE FunctionFindOrders1
    @StartDateTime DATETIME,
    @EndDateTime DATETIME,
    @StateList VARCHAR(7996)
AS
BEGIN
    SELECT *
    FROM TestOrder
    WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime AND
        dbo.IsValueInList(State, @StateList, DEFAULT) = 1
END

The following is an example call to this stored procedure:
EXEC FunctionFindOrders1 '1/1/2005', '1/4/2005', 'MO,TX'

This approach eliminates the dynamic SQL in the stored procedure, but it also causes the stored procedure to use a less efficient query plan. Assuming there is an index on the OrderDateTime and State columns, the dynamic SQL solution will often make use of both indexes (i.e. use index seeks). However, in the case of the user defined function only the OrderDateTime index can be used. An index scan (SQL Server reads every row in the index) is performed on the State index to resolve the “dbo.IsValueInList(State, @StateList, DEFAULT) = 1” condition.

Converting List to Table
Another approach is to convert the list of values into an in-memory table and join against it to restrict the results of the SELECT statement. The following user defined function demonstrates how to convert a delimited string into an in-memory table:

-- Purpose: Converts an input string into a single value table.
-- Returns: Table with one VARCHAR(1000) column called Value.
-- Arguments:
-- InputString - String to convert
-- Delimeter - Delimeter character, cannot be a space (default: ,)
-- ExcludeEmpty - If true, do not include empty items (default: False)
-- TrimValues - If true, leading and trailing spaces are removed
-- (default: True)
-------------------------------------------------------------------
CREATE FUNCTION ConvertStringToTable(
    @InputString VARCHAR(8000),
    @Delimeter VARCHAR(2) = ',',
    @ExcludeEmpty BIT = 0,
    @TrimValues BIT = 1)
RETURNS @Values TABLE (Value VARCHAR(1000)) AS
BEGIN

DECLARE @DelimitedList VARCHAR(8000)
DECLARE @Continue BIT
DECLARE @StartPos INTEGER
DECLARE @TextLength INTEGER
DECLARE @DelimeterPosition INTEGER
DECLARE @TextValue VARCHAR(1000)
DECLARE @DelimeterLength INTEGER

IF @TrimValues = 1 BEGIN
    SET @DelimitedList = LTRIM(RTRIM(@InputString))
END ELSE BEGIN
    SET @DelimitedList = @InputString
END

IF LEN(@DelimitedList) = 0 BEGIN
    SET @Continue = 0
END
ELSE BEGIN
    IF LEN(@Delimeter) < 1 BEGIN
        -- Cannot use '', ' ', etc. for a delimeter
        SET @Delimeter = ','
    END
    SET @Continue = 1
    SET @StartPos = 1
    SET @TextLength = LEN(@DelimitedList)
    SET @DelimeterLength = LEN(@Delimeter)
END

WHILE @Continue = 1 BEGIN
    SET @DelimeterPosition = CHARINDEX(@Delimeter,
        SUBSTRING(@DelimitedList, @StartPos,
        @TextLength - @StartPos + @DelimeterLength))
    IF @DelimeterPosition > 0 BEGIN
        IF @TrimValues = 1 BEGIN
            SET @TextValue = RTRIM(LTRIM(SUBSTRING(@DelimitedList,
                @StartPos, (@DelimeterPosition - 1))))
        END ELSE BEGIN
            SET @TextValue = SUBSTRING(@DelimitedList, @StartPos,
                (@DelimeterPosition - 1))
        END
        SET @StartPos = @DelimeterPosition + @StartPos +
            (@DelimeterLength - 1)
    END
    ELSE BEGIN
        IF @TrimValues = 1 BEGIN
            SET @TextValue = RTRIM(LTRIM(SUBSTRING(@DelimitedList,
                @StartPos, (@TextLength - @StartPos) +
                @DelimeterLength)))
        END ELSE BEGIN
            SET @TextValue = SUBSTRING(@DelimitedList, @StartPos,
                (@TextLength - @StartPos) + @DelimeterLength)
        END
        SELECT @Continue = 0
    END
    IF @ExcludeEmpty = 0 OR @TextValue <> '' BEGIN
        INSERT INTO @Values VALUES(@TextValue)
    END
END -- WHILE @Continue = 1 BEGIN

RETURN
END

Unlike the IsValueInList function shown above, this function allows white space in-between values. The following is the example stored procedure rewritten to use the ConvertStringToTable function:

CREATE PROCEDURE FunctionFindOrders2
    @StartDateTime DATETIME,
    @EndDateTime DATETIME,
    @StateList VARCHAR(8000)
AS
BEGIN
    SELECT *
    FROM TestOrder
        INNER JOIN dbo.ConvertStringToTable(@StateList, DEFAULT, 1, 1)
            AS StateList ON StateList.Value = TestOrder.State
    WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime
END

The following is an example call to this stored procedure:
EXEC FunctionFindOrders2 '1/1/2005', '1/4/2005', 'MO,TX'

This approach eliminates the dynamic SQL in the stored procedure, and also performs much better than the IsValueInList function.

In a rather unscientific test, I found that the two dynamic SQL stored procedures perform the best. The ConvertStringToTable stored procedure performs almost as well as the dynamic SQL, but it spends a fair amount of CPU resources to convert input string into a table. The IsValueInList stored procedure performs horribly compared to the other three options.

Things to Think About

Dynamically Generate SQL in Stored Procedure
Pros
  • SQL Query Optimizer can make best use of available indexes
  • Works well with large amounts of data
Cons
  • May be susceptible to SQL Injection attack

Using the IsValueInList User Defined Function
Pros
  • Not susceptible to SQL Injection attack
  • Avoids use of dynamic SQL
Cons
  • SQL Server will have to perform index scans or table scans
  • Does not work well with large amounts of data

Using the ConvertStringToTable User Defined Function
Pros
  • Not susceptible to SQL Injection attack
  • Avoids use of dynamic SQL
Cons
  • Takes time to convert the string into an in-memory table

Other Techniques to Consider
  • Consider restricting the number of items that can be searched on to a fixed amount so that dynamic SQL will not have to be used.

No comments: