Tuesday, November 29, 2005

Dynamic IN Clauses in SQL Server 2000

Background
Sometimes it is necessary to limit the results of a SELECT statement by some list of values. This is typically done using the IN operator. The following is an example query that uses the IN operator:

SELECT *
FROM TestOrder
WHERE OrderDateTime BETWEEN '1/1/2005' AND '1/4/2005' AND
    State IN ('MO', 'TX')

Often both the list of values used to limit the results of a SELECT statement and the size of that list is dynamic. Consider an application that allows users to search for orders by state. If this application allows users to search for orders in multiple states at the same time, the number of states selected could be different for each search.

This is not a big deal if the application dynamically generates the SQL statements, but if the application is required to use stored procedures for all of its data access, these dynamic lists can cause trouble. In SQL Server 2000 there is no mechanism for passing in an array of values to a stored procedure, so other ways must be employed.

Dynamically Generate SQL in Stored Procedure
One solution is to dynamically generate SQL in the stored procedure. The following is a stored procedure that makes use of this technique:

CREATE PROCEDURE DynamicFindOrders1
    @StartDateTime DATETIME,
    @EndDateTime DATETIME,
    @StateList VARCHAR(7500)
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @DynamicSQL VARCHAR(8000)

    SET @DynamicSQL = 'SELECT * ' +
        'FROM TestOrder ' +
        'WHERE OrderDateTime BETWEEN ''' +
            CONVERT(VARCHAR(10), @StartDateTime, 101) +
            ''' AND ''' +
            CONVERT(VARCHAR(10), @EndDateTime, 101) + ''' AND ' +
            'State IN (' + @StateList + ')'

    SET NOCOUNT OFF;

    EXECUTE (@DynamicSQL)
END

The following is an example call to this stored procedure:
EXEC DynamicFindOrders1 '1/1/2005', '1/4/2005', '''MO'', ''TX'''

This solution works, but some of the benefits of stored procedures are lost when dynamic SQL is introduced. Another version of using dynamic SQL is shown below:

CREATE PROCEDURE DynamicFindOrders2
    @StartDateTime DATETIME,
    @EndDateTime DATETIME,
    @StateList VARCHAR(3000)
AS
BEGIN
    SET NOCOUNT ON;

    DECLARE @DynamicSQL NVARCHAR(4000)

    SET @DynamicSQL = 'SELECT * ' +
        'FROM TestOrder ' +
        'WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime AND ' +
            'State IN (' + @StateList + ')'

    SET NOCOUNT OFF;

    EXECUTE sp_executesql @DynamicSQL,
        N'@StartDateTime DATETIME, @EndDateTime DATETIME',
        @StartDateTime, @EndDateTime
END

The following is an example call to this stored procedure:
EXEC DynamicFindOrders2 '1/1/2005', '1/4/2005', '''MO'', ''TX'''

Using a User Defined Function
Another way to dynamically restrict the results of a SELECT statement in a stored procedure is through the use of a user defined function. The following is an example function that returns True (1) if a value appears in a list and False (0) if the value does not appear in the list.

-- Purpose: Indicates if the input value is in the list of values.
-- Returns: True if the indicated value is in the list of values.
-- Arguments:
-- InputValue - Value to look for
-- ListOfValues - Delimeter list of values
-- Delimeter - Delimeter character, cannot be a space (default: ,)
-------------------------------------------------------------------
CREATE FUNCTION IsValueInList(
    @InputValue VARCHAR(7996),
    @ListOfValues VARCHAR(7996),
    @Delimeter VARCHAR(2) = ',')
RETURNS BIT AS
BEGIN
    RETURN CASE WHEN CHARINDEX(@Delimeter + @InputValue + @Delimeter,
            @Delimeter + @ListOfValues + @Delimeter) > 0 THEN 1
        ELSE 0
    END
END

This function uses a comma character (,) as its delimiter by default. The delimiter cannot appear as part of a value in the @ListOfValues argument. A space character cannot be used as a delimiter. The @ListOfValues argument cannot have any white space between items in the list. For example, “MO,TX,MN” would be an acceptable @ListOfValues, but “MO, TX, MN” would not work.

Using this function, the stored procedure used above can be rewritten as follows:

CREATE PROCEDURE FunctionFindOrders1
    @StartDateTime DATETIME,
    @EndDateTime DATETIME,
    @StateList VARCHAR(7996)
AS
BEGIN
    SELECT *
    FROM TestOrder
    WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime AND
        dbo.IsValueInList(State, @StateList, DEFAULT) = 1
END

The following is an example call to this stored procedure:
EXEC FunctionFindOrders1 '1/1/2005', '1/4/2005', 'MO,TX'

This approach eliminates the dynamic SQL in the stored procedure, but it also causes the stored procedure to use a less efficient query plan. Assuming there is an index on the OrderDateTime and State columns, the dynamic SQL solution will often make use of both indexes (i.e. use index seeks). However, in the case of the user defined function only the OrderDateTime index can be used. An index scan (SQL Server reads every row in the index) is performed on the State index to resolve the “dbo.IsValueInList(State, @StateList, DEFAULT) = 1” condition.

Converting List to Table
Another approach is to convert the list of values into an in-memory table and join against it to restrict the results of the SELECT statement. The following user defined function demonstrates how to convert a delimited string into an in-memory table:

-- Purpose: Converts an input string into a single value table.
-- Returns: Table with one VARCHAR(1000) column called Value.
-- Arguments:
-- InputString - String to convert
-- Delimeter - Delimeter character, cannot be a space (default: ,)
-- ExcludeEmpty - If true, do not include empty items (default: False)
-- TrimValues - If true, leading and trailing spaces are removed
-- (default: True)
-------------------------------------------------------------------
CREATE FUNCTION ConvertStringToTable(
    @InputString VARCHAR(8000),
    @Delimeter VARCHAR(2) = ',',
    @ExcludeEmpty BIT = 0,
    @TrimValues BIT = 1)
RETURNS @Values TABLE (Value VARCHAR(1000)) AS
BEGIN

DECLARE @DelimitedList VARCHAR(8000)
DECLARE @Continue BIT
DECLARE @StartPos INTEGER
DECLARE @TextLength INTEGER
DECLARE @DelimeterPosition INTEGER
DECLARE @TextValue VARCHAR(1000)
DECLARE @DelimeterLength INTEGER

IF @TrimValues = 1 BEGIN
    SET @DelimitedList = LTRIM(RTRIM(@InputString))
END ELSE BEGIN
    SET @DelimitedList = @InputString
END

IF LEN(@DelimitedList) = 0 BEGIN
    SET @Continue = 0
END
ELSE BEGIN
    IF LEN(@Delimeter) < 1 BEGIN
        -- Cannot use '', ' ', etc. for a delimeter
        SET @Delimeter = ','
    END
    SET @Continue = 1
    SET @StartPos = 1
    SET @TextLength = LEN(@DelimitedList)
    SET @DelimeterLength = LEN(@Delimeter)
END

WHILE @Continue = 1 BEGIN
    SET @DelimeterPosition = CHARINDEX(@Delimeter,
        SUBSTRING(@DelimitedList, @StartPos,
        @TextLength - @StartPos + @DelimeterLength))
    IF @DelimeterPosition > 0 BEGIN
        IF @TrimValues = 1 BEGIN
            SET @TextValue = RTRIM(LTRIM(SUBSTRING(@DelimitedList,
                @StartPos, (@DelimeterPosition - 1))))
        END ELSE BEGIN
            SET @TextValue = SUBSTRING(@DelimitedList, @StartPos,
                (@DelimeterPosition - 1))
        END
        SET @StartPos = @DelimeterPosition + @StartPos +
            (@DelimeterLength - 1)
    END
    ELSE BEGIN
        IF @TrimValues = 1 BEGIN
            SET @TextValue = RTRIM(LTRIM(SUBSTRING(@DelimitedList,
                @StartPos, (@TextLength - @StartPos) +
                @DelimeterLength)))
        END ELSE BEGIN
            SET @TextValue = SUBSTRING(@DelimitedList, @StartPos,
                (@TextLength - @StartPos) + @DelimeterLength)
        END
        SELECT @Continue = 0
    END
    IF @ExcludeEmpty = 0 OR @TextValue <> '' BEGIN
        INSERT INTO @Values VALUES(@TextValue)
    END
END -- WHILE @Continue = 1 BEGIN

RETURN
END

Unlike the IsValueInList function shown above, this function allows white space in-between values. The following is the example stored procedure rewritten to use the ConvertStringToTable function:

CREATE PROCEDURE FunctionFindOrders2
    @StartDateTime DATETIME,
    @EndDateTime DATETIME,
    @StateList VARCHAR(8000)
AS
BEGIN
    SELECT *
    FROM TestOrder
        INNER JOIN dbo.ConvertStringToTable(@StateList, DEFAULT, 1, 1)
            AS StateList ON StateList.Value = TestOrder.State
    WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime
END

The following is an example call to this stored procedure:
EXEC FunctionFindOrders2 '1/1/2005', '1/4/2005', 'MO,TX'

This approach eliminates the dynamic SQL in the stored procedure, and also performs much better than the IsValueInList function.

In a rather unscientific test, I found that the two dynamic SQL stored procedures perform the best. The ConvertStringToTable stored procedure performs almost as well as the dynamic SQL, but it spends a fair amount of CPU resources to convert input string into a table. The IsValueInList stored procedure performs horribly compared to the other three options.

Things to Think About

Dynamically Generate SQL in Stored Procedure
Pros
  • SQL Query Optimizer can make best use of available indexes
  • Works well with large amounts of data
Cons
  • May be susceptible to SQL Injection attack

Using the IsValueInList User Defined Function
Pros
  • Not susceptible to SQL Injection attack
  • Avoids use of dynamic SQL
Cons
  • SQL Server will have to perform index scans or table scans
  • Does not work well with large amounts of data

Using the ConvertStringToTable User Defined Function
Pros
  • Not susceptible to SQL Injection attack
  • Avoids use of dynamic SQL
Cons
  • Takes time to convert the string into an in-memory table

Other Techniques to Consider
  • Consider restricting the number of items that can be searched on to a fixed amount so that dynamic SQL will not have to be used.

Thursday, November 17, 2005

Using COM Objects in Multi-Threaded .NET Applications

This article discusses how .NET applications, including multi-threaded .NET applications, use the different kinds of COM objects. If you are writing a .NET application that uses COM objects, this article should help you avoid some problems related to COM objects and threading.

Background Information on .NET Threads
The following is a quote from the .NET Framework Class Library Help in Visual Studio 2003:

An apartment is a logical container within a process for objects sharing the same thread access requirements. All objects in the same apartment can receive calls from any thread in the apartment. The .NET Framework does not use apartments, and managed objects are responsible for using all shared resources in a thread-safe manner themselves.

Because COM classes use apartments, the common language runtime needs to create and initialize an apartment when calling a COM object in a COM interop situation. A managed thread can create and enter a single-threaded apartment (STA) that allows only one thread, or a multithreaded apartment (MTA) that contains one or more threads.

COM classes created with Visual Basic 6.0 have a ThreadingModel of Apartment which means they will be placed into a single-threaded apartment (STA) when they are created. COM classes created with Visual C++ 6.0 can have a ThreadingModel of Apartment, Free, or Both. COM classes with a ThreadingModel of Free will be placed into a multithreaded apartment (MTA) when they are created. COM classes with a ThreadingModel of Both will be placed into whatever apartment their creator is in. If their creator is not in an apartment, they will be placed into an MTA.

The Examples
All of the results shown below were created using a ComObjectThreading.exe application I wrote. This application will create the indicated number of instances of the selected COM object using one of three methods. If Object Location is set to Main Thread, the COM objects are created on the application’s main thread. If Object Location is set to Dedicated MTA Threads, the application creates the indicated number of MTA threads and each MTA thread creates a copy of the COM object. If Object Location is set to Dedicated STA Threads, the application creates the indicated number of STA threads and each STA thread creates a copy of the COM object.

There are four different COM classes you can create with this application:
  • CsgVbComInfo.ThreadInfo
  • CsgVcComInfo.ApartmentThreadInfo
  • CsgVcComInfo.BothThreadInfo
  • CsgVcComInfo.FreeThreadInfo
CsgVbComInfo.ThreadInfo (written in Visual Basic 6.0) and CsgVcComInfo.ApartmentThreadInfo (written in Visual C++ 6.0) are COM classes with a ThreadingModel of Apartment. CsgVcComInfo.BothThreadInfo is a Visual C++ 6.0 COM class with a ThreadingModel of Both. CsgVcComInfo.FreeThreadInfo is a Visual C++ 6.0 COM class with a ThreadingModel of Free.

The following examples demonstrate which thread a COM object is assigned to in different scenarios.

Example 1: Apartment Class - Main Thread
The following are the results of creating 5 instances of the CsgVbComInfo.ThreadInfo (Apartment) class with Object Location set to Main Thread:

GUI Thread ID: 3568
Object Location: Main Thread
Object Created: CsgVbComInfo.ThreadInfo

Instance   Application   
Thread ID
Object
Thread ID   
135683568
235683568
335683568
435683568
535683568

Since the main application thread in this application is an STA thread, the Apartment COM object can be executed on it. This is a good scenario because there are no thread context switches when going from the .NET application to the COM class.

Example 2: Free Class - Main Thread
The following are the results of creating 5 instances of the CsgVcComInfo.FreeThreadInfo (Free) class with Object Location set to Main Thread:

GUI Thread ID: 3568
Object Location: Main Thread
Object Created: CsgVcComInfo.FreeThreadInfo

Instance   Application   
Thread ID
Object
Thread ID   
135682848
235682848
335682848
435682848
535682848

Since the main application thread is an STA thread and COM objects with a ThreadingModel of Free must be executed on an MTA thread, the Free COM object has to be executed on a different thread. This is a less than ideal scenario because there is a thread context switch when going from the .NET application to the COM class. Depending on how often the COM object is used, this could impact performance.

Example 3: Both Class - Main Thread
The following are the results of creating 5 instances of the CsgVcComInfo.BothThreadInfo (Both) class with Object Location set to Main Thread:

GUI Thread ID: 3568
Object Location: Main Thread
Object Created: CsgVcComInfo.BothThreadInfo

Instance   Application   
Thread ID
Object
Thread ID   
135683568
235683568
335683568
435683568
535683568

Since COM objects with a ThreadingModel of Both can be executed on either an STA or an MTA thread, the COM object can be executed on the main thread. This is a good scenario because there are no thread context switches when going from the .NET application to the COM class.

Example 4: Apartment Class - Dedicated MTA Threads
The following are the results of creating 5 instances of the CsgVbComInfo.ThreadInfo (Apartment) class with Object Location set to Dedicated MTA Threads:

GUI Thread ID: 3568
Object Location: Dedicated MTA Threads
Object Created: CsgVbComInfo.ThreadInfo

Instance   Application   
Thread ID
Object
Thread ID   
438323692
114163692
34923692
29043692
523563692

Since the threads have their ApartmentState property set to MTA, the Apartment COM objects cannot be executed on them. The application creates and executes the Apartment COM objects on a special thread I am calling the Default STA thread. This is not the same thread as the main application thread (although the Apartment COM object could execute on that thread), but it is a dedicated thread that is created to handle Apartment COM objects that cannot be executed on their caller’s thread. There is only one Default STA thread for an application, so all requests for Apartment COM objects coming from MTA threads are processed by this one thread. If multiple MTA threads make requests of Apartment COM objects at the same time, all of those requests are queued and processed one at a time. Of all the scenarios presented in this article, this is the worst. Not only do you have a thread context switch when a COM object is used, but all of the COM objects reside on the same thread. Because of the thread context switching, this scenario is actually a little worse than having a single threaded application.

Example 5: Free Class - Dedicated MTA Threads
The following are the results of creating 5 instances of the CsgVcComInfo.FreeThreadInfo (Free) class with Object Location set to Dedicated MTA Threads:

GUI Thread ID: 3568
Object Location: Dedicated MTA Threads
Object Created: CsgVcComInfo.FreeThreadInfo

Instance   Application   
Thread ID
Object
Thread ID   
235203520
527242724
339763976
429962996
124962496

In this scenario each COM object is executed on the same thread as its caller. This is a very good scenario because there are no thread context switches when going from the .NET application to the COM class and multiple threads can execute at the same time.

Example 6: Both Class - Dedicated MTA Threads
The following are the results of creating 5 instances of the CsgVcComInfo.BothThreadInfo (Both) class with Object Location set to Dedicated MTA Threads:

GUI Thread ID: 3568
Object Location: Dedicated MTA Threads
Object Created: CsgVcComInfo.BothThreadInfo

Instance   Application   
Thread ID
Object
Thread ID   
125682568
232843284
322562256
519361936
427722772

Since a COM class with a ThreadingModel of Both can be executed on an MTA thread, the COM objects are executed on the same thread as their callers. This is a very good scenario because there are no thread context switches when going from the .NET application to the COM class and multiple threads can execute at the same time.

Example 7: Apartment Class - Dedicated STA Threads
The following are the results of creating 5 instances of the CsgVbComInfo.ThreadInfo (Apartment) class with Object Location set to Dedicated STA Threads:

GUI Thread ID: 3568
Object Location: Dedicated STA Threads
Object Created: CsgVbComInfo.ThreadInfo

Instance   Application   
Thread ID
Object
Thread ID   
240804080
1580580
420562056
310121012
520682068

In this scenario each COM object is executed on the same thread as its caller. This is a very good scenario because there are no thread context switches when going from the .NET application to the COM class and multiple threads can execute at the same time.

Example 8: Free Class - Dedicated STA Threads
The following are the results of creating 5 instances of the CsgVcComInfo.FreeThreadInfo (Free) class with Object Location set to Dedicated STA Threads:

GUI Thread ID: 3568
Object Location: Dedicated STA Threads
Object Created: CsgVcComInfo.FreeThreadInfo

Instance   Application   
Thread ID
Object
Thread ID   
331962608
537203780
12923780
233363780
421403780

Since each of the threads have their ApartmentState property set to STA, the Free COM objects cannot be executed on them. The application creates and executes the Free COM objects on an MTA thread. Since MTA COM objects can be executed on any MTA thread, when a caller accesses the COM object, it will be placed onto any available MTA thread. That is why there are two different Object Thread ID values in the results. As the number of dedicated STA threads is increased, there is more variation in the number of MTA threads used. In one scenario where 50 dedicated STA threads were created, 10 MTA threads were used, and in another scenario where 500 dedicated STA threads were created, 164 MTA threads were used. This is a less than ideal scenario because there is a thread context switch when going from the .NET application to the COM class. Depending on how often the COM object is used, this could impact performance.

Example 9: Both Class - Dedicated STA Threads
The following are the results of creating 5 instances of the CsgVcComInfo.FreeThreadInfo (Free) class with Object Location set to Dedicated STA Threads:

GUI Thread ID: 3568
Object Location: Dedicated STA Threads
Object Created: CsgVcComInfo.BothThreadInfo

Instance   Application   
Thread ID
Object
Thread ID   
280808080
180768076
480888088
380848084
580928092

Since a COM class with a ThreadingModel of Both can be executed on an STA thread, the COM objects are executed on the same thread as their callers. This is a very good scenario because there are no thread context switches when going from the .NET application to the COM class and multiple threads can execute at the same time.

Summary
The following table summaries the examples shown above. The left hand column indicates the ApartmentState of the caller’s thread. The top row indicates the ThreadingModel of the COM class being created. The contents of the table indicate which thread will process the COM object.

ApartmentState
of Thread
COM Class Threading Model
ApartmentFreeBoth
STASame STAMTASame STA
MTADefault STASame MTASame MTA

The ideal scenarios are the ones that are either Same STA or Same MTA. The worst scenario is the Default STA scenario.

Lessons Learned
  • Whenever possible, create your COM classes to have a ThreadingModel of Both. This will allow them to work optimally in both STA and MTA environments. Unfortunately this is not possible with Visual Basic 6.0.
  • Be sure to consider the ApartmentState of the thread and the ThreadingModel of the COM class whenever using COM classes in .NET applications.
  • Avoid creating COM classes with a ThreadingModel of Apartment from a thread with an ApartmentState of MTA. Be careful when using COM classes created in Visual Basic 6.0 because they will have a ThreadingModel of Apartment.

Wednesday, November 16, 2005

MTS vs. COM+ From VB 6.0’s Perspective

Back in the days of Windows NT 4.0 there was a product called Microsoft Transaction Service (MTS) that allowed ActiveX/COM objects to execute on a separate server (i.e. DCOM) and allowed them to create distributed transactions. ActiveX/COM classes could be created using Visual Basic 6.0 or Visual C++ 6.0. It didn’t really matter which language was used, because they both ran and performed about the same.

In Windows 2000 Microsoft replaced MTS with Component Services (COM+). They were functionally equivalent (COM+ had some new features, but was backward compatible) and at that time Microsoft had released no documentation indicating that they were any different. However, as people started upgrading their Windows NT 4.0 servers to Windows 2000, they started seeing performance problems with their middle-tier components (objects that were running in MTS, but now were running in COM+), especially components written in VB 6.0 (but some VC++ 6.0 object were also having trouble).

It turns out that Microsoft made some major changes when they moved from MTS to COM+, and it was these changes that caused many n-tier applications to perform badly under Windows 2000/COM+. The following article explains the changes that were made and what can be done to restore the performance seen in Windows NT 4.0.

Background Information on COM Objects
COM makes use of a concept called apartments. An apartment is like a container that objects are placed in when they are created. Once an object is created and put into an apartment (or container), it cannot be moved into another apartment (or container). There are three kinds of apartments (or containers) in COM+: Single-threaded apartments (STAs), Multithreaded apartments (MTAs), and Thread-neutral apartments (TNAs). An STA always contains one and only one thread. All objects that are created and placed into an STA share the same execution thread. An error will be raised if another thread tries to execute or use the object. An MTA can contain many threads. An object created and placed into an MTA can be executed or manipulated on any of the threads that are assigned to that MTA. A TNA does not have any threads at all. This article only discusses STAs and MTAs.

All COM classes have a threading model. The threading model for a class is stored in the registry. The following chart, which comes from Tim Ewald’s book Transactional COM+: Building Scalable Applications, describes which apartment, based on its threading model registry setting, a COM object is placed in when it is created:

ThreadingModel API ValueThreadingModel Registry SettingApartment
COMAdminThreadingModelApartmentApartmentSTA
COMAdminThreadingModelFreeFreeMTA
COMAdminThreadingModelMain{none}Main STA
COMAdminThreadingModelBothBoth Either STA or MTA
COMAdminThreadingModelNeutral Neutral TNA

This article will only discuss COM classes with a ThreadingModel of Apartment, Free, or Both. When a COM object with a ThreadingModel registry setting of Apartment is created, it will be placed into an STA. When a COM object with a ThreadingModel registry setting of Free is created, it will be placed into an MTA. When a COM object with a ThreadingModel registry setting of Both is created, it will be placed into whatever apartment its creator is in. If the object creating it is in an STA, it will be placed into that same STA. If its creator is in an MTA, it will be placed into that same MTA. If the creator is in another process or on another machine, the object will be placed into an MTA.

Visual Basic 6.0 can only create COM classes with a ThreadingModel of Apartment. This means that all Visual Basic 6.0 objects will be placed into an STA. Visual C++ 6.0 allows you to create classes with a ThreadingModel of Apartment, Free, or Both.

The Change
Each server application (as opposed to library applications) in MTS has an STA thread pool of 100 threads. When MTS creates an object on behalf of a client, it places the object into one of the 100 STAs it has at its disposal. Once the object is created, it cannot be moved into another STA or executed by any thread other than the thread that is associated with that STA. When all 100 STAs have an object in them, MTS starts placing newly created objects into the existing STAs. This process is called multiplexing. If two objects in the same STA want to execute at the same time, one of them will have to wait for the other object to finish executing before it can execute. The performance of your middle-tier can start to suffer when MTS starts multiplexing COM objects.

In MTS all COM objects run in an STA, even COM objects that are marked Free or Both in the registry. Therefore all classes, regardless of their ThreadingModel registry setting, perform about the same.

In COM+ all that changed. COM+ has two thread pools, an STA thread pool and an MTA thread pool. The MTA thread pool has no limit to the number of threads that can be created and used by COM+. The STA, however, is very limited. The following excerpt from Microsoft Knowledge Base Article 282490 (INFO: Thread Pool Differences Between COM+ and MTS) describes how the COM+ STA thread pool works:

  • The thread pool size is initialized to 7 plus the number of processors.

  • The thread pool size increases with the number of queued requests.

  • The maximum size that the thread pool can reach is 10 multiplied by the number of processors.

  • The thread pool decreases in size when the traffic is low.
For a typical dual processor COM+ server, the STA thread pool is initialized to 9 threads and increases in size to 20 threads. This situation is much worse for a single processor COM+ server. The STA thread pool for COM+ applications running on a single processor COM+ server can only increase in size to 10 threads.

If a middle-tier application written in Visual Basic 6.0 running in NT 4.0/MTS is upgraded to Windows 2000/COM+, the application goes from having 100 threads of execution down to only 10 threads (or 10 times the number of processors on the server). This can cause a huge impact on performance.

One of Microsoft’s recommendations on how to restore performance when upgrading to Windows 2000/COM+ is to write all COM classes in a way that allows them to run in an MTA. This is not too hard to do in Visual C++, but it is impossible to do in Visual Basic 6.0. Ironically enough, Microsoft admits in an article written by Michael McKeown titled Preserving Application Performance When Porting from MTS to COM+ that under MTS, classes with a ThreadingModel of Apartment was the way to go. In COM+, however, a ThreadingModel of Both (or Free) is preferred.

In my opinion Microsoft really hurt Visual Basic 6.0 when they made the decision to restrict the number of threads in the STA thread pool in COM+. It would have been good if Microsoft had modified COM+ to allow a user to specify the number of threads in the STA thread pool or had released a Service Pack to Visual Basic 6.0 allowing COM classes created with Visual Basic 6.0 to have a ThreadingModel of Both or Free in addition to Apartment.

Example
The following are the results of an application I wrote to demonstrate how many threads COM+ was using for a given COM+ Server Application and how it assigned objects to those threads. Contact me if you would like a copy of the source code for the dlls and executable I used to generate this data. I ran all of my testing using a dual processor COM+ server.

The first test demonstrates how COM+ distributes Visual Basic 6.0 objects to threads in the STA thread pool. I created 140 instances of the CsgVbInternals.ThreadInfo class in a COM+ Server Application and recorded which thread the object was created on. The Instance column indicates which instance of the object was created; the Thread ID column indicates the ID of the windows thread the object was assigned to; and the Thread Count column is my application’s ID number for that thread.

Instance   Thread ID   Thread Count
118361
21522
314323
415844
518085
615246
78447
814968
913529
1013529
1114968
128447
1315246
1418085
1515844
1614323
171522
1818361
1918361
201522
2114323
2215844
2318085
2415246
258447
2614968
2713529
2813529
2914968
308447
3115246
3218085
3315844
3414323
351522
3618361
3718361
381522
3914323
4015844
4118085
4215246
438447
4414968
4513529
46159610
47159610
48159610
49159610
50159610
51152011
52152011
53152011
54152011
55152011
56139212
  
95170819
96138420
97138420
98138420
99138420
100138420
101138420
102170819
103151218
104147217
105165616
106126415
107160014
108167613
109139212
110152011
111159610
11213529
11314968
1148447
11515246
11618085
11715844
11814323
1191522
12018361
12118361
1221522
12314323
12415844
12518085
12615246
1278447
12814968
12913529
130159610
131152011
132139212
133167613
134160014
135126415
136165616
137147217
138151218
139170819
140138420

COM+ creates 9 threads for the STA thread pool and starts to allocate the newly created COM objects to them. Once each thread has 5 objects associated with it, it starts to create additional threads. Each new thread gets 5 objects assigned to it. Once all threads have 5 objects associated with them, a new thread is created. This continues until 20 threads are created, each having 5 objects assigned to them. Then COM+ starts assigning each newly created object to one of the existing 20 thread. This continues until all threads have 6 objects. Once all threads have 6 objects, COM+ starts assigning a seventh object to each thread. This will continue indefinitely.

If you have an enterprise software solution that will have over 9 (in the case of a dual processor machine) COM objects created on a single COM+ server, you are guaranteed to have multiple objects on the same STA thread.

The results are the same if you create a COM class with a ThreadingModel of Apartment in Visual C++. If, however, you create a COM class with a ThreadingModel of Both or Free, you get a much different result. The following are the results of running a testing that created 140 instances of the CsgComPlusTestLib.FreeInfo class.

Instance   Thread ID   Thread Count
117161
217161
317161
417161
  
13817161
13917161
14017161

COM+ executed all 140 instances of the object on the same thread. Since objects with a ThreadingModel of Free can be executed on any thread in the MTA thread pool, COM+ only needs to create a new thread if all existing threads are busy. Since I only had one application creating the objects, there was never any reason for COM+ to create another thread.

The following are some of the results generated by running three instances of my test application that created CsgComPlusTestLib.FreeInfo objects.

Instance   Thread ID   Thread Count
117161
215762
315762
417161
517161
  
5817161
5915762
6017161
6117161
6217283
6317283
  

Since there are up to three objects executing at the same time, you see three different threads in the results.

Since objects in an MTA can be executed on any thread in that MTA’s thread pool, COM+ can make much better use of existing threads.

How to Restore Performance
Microsoft’s primary recommendation on how to solve this problem is to make all COM+ classes have a ThreadingModel of Free or Both. This means Visual Basic 6.0 cannot be used to create classes that will be used in COM+. If the middle-tier for a large, enterprise software solution was written in Visual Basic 6.0, odds are it is not going to be rewritten in another language that supports a ThreadingModel of Free or Both anytime soon.

As of Post Windows 2000 Service Pack 2 COM+ Rollup Hotfix 10 (see Knowledge Base Article 294510) Microsoft added a registry setting (HKEY_LOCAL_MACHINE\Software\Microsoft\COM3\STAThreadPool Value name: EmulateMTSBehavior Data type: REG_DWORD) that allows COM+ to use the threading model MTS used. This allows COM+ to create 100 threads in its STA thread pool. Knowledge Base Article 303071 (Registry key for tuning COM+ thread and activity) discusses how to use this registry setting.

The following are the results of running my test application creating the CsgVbInternals.ThreadInfo class on a COM+ server that has had this registry setting changed.

Instance   Thread ID   Thread Count
119041
219442
317523
  
97224897
98225298
99225699
1002260100
1012260100
102225699
103225298
104224897
  
19817523
19919442
20019041
20119041

This registry setting allows Visual Basic 6.0 COM objects to perform the same way in COM+ as they did in MTS.

References
Transactional COM+: Building Scalable Applications
by Tim Ewald
http://www.amazon.com/exec/obidos/
tg/detail/-/0201615940/qid%3D1131988786


Preserving Application Performance When Porting from MTS to COM+
by Michael McKeown
http://msdn.microsoft.com/library/en-us/dncomser/
html/portingwinntapps_mtstocom.asp
(this link is broken, but the article can be found at http://web.archive.org/web/20060911204243/http://msdn.microsoft.com/library/en-us/dncomser/html/portingwinntapps_mtstocom.asp)

INFO: Thread Pool Differences Between COM+ and MTS
Knowledge Base Article: 282490
https://support.microsoft.com/en-us/kb/282490

Registry key for tuning COM+ thread and activity
Knowledge Base Article: 303071
https://support.microsoft.com/en-us/kb/303071

INFO: Post Windows 2000 Service Pack 2 COM+ Rollup Hotfix 10 Is Available
Knowledge Base Article: 294510
http://support.microsoft.com/kb/294510/

Thursday, November 10, 2005

COALESCE vs. ISNULL

I found a really good article that describes the differences between COALESCE and ISNULL on SQL Server. You can read this article at http://www.aspfaq.com/show.asp?id=2532. The following is a summary of the article.

COALESCE

ISNULL

ISO SQL Standard Command

Yes

No

Number of Arguments

Unlimited

2

Return value datatype

Based on all arguments

Datatype of first argument

Computed Column Nullability

Nullable

Not Nullable



A lot of situations only require two arguments, so you will have to decide which function is best for your situation. The return value’s datatype is often the deciding factor. For example:

DECLARE @SmallString NVARCHAR(3)
SET @SmallString = NULL

SELECT ISNULL(@SmallString, N'1234567') AS [ISNULL]

SELECT COALESCE(@SmallString, N'1234567') AS [COALESCE]

Results:

ISNULL
------
123

COALESCE
--------
1234567

Another factor to consider is ISNULL will perform better in certain situations. The article listed above contains more details on this.

Modifying Calculated Fields in ADO

The Problem
One of the limitations of ADO Recordsets is that you cannot edit or modify calculated or constant fields. Microsoft has a knowledgebase article (241818-PRB: Calculated Field Contents Cannot be Modified by ADO) that describes this issue at http://support.microsoft.com/default.aspx?scid=kb;en-us;241818 https://support.microsoft.com/en-us/kb/241818. The article suggests using the Shape command to get around this issue. That workaround is OK for some situations, but there are other situations where you may want to avoid using the Shape command. For those situations you can try the following technique.

The Workaround
Instead of using constants or calculated fields in your SELECT statement, create a temporary or in-memory table (or tables) and select the constants or calculated fields from there. The following is a sample stored procedure that returns a constant (Status) and a calculated field (StateCounty):

CREATE PROCEDURE StatusReadOnly
AS
BEGIN

    SELECT TestID, 'Ready' AS Status,
        State + ', ' + County AS StateCounty
    FROM Test

END

The following script can be used to create the Test table used in the stored procedure above:

CREATE TABLE [dbo].[Test](
    [TestID] [int] IDENTITY(1,1) NOT NULL,
    [State] [nchar](2) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    [County] [nvarchar](50) COLLATE SQL_Latin1_General_CP1_CI_AS NULL,
    CONSTRAINT [PK_Test] PRIMARY KEY CLUSTERED
(
    [TestID] ASC
) ON [PRIMARY]
) ON [PRIMARY]

If you run the stored procedure and place the results in an ADO Recordset, you will not be able to edit the Status or the StateCounty fields.

The following stored procedure uses in-memory tables to prevent the Status and StateCounty fields from being read-only:

CREATE PROCEDURE StatusReadWrite
AS
BEGIN
    -- SET NOCOUNT ON added to prevent extra result sets from
    -- interfering with SELECT statements.
    SET NOCOUNT ON;

    -- Build in-memory tables
    DECLARE @Constants TABLE (
        Status NVARCHAR(15) NOT NULL)
    DECLARE @CalculatedValues TABLE (
        TestID INT NOT NULL,
        StateCounty NVARCHAR(54))

    -- Populate table with desired editable constants
    INSERT INTO @Constants VALUES (N'Ready')

    -- Populate table with desired editable calculated values
    INSERT INTO @CalculatedValues (TestID, StateCounty)
    SELECT TestID, State + N', ' + County
    FROM TEST

    SET NOCOUNT OFF;

    -- Return data to user
    SELECT Test.TestID, Constants.Status AS Status,
        CalculatedValues.StateCounty AS StateCounty
    FROM Test
        INNER JOIN @CalculatedValues AS CalculatedValues
            ON CalculatedValues.TestID = Test.TestID
        CROSS JOIN @Constants AS Constants

END

If you run this stored procedure and place its results in an ADO Recordset, you will be able to edit both the Status and the StateCounty fields. This stored procedure demonstrates using an in-memory table for constant values and it also demonstrates using an in-memory table for calculated fields. The example above makes two passes through the Test table to obtain the desired results, but it could be rewritten to only make one pass through the Test table by copying all fields, not just the calculated fields, to the in-memory table.

I have written a very simple VB 6 application that you can use to test these two stored procedures. You can download the application and its source files at http://private.csgsolutions.com/heironimus/
codesamples/StoredProcTester.zip
(link is no longer valid - download is no longer available). The application prompts you for a database connection string and the name of the stored procedure you want to run. When you press the "Get Recordset" button, it executes the stored procedure and places the results in a data grid. You can use the data grid to test whether or not you can edit a particular field.

Pros

  • You do not have to use the Shape command.

Cons

  • If you are returning constants, you have to create and populate a constants in-memory or temporary table. This could negatively affect performance.

  • If you are returning calculated fields, you may have to make two passes through your data to get the desired results. This could adversely affect performance.

  • Your queries may become more complex.

Monday, November 07, 2005

Trailing Spaces in SQL Server 2000

String values in SQL Server are rather unique, and I recently discovered some quirks of strings in SQL Server I though I would share. Some of these quirks are documented in SQL Server Books Online and some are not.

Trailing Spaces and the LEN Function
Having programmed in C/C++, VB, C#, and other programming languages, I am very familiar with functions that return the length of strings. In all of the cases that I know of a “X” (a string with a single X character) string is 1 character long and a “X ” (an X character with a trailing space) string is two characters long. This does not appear to be the case in SQL Server. The following is the results of a script I wrote to demonstrate this.

Line Expression        Result    Unicode Exp.      Unicode Result
---- ----------------- --------- ----------------- --------------
 1   LEN('X')          1         LEN(N'X')         1
 2   LEN('X ')         1         LEN(N'X ')        1
 3   LEN('X     ')     1         LEN(N'X    ')     1
 4   LEN(' X ')        2         LEN(N' X ')       2

As you can see, the LEN function ignores trailing spaces. This is actually a documented feature of the function. SQL Server Books Online states the following:

LEN
Returns the number of characters, rather than the number of bytes, of the given string expression, excluding trailing blanks.

If you do want to know the length of a string including trailing spaces, you have a few options. One is to use the DATALENGTH function. SQL Server Books Online defines DATALENGTH as follows:

DATALENGTH
Returns the number of bytes used to represent any expression.

Running the same set of strings through the DATALENGTH function returns the following results:

Line Expression           Result  Unicode Exp.          Unicode R
---- -------------------- ------- --------------------- ---------
 5   DATALENGTH('X')      1       DATALENGTH(N'X')      2
 6   DATALENGTH('X ')     2       DATALENGTH(N'X ')     4
 7   DATALENGTH('X     ') 6       DATALENGTH(N'X     ') 12
 8   DATALENGTH(' X ')    3       DATALENGTH(N' X ')    6

These results are more in line with what you would expect, except you need to be aware of what kind of strings you are dealing with. If you use the DATALENGTH function on a Unicode string the result is actually twice the length of the string. You will need to divide the result by two to get the string length.

Another option would be to create a user defined function that appended a single known character to the end of the input string, perform the LEN function and subtracted one from the result, but I do not see how that is any better than just using the DATALENGTH function.

Comparing Strings with Trailing Spaces
It would appear that trailing spaces have no significance to other SQL Server 2000 operations as well. For example, if I were to compare the string “X” (a string with a single X character) with the string “X ” (an X character with a trailing space), I would expect them to be unequal. However, in SQL Server 2000 these strings are considered equal.

This makes since if the strings in question are CHAR or NCHAR strings, because CHAR and NCHAR strings are always padded with extra spaces to the length of the string. If you have an “X” (a string with a single X character) string and a “X ” (an X character with a trailing space) stored in a CHAR(3) variable or column, the actual value of both strings is “X ” (an X character with two trailing spaces).

If the strings in question are VARCHAR or NVARCHAR strings and ANSI_PADDING is ON, no padding or trimming of string values should take place. If you store an “X” (a string with a single X character) string into a VARCHAR(3) variable or column, an “X” (a string with a single X character) string is exactly what is stored. Likewise, if you store an “X ” (an X character with a trailing space) string into a VARCHAR(3) variable or column, a “X ” (an X character with a trailing space) string is exactly what is stored. Therefore if you compare “X” (a string with a single X character) and “X ” (an X character with a trailing space), one would expect them to be different. That, however, is not the case with SQL Server 2000. Below is a query and the query results that demonstrates this oddity:

DECLARE @NVarChar1 NVARCHAR(10)
DECLARE @NVarChar2 NVARCHAR(10)
 
SET @NVarChar1 = N'X'
SET @NVarChar2 = N'X '
 
SELECT NCHAR(39) + @NVarChar1 + NCHAR(39) AS [@NVarChar1],
      NCHAR(39) + @NVarChar2 + NCHAR(39) AS [@NVarChar2],
      CASE WHEN @NVarChar1 = @NVarChar2 
      THEN N'True' ELSE N'False' END AS [@NVarChar1 = @NVarChar2]

Results:
@NVarChar1   @NVarChar2   @NVarChar1 = @NVarChar2
------------ ------------ -----------------------
'X'          'X '         True

If trailing spaces are significant in your query, one way around this would be to add the same character to the end of both strings when you do the comparison. This will not change the results of strings that are equal, but strings with unequal numbers of trailing spaces would no longer be equal.

SELECT NCHAR(39) + @NVarChar1 + NCHAR(39) AS [@NVarChar1],
      NCHAR(39) + @NVarChar2 + NCHAR(39) AS [@NVarChar2],
      CASE WHEN @NVarChar1 + NCHAR(1) = @NVarChar2 + NCHAR(1)
      THEN N'True' ELSE N'False' END AS [@NVarChar1 = @NVarChar2]
 
Results:
@NVarChar1   @NVarChar2   @NVarChar1 = @NVarChar2
------------ ------------ -----------------------
'X'          'X '         False

Some Lessons Learned
Here are some things to remember about SQL Server 2000:

   LEN('X') = 1
   LEN('X ') = 1
   DATALENGTH('X') = 1
   DATALENGTH('X ') = 2
   'X' = 'X ' returns True

Friday, November 04, 2005

Variable Processing in SQL Server

or Why Won’t SQL Server Use My Index?

I ran into an interesting situation recently when I was writing a new stored procedure. I had written a very simple query that I was confident SQL Server 2000 would use an index when it executed, but SQL Server would not use the index (unless I explicitly told it to use it). I did some research and discovered why SQL Server would not use the index. The results of my research are given below.

The Problem
I have a typical order table with an OrderID column, a ProductOrderID primary key column, a bunch of other columns, and an OrderDateTime column. The clustered index for the table is on the OrderID column and there is an index on the OrderDateTime column. I needed to write a stored procedure that returned information about orders that occurred between two datetime values. I wrote a query in SQL Query Analyzer similar to the one shown below to return the information I needed:

-- Examples
DECLARE @StartDateTime DATETIME
DECLARE @EndDateTime DATETIME

SET @StartDateTime = '11/1/2005'
SET @EndDateTime = '11/2/2005'

-- Attempt 1
SELECT ProductOrderID, TotalPrice
FROM ProductOrder
WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime


When I ran the query in SQL Query Analyzer the performance was horrible, so I took a look at the query plan and saw that it was doing a Clustered Index Scan (which is basically a table scan) instead of doing an Index Seek on the OrderDateTime index. This baffled me so I though I would change the query to use >= and <= operators instead of the BETWEEN operator.

-- Attempt 2
SELECT ProductOrderID, TotalPrice
FROM ProductOrder
WHERE OrderDateTime >= @StartDateTime AND OrderDateTime <= @EndDateTime


This query behaved the exact same way. If I added a query hint to the query (as shown below) I could get it to use the index, but I really did not want to do that. (I tend to believe that SQL Server can optimize a query better than I can.)

-- Attempt 1 With Hint
SELECT ProductOrderID, TotalPrice
FROM ProductOrder WITH (INDEX(ix_ProductOrder_OrderDateTime))
WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime


This new stored procedure was actually replacing some dynamically generated SQL, so I though I would run the query with the values explicitly specified (Dynamic SQL Method) instead of using variables (Stored Procedure Method).

-- Dynamic SQL
SELECT ProductOrderID, TotalPrice
FROM ProductOrder
WHERE OrderDateTime BETWEEN '11/1/2005' AND '11/2/2005'


Just as I would expect, this version used the OrderDateTime index. At this point I was very confused and was beginning to doubt the conventional wisdom that stored procedures are better than dynamic SQL (but that is another topic all together).

A coworker and I did some digging and found the following useful newsgroup threads and articles:

Optimizing Variables and Parameters
By Kalen Delaney
http://www.windowsitpro.com/SQLServer/Article/ArticleID/42801/42801.html
http://sqlmag.com/t-sql/optimizing-variables-and-parameters

Inside Search Arguments
By Kalen Delaney
http://www.windowsitpro.com/Articles/Index.cfm?ArticleID=42349
http://sqlmag.com/t-sql/inside-search-arguments

Dates and Indexes in Query
microsoft.public.sqlserver.programming
http://groups.google.com/group/microsoft.public.sqlserver.programming
/browse_thread/thread/a27a1d7930f42ad9/ac2b0b3a280427f2


I would recommend reading any and all of the resources listed above, but the Optimizing Variables and Parameters article said it best:


Although you have a valid SARG in the SELECT statement, when the optimizer evaluates the query, it hasn't yet executed the SET statement that assigns the specific value to the variable. The optimizer optimizes the entire batch at once. You can't put the variable assignment in a separate batch because the scope of a local variable is one batch. Because the optimizer doesn't know what value to look for in the index-statistics histogram, it has to guess.


Basically SQL Server doesn’t know what the value of the variable is going to be (even though I had just set it in the line above), so it guesses what it might be and goes from there. The article goes on to explain exactly how SQL Server guesses, but in most cases (including mine) it guesses wrong.

The Solution
So what is the solution to this problem? Put the query in a stored procedure which, ironically enough, is where it was destined for anyway.

When I put the query into the following stored procedure and ran it, it used the index just like it was supposed to.

CREATE PROCEDURE GetProductOrderInfoForRange(

    @StartDateTime DATETIME,
    @EndDateTime DATETIME) AS
BEGIN
    SELECT ProductOrderID, TotalPrice
    FROM ProductOrder
    WHERE OrderDateTime BETWEEN @StartDateTime AND @EndDateTime
END

-- Stored Procedure Method
EXEC GetProductOrderInfoForRange
    @StartDateTime = '11/1/2005',
    @EndDateTime = '11/2/2005'


Some Lessons Learned
Unlike some compiled languages that used optimizers (like C/C++, .NET, etc.) using a literal constant is different than using a variable that is set to a constant value. Do not expect the optimizer to know what the value of a variable will be or that a variable will not change.

If you are testing or analyzing the performance of a query, it is best to test it in the context that it will actually be running in. In other words, if a query is destined for a stored procedure, go ahead and test it in a stored procedure. (Of course if I had done that in the first place, I would not have learned about SARGs and how SQL Servers query optimizer works.)