Almost everyone working on Sybase for a fair amount of time knows the importance of running update statistics on tables and indexes. Just to reiterate, ASE's cost based optimizer uses tables, indexes and columns named in a query to estimate query costs. It chooses the access method that the optimizer
determines has the least cost. But this cost estimate cannot be accurate if
statistics are not accurate. Hence, running update statistics is extremely important for tables involving frequent data modifications.
Let me come to the problem statement now. One of our stored procedure was running terribly slow. Architecture of our application was designed in such a way that, definition of temporary tables (# tables) and their indexes (if any) is stored in a script and population/updation of the # table is done inside the stored procedure. A wrapper script first calls this # table creation script followed by the execution of the stored proc.
This wrapper script, controlled by a scheduler, runs throughout the day. As soon as data is available at source, it populates the target, using the wrapper script mentioned above.
After running set showplan on and set fmtonly on tools on the culprit stored proc, it came to the notice that index wasn't getting used while updating the # table. The reason was obvious. Index on # table was getting created inside a different script, followed by population of # table inside a stored proc. Ideally, index should be created after data is inserted into the table because of two reasons: 1] Insertion will be faster without the index 2] After insertion, when the index is created on the temp table it will have up-to-date statistics.
Unfortunately, due to the architectural design I couldn't create index after insertion of the data. Hence, I thought of running update statistics inside the stored procedure immediately after insertion of the data.
Well, as expected this trick did the wonders with a little problem. For large number of records, update stats worked well. Subsequent data modification operation used index instead of going for a table scan and reduced the completion time of the proc from hours to just few minutes.
Problem occurred when number of records were less (say 100). Update stats followed by data modification took more time than data modification operation without update stats. Reason is - though we are using update stats after data population, optimizer will still go for table scan instead of index as for such a small number it found table scan cheaper (Remember, it's a cost based optimizer).
So, let's say - update stats followed by data modification for a batch of 100 records took 6 seconds while data modification without update stats took 3 seconds. For most of the time, number of records to be updated consists of a typical batch of 100 and few times a large batch of million records use to come.
Update stats is not gonna work for a typical batch of 100 which could do wonders for a large batch of millions. So, it turned out to be having only 2 scenarios: either a typical batch of 100 or a large batch of millions (meaning greater than 100). So, we did this:
After insertion of data into # table, check the number of rows inserted by fetching the value of global variable @@rowcount like this:
select @cnt = @@rowcount
if @cnt > 100
begin
update statistics #Table
end
update #Table
set .......
where .......
.......
Now, the proc started behaving exactly as we wanted it to be. For a small batch it simply ignored updated stats and performed data modification operation (by performing table scan with expected throughput). For a large batch, update stats performed prior to data modification (with the use of desired index for column defined in the where clause).
Well, you must have realized by now if you can anticipate the number of rows to be updated then you can follow above approach. Otherwise, you need to define a threshold if there is a significant variation in the number of records being updated each time.
Let me come to the problem statement now. One of our stored procedure was running terribly slow. Architecture of our application was designed in such a way that, definition of temporary tables (# tables) and their indexes (if any) is stored in a script and population/updation of the # table is done inside the stored procedure. A wrapper script first calls this # table creation script followed by the execution of the stored proc.
This wrapper script, controlled by a scheduler, runs throughout the day. As soon as data is available at source, it populates the target, using the wrapper script mentioned above.
After running set showplan on and set fmtonly on tools on the culprit stored proc, it came to the notice that index wasn't getting used while updating the # table. The reason was obvious. Index on # table was getting created inside a different script, followed by population of # table inside a stored proc. Ideally, index should be created after data is inserted into the table because of two reasons: 1] Insertion will be faster without the index 2] After insertion, when the index is created on the temp table it will have up-to-date statistics.
Unfortunately, due to the architectural design I couldn't create index after insertion of the data. Hence, I thought of running update statistics inside the stored procedure immediately after insertion of the data.
Well, as expected this trick did the wonders with a little problem. For large number of records, update stats worked well. Subsequent data modification operation used index instead of going for a table scan and reduced the completion time of the proc from hours to just few minutes.
Problem occurred when number of records were less (say 100). Update stats followed by data modification took more time than data modification operation without update stats. Reason is - though we are using update stats after data population, optimizer will still go for table scan instead of index as for such a small number it found table scan cheaper (Remember, it's a cost based optimizer).
So, let's say - update stats followed by data modification for a batch of 100 records took 6 seconds while data modification without update stats took 3 seconds. For most of the time, number of records to be updated consists of a typical batch of 100 and few times a large batch of million records use to come.
Update stats is not gonna work for a typical batch of 100 which could do wonders for a large batch of millions. So, it turned out to be having only 2 scenarios: either a typical batch of 100 or a large batch of millions (meaning greater than 100). So, we did this:
After insertion of data into # table, check the number of rows inserted by fetching the value of global variable @@rowcount like this:
select @cnt = @@rowcount
if @cnt > 100
begin
update statistics #Table
end
update #Table
set .......
where .......
.......
Now, the proc started behaving exactly as we wanted it to be. For a small batch it simply ignored updated stats and performed data modification operation (by performing table scan with expected throughput). For a large batch, update stats performed prior to data modification (with the use of desired index for column defined in the where clause).
Well, you must have realized by now if you can anticipate the number of rows to be updated then you can follow above approach. Otherwise, you need to define a threshold if there is a significant variation in the number of records being updated each time.
No comments:
Post a Comment