Monday, July 8, 2013

conditional update stats

Almost everyone working on Sybase for a fair amount of time knows the importance of running update statistics on tables and indexes. Just to reiterate, ASE's cost based optimizer uses tables, indexes and columns named in a query to estimate query costs. It chooses the access method that the optimizer determines has the least cost. But this cost estimate cannot be accurate if statistics are not accurate. Hence, running update statistics is extremely important for tables involving frequent data modifications. 

Let me come to the problem statement now. One of our stored procedure was running terribly slow.  Architecture of our application was designed in such a way that, definition of temporary tables (# tables) and their indexes (if any) is stored in a script and population/updation of the # table is done inside the stored procedure.  A wrapper script first calls this # table creation script followed by the execution of the stored proc. 

This wrapper script, controlled by a scheduler, runs throughout the day.  As soon as data is available at source, it populates the target, using the wrapper script mentioned above. 

After running set showplan on and set fmtonly on tools on the culprit stored proc, it came to the notice that index wasn't getting used while updating the # table. The reason was obvious.  Index on # table was getting created inside a different script, followed by population of # table inside a stored proc.  Ideally, index should be created after data is inserted into the table because of two reasons: 1] Insertion will be faster without the index  2] After insertion, when the index is created on the temp table it will have up-to-date statistics. 

Unfortunately, due to the architectural design I couldn't create index after insertion of the data.  Hence, I thought of running update statistics inside the stored procedure immediately after insertion of the data. 

Well, as expected this trick did the wonders with a little problem.  For large number of records, update stats worked well.  Subsequent data modification operation used index instead of going for a table scan and reduced the completion time of the proc from hours to just few minutes. 

Problem occurred when number of records were less (say 100).  Update stats followed by data modification took more time than data modification operation without update stats.  Reason is - though we are using update stats after data population, optimizer will still go for table scan instead of index as for such a small number it found table scan cheaper (Remember, it's a cost based optimizer). 

So, let's say - update stats followed by data modification for a batch of 100 records took 6 seconds while data modification without update stats took 3 seconds. For most of the time, number of records to be updated consists of a typical batch of 100 and few times a large batch of million records use to come.

Update stats is not gonna work for a typical batch of 100 which could do wonders for a large batch of millions.  So, it turned out to be having only 2 scenarios: either a typical batch of 100 or a large batch of millions (meaning greater than 100).  So, we did this:

After insertion of data into # table, check the number of rows inserted by fetching the value of global variable @@rowcount like this:

select @cnt = @@rowcount
 
if @cnt > 100
begin
     update statistics #Table
end

update #Table
set .......
where .......
.......


Now, the proc started behaving exactly as we wanted it to be.  For a small batch it simply ignored updated stats and performed data modification operation (by performing table scan with expected throughput).  For a large batch, update stats performed prior to data modification (with the use of desired index for column defined in the where clause). 

Well, you must have realized by now if you can anticipate the number of rows to be updated then you can follow above approach.  Otherwise, you need to define a threshold if there is a significant variation in the number of records being updated each time.  

Friday, July 5, 2013

difference between @@pagesize and @@maxpagesize

My friend working on Java asked me - whether I can create a column of size varchar(8000).  So I asked him to check the page size of the ASE Adaptive server (or simply Sybase server). He said its 2048. So, I confidently told him - NO, you cannot create it since page size is 2k and you are trying to create a column more than allowed size.  But, to my surprise, he said there is already a table in the database having column as varchar(8000). 

So, I started searching in Sybase manuals and other informative Sybase blogs and found out this - On version 12.5 and onwards, @@pagesize always returns the number - 2048. Kind of default page size.  You can find actual page size of the server by checking the value of global variable @@maxpagesize.

Back to  my friend's problem. So I asked him to tell me the value of global variable @@maxpagesize and he said its 8192. So, clearly its a 8k page server.  He can still create a table having column size more than 8192 bytes or total number of bytes of all columns exceeding 8192 limit.  But while creating such table, it will throw a warning.   While populating such table, if value(s) exceeds 8192 limit, Adaptive server will throw an error and results in failure of insert/update command. 

Morale of the story is:  always check value of the global variable @@maxpagesize instead of relying on @@pagesize global variable.

P.S.: If your database design wants you to have a column to store value up to approx. 2047 bytes per row/record, you might want to think of TEXT datatype.  For instance: you need to have a column in your table to store user comments or  something similar which would occupy up to 2k size think of declaring it as TEXT datatype.  Adaptive server allocates entire 2k page to store value for text datatype column. Always declare this column as null.  Otherwise, even if you don't put any value while inserting/updating the TEXT column,Adaptive server will assign a text pointer and allocate entire 2k page to store the value (meaning sheer wastage of the space).

Also, consider following limitations of TEXT datatype before going for it:

  • A stored procedure can neither have a parameter as text datatype nor it can be declared as a local variable
  • It cannot be indexed.
  • Cannot be included in order by, group by, compute or union clauses
  • Cannot be included in subqueries or joins
  • Cannot be used in where clause. Exception is: you can use like operator on this column
So, take a decision after considering above limitations.

Sunday, April 21, 2013

Welcome to sybase world

This is my first post here.  I would like to share my sybase experiences with you all.  I started working on sybase RDBMS (right from version 11.x where, for example, i need to drop the triggers for disabling it) from 2005 and till date worked on sybase version 12.5.4 (where I can simply disable the triggers without dropping it).  I am yet to work on sybase 15.x versions.

In this blog, I will share whatever I understood about sybase.  I worked on Db2 for some time, but frankly speaking I am not expert in Db2. 

So, please feel free to post your comments on my posts and your suggestions would help me get this blog in a better position.

Thanks,
Hrushi