ridge.uoregon.edu!enews.sgi.com!news.corp.sgi.com!mew.corp.sgi.com!pablo
Subject: Sybase FAQ: 7/16 - section 6
Date: 1 Sep 1997 06:02:47 GMT
Summary: Info about SQL Server, bcp, isql and other goodies
Posting-Frequency: monthly

Archive-name: databases/sybase-faq/part7
URL: http://reality.sgi.com/pablo/Sybase_FAQ

                 Q6.1: ALTERNATIVE TO ROW AT A TIME PROCESSING
                                       
   
     _________________________________________________________________
   
   Someone asked how they could speed up their processing. They were
   batch updating/inserting gobs of information. Their algorithm was
   something as follows:
   
     ... In another case I do:

If exists (select record) then
        update record
else
        insert record

     I'm not sure which wa[y] is faster or if it makes a difference. I am
     doing this for as many as 4000 records at a time (calling a stored
     procedure 4000 times!). I am interesting in knowing any way to
     improve this. The parameter translation alone on the procedure calls
     takes 40 seconds for 4000 records. I am using _exec_ in DB-Lib.
     
     Would RPC or CT-Lib be better/faster?
     
   A netter responded stating that it was faster to ditch their algorithm
   and to apply a set based strategy:
   
     The way to take your approach is to convert the row at a time
     processing (which is more traditional type of thinking) into a batch
     at a time (which is more relational type of thinking). Now I'm not
     trying to insult you to say that you suck or anything like that, we
     just need to dial you in to think in relational terms.
     
     The idea is to do batches (or bundles) of rows rather than
     processing a single one at a time.
     
     So let's take your example (since you didn't give exact values
     [probably out of kindness to save my eyeballs] I'll use your generic
     example to extend what I'm talking about):
     
     Before:

        if exists (select record) then
           update record
        else
           insert record

     
     
     New way:
    1. Load _all_ your rows into a table named _new_stuff_ in a separate
       work database (call it _work_db_) and load it using _bcp_ -- no
       third GL needed.
         1. truncate _new_stuff_ and drop all indexes
         2. sort your data using UNIX sort and sort it by the clustered
            columns
         3. load it using _bcp_
         4. create clustered index using _with sorted_data_ and any
            ancillary non-clustered index.
    2. Assuming that your target table is called _old_stuff_
    3. Do the _update_ in a single batch:

   begin tran

     /* delete any rows in old_stuff which would normally
     ** would have been updated... we'll insert 'em instead!
     ** Essentially, treat the update as a delete/insert.
     */

     delete old_stuff
       from old_stuff,
            new_stuff
      where old_stuff.key = new_stuff.key

    /* insert entire new table:  this adds any rows
    ** that would have been updated before and
    ** inserts the new rows
    */
     insert old_stuff
        select * from new_stuff

   commit tran


     You can do all this _without_ writing 3-GL, using _bcp_ and a shell
     script.
     
     A word of caution: _Since these inserts/updates are batched
     orientated you may blow your log if you attempt to do too many at a
     time. In order to avoid this use the set rowcount_ directive to
     create _bite-size_ chunks.
     
   
     _________________________________________________________________

                Q6.2: WHEN SHOULD I EXECUTE AN _SP_RECOMPILE?_
                                       
   
     _________________________________________________________________
   
   An _sp_recompile_ should be issued any time a new index is added or an
   update statistics. Dropping an index will cause an automatic recompile
   of all objects that are dependent on the table.
   
   The _sp_recompile_ command simply increments the _schemacnt_ counter
   for the given table. All dependent object counter's are checked
   against this counter and if they are different the SQL Server
   recompiles the object.
     _________________________________________________________________

                 Q6.3: WHAT ARE THE DIFFERENT TYPES OF LOCKS?
                                       
   
     _________________________________________________________________
   
   First of, just to get it out of the way, there is no method to perform
   row level locking. If you think you need row level locking, you
   probably aren't thinking set based -- see Q6.1 for set processing.
   
   The SQL Server uses locking in order to ensure that sanity of your
   queries. Without locking there is no way to ensure the integrity of
   your operation. Imagine a transaction that debited one account and
   credited another. If the transaction didn't lock out readers/writers
   then someone can potentially see erroneous data.
   
   Essentially, the SQL Server attempts to use the least intrusive lock
   possible, page lock, to satisfy a request. If it reaches around 200
   page locks, then it escalates the lock to a table lock and releases
   all page locks thus performing the task more efficiently.
   
   There are three types of locks:
     * page locks
     * table locks
     * demand locks
       
Page Locks

   There are three types of page locks:
     * shared
     * exclusive
     * update
       
  shared
  
   These locks are requested and used by readers of information. More
   than one connection can hold a shared lock on a data page.
   
   This allows for multiple readers.
   
  exclusive
  
   The SQL Server uses exclusive locks when data is to be modified. Only
   _one_ connection may have an exclusive lock on a given data page. If a
   table is large enough and the data is spread sufficiently, more than
   one connection may update different data pages of a given table
   simultaneously.
   
  update
  
   A update lock is placed during a _delete_ or an _update_ while the SQL
   Server is hunting for the pages to be altered. While an update lock is
   in place, there can be shared locks thus allowing for higher
   throughput.
   
   The update lock(s) are promoted to exclusive locks once the SQL Server
   is ready to perform the _delete/update_.
   
Table Locks

   There are three types of table locks:
     * intent
     * shared
     * exclusive
       
  intent
  
   Intent locks indicate the intention to acquire a shared or exclusive
   lock on a data page. Intent locks are used to prevent other
   transactions from acquiring shared or exclusive locks on the given
   page.
   
  shared
  
   This is similar to a page level shared lock but it affects the entire
   table. This lock is typically applied during the creation of a
   non-clustered index.
   
  exclusive
  
   This is similar to a page level exclusive lock but it affects the
   entire table. If an _update_ or _delete_ affects the entire table, an
   exclusive table lock is generated. Also, during the creation of a
   clustered index an exclusive lock is generated.
   
Demand Locks

   A demand lock prevents further shared locks from being set. The SQL
   Server sets a demand lock to indicate that a transaction is next to
   lock a table or a page.
   
   This avoids indefinite postponement if there was a flurry of readers
   when a writer wished to make a change.
     _________________________________________________________________

                 Q6.4: WHAT'S THE PURPOSE OF USING _HOLDLOCK_?
                                       
   
     _________________________________________________________________
   
   All _select/readtext_ statements acquire shared locks (see Q6.3) to
   retrieve their information. After the information is retrieved, the
   shared lock(s) is/are released.
   
   The _holdlock_ option is used within _transactions_ so that after the
   _select/readtext_ statement the locks are held until the end of the
   transaction:
     * commit transaction
     * rollback transaction
       
   If the _holdlock_ is not used within a transaction, the shared locks
   are released.
     _________________________________________________________________

               Q6.6: HOW DO I FIND THE OLDEST OPEN TRANSACTION?
                                       
   
     _________________________________________________________________
   

select h.spid, u.name, p.cmd, h.name, h.starttime,
       p.hostname, p.hostprocess, p.program_name
from master..syslogshold h, master..sysprocesses p, master..sysusers u
where h.spid = p.spid
  and p.suid = u.suid
  and h.spid != 0 /* not replication truncation point */

   
     _________________________________________________________________

               Q6.7: HOW DO I FIND THE OLDEST OPEN TRANSACTION?
                                       
   
     _________________________________________________________________
   
   System 11 and beyond:

select h.spid, convert(varchar(20), h.name), h.starttime
  from master..syslogshold h, sysindexes i
 where h.dbid = db_id()
   and h.spid != 0
   and i.id = 8 /* syslogs */
   and h.page in (i.first, i.first+1)/* first page of log = page of oldest xact
 */

   
     _________________________________________________________________

                        Q6.8: THE _TIMESTAMP_ DATATYPE
                                       
   
     _________________________________________________________________
   
   The timestamp datatype is user-defined datatype supplied by Sybase,
   defined as:
   
     varbinary(8) NULL
     
   It has a special use when used to define a table column. A table may
   have at most one column of type timestamp, and whenever a row
   containing a timestamp column is inserted or updated the value in the
   timestamp column is automatically updated. This much is covered in the
   documentation.
   
   What isn't covered is what the values placed in timestamp columns
   actually represent. It is a common misconception that timestamp values
   bear some relation to calendar date and/or clock time. They don't -
   the datatype is badly-named. SQL Server keeps a counter that is
   incremented for every write operation - you can see its current value
   via the global variable @@DBTS (though don't try and use this value to
   predict what will get inserted into a timestamp column as every
   connection shares the same counter.)
   
   The value is maintained between server startups and increases
   monotonically over time (though again you cannot rely on it this
   behaviour). Eventually the value will wrap, potentially causing huge
   problems, though you will be warned before it does - see Sybase
   Technical News Volume 5, Number 1 (see Q10.3.1). You _cannot_ convert
   this value to a datetime value - it is simply an 8-byte integer.
   
     Note that the global timestamp value is used for recovery purposes
     in the event of an RDMBS crash. As transactions are committed to the
     log each transaction gets a unique timestamp value. The checkpoint
     process places a marker in the log with its unique timestamp value.
     If the RDBMS crashes, recovery is the process of looking for
     transactions that need to be rolled forward and/or backward from the
     checkpoint event. If a transaction spans across the checkpoint event
     and it never competed it too needs to be rolled back.
     
     Essentially, this describes the _write-ahead log protocol_ described
     by _C.J. Date_ in _An Introduction to Database Systems_ - see Q12.2.
     
   
   
   So what is it for? It was created in order to support the browse-mode
   functions of DB-Library (and for recovery as mentioned above). This
   enables an application to easily support optimistic locking by
   guaranteeing a _watch_ column in a row will change value if any other
   column in that row is updated. The browse functions checked that the
   timestamp value was still the same as when the column was read before
   attempting an update. This behaviour is easy to replicate without
   necessarily using the actual client browse-mode functions - just read
   the timestamp value along with other data retrieved to the client, and
   compare the stored value with the current value prior to an update.
     _________________________________________________________________

             Q6.9: STORED PROCEDURE RECOMPILATION AND RERESOLUTION
                                       
   
     _________________________________________________________________
   
   When a stored procedure is created, the text is placed in syscomments
   and a parse tree is placed in sysprocedures. At this stage there is no
   compiled _query plan_.
   
   A compiled _query plan_ for the procedure only ever exists in memory
   (that is, in the procedure cache) and is created under the following
   conditions:
    1. A procedure is executed for the first time.
    2. A procedure is executed by a second or subsequent user when the
       first plan in cache is still in use.
    3. The procedure cache is flushed by server restart or cache LRU
       flush procedure.
    4. The procedure is executed or created using the _with recompile_
       option.
       
   
   
   If the objects the procedure refers to change in some way - indexes
   dropped, table definition changed, etc - the procedure will be
   _reresolved_ - which updates sysprocedures with a modified tree. Under
   10.x the tree actually grows and in extreme cases the procedure can
   become to big to execute. This problem disappears in Sybase System 11.
   This reresolution will _always_ occur if the stored procedure uses
   temporary tables (tables that start with "#").
   
   There is apparently no way of telling if a procedure has been
   reresolved.
     _________________________________________________________________
-- 
Pablo Sanchez              | Ph # (415) 933.3812        Fax # (415) 933.2821
pablo@sgi.com              | Pg # (800) 930.5635  -or-  pablo_p@pager.sgi.com
===============================================================================
I am accountable for my actions.   http://reality.sgi.com/pablo [ /Sybase_FAQ ]
