Tuesday, 1 September 2015

What is heap table?

Heap table:-
A table without cluster index is called Heap table. Now you are thinking why we are talking about this.
So as SQL developer we should aware and In next post I will explain which is related to maintenance how to Rebuild a heap table.
Now our next question is how to determine all the heap table from a database
so below is simplest query to determine all the heap tables from a  database
SELECT T.Name ‘Heaptable’
FROM sys.indexes I
INNER JOIN sys.tables T
ON I.object_id = T.object_id
WHERE I.type = 0 AND T.type = ‘U’
I ran this query on Adventureworks database and 2 tables with heap as shown in below figure
heaptable

What is extent?

Extents are  basic unit in which space is allocated to table or indexes.An extent is 8 contiguous pages .Each page is having size of 8k.

Saturday, 13 June 2015

Performance Tuning - Statistics - The heart of Performance.

Last post we have seen how to read excecution plan .It was brief  post about execution plan.you may have question in your mind how this execution plan is created ?On what basis SQL Server engine generate  plan?  how many plan gets generated  and which plan SQL Engine select?
   I will try to make it as simple as .There should be something  in sql server which act as meta data to take decision  .Yes there is .That something is called as Statistics. 
  Now next question can come to in your mind how it gets created and who create it?
   Statistics get created in following cases and Sql server engine automatically creates it.
1.When we create any index .
2. Running query using where  or join clause so individual statistics will be created first time .
 It will be updated later point if auto_update_statistics is ON.

E.g.
CREATE TABLE  CREDITCARD
  (
     [Credit_Card_id] [INT] NOT NULL,
     [Name]           [VARCHAR](100) NULL,
     [Limit]          [INT] NULL,
     [Valid_from]     [INT] NULL,
     [Valid_To]       [INT] NULL,
 
  )

  Go
 select * from CREDITCARD
  Now we have created table and seeing statistics are created are not.
  You will see no statistics are created yet.

Lets run below query ,

SELECT *
  FROM   CREDITCARD
  WHERE  Credit_Card_id = 1
         AND Name = 'test'
         AND Limit = 1
         AND Valid_from = 20
         AND Valid_To = 30
Now if you refresh statistics you will see five statistics object are created .Lets see how to see  
what data is there in that statistics .


We will use DBCC   SHOW_STATISTICS to see data inside statistics. 

Note : For demo I am considering only one statistics object.Name of your statistic will be different as it creates dynamically.

  DBCC SHOW_STATISTICS('CREDITCARD',  _WA_Sys_00000001_0425A276)



Now we are seeing statistics but no data is there.Now lets do some insert.Lets  insert data and run once above command.

                insert into CREDITCARD 
               select 1,'Shrikant',25000,1,22
  
              Go
           SELECT *
                         FROM   CREDITCARD
                        WHERE  Credit_Card_id = 1
                       AND Name = 'test'
                       AND Limit = 1
                      AND Valid_from = 20
                    AND Valid_To = 30 
  
  DBCC SHOW_STATISTICS('CREDITCARD', _WA_Sys_00000001_0425A276).
Now you will see very important information.
  • The number of rows and pages occupied by a table's data
  • The time that statistics were last updated
  • The average length of keys in a column
  • Histograms showing the distribution of data in a column
  • String summaries that are used when performing LIKE queries on character data


In this third row is very important .In this we can see how many distinct values are there for that column .Ther is one column  called EQ_ROW which is very useful for calculating plan.
Lets go bit inside.I am inserting a 10000 record in above table.

  declare @lclId int =1
  while 1= 1
  begin
  if @lclId>10000
  break
  
  insert into CREDITCARD 
  select  @lclId,'Shrikant',25000,1,22
  set @lclId +=1
    end 
Now If you run following select statement and DBBC command
Go
  SELECT *
  FROM   CREDITCARD
  WHERE   
           Name = 'Shrikant'
         AND Limit = 25000
         AND Valid_from = 1
         AND Valid_To = 22  and Credit_Card_id =1
  GO
  DBCC SHOW_STATISTICS('CREDITCARD',_WA_Sys_00000001_0425A276 )
  
you can see vaues for update and EQ_Row.EQ row is used to calculate estimated row count.

Note : You can manually update statistics by using sp_updatestats procedure for entire database.
If you want to update statistics for a table in above case you can use

UPDATE STATISTICS CREDITCARD

If you want to update statistics for a specific statistics in above case you can use

UPDATE STATISTICS CREDITCARD     _WA_Sys_00000001_0425A276;


In next post we will try to understand how statistics are useful for performance tuning.
  

Saturday, 6 June 2015

Sql Tunining - Understanding execution Plan in Simple Way

  Last post we have seen how to clear data pages ,index pages and execution plan from buffer.Now next step in performance tuning is to understand execution plan.you may have question like how the execution plan is  generated? who generate it?how it is important? or you may have lot of question which I have not mentioned here.  Execution plan is visual representation of the operation performed by database engine in order to return data required by the query.If you cam read and understand execution plans then you can better understand how you query is executing internally.

How it will helpful to increase performance as I don't know how to read it ?
              I will answer this question like suppose you have one query and you have executed  and execution plan is generated .You have execution plan too.Reading execution is bit difficult first time but after some time you will find it easy. You may have question from which side I will start reading it? You always need too start from right side.Logic is simple we will have  few input  query ,which will merge at last and provide result  .Yo can see few different terms like merge join ,inner join,nested loop,RID look up,Stream Aggregated etc all this term are called operator in execution plan .So you will see what are my input queries which are generating result,which query is taking more time  and how better I can  tune query to get performance.
         Lets try to understand simple execution Plan,
Script :

CREATE TABLE  TEST3
  (
     [Credit_Card_id] [INT] NOT NULL,
     [Name]           [VARCHAR](100) NULL,
     [Limit]          [INT] NULL,
     [Valid_from]     [INT] NULL,
     [Valid_To]       [INT] NULL,
    
  )

select * from Test3

We have created script and lets run it ans see execution plan,

You can see above execution plan  and try to understand how it looks like.It shows two operator ie. Select and Table Scan.From where you will start reading it ? Off course you will start from right .you can understand there is table scan on Test3 table and finally select is running and giving output.Lets try to understand bit complex execution plan.

Script :

SELECT *
FROM   test3 Test1
       INNER JOIN test3 test2
               ON Test1.Credit_Card_id = test2.Credit_Card_id

  This query has two input as we are joining  two table .Lets  understand it with help of execution plan




 In above plan you can see three operator Select,Table Scan,Hash Join .We can clearly see there are two input Test1 and Test2  each one is contributing 13% so total time for forming input is 26%.
Now you can see 73% of time is for hash join which is because our input tables are not having any index if we create index on this table we can avoid hash join and gain performance.
Lets create index  and will see what will happen,


CREATE TABLE  TEST3
  (
     [Credit_Card_id] [INT] NOT NULL,
     [Name]           [VARCHAR](100) NULL,
     [Limit]          [INT] NULL,
     [Valid_from]     [INT] NULL,
     [Valid_To]       [INT] NULL,
   PRIMARY KEY CLUSTERED ( [Credit_Card_id] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
  )

Run below query,
SELECT *
FROM   test3 Test1
       INNER JOIN test3 test2
               ON Test1.Credit_Card_id = test2.Credit_Card_id

Execution Plan,


.



In above plan you can see comparison cost is 0% So we have increase performance of the query.Please let me know  if you have doubt.Please comment if you like it.


Friday, 5 June 2015

Sql Tuning Basic - Part 1

In previous post we have seen why performance is important for any organization  and how it related with fame and name in market.Now lets start our technical discussion from this post onwards.
Before start performance tuning concept we need to know some thing about  Buffer Pool.
             SQL Server buffer pool , also called an SQL Server cache ,is  a place in system memory that is used for table and index data pages as they are modified or read from disk also it contains execution plane .As this is basic post I will try to make as simple as I can do.So when we are doing performance testing we must clear  buffer cache.One of my friend ask me what will happen if we will not clear buffer pool.Answers was " It can mislead your testing as data (pages)  are in buffer so it will not fetch from disk and it will choose execution plan from cache  so you will see performance but it is not ".
 He asked me  again

How will you clear buffer pool i,e Index/Data pages and execution plan from memory?

Ans -->
      It is very simple , Sql server has DBCC command ,We will use following command

              DBCC dropcleanbuffers

    Note : This command is very useful it will help us to do our performance  testing without shut-down or restarting server.
     If your more intrested to know what is there in my buffer at that moment and if you want see it .
You can use following DMV which will give all detail .

        select  *  From sys.dm_os_buffer_descriptors

 If you run above command you will lose your data /index pages and execution plan from buffer pool.That what we want .Now you are ready to start your performance testing which will give you accurate result on test and production if it is clone.
  There is one more command ideally used before clearing cache its CheckPoint

CHECKPOINT :
   Writes all dirty pages for the current database to disk.Dirty pages are data pages that have been entered into the buffer cache and modified ,but not yet written to disk.Checkpoints save time during a later recovery by creating a point at which all dirty pages are guaranteed to have been written to disk.

Checkpoint
Go
DBCC DropCleanBuffers

Please use above above command before comparing stats ie. before and after.Next post we will start actual performance tuning.Please comment if you like this post.



Sql Tunning - Introduction

    Some of my colleague and friends asked me to write post  on performance tuning .I was busy on some development project so I didn't get time to write a post.  Performance is one measure on which we rate anything, suppose I have developed product it has all feature which satisfy any business need.It gives accurate prediction . I want to sell that product  in million dollar  but  If I say for accurate prediction you only need to wait .Then buyer will straight forward say "Sorry  not interested " .Now days life is fast no one wants to wait .They need result on the fly so if your product is so good but very low performance then  no one will buy it.
   So performance is key factor for any organization to get fame in the market.Suppose  company developed product  and sold to local  customer.Starting time customer was very happy with result ,response and services from the company.Now customer expands business to all over  India .So company will charge more money from them  for support.They will happily give it with some negotiation.As the business expands customer will start facing issue with response i.e. performance.They will say initial time we was getting response in millisecond now we need to wait  for  1 minute .So response time is increased by 60 - 100% .Then situation will become very worse for company .System was originally tested on database whose size varies between  1 - 2 GB. Now data size growing drastically i.e. it is flowing in TB  . So response time what there are getting is correct only interesting thing everything is working as expected .It will work as expected if they have proper deadlock management and blocking mechanism.
   Thing is how you face your customer now.They will force you give us solution to resolve performance issue .Now they are more dependant on you .So this is time to get fame and name in the market.Making customer happy is always healthier for company financial growth.
   This is time to think something bigger which you have not thought while system was developed.How will you make customer happy in this case , if they got good response  from application they will be happy.
  This is just introductory post I will start technical things  from next post which we can use to increase performance .

Sunday, 10 May 2015

Views are not refreshing in SQL Server when there is any schema change?

                It was weird behaviour I observed while working on one of the development project.Let me explain step by step.
  I have created two view as vwTest1  and vwTest2 .
e.g
Create Table Test (Id Int ,Name varchar(100))
                       ---->  Above table is created with two column as Id & Name
Lets create two view with  same structure,

Create View vwTest1
As
Select  Id ,Name  From Test
                     ----> First View with explicit column name has been created.

Create View vwTest2
As
Select *  From Test

  If we see result of these two view will be same i.e it will show only two column.

Now there is new requirement :
    Name needs to change to CustomerName .

Now I am changing structure  of table as per requirement.

i.e
   Sp_Rename 'Test.Name','CustomerName'

Lets see result of table and views.

Select * from Test
  ---> It will have updated name of column.

Select * from VwTest1
  ---> It will have updated name of column.

Select * from VwTest2
  ---> It will have old name of column.

 Why this happened ?
                         When creating a view the metdata of the view is stored in the system tables, this includes the columns in the SELECT statement.  When you don't explicitly name each column, but rather use the wild card, "*", then SQL Server will list the columns that are contained in the underlying table at the time of creation.  If the underlying table(s) are changed by adding or removing columns after a view has been created then this can affect the view as the metadata of those columns are not automatically updated, if the view is created WITH SCHEMABINDING then the us of the wildcard is prohibited. 

Solution :
 If we have view which uses " * " in its creation then we need manually execute Sp_refreshview
to refresh metadata of views .

e.g
                  exec sp_refreshview VwTest2
Now if we see result  of vwTest2   we can see latest column .






Saturday, 21 March 2015

Write Query to get top N number of Employee of each department whose salary is maximum.

     One of my friend was facing issue in his development so I worked and solved his issue.I will try to share his requirement with respect to Employee table.
     His requirement was to show top 3 employee of each department having maximum salary.It seems to be bit confusing but it  is  not.
 Code :
We have following table :

Create Table Employee
(
EmployeeID Int Identity Primary Key,
EmployeeName Varchar(10),
DepartmentID Int ,
Salary Money
)

--Insert data

 INSERT INTO Employee
            (EmployeeName,
             DepartmentID,
             Salary)
VALUES      ('S12',
             1,
             2000),
            ('S13',
             1,
             3000),
            ('S14',
             1,
             4000),
            ('S15',
             1,
             5000),
            ('S16',
             1,
             6000),
            ('S17',
             1,
             7000),
            ('S18',
             1,
             8000),
            ('S19',
             1,
             9000),
            ('S110',
             1,
             10000),
            ('S11',
             2,
             1000),
            ('S12',
             2,
             2000),
            ('S13',
             2,
             3000),
            ('S14',
             2,
             4000),
            ('S15',
             2,
             5000),
            ('S16',
             2,
             6000),
            ('S17',
             2,
             7000),
            ('S18',
             2,
             8000),
            ('S19',
             2,
             9000),
            ('S110',
             2,
             10000),
            ('S11',
             3,
             1000),
            ('S12',
             3,
             2000),
            ('S13',
             3,
             3000),
            ('S14',
             3,
             4000),
            ('S15',
             3,
             5000),
            ('S16',
             4,
             6000),
            ('S17',
             4,
             7000),
            ('S18',
             4,
             8000),
            ('S19',
             4,
             9000),
            ('S110',
             4,
             10000)


Now we have data  as follow

Now question is ,really can we get it through query ?

Answer is yes.

Code :

;WITH cte
     AS (SELECT *,
                (SELECT Count(*)
                 FROM   employee
                 WHERE   salary>=e.salary
                        AND departmentID = e.departmentID) AS count
         FROM   Employee e)

SELECT *
FROM   cte
WHERE count <= 3

Logic is very simple but for understanding it will take some time.I will try to explain step by step.
1.We have main  table Employe. We will add one column in CTE as count.
2.Count will be calculated based on corelated subquery..
3.It will check how many employee of that department are getting equal are more than that  salary .
4.How value for  first emplyeeId is calculated with respect to DepartmentId
 -->
--First Row
  select *,(select  count(*) from Emplyee b where b.salary <=1000  and b.departmentID=a.departmentID
)

so for first row count will come as 10
EmployeeID EmployeeName DepartmentID Salary count
 1                 S11                           1                1000.00 10

--Second Row
  select *,(select  count(*) from Emplyee b where b.salary <=2000
and b.departmentID=a.departmentID
)

so for second row count will come as 9
EmployeeID EmployeeName DepartmentID Salary count
2                  S12                   1                 2000.00 9

 Like this all count  will be calculated with respect to departmentId
If we want to find out top 5 salary of employee we  only need to replace 3 by 5 in above query as follow
SELECT *
FROM   cte
WHERE count <= 5
  Same thing we can achieve through ranking function 
;
with cte as(
select * , rank() over ( PARTITION by DepartmentID   order  by salary desc  )  cn
from Employee  emp
)
select * From cte where cn<=3 order by DepartmentID  asc



;
with cte as(
select * , row_number() over ( PARTITION by DepartmentID   order  by salary desc  )  cn
from Employee emp
)
select * From cte where cn<=3 order by DepartmentID  asc




;
with cte as(
select * , dense_rank() over ( PARTITION by DepartmentID   order  by salary desc  )  cn
from Employee emp
)
select * From cte where cn<=3 order by DepartmentID  asc


Write program in SQL Server to print pyramid

                    Today,I was reading one of  programming language book .I saw lot of program with respect to pyramids .I always consider it as game of printing some number or  special character.
So I thought why don't I utilize my Saturday with that games.Now we considering that special character as '*'.Lets play with this.We will start with printing pyramids .

Output:
1.



Script :

DECLARE @lclMaxLevel INT=5
DECLARE @lclPrintCount INT =0

WHILE @lclMaxLevel > 0
  BEGIN
      PRINT Space(@lclMaxLevel)
            + Replicate('*', @lclPrintCount)
            + Replicate('*', @lclPrintCount+1)

      SET @lclMaxLevel=@lclMaxLevel - 1
      SET @lclPrintCount=@lclPrintCount + 1
  END

2.
Script :
DECLARE @lclMaxLevel INT=5
DECLARE @lclPrintCount INT =0

WHILE @lclMaxLevel > 0
  BEGIN
      PRINT Space(@lclMaxLevel)
            + Replicate('*', @lclPrintCount+1)

      SET @lclMaxLevel=@lclMaxLevel - 1
      SET @lclPrintCount=@lclPrintCount + 1
  END

  Please give your feedback if you liked it.

Nested Transaction -How it is important.

 Recently I was  encountered with issue  ,it was related with Transaction .I can share code with respect to adventurework database
Script:

SELECT Count(*)
FROM   SALESLT.SalesOrderDetail
WHERE  SalesOrderID IN ( 71774, 71780 )

BEGIN TRY
--Added try catch to avoid exception
    BEGIN TRAN
--First Transaction
    BEGIN TRAN
--Second Transaction
    BEGIN TRAN
--Third Transaction

    UPDATE SALESLT.SalesOrderDetail
    SET    UnitPrice = 5000
    WHERE  SalesOrderID = 71774

    COMMIT -- First Transaction End

    UPDATE SALESLT.SalesOrderDetail
    SET    UnitPrice = 1000
    WHERE  SalesOrderID = 71780

    COMMIT -- Second Transaction End

    UPDATE SALESLT.SalesOrderDetail
    SET    UnitPrice = 1000
    WHERE  SalesOrderID = 71781
--No Entry in Table So it will raise manual error

    IF @@ROWCOUNT <= 1
      RAISERROR('Call Rollback',16,1)
--it will go to Catch block

    COMMIT
END try

BEGIN catch
    PRINT 'Call Rollback'

    ROLLBACK
--Data will be rollback
END Catch

SELECT Count(*)
FROM   SALESLT.SalesOrderDetail
WHERE  SalesOrderID IN ( 71774, 71780 )

   Result of both query will be same as all transaction are rollback.
  Many of us can confuse with respect to result.




Summary:
   It has total  three transaction  out of which two transaction succeed  and one is failed .  Though inner transaction are committed without error but outer transaction got rollback..So all data got rollback. While implementing transaction  if it is required we need take care of this case.



Friday, 9 January 2015

What is a deadlock and what is a live lock?

Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other’s piece. Each process would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user’s process. A livelock is one, where a request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely.

What are the steps you will take to improve performance of a poor performing query?


This is a very open ended question and there could be a lot of reasons behind the poor performance of a query. But some general issues that you could talk about would be: No indexes, table scans, missing or out of date statistics, blocking, excess recompilations of stored procedures, procedures and triggers without SET NOCOUNT ON, poorly written query with unnecessarily complicated joins, too much normalization, excess usage of cursors and temporary tables. Some of the tools/ways that help you troubleshooting performance problems are: SET SHOWPLAN_ALL ON, SET SHOWPLAN_TEXT ON, SET STATISTICS IO ON, SQL Server Profiler, Windows NT /2000 Performance monitor, Graphical execution plan in Query Analyzer.

How are the UNIQUE and PRIMARY KEY constraints different?

A UNIQUE constraint is similar to PRIMARY key, but you can have more than one UNIQUE constraint per table.
When you declare a UNIQUE constraint, SQL Server creates a UNIQUE index to speed up the process of searching for duplicates. In this case the index defaults to NONCLUSTERED index, because you can have only one CLUSTERED index per table.
* The number of UNIQUE constraints per table is limited by the number of indexes on the table
Contrary to PRIMARY key UNIQUE constraints can accept NULL but just once. If the constraint is defined in a combination of fields, then every field can accept NULL and can have some values on them, as long as the combination values is unique.

What is primary key and Candidate Key?

 Each table has one and only one primary key, which can consist of one or many columns. A concatenated primary key comprises two or more columns. In a single table, you might find several columns, or groups of columns, that might serve as a primary key and are called candidate keys. A table can have more than one candidate key, but only one candidate key can become the primary key for that table

What are System database in SQL Server?

  There are mainly four system database:
1.Msdb
2.Master.
3.Model.
4.Tempdb


 Let us see one by one.

Master Database

Master database is system database. It contains information about server’s configuration. It is a very important database and important to backup Master database. Without Master database, server can't be started.

MSDB Database

It stores information related to database backups, DTS packages, Replication, SQL Agent information, SQL Server jobs.

TEMPDB Database

It stores temporary objects like temporary tables and temporary stored procedure.

Model Database

It is a template database used in the creation of new database.


Resourse Database
The resoure Database is a read-only database that contains all the system objects that are included with SQL Server. SQL Server system objects, such as sys.objects, are physically persisted in the Resource database, but they logically appear in the sys schema of every database. The Resource database does not contain user data or user metadata.

How to get which Process is Blocked in SQL SERVER ?

 There are two ways to get this sp_who and sp_who2 . You cannot get any detail about the sp_who2 but its provide more information than the sp_who . And other option from which we can find which process is blocked by other process is by using Enterprise Manager or Management Studio, these two commands work much faster and more efficiently than these GUI-based front-ends.