Tuesday, 10 December 2013

Difference Between Index Scan and Index Seek.

Index Scan :
                   Index Scan scans each and every record in the index. `Table Scan` is where the table is
processed row by row from beginning to end. If the index is a clustered index then an
index scan is really a table scan. Since a scan touches every row in the table whether or
not it qualifies, the cost is proportional to the total number of rows in the table.
Hence, a scan is an efficient strategy if the table is small.

Index Seek :
                   Since a seek only touches rows that qualify and pages that contain these qualifying
 rows, the cost is proportional to the number of qualifying rows and pages rather than to
 the total number of rows in the table.

               Before we go over the concept of scan and seek we need to understand what SQL Server does before applying any kind of index on query. When any query is ran SQL Server has to determine that if any particular index can be applied on that particular query or not.SQL Server uses search predicates to make decision right before applying indexes to any given query.

Predicate is an expression that evaluates to TRUE, FALSE, or UNKNOWN. Predicates are used in the search condition of WHERE clauses and HAVING clauses, the join conditions of FROM clauses, and other constructs where a Boolean value is required.

Suppose,
TableX has five columns : Col1,Col2,Col3,Col4,Col5
Index1 on TableX contains two columns : Col2,Col3
Query1 on TableX retrieves two columns : Col1,Col3

          Now when Query1 is ran on TableX  it will use search predicates Col1,Col3 to figure out if it will use Index1 or not. As Col1,Col3 of Query1 are not same as Col2,Col3 or Index1 there are good chances that Query1 will not use Index1. This scenario will end up in table scan. If Query1 would have used Index1 it would have resulted in table seek.

Friday, 6 December 2013

What is BigData?

                                         Big data is a buzzword, or catch-phrase, used to describe a massive volume of both structured and unstructured data that is so large that it's difficult to process using traditional database and software techniques.
While the term may seem to reference the volume of data, that isn't always the case. The term big data -- especially when used by vendors -- may refer to the technology (which includes tools and processes) that an organization requires to handle the large amounts of data and storage facilities.
The term big data is believed to have originated with Web search companies who had to query very large distributed aggregations of loosely-structured data.

An Example of Big Data

An example of big data might be petabytes (1,024 terabytes) or exabytes(1,024 petabytes) of data consisting of billions to trillions of records of millions of people -- all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on). The data is typically loosely structured data that is often incomplete and inaccessible.
When dealing with larger datasets, organizations face difficulties in being able to create, manipulate, and manage big data. Big data is particularly a problem in business analytics because standard tools and procedures are not designed to search and analyze massive datasets.


What is Hadoop?

                                                                Hadoop is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment. It is part of the Apache project sponsored by the Apache Software Foundation.
Hadoop makes it possible to run applications on systems with thousands of nodes involving thousands of terabytes. Its distributed file system facilitates rapid data transfer rates among nodes and allows the system to continue operating uninterrupted in case of a node failure. This approach lowers the risk of catastrophic system failure, even if a significant number of nodes become inoperative.
Hadoop was inspired by Google's MapReduce, a software framework in which an application is broken down into numerous small parts. Any of these parts (also called fragments or blocks) can be run on any node in the cluster. Doug Cutting, Hadoop's creator, named the framework after his child's stuffed toy elephant. The current Apache Hadoop ecosystem consists of the Hadoop kernel, MapReduce, the Hadoop distributed file system (HDFS) and a number of related projects such as Apache Hive, HBase and Zookeeper.
The Hadoop framework is used by major players including Google, Yahoo and IBM, largely for applications involving search engines and advertising. The preferred operating systems areWindows and Linux but Hadoop can also work with BSD and OS X.

Thursday, 5 December 2013

Getting number from alphanumeric string in SQL Server

 Suppose ,
            We have string which is containing numbers,character,special character.Requirement is we want only numbers .We don't want characters ,special character etc.I created function which will only return  number.

Create function fnOnlyNumbers(@Value varchar(255))
 returns varchar(255)
 begin
 if PATINDEX('%[0-9]%',@Value)>0
 begin
 while  isnumeric(@Value)=0
 begin
 set @Value=ltrim(rtrim(REPLACE(REPLACE (@Value, SUBSTRING (@Value ,PATINDEX ( '%[!@#$a-z`?><.,/A-Z() *&]%' , @Value ),1),'') ,' ','')))
 end
 end
 return  case when  isnumeric(@Value)=1 then   @Value else Null end
 end


How to run :
select dbo.fnOnlyNumbers('sss1$2>>34 ')
Output : 1234

Note : If we are not passing any value then we will get Null value.