Modern Performance in SQL Server

Modern Performance - SQL Server

Joe Chang

& SolidQ

www.qdpma.com

About Joe

•

SQL Server consultant since 1999

•

Query Optimizer execution plan cost formulas (2002)

•

True cost structure of SQL plan operations (2003?)

•

Database with distribution statistics only, no data 2004

•

Decoding statblob/stats_stream

–

writing your own statistics

•

Disk IO cost structure

•

Tools for system monitoring, execution plan analysis

See ExecStats

http://www.qdpma.com/ExecStats/SQLExecStats.html

Download:

http://www.qdpma.com/ExecStatsZip.html

Blog:

http://sqlblog.com/blogs/joe_chang/default.aspx

Overview

•

Why performance is still important today?

–

Brute force?

•

Yes, but …

•

Special Topics

•

Automating data collections

•

SQL Server Engine

–

What developers/DBA need to know?

CPU & Memory 2001 and 2012

2001 – 4 sockets, 4 cores

Pentium III Xeon, 900MHz

4-8GB memory?

Xeon MP 2002-4

2012 – 4 sockets, 8 cores each

4 x 8 = 32 cores total

768GB (48 x 16GB),

Westmere-EX 1TB

15 cores in next generation?

FSB

L2

MCH

Each core today is more

than 10x over PIII

16GB  $191

32GB  $794

Storage 2001 versus 2012/13

QPI

QPI

100 x 10K HDD

125 IOPS each = 12.5K IOPS

IO Bandwidth limited: 1.3GB/s

(1/3 memory bandwidth)

64 SSDs, >10K+ IOPS each, 1M

IOPS possible

IO Bandwidth 10GB/s easy

SAN vendors –

questionable BW

MCH

http://www.qdpma.com/Storage/Storage2013.html

http://www.qdpma.com/ppt/Storage_2013.pptx

SAN

http://sqlblog.com/blogs/joe_chang/archive/2013/05/10/enterprise-storage-systems-emc-vmax.aspx

http://sqlblog.com/blogs/joe_chang/archive/2013/02/25/emc-vnx2-and-vnx-future.aspx

Performance Past, Present, Future

•

When will servers be so powerful that …

–

Been saying this for a long time

•

Today – 10 to 100X overkill

–

32-cores, 60-cores later in 2013?

–

Enough memory that IO is only sporadic

–

Unlimited IOPS with SSD

•

What can go wrong?

Today’s topic

Factors to Consider

SQL

Tables

Indexes

Query

Optimizer

Statistics

Compile

Parameters

Storage

Engine

Hardware

Special Topics

•

Data type mistmatch

•

Multiple Optional Search Arguments (SARG)

–

Function on SARG

•

Parameter Sniffing versus Variables

•

Statistics related (big topic)

•

first OR, then AND/OR combinations

•

Complex Query with sub-expressions

•

Parallel Execution

Not in order of priority

http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

1a. Data type mismatch

DECLARE

@name

nvarchar

(25) =

N'Customer#000002760'

SELECT

FROM

 CUSTOMER

WHERE

 C_NAME =

@name

SELECT

FROM

 CUSTOMER

WHERE

 C_NAME = CONVERT(

varchar

@name

.NET auto-parameter discovery?

Unable to use index seek

1b. Type Mismatch – Row Estimate

SELECT

FROM

 CUSTOMER

WHERE

 C_NAME

LIKE

N'Customer#00000276%'

SELECT

FROM

 CUSTOMER

WHERE

 C_NAME

LIKE

'Customer#00000276%'

Row estimate

error could have

severe

consequences in a

complex query

SELECT TOP plus Row Estimate Error

SELECT TOP

[Document]

[ArtifactID]

FROM

[Document]

NOLOCK

WHERE

[Document]

[AccessControlListID_D]

IN

AND EXISTS (

SELECT

[DocumentBatch]

[BatchArtifactID]

FROM

[DocumentBatch]

NOLOCK

INNER JOIN

[Batch]

NOLOCK

ON

[Batch]

ArtifactID

[DocumentBatch]

[BatchArtifactID]

WHERE

[DocumentBatch]

[DocumentArtifactID]

[Document]

[ArtifactID]

AND

[Batch]

[Name]

LIKE

N'%Value%'

ORDER BY

[Document]

[ArtifactID]

Data type mismatch – results in estimate rows high

Top clause – easy to find first 1000 rows

In fact, there are few rows that match SARG

Wrong plan for evaluating large number of rows

http://www.qdpma.com/CBO/Relativity.html

2. Multiple Optional SARG

DECLARE

@Orderkey

 int,

@Partkey

 int = 1

SELECT

FROM

 LINEITEM

WHERE

@Orderkey

 IS NULL

OR

 L_ORDERKEY =

@Orderkey

AND

@Partkey

 IS NULL

OR

 L_PARTKEY =

@Partkey

AND (@Partkey IS NOT NULL OR @Orderkey IS NOT NULL)

Dynamically Built Parameterized SQL

DECLARE

@Orderkey

int

@Partkey

int

@SQL

nvarchar

),

@Param

nvarchar

SELECT

@SQL

N‘/* Comment */

SELECT * FROM LINEITEM WHERE 1=1‘

@Param

N'@Orderkey int, @Partkey int'

IF

@Orderkey

IS

NOT

NULL)

SELECT

@SQL

@SQL

N' AND L_ORDERKEY = @Orderkey'

IF

@Partkey

IS

NOT

NULL)

SELECT

@SQL

@SQL

N' AND L_PARTKEY = @Partkey'

PRINT

@SQL

exec

sp_executesql

@SQL

@Param

@Orderkey

@Partkey

IF block is easier for few options

Dynamically built parameterized SQL better for many options

Considering /*comment*/ to help identify this

IF block

DECLARE

@Orderkey

 int,

@Partkey

 int = 1

IF

@Orderkey

IS

NOT

NULL)

  SELECT

FROM

 LINEITEM

  WHERE

 (L_ORDERKEY =

@Orderkey

AND

@Partkey

 IS NULL

OR

 L_PARTKEY =

@Partkey

ELSE IF

@Partkey

IS

NOT

NULL)

  SELECT

FROM

 LINEITEM

  WHERE

 (L_PARTKEY =

@Partkey

Need to consider impact of Parameter Sniffing,

Consider the OPTIMIZER FOR hint

This is actually the stored procedure

parameters

2b. Function on column SARG

SELECT

COUNT

(*),

SUM

(L_EXTENDEDPRICE)

FROM

 LINEITEM

WHERE

YEAR

(L_SHIPDATE) = 1995

AND

MONTH

(L_SHIPDATE) = 1

SELECT

COUNT

(*),

SUM

(L_EXTENDEDPRICE)

FROM

 LINEITEM

WHERE

 L_SHIPDATE

BETWEEN

'1995-01-01'

AND

'1995-01-31'

DECLARE

@Startdate

 date,

@Days

 int = 1

SELECT

COUNT

(*),

SUM

(L_EXTENDEDPRICE)

FROM

 LINEITEM

WHERE

 L_SHIPDATE

BETWEEN

@Startdate

AND

DATEADD

(dd,1,

@Startdate

Estimated versus Actual Plan - rows

Estimated Plan – 1 row???

Actual Plan – actual rows 77,356

3 Parameter Sniffing

-- first call, procedure compiles with these parameters

exec

 p_Report @startdate =

'2011-01-01'

, @enddate =

'2011-12-31'

-- subsequent calls, procedure executes with original plan

exec

 p_Report @startdate =

'2012-01-01'

, @enddate =

'2012-01-07'

Need different execution plans for narrow and wide range

Options:

1) WITH RECOMPILE

2) main procedure calls 1 of 2 identical sub-procedures

One sub-procedure is only called for narrow range

Other called for wide range

Skewed data distributions also important

Example: Large & small customers

Assuming date data type

4 Statistics

•

Auto-recompute points

•

Sampling strategy

–

Percentage

–

Random pages versus random rows

–

Histogram Equal and Range Rows

–

Out of bounds, value does not exist

–

etc

Statistics Used by the Query Optimizer in SQL Server 2008

Writer: Eric N. Hanson and Yavor Angelov

Contributor: Lubor Kollar

http://msdn.microsoft.com/en-us/library/dd535534.aspx

Statistics Structure

•

Stored (mostly) in binary field

Scalar values

Density Vector –

limit 30, half in NC, half Cluster key

Histogram

Up to 200 steps

Consider not blindly using IDENTITY on critical tables

Example: Large customers get low ID values

Small customers get high ID values

http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx

Statistics Auto/Re-Compute

•

Automatically generated on query compile

•

Recompute at 6 rows, 500, every 20%?

Has this changed?

Statistics Sampling

•

Sampling theory

–

True random sample

–

Sample error - square root     N

•

Relative error 1/   N

•

SQL Server sampling

–

All rows in random pages

Row Estimate Problems

•

Skewed data distribution

•

Out of bounds

•

Value does not exist

Loop Join - Table Scan on Inner Source

Estimated out from first 2 tabes (at right) is

zero or 1 rows. Most efficient join to third

table (without index on join column) is a loop

join with scan. If row count is 2 or more, then

a fullscan is performed for each row from

outer source

Default statistics rules may lead to serious ETL issues

Consider custom strategy

Compile Parameter Not Exists

Main procedure

 has cursor around view_Servers

First server in view_Servers is

’CAESIUM’

Cursor executes

sub-procedure

 for each Server

sql:

SELECT

MAX

(ID)

FROM

 TReplWS

WHERE

 Hostname = @ServerName

But

CAESIUM

 does not exist in TReplWS!

Good and Bad Plan?

SqlPlan Compile Parameters

SqlPlan Compile Parameters

<?

xml

version

1.0

encoding

utf-8

?>

ShowPlanXML

xmlns

http://schemas.microsoft.com/sqlserver/2004/07/showplan

Version

1.1

Build

10.50.2500.0

BatchSequence

Batch

Statements

StmtSimple

StatementText

@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0)

 FROM TReplWS WHERE Hostname = @ServerName

StatementId

StatementCompId

StatementType

SELECT

StatementSubTreeCost

0.0032843

StatementEstRows

StatementOptmLevel

FULL

QueryHash

0x671D2B3E17E538F1

QueryPlanHash

0xEB64FB22C47E1CF2

StatementOptmEarlyAbortReason

GoodEnoughPlanFound

StatementSetOptions

QUOTED_IDENTIFIER

true

ARITHABORT

false

CONCAT_NULL_YIELDS_NULL

true

ANSI_NULLS

true

ANSI_PADDING

true

ANSI_WARNINGS

true

NUMERIC_ROUNDABORT

false

/>

QueryPlan

CachedPlanSize

CompileTime

CompileCPU

CompileMemory

RelOp

NodeId

PhysicalOp

Compute Scalar

LogicalOp

Compute Scalar

EstimateRows

EstimateIO

EstimateCPU

1e-007

“

AvgRowSize

EstimatedTotalSubtreeCost

0.0032843

Parallel

EstimateRebinds

EstimateRewinds

</

RelOp

ParameterList

ColumnReference

Column

@ServerName

ParameterCompiledValue

'CAESIUM'

/>

</

ParameterList

</

QueryPlan

</

StmtSimple

</

Statements

</

Batch

</

BatchSequence

</

ShowPlanXML

Compile parameter values at bottom of sqlplan file

5a Single Table OR

-- Single table

SELECT

FROM

LINEITEM

WHERE

L_ORDERKEY

OR

L_PARTKEY

5a Join 2 Tables, OR in SARG

-- subsequent calls, procedure executes with original plan

SELECT

O_ORDERDATE

O_ORDERKEY

L_SHIPDATE

L_QUANTITY

FROM

LINEITEM

INNER

JOIN

ORDERS

ON

O_ORDERKEY

L_ORDERKEY

WHERE

L_PARTKEY

OR

O_CUSTKEY

5a UNION instead of OR

SELECT

O_ORDERDATE

O_ORDERKEY

L_SHIPDATE

L_QUANTITY

O_CUSTKEY

L_PARTKEY

FROM

LINEITEM

INNER

JOIN

ORDERS

ON

O_ORDERKEY

L_ORDERKEY

WHERE

L_PARTKEY

UNION

(ALL)

SELECT

O_ORDERDATE

O_ORDERKEY

L_SHIPDATE

L_QUANTITY

O_CUSTKEY

L_PARTKEY

FROM

LINEITEM

INNER

JOIN

ORDERS

ON

O_ORDERKEY

L_ORDERKEY

WHERE

O_CUSTKEY

--

AND (

L_PARTKEY

<>

OR

L_PARTKEY

IS NULL)

--

Caution:

 select list should

have keys to ensure

correct rows

UNION removes duplicates

UNION ALL does not

-- Hugo Kornelis trick --

5b AND/OR Combinations

•

Hash Join is good method to process many rows

–

Requirement is equality join condition

•

In complex SQL with AND/OR or IN NOT IN combinations

–

Query optimizer may not be to determine that equality join

condition exists

–

Execution plan will use loop join,

–

and attempt to force hash join will be rejected

•

Re-write using UNION in place of OR

•

And LEFT JOIN in place of NOT IN

SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2)

SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3)

More on AND/OR combinations:

http://www.qdpma.com/CBO/Relativity3.html

Complex Query with Sub-expression

•

Query complexity –

really high compile cost

•

Repeating sub-expressions (including CTE)

–

Must be evaluated multiple times

•

Main Problem -

Row estimate error propagation

•

Solution/Strategy –

Get a good execution plan

–

Temp table when estimate is high, actual is low.

More on AND/OR combinations:

http://www.qdpma.com/CBO/Relativity4.html

http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

When Estimate is low, and actual rows is high, need to balance temp

table insert overhead versus plan benefit. Would a join hint work?

Temp Table and Table Variable

•

Forget what other people have said

–

Most is Cr@p

•

Temp Tables –

subject to statistics auto/re-compile

•

Table variable –

no statistics, assumes 1 row

•

Question:

In each specific case: does the statistics

and recompile help or not?

–

Yes: temp table

–

No: table variable

Parallelism

•

Designed for 1998 era

–

Cost Threshold for Parallelism: default 5

–

Max Degree of Parallelism – instance level

–

OPTION (MAXDOP n) – query level

•

Today – complex system – 32 cores

–

Plan cost 5 query might run in 10ms?

–

Some queries at DOP 4

–

Others at DOP 16?

More on Parallelism:

http://www.qdpma.com/CBO/ParallelismComments.html

http://www.qdpma.com/CBO/ParallelismOnset.html

Really need to rethink

parallelism / NUMA strategies

Full-Text Search

Loop Join with FT as inner

Source Full Text search

Potentially executed

many times

varchar

max

) stored in lob pages

•

Disk IO to lob pages is synchronous?

–

Must access row to get 16 byte link?

–

Feature request: index pointer to lob

Summary

•

Hardware today is really powerful

–

Storage may not be – SAN vendor disconnect

•

Standard performance practice

–

Top resource consumers, index usage

•

But also Look for serious blunders

http://www.qdpma.com/CBO/SQLServerCostBasedOptimizer.html

http://www.qdpma.com/CBO/Relativity.html

http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

Special Topics

•

Data type mismatch

•

Multiple Optional Search Arguments (SARG)

–

Function on SARG

•

Parameter Sniffing versus Variables

•

Statistics related (big topic)

•

AND/OR

•

Complex Query with sub-expressions

•

Parallel Execution

Slide Note

Embed Share

Download

Explore the evolution of performance in SQL Server with insights from Joe Chang, a seasoned SQL Server consultant. Delve into topics like query optimization, execution plans, CPU and memory advancements, storage technologies, and the significance of performance in today's data environment. Discover how automation and data collection play a crucial role in enhancing SQL Server performance.

les_la Follow

Uploaded on Sep 16, 2024 | 0 Views

Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

Download Presentation

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript

Modern Performance - SQL Server Joe Chang www.qdpma.com & SolidQ

About Joe SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data 2004 Decoding statblob/stats_stream writing your own statistics Disk IO cost structure Tools for system monitoring, execution plan analysis See ExecStats http://www.qdpma.com/ExecStats/SQLExecStats.html Download: http://www.qdpma.com/ExecStatsZip.html Blog: http://sqlblog.com/blogs/joe_chang/default.aspx

Overview Why performance is still important today? Brute force? Yes, but Special Topics Automating data collections SQL Server Engine What developers/DBA need to know?

CPU & Memory 2001 and 2012 PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E P L2 P P P QPI PCI-E C4 QPI PCI-E C4 C3 C3 QPI C2 C5 C2 C5 FSB LLC LLC C1 C0 C6 C7 C1 C0 C6 C7 MCH MI MI MI MI QPI QPI 2001 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? QPI PCI-E C4 QPI PCI-E C4 C3 C3 C2 C5 C2 C5 QPI LLC LLC C1 C0 C6 C7 C1 C0 C6 C7 MI MI MI MI Xeon MP 2002-4 DMI 2 PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E 2012 4 sockets, 8 cores each 4 x 8 = 32 cores total 768GB (48 x 16GB), Westmere-EX 1TB 15 cores in next generation? Each core today is more than 10x over PIII 16GB $191 32GB $794

Storage 2001 versus 2012/13 QPI 192 GB 192 GB QPI MCH PCIe x4 PCIe x8 PCIe x8 PCIe x8 PCIe x8 PCIe x8 PCI PCI PCI PCI RAID RAID RAID RAID 10GbE RAID RAID RAID RAID IB HDD HDD HDD HDD SSD SSD SSD SSD HDD HDD HDD HDD HDD HDD HDD HDD 2001 100 x 10K HDD 125 IOPS each = 12.5K IOPS IO Bandwidth limited: 1.3GB/s (1/3 memory bandwidth) 2013 64 SSDs, >10K+ IOPS each, 1M IOPS possible IO Bandwidth 10GB/s easy SAN vendors questionable BW http://www.qdpma.com/Storage/Storage2013.html http://www.qdpma.com/ppt/Storage_2013.pptx

SAN Node 1 Node 2 Node 1 Node 768 GB 768 GB 2 768 GB 768 GB 8 Gbps FC or 10Gbps FCOE x8 x8 x8 x8 x8 x8 SSD SSD Switch Switch 8 Gb FC Switch Switch SP A SP B 24 GB 24 GB SP A SP B x4 SAS 2GB/s 24 GB 24 GB x4 SAS 2GB/s Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8 Main Volume Data 9 Data 10 Data 11 Data 12 Data 13 Data 14 Data 15 Data 16 Log volume SSD 1 SSD 2 SSD 3 SSD 4 Hot Spares SSD 10K 7.2K Log 1 Log 2 Log 3 Log 4 http://sqlblog.com/blogs/joe_chang/archive/2013/05/10/enterprise-storage-systems-emc-vmax.aspx http://sqlblog.com/blogs/joe_chang/archive/2013/02/25/emc-vnx2-and-vnx-future.aspx

Performance Past, Present, Future When will servers be so powerful that Been saying this for a long time Today 10 to 100X overkill 32-cores, 60-cores later in 2013? Enough memory that IO is only sporadic Unlimited IOPS with SSD What can go wrong? Today s topic

Factors to Consider SQL Tables Indexes Compile Parameters Statistics Query Optimizer Storage Engine Hardware

Special Topics Data type mistmatch Multiple Optional Search Arguments (SARG) Function on SARG Parameter Sniffing versus Variables Statistics related (big topic) first OR, then AND/OR combinations Complex Query with sub-expressions Parallel Execution Not in order of priority http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

1a. Data type mismatch DECLARE @name nvarchar(25) = N'Customer#000002760' SELECT * FROM CUSTOMER WHERE C_NAME = @name SELECT * FROM CUSTOMER WHERE C_NAME = CONVERT(varchar, @name) .NET auto-parameter discovery? Unable to use index seek

1b. Type Mismatch Row Estimate SELECT * FROM CUSTOMER WHERE C_NAME LIKE N'Customer#00000276%' SELECT * FROM CUSTOMER WHERE C_NAME LIKE 'Customer#00000276%' Row estimate error could have severe consequences in a complex query

SELECT TOP plus Row Estimate Error SELECT TOP 1000 [Document].[ArtifactID] FROM [Document] (NOLOCK) WHERE [Document].[AccessControlListID_D] IN (1,1000064,1000269) AND EXISTS ( SELECT [DocumentBatch].[BatchArtifactID] FROM [DocumentBatch] (NOLOCK) INNER JOIN [Batch] (NOLOCK) ON [Batch].ArtifactID = [DocumentBatch].[BatchArtifactID] WHERE [DocumentBatch].[DocumentArtifactID] = [Document].[ArtifactID] AND [Batch].[Name] LIKE N'%Value%' ) ORDER BY [Document].[ArtifactID] Data type mismatch results in estimate rows high Top clause easy to find first 1000 rows In fact, there are few rows that match SARG Wrong plan for evaluating large number of rows http://www.qdpma.com/CBO/Relativity.html

2. Multiple Optional SARG DECLARE @Orderkey int, @Partkey int = 1 SELECT * FROM LINEITEM WHERE (@Orderkey IS NULL OR L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey) AND (@Partkey IS NOT NULL OR @Orderkey IS NOT NULL)

Dynamically Built Parameterized SQL DECLARE @Orderkey int, @Partkey int = 1 , @SQL nvarchar(500), @Param nvarchar(100) SELECT @SQL = N /* Comment */ SELECT * FROM LINEITEM WHERE 1=1 , @Param = N'@Orderkey int, @Partkey int' IF (@Orderkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_ORDERKEY = @Orderkey' IF (@Partkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_PARTKEY = @Partkey' PRINT @SQL exec sp_executesql @SQL, @Param, @Orderkey, @Partkey IF block is easier for few options Dynamically built parameterized SQL better for many options Considering /*comment*/ to help identify this

IF block DECLARE @Orderkey int, @Partkey int = 1 This is actually the stored procedure parameters IF (@Orderkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey) ELSE IF (@Partkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_PARTKEY = @Partkey) Need to consider impact of Parameter Sniffing, Consider the OPTIMIZER FOR hint

2b. Function on column SARG SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE YEAR(L_SHIPDATE) = 1995 AND MONTH(L_SHIPDATE) = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN '1995-01-01' AND '1995-01-31' DECLARE @Startdate date, @Days int = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN @Startdate AND DATEADD(dd,1,@Startdate)

Estimated versus Actual Plan - rows Estimated Plan 1 row??? Actual Plan actual rows 77,356

3 Parameter Sniffing -- first call, procedure compiles with these parameters exec p_Report @startdate = '2011-01-01', @enddate = '2011-12-31' -- subsequent calls, procedure executes with original plan exec p_Report @startdate = '2012-01-01', @enddate = '2012-01-07' Assuming date data type Need different execution plans for narrow and wide range Options: 1) WITH RECOMPILE 2) main procedure calls 1 of 2 identical sub-procedures One sub-procedure is only called for narrow range Other called for wide range Skewed data distributions also important Example: Large & small customers

4 Statistics Auto-recompute points Sampling strategy Percentage Random pages versus random rows Histogram Equal and Range Rows Out of bounds, value does not exist etc Statistics Used by the Query Optimizer in SQL Server 2008 Writer: Eric N. Hanson and Yavor Angelov Contributor: Lubor Kollar http://msdn.microsoft.com/en-us/library/dd535534.aspx

Statistics Structure Stored (mostly) in binary field Scalar values Density Vector limit 30, half in NC, half Cluster key Histogram Up to 200 steps Consider not blindly using IDENTITY on critical tables Example: Large customers get low ID values Small customers get high ID values http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx

Statistics Auto/Re-Compute Automatically generated on query compile Recompute at 6 rows, 500, every 20%? Has this changed?

Statistics Sampling Sampling theory True random sample Sample error - square root N Relative error 1/ N SQL Server sampling All rows in random pages

Row Estimate Problems Skewed data distribution Out of bounds Value does not exist

Loop Join - Table Scan on Inner Source Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source Default statistics rules may lead to serious ETL issues Consider custom strategy

Compile Parameter Not Exists Main procedure has cursor around view_Servers First server in view_Servers is CAESIUM Cursor executes sub-procedure for each Server sql: SELECT MAX(ID) FROM TReplWS WHERE Hostname = @ServerName But CAESIUM does not exist in TReplWS!

Good and Bad Plan?

SqlPlan Compile Parameters

SqlPlan Compile Parameters <?xml version="1.0" encoding="utf-8"?> <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.50.2500.0"> <BatchSequence> <Batch> <Statements> <StmtSimple StatementText="@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0) FROM TReplWS WHERE Hostname = @ServerName" StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost="0.0032843" StatementEstRows="1" StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2" StatementOptmEarlyAbortReason="GoodEnoughPlanFound"> <StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" /> <QueryPlan CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="168"> <RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar" EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007 AvgRowSize="15" EstimatedTotalSubtreeCost="0.0032843" Parallel="0" EstimateRebinds="0" EstimateRewinds="0"> </RelOp> <ParameterList> <ColumnReference Column="@ServerName" ParameterCompiledValue="'CAESIUM'" /> </ParameterList> </QueryPlan> </StmtSimple> </Statements> </Batch> </BatchSequence> </ShowPlanXML> Compile parameter values at bottom of sqlplan file

5a Single Table OR -- Single table SELECT * FROM LINEITEM WHERE L_ORDERKEY = 1 OR L_PARTKEY = 184826

5a Join 2 Tables, OR in SARG -- subsequent calls, procedure executes with original plan SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826 OR O_CUSTKEY = 137099

5a UNION instead of OR SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826 UNION (ALL) SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE O_CUSTKEY = 137099 -- AND (L_PARTKEY <> 184826 OR L_PARTKEY IS NULL) -- Caution: select list should have keys to ensure correct rows UNION removes duplicates UNION ALL does not -- Hugo Kornelis trick --

5b AND/OR Combinations Hash Join is good method to process many rows Requirement is equality join condition SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2) SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3) In complex SQL with AND/OR or IN NOT IN combinations Query optimizer may not be to determine that equality join condition exists Execution plan will use loop join, and attempt to force hash join will be rejected Re-write using UNION in place of OR And LEFT JOIN in place of NOT IN More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity3.html

Complex Query with Sub-expression Query complexity really high compile cost Repeating sub-expressions (including CTE) Must be evaluated multiple times Main Problem - Row estimate error propagation Solution/Strategy Get a good execution plan Temp table when estimate is high, actual is low. When Estimate is low, and actual rows is high, need to balance temp table insert overhead versus plan benefit. Would a join hint work? More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity4.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

Temp Table and Table Variable Forget what other people have said Most is Cr@p Temp Tables subject to statistics auto/re-compile Table variable no statistics, assumes 1 row Question: In each specific case: does the statistics and recompile help or not? Yes: temp table No: table variable

Parallelism Designed for 1998 era Cost Threshold for Parallelism: default 5 Max Degree of Parallelism instance level OPTION (MAXDOP n) query level Today complex system 32 cores Plan cost 5 query might run in 10ms? Some queries at DOP 4 Others at DOP 16? Really need to rethink parallelism / NUMA strategies More on Parallelism: http://www.qdpma.com/CBO/ParallelismComments.html http://www.qdpma.com/CBO/ParallelismOnset.html

Full-Text Search Loop Join with FT as inner Source Full Text search Potentially executed many times

varchar(max) stored in lob pages Disk IO to lob pages is synchronous? Must access row to get 16 byte link? Feature request: index pointer to lob

Summary Hardware today is really powerful Storage may not be SAN vendor disconnect Standard performance practice Top resource consumers, index usage But also Look for serious blunders http://www.qdpma.com/CBO/SQLServerCostBasedOptimizer.html http://www.qdpma.com/CBO/Relativity.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

Special Topics Data type mismatch Multiple Optional Search Arguments (SARG) Function on SARG Parameter Sniffing versus Variables Statistics related (big topic) AND/OR Complex Query with sub-expressions Parallel Execution

Modern Performance in SQL Server

Download Presentation

Presentation Transcript

Related

More Related Content