Modern Performance in SQL Server

Modern Performance - SQL Server
Joe Chang
& SolidQ
www.qdpma.com
About Joe
SQL Server consultant since 1999
Query Optimizer execution plan cost formulas (2002)
True cost structure of SQL plan operations (2003?)
Database with distribution statistics only, no data 2004
Decoding statblob/stats_stream
writing your own statistics
Disk IO cost structure
Tools for system monitoring, execution plan analysis
See ExecStats 
http://www.qdpma.com/ExecStats/SQLExecStats.html
Download: 
http://www.qdpma.com/ExecStatsZip.html
Blog: 
http://sqlblog.com/blogs/joe_chang/default.aspx
Overview
Why performance is still important today?
Brute force?
Yes, but …
Special Topics
Automating data collections
SQL Server Engine
What developers/DBA need to know?
CPU & Memory 2001 and 2012
2001 – 4 sockets, 4 cores
Pentium III Xeon, 900MHz
4-8GB memory?
Xeon MP 2002-4
2012 – 4 sockets, 8 cores each
4 x 8 = 32 cores total
768GB (48 x 16GB), 
Westmere-EX 1TB
15 cores in next generation?
FSB
P
L2
P
MCH
Each core today is more
than 10x over PIII
16GB  $191
32GB  $794
Storage 2001 versus 2012/13
P
C
I
e
 
x
8
P
C
I
e
 
x
8
P
C
I
e
 
x
8
P
C
I
e
 
x
8
P
C
I
e
 
x
8
P
C
I
e
 
x
4
I
B
QPI
QPI
1
9
2
 
G
B
1
9
2
 
G
B
2001
100 x 10K HDD
125 IOPS each = 12.5K IOPS
IO Bandwidth limited: 1.3GB/s
(1/3 memory bandwidth)
2013
64 SSDs, >10K+ IOPS each, 1M
IOPS possible
IO Bandwidth 10GB/s easy
SAN vendors –
questionable BW
P
C
I
P
C
I
P
C
I
P
C
I
MCH
R
A
I
D
R
A
I
D
R
A
I
D
R
A
I
D
http://www.qdpma.com/Storage/Storage2013.html
http://www.qdpma.com/ppt/Storage_2013.pptx
SAN
http://sqlblog.com/blogs/joe_chang/archive/2013/05/10/enterprise-storage-systems-emc-vmax.aspx
http://sqlblog.com/blogs/joe_chang/archive/2013/02/25/emc-vnx2-and-vnx-future.aspx
Performance Past, Present, Future
When will servers be so powerful that …
Been saying this for a long time
Today – 10 to 100X overkill
32-cores, 60-cores later in 2013?
Enough memory that IO is only sporadic
Unlimited IOPS with SSD
What can go wrong?
Today’s topic
Factors to Consider
SQL
Tables
Indexes
Query
Optimizer
Statistics
Compile
Parameters
Storage
Engine
Hardware
Special Topics
Data type mistmatch
Multiple Optional Search Arguments (SARG)
Function on SARG
Parameter Sniffing versus Variables
Statistics related (big topic)
first OR, then AND/OR combinations
Complex Query with sub-expressions
Parallel Execution
Not in order of priority
http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx
1a. Data type mismatch
DECLARE
 
@name
 
nvarchar
(25) = 
N'Customer#000002760'
SELECT
 * 
FROM
 CUSTOMER 
WHERE
 C_NAME = 
@name
SELECT
 * 
FROM
 CUSTOMER 
WHERE
 C_NAME = CONVERT(
varchar
, 
@name
)
.NET auto-parameter discovery?
Unable to use index seek
1b. Type Mismatch – Row Estimate
SELECT
 * 
FROM
 CUSTOMER 
WHERE
 C_NAME 
LIKE
 
N'Customer#00000276%'
SELECT
 * 
FROM
 CUSTOMER 
WHERE
 C_NAME 
LIKE
 
'Customer#00000276%'
Row estimate
error could have
severe
consequences in a
complex query
SELECT TOP plus Row Estimate Error
SELECT TOP 
1000
 
[Document]
.
[ArtifactID]
FROM
 
[Document]
 
(
NOLOCK
)
WHERE
 
[Document]
.
[AccessControlListID_D]
 
IN
 
(
1
,
1000064
,
1000269
)
AND EXISTS (
  
SELECT
 
[DocumentBatch]
.
[BatchArtifactID]
  
FROM
 
[DocumentBatch]
 
(
NOLOCK
)
  
INNER JOIN
 
[Batch]
 
(
NOLOCK
)
  
ON
 
[Batch]
.
ArtifactID
 
=
 
[DocumentBatch]
.
[BatchArtifactID]
  
WHERE
 
[DocumentBatch]
.
[DocumentArtifactID]
 
=
 
[Document]
.
[ArtifactID]
  
AND
 
[Batch]
.
[Name]
 
LIKE
 
N'%Value%'
)
ORDER BY
 
[Document]
.
[ArtifactID]
Data type mismatch – results in estimate rows high
Top clause – easy to find first 1000 rows
In fact, there are few rows that match SARG
Wrong plan for evaluating large number of rows
http://www.qdpma.com/CBO/Relativity.html
2. Multiple Optional SARG
DECLARE
 
@Orderkey
 int, 
@Partkey
 int = 1
SELECT
 * 
FROM
 LINEITEM 
WHERE
 (
@Orderkey
 IS NULL 
OR
 L_ORDERKEY = 
@Orderkey
) 
AND
 (
@Partkey
 IS NULL 
OR
 L_PARTKEY = 
@Partkey
)
AND (@Partkey IS NOT NULL OR @Orderkey IS NOT NULL)
Dynamically Built Parameterized SQL
DECLARE
 
@Orderkey
 
int
,
 
@Partkey
 
int
 
=
 1
,
 
@SQL
 
nvarchar
(
500
),
 
@Param
 
nvarchar
(
100
)
SELECT
 
@SQL
 
=
 
N‘/* Comment */
SELECT * FROM LINEITEM WHERE 1=1‘
,
 
@Param
 
=
 
N'@Orderkey int, @Partkey int'
IF 
(
@Orderkey
 
IS
 
NOT
 
NULL)
 
SELECT
 
@SQL
 
=
 
@SQL
 
+
 
N' AND L_ORDERKEY = @Orderkey'
IF 
(
@Partkey
 
IS
 
NOT
 
NULL)
 
SELECT
 
@SQL
 
=
 
@SQL
 
+
 
N' AND L_PARTKEY = @Partkey'
PRINT
 
@SQL
exec
 
sp_executesql
 
@SQL
,
 
@Param
,
 
@Orderkey
,
 
@Partkey
IF block is easier for few options
Dynamically built parameterized SQL better for many options
Considering /*comment*/ to help identify this
IF block
DECLARE
 
@Orderkey
 int, 
@Partkey
 int = 1
IF 
(
@Orderkey
 
IS
 
NOT
 
NULL)
  SELECT
 * 
FROM
 LINEITEM 
  WHERE
 (L_ORDERKEY = 
@Orderkey
) 
  AND
 (
@Partkey
 IS NULL 
OR
 L_PARTKEY = 
@Partkey
)
ELSE IF 
(
@Partkey
 
IS
 
NOT
 
NULL)
  SELECT
 * 
FROM
 LINEITEM 
  WHERE
 (L_PARTKEY = 
@Partkey
)
Need to consider impact of Parameter Sniffing,
Consider the OPTIMIZER FOR hint
This is actually the stored procedure
parameters
2b. Function on column SARG
SELECT
 
COUNT
(*), 
SUM
(L_EXTENDEDPRICE) 
FROM
 LINEITEM 
WHERE
 
YEAR
(L_SHIPDATE) = 1995 
AND
 
MONTH
(L_SHIPDATE) = 1
SELECT
 
COUNT
(*), 
SUM
(L_EXTENDEDPRICE) 
FROM
 LINEITEM 
WHERE
 L_SHIPDATE 
BETWEEN
 
'1995-01-01'
 
AND
 
'1995-01-31'
DECLARE
 
@Startdate
 date, 
@Days
 int = 1
SELECT
 
COUNT
(*), 
SUM
(L_EXTENDEDPRICE) 
FROM
 LINEITEM 
WHERE
 L_SHIPDATE 
BETWEEN
 
@Startdate 
 
AND
 
DATEADD
(dd,1,
@Startdate
)
Estimated versus Actual Plan - rows
Estimated Plan – 1 row???
Actual Plan – actual rows 77,356
3 Parameter Sniffing
-- first call, procedure compiles with these parameters
exec
 p_Report @startdate = 
'2011-01-01'
, @enddate = 
'2011-12-31'
-- subsequent calls, procedure executes with original plan
exec
 p_Report @startdate = 
'2012-01-01'
, @enddate = 
'2012-01-07'
Need different execution plans for narrow and wide range
Options: 
1) WITH RECOMPILE
2) main procedure calls 1 of 2 identical sub-procedures
One sub-procedure is only called for narrow range
Other called for wide range
Skewed data distributions also important
Example: Large & small customers
Assuming date data type
4 Statistics
Auto-recompute points
Sampling strategy
Percentage
Random pages versus random rows
Histogram Equal and Range Rows
Out of bounds, value does not exist
etc
Statistics Used by the Query Optimizer in SQL Server 2008
Writer: Eric N. Hanson and Yavor Angelov
Contributor: Lubor Kollar
http://msdn.microsoft.com/en-us/library/dd535534.aspx
Statistics Structure
Stored (mostly) in binary field
Scalar values
Density Vector – 
limit 30, half in NC, half Cluster key
Histogram
Up to 200 steps
Consider not blindly using IDENTITY on critical tables
Example: Large customers get low ID values
Small customers get high ID values
http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx
Statistics Auto/Re-Compute
Automatically generated on query compile
Recompute at 6 rows, 500, every 20%?
Has this changed?
Statistics Sampling
Sampling theory
True random sample
Sample error - square root     N
Relative error 1/   N
SQL Server sampling
All rows in random pages
Row Estimate Problems
Skewed data distribution
Out of bounds
Value does not exist
Loop Join - Table Scan on Inner Source
Estimated out from first 2 tabes (at right) is
zero or 1 rows. Most efficient join to third
table (without index on join column) is a loop
join with scan. If row count is 2 or more, then
a fullscan is performed for each row from
outer source
Default statistics rules may lead to serious ETL issues
Consider custom strategy
 
Compile Parameter Not Exists
Main procedure
 has cursor around view_Servers
First server in view_Servers is 
’CAESIUM’
Cursor executes 
sub-procedure
 for each Server
sql: 
 
SELECT
 
MAX
(ID) 
FROM
 TReplWS 
 
WHERE
 Hostname = @ServerName
But 
CAESIUM
 does not exist in TReplWS!
Good and Bad Plan?
SqlPlan Compile Parameters
SqlPlan Compile Parameters
<?
xml
 
version
=
"
1.0
"
 
encoding
=
"
utf-8
"
?>
<
ShowPlanXML
 
xmlns
=
"
http://schemas.microsoft.com/sqlserver/2004/07/showplan
"
 
Version
=
"
1.1
"
 
Build
=
"
10.50.2500.0
"
>
  <
BatchSequence
>
    <
Batch
>
      <
Statements
>
        <
StmtSimple
 
StatementText
=
"
@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0)
 
 FROM TReplWS WHERE Hostname = @ServerName
"
 
 
StatementId
=
"
1
"
 
StatementCompId
=
"
43
"
 
StatementType
=
"
SELECT
"
 
StatementSubTreeCost
=
"
0.0032843
"
 
StatementEstRows
=
"
1
"
 
StatementOptmLevel
=
"
FULL
"
 
QueryHash
=
"
0x671D2B3E17E538F1
"
 
QueryPlanHash
=
"
0xEB64FB22C47E1CF2
"
 
 
StatementOptmEarlyAbortReason
=
"
GoodEnoughPlanFound
"
>
          <
StatementSetOptions
 
QUOTED_IDENTIFIER
=
"
true
"
 
ARITHABORT
=
"
false
"
 
CONCAT_NULL_YIELDS_NULL
=
"
true
"
 
ANSI_NULLS
=
"
true
"
 
 
ANSI_PADDING
=
"
true
"
 
ANSI_WARNINGS
=
"
true
"
 
NUMERIC_ROUNDABORT
=
"
false
"
 />
          <
QueryPlan
 
CachedPlanSize
=
"
16
"
 
CompileTime
=
"
1
"
 
CompileCPU
=
"
1
"
 
CompileMemory
=
"
168
"
>
            <
RelOp
 
NodeId
=
"
0
"
 
PhysicalOp
=
"
Compute Scalar
"
 
LogicalOp
=
"
Compute Scalar
"
 
 
EstimateRows
=
"
1
"
 
EstimateIO
=
"
0
"
 
EstimateCPU
=
"
1e-007
 
AvgRowSize
=
"
15
"
 
EstimatedTotalSubtreeCost
=
"
0.0032843
"
 
Parallel
=
"
0
"
 
EstimateRebinds
=
"
0
"
 
EstimateRewinds
=
"
0
"
>
           </
RelOp
>
            <
ParameterList
>
              <
ColumnReference
 
Column
=
"
@ServerName
"
 
ParameterCompiledValue
=
"
'CAESIUM'
"
 />
            </
ParameterList
>
          </
QueryPlan
>
        </
StmtSimple
>
      </
Statements
>
    </
Batch
>
  </
BatchSequence
>
</
ShowPlanXML
>
Compile parameter values at bottom of sqlplan file
5a Single Table OR
-- Single table
SELECT
 
*
 
FROM
 
LINEITEM
 
WHERE
 
L_ORDERKEY
 
=
 1
 
OR
 
L_PARTKEY
 
=
 184826
5a Join 2 Tables, OR in SARG
-- subsequent calls, procedure executes with original plan
SELECT
 
O_ORDERDATE
,
 
O_ORDERKEY
,
 
L_SHIPDATE
,
 
L_QUANTITY
FROM
 
LINEITEM
 
INNER
 
JOIN
 
ORDERS
 
ON
 
O_ORDERKEY
 
=
 
L_ORDERKEY
WHERE
 
L_PARTKEY
 
=
 184826 
OR
 
O_CUSTKEY
 
=
 137099
5a UNION instead of OR
SELECT
 
O_ORDERDATE
,
 
O_ORDERKEY
,
 
L_SHIPDATE
,
 
L_QUANTITY
,
  
O_CUSTKEY
,
 
L_PARTKEY
 
FROM
 
LINEITEM
 
INNER
 
JOIN
 
ORDERS
 
ON
 
O_ORDERKEY
 
=
 
L_ORDERKEY 
WHERE
  
L_PARTKEY
 
=
 184826
UNION 
(ALL)
SELECT
 
O_ORDERDATE
,
 
O_ORDERKEY
,
 
L_SHIPDATE
,
 
L_QUANTITY
,
  
O_CUSTKEY
,
 
L_PARTKEY
 
FROM
 
LINEITEM
 
INNER
 
JOIN
 
ORDERS
 
ON
 
O_ORDERKEY
 
=
 
L_ORDERKEY 
WHERE
  
O_CUSTKEY
 
=
 137099 
-- 
AND (
L_PARTKEY
 
<>
 184826 
OR 
L_PARTKEY 
IS NULL) 
--
Caution:
 select list should
have keys to ensure
correct rows
UNION removes duplicates
UNION ALL does not
-- Hugo Kornelis trick --
5b AND/OR Combinations
Hash Join is good method to process many rows
Requirement is equality join condition
In complex SQL with AND/OR or IN NOT IN combinations
Query optimizer may not be to determine that equality join
condition exists
Execution plan will use loop join,
and attempt to force hash join will be rejected
Re-write using UNION in place of OR
And LEFT JOIN in place of NOT IN
SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2)
SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3)
More on AND/OR combinations:
http://www.qdpma.com/CBO/Relativity3.html
Complex Query with Sub-expression
Query complexity – 
really high compile cost
Repeating sub-expressions (including CTE)
Must be evaluated multiple times
Main Problem - 
Row estimate error propagation
Solution/Strategy – 
Get a good execution plan
Temp table when estimate is high, actual is low.
More on AND/OR combinations: 
http://www.qdpma.com/CBO/Relativity4.html
http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx
When Estimate is low, and actual rows is high, need to balance temp
table insert overhead versus plan benefit. Would a join hint work?
Temp Table and Table Variable
Forget what other people have said
Most is Cr@p
Temp Tables – 
subject to statistics auto/re-compile
Table variable – 
no statistics, assumes 1 row
Question: 
In each specific case: does the statistics
and recompile help or not?
Yes: temp table
No: table variable
Parallelism
Designed for 1998 era
Cost Threshold for Parallelism: default 5
Max Degree of Parallelism – instance level
OPTION (MAXDOP n) – query level
Today – complex system – 32 cores
Plan cost 5 query might run in 10ms?
Some queries at DOP 4
Others at DOP 16?
More on Parallelism:
http://www.qdpma.com/CBO/ParallelismComments.html
http://www.qdpma.com/CBO/ParallelismOnset.html
Really need to rethink
parallelism / NUMA strategies
Full-Text Search
Loop Join with FT as inner 
Source Full Text search 
Potentially executed 
many times
varchar
(
max
) stored in lob pages
Disk IO to lob pages is synchronous?
Must access row to get 16 byte link?
Feature request: index pointer to lob
Summary
Hardware today is really powerful
Storage may not be – SAN vendor disconnect
Standard performance practice
Top resource consumers, index usage
But also Look for serious blunders
http://www.qdpma.com/CBO/SQLServerCostBasedOptimizer.html
http://www.qdpma.com/CBO/Relativity.html
http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx
Special Topics
Data type mismatch
Multiple Optional Search Arguments (SARG)
Function on SARG
Parameter Sniffing versus Variables
Statistics related (big topic)
AND/OR
Complex Query with sub-expressions
Parallel Execution
Slide Note
Embed
Share

Explore the evolution of performance in SQL Server with insights from Joe Chang, a seasoned SQL Server consultant. Delve into topics like query optimization, execution plans, CPU and memory advancements, storage technologies, and the significance of performance in today's data environment. Discover how automation and data collection play a crucial role in enhancing SQL Server performance.

  • SQL Server
  • Performance
  • Joe Chang
  • Query Optimization
  • Storage Technologies

Uploaded on Sep 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Modern Performance - SQL Server Joe Chang www.qdpma.com & SolidQ

  2. About Joe SQL Server consultant since 1999 Query Optimizer execution plan cost formulas (2002) True cost structure of SQL plan operations (2003?) Database with distribution statistics only, no data 2004 Decoding statblob/stats_stream writing your own statistics Disk IO cost structure Tools for system monitoring, execution plan analysis See ExecStats http://www.qdpma.com/ExecStats/SQLExecStats.html Download: http://www.qdpma.com/ExecStatsZip.html Blog: http://sqlblog.com/blogs/joe_chang/default.aspx

  3. Overview Why performance is still important today? Brute force? Yes, but Special Topics Automating data collections SQL Server Engine What developers/DBA need to know?

  4. CPU & Memory 2001 and 2012 PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E P L2 P P P QPI PCI-E C4 QPI PCI-E C4 C3 C3 QPI C2 C5 C2 C5 FSB LLC LLC C1 C0 C6 C7 C1 C0 C6 C7 MCH MI MI MI MI QPI QPI 2001 4 sockets, 4 cores Pentium III Xeon, 900MHz 4-8GB memory? QPI PCI-E C4 QPI PCI-E C4 C3 C3 C2 C5 C2 C5 QPI LLC LLC C1 C0 C6 C7 C1 C0 C6 C7 MI MI MI MI Xeon MP 2002-4 DMI 2 PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E PCI-E 2012 4 sockets, 8 cores each 4 x 8 = 32 cores total 768GB (48 x 16GB), Westmere-EX 1TB 15 cores in next generation? Each core today is more than 10x over PIII 16GB $191 32GB $794

  5. Storage 2001 versus 2012/13 QPI 192 GB 192 GB QPI MCH PCIe x4 PCIe x8 PCIe x8 PCIe x8 PCIe x8 PCIe x8 PCI PCI PCI PCI RAID RAID RAID RAID 10GbE RAID RAID RAID RAID IB HDD HDD HDD HDD SSD SSD SSD SSD HDD HDD HDD HDD HDD HDD HDD HDD 2001 100 x 10K HDD 125 IOPS each = 12.5K IOPS IO Bandwidth limited: 1.3GB/s (1/3 memory bandwidth) 2013 64 SSDs, >10K+ IOPS each, 1M IOPS possible IO Bandwidth 10GB/s easy SAN vendors questionable BW http://www.qdpma.com/Storage/Storage2013.html http://www.qdpma.com/ppt/Storage_2013.pptx

  6. SAN Node 1 Node 2 Node 1 Node 768 GB 768 GB 2 768 GB 768 GB 8 Gbps FC or 10Gbps FCOE x8 x8 x8 x8 x8 x8 SSD SSD Switch Switch 8 Gb FC Switch Switch SP A SP B 24 GB 24 GB SP A SP B x4 SAS 2GB/s 24 GB 24 GB x4 SAS 2GB/s Data 1 Data 2 Data 3 Data 4 Data 5 Data 6 Data 7 Data 8 Main Volume Data 9 Data 10 Data 11 Data 12 Data 13 Data 14 Data 15 Data 16 Log volume SSD 1 SSD 2 SSD 3 SSD 4 Hot Spares SSD 10K 7.2K Log 1 Log 2 Log 3 Log 4 http://sqlblog.com/blogs/joe_chang/archive/2013/05/10/enterprise-storage-systems-emc-vmax.aspx http://sqlblog.com/blogs/joe_chang/archive/2013/02/25/emc-vnx2-and-vnx-future.aspx

  7. Performance Past, Present, Future When will servers be so powerful that Been saying this for a long time Today 10 to 100X overkill 32-cores, 60-cores later in 2013? Enough memory that IO is only sporadic Unlimited IOPS with SSD What can go wrong? Today s topic

  8. Factors to Consider SQL Tables Indexes Compile Parameters Statistics Query Optimizer Storage Engine Hardware

  9. Special Topics Data type mistmatch Multiple Optional Search Arguments (SARG) Function on SARG Parameter Sniffing versus Variables Statistics related (big topic) first OR, then AND/OR combinations Complex Query with sub-expressions Parallel Execution Not in order of priority http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

  10. 1a. Data type mismatch DECLARE @name nvarchar(25) = N'Customer#000002760' SELECT * FROM CUSTOMER WHERE C_NAME = @name SELECT * FROM CUSTOMER WHERE C_NAME = CONVERT(varchar, @name) .NET auto-parameter discovery? Unable to use index seek

  11. 1b. Type Mismatch Row Estimate SELECT * FROM CUSTOMER WHERE C_NAME LIKE N'Customer#00000276%' SELECT * FROM CUSTOMER WHERE C_NAME LIKE 'Customer#00000276%' Row estimate error could have severe consequences in a complex query

  12. SELECT TOP plus Row Estimate Error SELECT TOP 1000 [Document].[ArtifactID] FROM [Document] (NOLOCK) WHERE [Document].[AccessControlListID_D] IN (1,1000064,1000269) AND EXISTS ( SELECT [DocumentBatch].[BatchArtifactID] FROM [DocumentBatch] (NOLOCK) INNER JOIN [Batch] (NOLOCK) ON [Batch].ArtifactID = [DocumentBatch].[BatchArtifactID] WHERE [DocumentBatch].[DocumentArtifactID] = [Document].[ArtifactID] AND [Batch].[Name] LIKE N'%Value%' ) ORDER BY [Document].[ArtifactID] Data type mismatch results in estimate rows high Top clause easy to find first 1000 rows In fact, there are few rows that match SARG Wrong plan for evaluating large number of rows http://www.qdpma.com/CBO/Relativity.html

  13. 2. Multiple Optional SARG DECLARE @Orderkey int, @Partkey int = 1 SELECT * FROM LINEITEM WHERE (@Orderkey IS NULL OR L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey) AND (@Partkey IS NOT NULL OR @Orderkey IS NOT NULL)

  14. Dynamically Built Parameterized SQL DECLARE @Orderkey int, @Partkey int = 1 , @SQL nvarchar(500), @Param nvarchar(100) SELECT @SQL = N /* Comment */ SELECT * FROM LINEITEM WHERE 1=1 , @Param = N'@Orderkey int, @Partkey int' IF (@Orderkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_ORDERKEY = @Orderkey' IF (@Partkey IS NOT NULL) SELECT @SQL = @SQL + N' AND L_PARTKEY = @Partkey' PRINT @SQL exec sp_executesql @SQL, @Param, @Orderkey, @Partkey IF block is easier for few options Dynamically built parameterized SQL better for many options Considering /*comment*/ to help identify this

  15. IF block DECLARE @Orderkey int, @Partkey int = 1 This is actually the stored procedure parameters IF (@Orderkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_ORDERKEY = @Orderkey) AND (@Partkey IS NULL OR L_PARTKEY = @Partkey) ELSE IF (@Partkey IS NOT NULL) SELECT * FROM LINEITEM WHERE (L_PARTKEY = @Partkey) Need to consider impact of Parameter Sniffing, Consider the OPTIMIZER FOR hint

  16. 2b. Function on column SARG SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE YEAR(L_SHIPDATE) = 1995 AND MONTH(L_SHIPDATE) = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN '1995-01-01' AND '1995-01-31' DECLARE @Startdate date, @Days int = 1 SELECT COUNT(*), SUM(L_EXTENDEDPRICE) FROM LINEITEM WHERE L_SHIPDATE BETWEEN @Startdate AND DATEADD(dd,1,@Startdate)

  17. Estimated versus Actual Plan - rows Estimated Plan 1 row??? Actual Plan actual rows 77,356

  18. 3 Parameter Sniffing -- first call, procedure compiles with these parameters exec p_Report @startdate = '2011-01-01', @enddate = '2011-12-31' -- subsequent calls, procedure executes with original plan exec p_Report @startdate = '2012-01-01', @enddate = '2012-01-07' Assuming date data type Need different execution plans for narrow and wide range Options: 1) WITH RECOMPILE 2) main procedure calls 1 of 2 identical sub-procedures One sub-procedure is only called for narrow range Other called for wide range Skewed data distributions also important Example: Large & small customers

  19. 4 Statistics Auto-recompute points Sampling strategy Percentage Random pages versus random rows Histogram Equal and Range Rows Out of bounds, value does not exist etc Statistics Used by the Query Optimizer in SQL Server 2008 Writer: Eric N. Hanson and Yavor Angelov Contributor: Lubor Kollar http://msdn.microsoft.com/en-us/library/dd535534.aspx

  20. Statistics Structure Stored (mostly) in binary field Scalar values Density Vector limit 30, half in NC, half Cluster key Histogram Up to 200 steps Consider not blindly using IDENTITY on critical tables Example: Large customers get low ID values Small customers get high ID values http://sqlblog.com/blogs/joe_chang/archive/2012/05/05/decoding-stats-stream.aspx

  21. Statistics Auto/Re-Compute Automatically generated on query compile Recompute at 6 rows, 500, every 20%? Has this changed?

  22. Statistics Sampling Sampling theory True random sample Sample error - square root N Relative error 1/ N SQL Server sampling All rows in random pages

  23. Row Estimate Problems Skewed data distribution Out of bounds Value does not exist

  24. Loop Join - Table Scan on Inner Source Estimated out from first 2 tabes (at right) is zero or 1 rows. Most efficient join to third table (without index on join column) is a loop join with scan. If row count is 2 or more, then a fullscan is performed for each row from outer source Default statistics rules may lead to serious ETL issues Consider custom strategy

  25. Compile Parameter Not Exists Main procedure has cursor around view_Servers First server in view_Servers is CAESIUM Cursor executes sub-procedure for each Server sql: SELECT MAX(ID) FROM TReplWS WHERE Hostname = @ServerName But CAESIUM does not exist in TReplWS!

  26. Good and Bad Plan?

  27. SqlPlan Compile Parameters

  28. SqlPlan Compile Parameters <?xml version="1.0" encoding="utf-8"?> <ShowPlanXML xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan" Version="1.1" Build="10.50.2500.0"> <BatchSequence> <Batch> <Statements> <StmtSimple StatementText="@ServerName varchar(50) SELECT @maxid = ISNULL(MAX(id),0) FROM TReplWS WHERE Hostname = @ServerName" StatementId="1" StatementCompId="43" StatementType="SELECT" StatementSubTreeCost="0.0032843" StatementEstRows="1" StatementOptmLevel="FULL" QueryHash="0x671D2B3E17E538F1" QueryPlanHash="0xEB64FB22C47E1CF2" StatementOptmEarlyAbortReason="GoodEnoughPlanFound"> <StatementSetOptions QUOTED_IDENTIFIER="true" ARITHABORT="false" CONCAT_NULL_YIELDS_NULL="true" ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" NUMERIC_ROUNDABORT="false" /> <QueryPlan CachedPlanSize="16" CompileTime="1" CompileCPU="1" CompileMemory="168"> <RelOp NodeId="0" PhysicalOp="Compute Scalar" LogicalOp="Compute Scalar" EstimateRows="1" EstimateIO="0" EstimateCPU="1e-007 AvgRowSize="15" EstimatedTotalSubtreeCost="0.0032843" Parallel="0" EstimateRebinds="0" EstimateRewinds="0"> </RelOp> <ParameterList> <ColumnReference Column="@ServerName" ParameterCompiledValue="'CAESIUM'" /> </ParameterList> </QueryPlan> </StmtSimple> </Statements> </Batch> </BatchSequence> </ShowPlanXML> Compile parameter values at bottom of sqlplan file

  29. 5a Single Table OR -- Single table SELECT * FROM LINEITEM WHERE L_ORDERKEY = 1 OR L_PARTKEY = 184826

  30. 5a Join 2 Tables, OR in SARG -- subsequent calls, procedure executes with original plan SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826 OR O_CUSTKEY = 137099

  31. 5a UNION instead of OR SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE L_PARTKEY = 184826 UNION (ALL) SELECT O_ORDERDATE, O_ORDERKEY, L_SHIPDATE, L_QUANTITY, O_CUSTKEY, L_PARTKEY FROM LINEITEM INNER JOIN ORDERS ON O_ORDERKEY = L_ORDERKEY WHERE O_CUSTKEY = 137099 -- AND (L_PARTKEY <> 184826 OR L_PARTKEY IS NULL) -- Caution: select list should have keys to ensure correct rows UNION removes duplicates UNION ALL does not -- Hugo Kornelis trick --

  32. 5b AND/OR Combinations Hash Join is good method to process many rows Requirement is equality join condition SELECT xx FROM A WHERE col1 IN (expr1) AND col2 NOT IN (expr2) SELECT xx FROM A WHERE (expr1) AND (expr2 OR expr3) In complex SQL with AND/OR or IN NOT IN combinations Query optimizer may not be to determine that equality join condition exists Execution plan will use loop join, and attempt to force hash join will be rejected Re-write using UNION in place of OR And LEFT JOIN in place of NOT IN More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity3.html

  33. Complex Query with Sub-expression Query complexity really high compile cost Repeating sub-expressions (including CTE) Must be evaluated multiple times Main Problem - Row estimate error propagation Solution/Strategy Get a good execution plan Temp table when estimate is high, actual is low. When Estimate is low, and actual rows is high, need to balance temp table insert overhead versus plan benefit. Would a join hint work? More on AND/OR combinations: http://www.qdpma.com/CBO/Relativity4.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

  34. Temp Table and Table Variable Forget what other people have said Most is Cr@p Temp Tables subject to statistics auto/re-compile Table variable no statistics, assumes 1 row Question: In each specific case: does the statistics and recompile help or not? Yes: temp table No: table variable

  35. Parallelism Designed for 1998 era Cost Threshold for Parallelism: default 5 Max Degree of Parallelism instance level OPTION (MAXDOP n) query level Today complex system 32 cores Plan cost 5 query might run in 10ms? Some queries at DOP 4 Others at DOP 16? Really need to rethink parallelism / NUMA strategies More on Parallelism: http://www.qdpma.com/CBO/ParallelismComments.html http://www.qdpma.com/CBO/ParallelismOnset.html

  36. Full-Text Search Loop Join with FT as inner Source Full Text search Potentially executed many times

  37. varchar(max) stored in lob pages Disk IO to lob pages is synchronous? Must access row to get 16 byte link? Feature request: index pointer to lob

  38. Summary Hardware today is really powerful Storage may not be SAN vendor disconnect Standard performance practice Top resource consumers, index usage But also Look for serious blunders http://www.qdpma.com/CBO/SQLServerCostBasedOptimizer.html http://www.qdpma.com/CBO/Relativity.html http://blogs.msdn.com/b/sqlcat/archive/2013/09/09/when-to-break-down-complex-queries.aspx

  39. Special Topics Data type mismatch Multiple Optional Search Arguments (SARG) Function on SARG Parameter Sniffing versus Variables Statistics related (big topic) AND/OR Complex Query with sub-expressions Parallel Execution

More Related Content

giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#giItT1WQy@!-/#