Managing Data Integrity with Constraints
Constraints play a crucial role in maintaining the integrity of your database by limiting and restricting the type of data that can be stored. They prevent bad data from entering the database, ensuring data quality and consistency. Neglecting constraints can lead to serious data issues and even catastrophic consequences as seen in historical events. It's essential to understand the importance of constraints and implement them effectively in database management.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
Tame Your Unruly Data With Constraints! Rob Volk
Speaker Bio / Contact SQL Server DBA/Developer since 1998 / 6.5 Started with dBase III & Clipper And of course MS Access Wireless telecom, utility billing, credit cards SQLTeam moderator & contributor Blog: http://weblogs.sqlteam.com/robv/ Twitter: @sql_r Email: sql_r@outlook.com I do deranged things in SQL Server, so you don t have to
What are Constraints? constraint |k n str nt| noun a limitation or restriction : the availability of water is the main constraint on food production | time constraints make it impossible to do everything. Constraints maintain the integrity of your database Domain what kind of data types/attributes Entity what kind of entities/objects/rows Relational control relations between entities Constraints don t allow good data, they prevent bad data But be mindful of what your constraints allow Let me tell you a story... (SalesOrderDetails)
Why Dont We Use Constraints? Developer ignorance Not necessarily stupid, just didn t know Cross-database integrity not supported Multi-platform support (or lack thereof) Data cleansing as a career (e.g. Data Quality Services) The app does all the checking, we don t need them Yeah, right. We ll get to that later
What arent Constraints? Identity columns Application Logic Triggers Security Development environment Execution restrictions Server Policies (PBM etc.) XML Schema Collections (these are a good thing though) Still important, just not managed using Constraints Constraints Still important, just not managed using
Examples February 31 Incomplete phone numbers Still invalid if padded or trimmed Duplicate rows* Sales to non-existent customer accounts Did they get the merchandise? Did they pay us? WHO ARE THESE PEOPLE?
But really, how bad could it be? Ariane Flight 501 (16 bit integer overflow) Mars Climate Orbiter (metric vs. US) USS Yorktown (1997, divide by zero) F-22 (crashes, Intl. Date Line) & F-35 (entire fleet grounded) Qantas Flight 72 (didn t crash, nose dived twice, injuries) Patriot Missile (clock drift, missed target, soldiers died) Therac-25 (deaths) Not data but process, lack of safeguards at every level More: http://goo.gl/4Akqm
Domain Constraints Domain fancy name for type or attribute (column in SQL) Data type - what kind of data you re storing (number, date, etc.) Nullability - whether it s required or not Default Check - what values are valid and invalid Demo (discovery script)
Domain Constraints - Types First constraint: Data type Choose the right type VARCHAR(max) is a bad sign Float or real -> Bad! Money -> Better (but still problematic) Dates (use proper date/time types) Good data type = storage and performance efficiency But don t be TOO efficient Ariane 501 NVARCHAR(max) is overkill for a phone # Ariane 501
Domain Constraints NOT NULL Limit or eliminate nullable columns Nulls are not comparable, even to themselves Not applicable vs. unknown Which one does Null represent? Nulls cannot be typed properly (or at all) Read Chris Date In SQL Server, nulls have storage overhead
Domain Constraints - Default Use this default when value is unknown or unsupplied N/A vs. Unknown can be addressed with defined values Can be useful in cascading foreign keys SET DEFAULT supported in SQL Server 2005+
Domain Constraints - CHECK Restrict values further than what data type allows Dates Numeric/Money Character Read Alex Kuznetsov
Entity Constraints Entity fancy name for row or record (tuple even fancier) Primary Key - value(s) that identify the entity Unique - value(s) that must be unique Table-level check - values that are valid or invalid One or more domains/attributes/columns Demo (discovery script)
Domain Constraints - External Very often ignored, not all values are valid (special meaning) Doesn t matter if external system can t provide invalid value Read Joe Celko Read Karen L pez (@DataChick) Data, Measurements and Standards in SQL
Domain Constraints - Internal Very often ignored (and rarely documented) $10,000 utility bills for single (unoccupied) apartment Percentage values (tax rate, discount) See AdventureWorks Sales tables (smallmoney) App could handle, but then must also fix if wrong Date range overlaps (Alex Kuznetsov)
Entity Constraints Primary Key Uniquely identifies the row/record/tuple/entity Must provide a value (NOT NULL by definition) Logical construct, NOT physical Surrogate vs. natural key Celko: if you don t have a key, you don t have a table Demo (discovery script)
Entity Constraints - Unique Candidate keys (natural keys) Phone number is unique, but not good candidate for PK Screen Actors Guild (billed/credited name) SSN, Credit card # (unique but can t be used for ID) Multiple columns are unique Constraint vs. index (logical vs. physical)
Entity Constraints - CHECK Multi-column conditions Pregnancy status (N/A for males or young girls) Unless you're British...http://goo.gl/msSeP ANSI/ISO supports subqueries Single table only e.g. Force date- or time-only values in datetime columns (< SQL 2008) Demo (discovery script)
Relational Constraints Foreign keys Requires data in one table to reference another table s data Referenced table (parent) Referencing table (child) Can be one OR MORE columns Referential actions on UPDATE/DELETE in referenced Demo (discovery script)
Demos AdventureWorks Good (Birth and Hire dates) Could be better (Salesperson commission pct.) SalesOrderDetail
Performance UNIQUE/PRIMARY keys can improve performance Demo Can also save space in index structures CHECK and Foreign Key Eliminate data access if WHERE clause violates CHECK constraint Must be trusted (next slides)
Trusted Constraints ALTER TABLE CHECK/NOCHECK CONSTRAINT CHECK WITH CHECK Performance improvement Demo Partitioned views
Untrusted Constraints Constraint could be enabled but data could be invalid Performance hit Demo How to detect: sys.check_constraints.is_not_trusted = 1 sys.foreign_keys.is_not_trusted = 1 Primary key and Unique are always trusted
Considerations Be wary of meets our current needs Be wary of will also meet all possible future needs Aaron Bertrand Bad Habits to Kick blog If there is a standard defined for your industry, use it (Celko ISO) Sanity check, make sure you covered everything If you must deviate from the standard, document why Multi-platform support Doomed Least-feature support = baseline Constraints (if any) go into app logic Use DBCC CHECKCONSTRAINTS! (thanks to Neil Hambly for reminding me!) PASS Data Architecture Virtual Chapter! June 21 2012: http://dataarch.sqlpass.org/
DOMAIN Support ANSI/ISO Combines data type, nullability, default and checks in one object Not available in SQL Server Please vote on Connect: http://goo.gl/HbRo2 PostgreSQL support! Example: CREATE DOMAIN us_postal_code AS varchar(10) NOT NULL DEFAULT('00000') CHECK( VALUE ~ '^\\d{5}$' OR VALUE ~ '^\\d{5}-\\d{4}$' ); Can be simulated with user-defined types and rules
Rules Deprecated vendor extension (Celko gripe) Since 7.0, still supported in SQL 2012 Not part of formal definition of object Must reference data value and have condition on it No anonymous rules like with constraints Only one rule can bind to an object But that rule can have multiple conditions And rule can bind to multiple objects Which is too bad, because
Unusual Constraint Use Security Prevent sa or sysadmin updates/inserts Execution Ad-hoc updates/inserts Cursors Connection settings like ANSI_PADDING Time-of-day Production vs. Dev/QA/UAT environments (thanks Argenis!) See blog post: http://goo.gl/p23NL Unusual data restrictions
Can you have too many? IMHO, no What about performance? Layers? App logic + stored procedure + triggers + constraints + rules Don t assume a single layer can do all the work What s the cost of fixing bad data? And will it actually get fixed? Importing/migrating bad data or design Cruft from old systems ETL transform bad data, but don t constrain it in destination Therac-25 Never time to do it right, Always time to do it over -Ron Soukup, SQL Server PM Ariane 501
But IF: Customers don t care Boss doesn t care Devs/$$$ Contractor say app will cover Make them personally fix any bad data Yes, that includes customers Have contractor refund fee AND fix bad data at no cost If they don t agree or can t be convinced: Add constraints anyway Unit testing An ounce of prevention is worth a metric ton of cure
Butbut Now I m just having fun Devs say Application does everything And DB constraints will make sure it works Then why use SQL Server? Write your own storage engine. Or use NoSQL. The data will be right eventually. In 5 years we ll rewrite your app in Ocaml, or Scala, or Blub Moral: you re using SQL Server anyway, use what it offers Ifapp actually works correctly, constraints won t interfere Even if you switch RDBMS, USE CONSTRAINTS! Credit: Andy Leonard & Buck Woody
And if you think Imstrict SQLite testing procedure: http://goo.gl/GfvIH 1177x as much test code as program code Space Shuttle Software Development: http://goo.gl/1k2kt 11 versions * 400K+ lines each = 17 errors total F-35 (JSF) Software coding standard: http://goo.gl/4rPxu 141 pages on C++ programming guidelines Which is really ironic Mars Spacecraft (Curiosity, Opportunity/Spirit, Phoenix) Extremely limited resources (20 MHz/10 MB RAM)
Conclusion Types of constraints Domain, Entity, Relational Why you need them Process, procedure and environmental Apps and triggers can t reliably protect or enforce New apps/processes must duplicate all logic Assume layers/components/processes will fail or change Protect data and DB integrity with constraints Constraints document integrity rules Use them! And be sure to check on them (DBCC)
References Books Online! MSDN Books Beginning SQL Server 2008 for Developers: From Novice to Professional by Robin Dewson (Apress) Pro SQL Server 2008 Relational Database Design and Implementation by Louis Davidson, with Kevin Kline, Scott Klein, Kurt Windisch(Apress) A Developer s Guide to Data Modeling for SQL Server, by Eric Johnson and Joshua Jones (Addison-Wesley) Defensive Database Programming by Alex Kuznetsov (Red Gate Books, PDF link below) SQL Antipatterns: Avoiding the Pitfalls of Database Programming by Bill Karwin (Pragmatic Bookshelf) SQL and Relational Theory: How to Write Accurate SQL Code by C. J. Date(O Reilly) Blogs/Sites http://weblogs.sqlteam.com/ http://sqlblog.com/ (lots of great people) http://sqlskills.com/ (Paul Randal, Kimberly Tripp, Jonathan Kehayias, Bob Beuchemin, Erin Stellato, Joe Sack, Glenn Berry) http://tsql.solidq.com/ (Itzik Ben-Gan) http://www.simple-talk.com/sql/database-administration/constraints-and-the-test-driven-database/ http://www.simple-talk.com/books/sql-books/defensive-database-programming/ http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/04/28/using-rowversion-to-enforce-business-rules.aspx http://sqlblog.com/blogs/alexander_kuznetsov/archive/2009/03/08/storing-intervals-of-time-with-no-overlaps.aspx http://sqlblog.com/blogs/hugo_kornelis/archive/2007/03/29/can-you-trust-your-constraints.aspx http://goo.gl/xCwdK (Datachix - Trusted Constraints) More Books Microsoft SQL Server 2008 Internals by Kalen Delaney, Paul S. Randal, Kimberly L. Tripp, Conor Cunningham, Adam Machanic (MSPress) SQL Server MVP Deep Dives by SQL Server MVPs for War Child International (Manning) SQL Server MVP Deep Dives 2 by SQL Server MVPs for Operation Smile (Manning) Microsoft SQL Server 2008: T-SQL Fundamentals by Itzik Ben-Gan (MSPress) SQL for Smarties by Joe Celko (Morgan Kauffman)
Coming up Speaker Title Room Leonard Lobel Geospatial Data Types in SQL Server 2012 Theatre Mark Stacey Analysis Services for the DBA Exhibition B Dennis Lloyd Jr Thwarting Database Defects Suite 3 Dmytro Andriychenko Kerberos for SQL Server and SharePoint the easy way Suite 1 Justin Langford Using Windows Azure to provide SQL Server Disaster Recovery Suite 2 Dave Ballantyne Estimation, Statistics and Oh My! Suite 4 #SQLBIT S
Thanks! SQLBits Sponsors, Organizers & Volunteers!