ETL Process Management with T-SQL by Richard Swinbank

etl process management with tsql n.w
1 / 58
Embed
Share

Explore ETL process management with T-SQL by Richard Swinbank, covering the execution of ETL processes using SSIS packages, T-SQL stored procedures, and more. Learn about desirable ETL behaviors, process dependencies, and different approaches for managing process execution efficiently.

  • ETL
  • T-SQL
  • Richard Swinbank
  • Process Management
  • SSIS

Uploaded on | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

You are allowed to download the files provided on this website for personal or commercial use, subject to the condition that they are used lawfully. All files are the property of their respective owners.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author.

E N D

Presentation Transcript


  1. ETL process management with TSQL Richard Swinbank

  2. ETL process management ETL performed by a collection of processes SSIS packages TSQL stored procedures Other bits of sticky tape and string Lots of them! Process execution has to be managed What runs when In what order What happens when things go wrong

  3. Five desirable ETL behaviours Parallel processing Fast to finish Convenient way to locate faults Fast to fix Easy to resume after error Fast to restart with as little as possible left to do Fast to finish after restart Easy to add new processes We ll come back to this

  4. A very small example A B C E F H D G I J Ten processes We ll be using stored procedures for now Process dependencies Let s look at some possible approaches

  5. Approach #1: Stepwise SQL Agent job Call each SP in a separate job step FYI, demo.usp_ProcessE is broken

  6. Agent job: Step-by-step A B C E F H D G I J

  7. Agent job: Step-by-step A B C E F H D G I J

  8. Agent job: Step #1 A B C E F H D G I J

  9. Agent job: Step #2 A B C E F H D G I J

  10. Agent job: Step #3 A B C E F H D G I J

  11. Agent job: Step #4 A B C E F H D G I J

  12. Agent job: Step #5 A B C E F H D G I J

  13. Stepwise SQL Agent job: Results

  14. Stepwise SQL Agent job: Evaluation Parallel processing Convenient way to locate faults

  15. Stepwise SQL Agent job: Evaluation Parallel processing Convenient way to locate faults Easy to resume after error

  16. Stepwise SQL Agent job: Evaluation Parallel processing Convenient way to locate faults Easy to resume after error with as little as possible left to do

  17. Approach #2: Master SSIS package Call each SP from an Execute SQL Task Deploy package to SSIS catalog; run in agent job

  18. Master SSIS package: Results

  19. Master SSIS package: Results

  20. Master SSIS package: Evaluation Parallel processing

  21. Master SSIS package: Evaluation Parallel processing Convenient(ish) way to locate faults

  22. Master SSIS package: Evaluation Parallel processing Convenient(ish) way to locate faults Easy to resume after error

  23. Master SSIS package: Evaluation Parallel processing Convenient(ish) way to locate faults Easy to resume after error with as little as possible left to do

  24. Recap We ve identified some desirable behaviours Parallel processing Convenient way to locate faults Easy to resume after error with as little as possible left to do (Easy to add new processes we ll come back to this) We ve looked at two process management approaches Stepwise SQL Agent Job Master SSIS package Each has some of the behaviours we want but neither has all of them

  25. Dependency-driven process management in TSQL 1. Table of processes Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Ready Ready Ready Not ready Ready Not ready Not ready Not ready Not ready Not ready 2. Dependency information Process demo.usp_ProcessD demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessI demo.usp_ProcessJ RunsAfter demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessC demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG 3. Process handler SP

  26. Process handler WHILE (anything s ready) BEGIN Pseudo-TSQL SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready END

  27. Process handler A B C WHILE (anything s ready) BEGIN SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready E F H D G I J END Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Ready Ready Ready Not ready Ready Not ready Not ready Not ready Not ready Not ready

  28. Process handler A B C WHILE (anything s ready) BEGIN SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready E F H D G I J END Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Ready Ready Ready Ready Not ready Not ready Not ready Not ready Not ready

  29. Process handler A B C WHILE (anything s ready) BEGIN SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready E F H D G I J END Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Ready Done Ready Ready Not ready Not ready Ready Not ready Not ready

  30. Process handler A B C WHILE (anything s ready) BEGIN SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready E F H D G I J END Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Ready Done Ready Ready Not ready Not ready Ready Not ready Not ready

  31. Better process handler WHILE (anything s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready END TRY BEGIN CATCH UPDATE ProcessList SET process status = Errored END CATCH END

  32. Better process handler A B C WHILE (anything s ready) BEGIN BEGIN TRY E F H D G SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready I J Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Ready Done Ready Ready Not ready Not ready Ready Not ready Not ready END TRY BEGIN CATCH UPDATE ProcessList SET process status = Errored END CATCH END

  33. Better process handler A B C WHILE (anything s ready) BEGIN BEGIN TRY E F H D G SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready I J Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Ready Done Ready Errored Not ready Not ready Ready Not ready Not ready END TRY BEGIN CATCH UPDATE ProcessList SET process status = Errored END CATCH END

  34. Better process handler A B C WHILE (anything s ready) BEGIN BEGIN TRY E F H D G SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready I J Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Done Done Ready Errored Ready Ready Ready Not ready Not ready END TRY BEGIN CATCH UPDATE ProcessList SET process status = Errored END CATCH END

  35. Better process handler A B C WHILE (anything s ready) BEGIN BEGIN TRY E F H D G SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready I J Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Done Done Done Errored Ready Ready Ready Not ready Not ready END TRY BEGIN CATCH UPDATE ProcessList SET process status = Errored END CATCH END

  36. Better process handler A B C WHILE (anything s ready) BEGIN BEGIN TRY E F H D G SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready I J Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Done Done Done Errored Ready Done Ready Not ready Ready END TRY BEGIN CATCH UPDATE ProcessList SET process status = Errored END CATCH END

  37. Better process handler A B C WHILE (anything s ready) BEGIN BEGIN TRY E F H D G SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready I J Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Done Done Done Errored Done Done Ready Not ready Ready END TRY BEGIN CATCH UPDATE ProcessList SET process status = Errored END CATCH END

  38. Better process handler A B C WHILE (anything s ready) BEGIN BEGIN TRY E F H D G SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready I J Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Done Done Done Errored Done Done Ready Not ready Done END TRY BEGIN CATCH UPDATE ProcessList SET process status = Errored END CATCH END

  39. Better process handler A B C WHILE (anything s ready) BEGIN BEGIN TRY E F H D G SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready I J Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Done Done Done Errored Done Done Done Not ready Done END TRY BEGIN CATCH UPDATE ProcessList SET process status = Errored END CATCH END

  40. Better process handler: Evaluation Parallel processing Convenient way to locate faults Easy to resume after error with as little as possible left to do WHILE (anything s ready) BEGIN BEGIN TRY SELECT ready process EXECUTE selected process UPDATE ProcessList SET process status = Done , process s dependants = Ready Process demo.usp_ProcessA demo.usp_ProcessB demo.usp_ProcessC demo.usp_ProcessD demo.usp_ProcessE demo.usp_ProcessF demo.usp_ProcessG demo.usp_ProcessH demo.usp_ProcessI demo.usp_ProcessJ Status Done Done Done Done Errored Done Done Done Not ready Done END TRY BEGIN CATCH UPDATE ProcessList SET process status = Errored END CATCH END

  41. Parallel processing Run multiple handlers at the same time Must prevent different handlers from running the same process Make handler reserve a process before executing it (and set status of processes in execution to Running ) Reserve by inserting details into reservations table Process demo.usp_ProcessA demo.usp_ProcessC demo.usp_ProcessE demo.usp_ProcessB demo.usp_ProcessD Primary key Catch PK violation, continue

  42. Parallelisable process handler WHILE (anything s ready) BEGIN BEGIN TRY -- for process execution SELECT ready process BEGIN TRY -- for process reservation INSERT ready process details INTO ProcessReservations END TRY BEGIN CATCH CONTINUE END CATCH UPDATE ProcessList SET process status = Running EXECUTE selected process UPDATE ProcessList SET process status = Done [...]

  43. What about SSIS packages? Can t EXECUTE selected package Execute package using SSIS catalog SPs SSISDB.catalog.create_execution SSISDB.catalog.start_execution Need EXECUTE-like behaviour Return only when package execution has finished Raise error if something goes wrong Wrap up in package runner SP Handler executes process or package runner IF process is SP EXECUTE process ELSE IF process is SSIS package EXECUTE usp_RunPackage @process

  44. Demo Sprockit my implementation of this approach Pure TSQL & SQL Server Agent Completely free & open-source

  45. Those five five desirable behaviours Parallel processing Convenient way to locate faults Easy to resume after error with as little as possible left to do Easy to add new processes? What s so hard about this anyway?!

  46. Adding new processes Where do I put new processdemo.usp_ProcessK? A B C E F H D G I J To decide, I need to know what everything else does Difficult unless I know the ETL landscape very well Takes a while for newbies to get up to speed

  47. Process dependencies A B C E F H D G I J

  48. Process dependencies A C B D E F G H I J

  49. Resource dependencies T01 T02 T03 A C B T07 T06 T04 D E F G H T05 T09 T08 T11 T10 I J T12 T13 T14

  50. Resource dependencies T01 T02 T03 A C B T07 T06 T04 D E F G H T05 T09 T08 T11 T10 I J T12 T13 T14

Related


More Related Content