Using In-Database Tools for Weekly Challenges in Alteryx

Slide Note
Embed
Share

Explore how to leverage In-Database tools in Alteryx for solving weekly challenges presented by the Alteryx Community. Learn the concept of Weekly Challenges, the role of In-Database tools, and how they work to process data efficiently. Dive into solving specific challenges using SQL setup and blending techniques to achieve desired outcomes. Follow the step-by-step instructions to join data, summarize results, and enhance your analytical skills in Alteryx.


Uploaded on Sep 16, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. Solving Weekly Challenges using In-Database tools Thales Donizeti Senior Data Engineer at phData

  2. What are Weekly Challenges? An initiative by the Alteryx Community Each week a use case is presented You have an Analytics question, and you must present a solution using Alteryx Every solution you post count to a board

  3. What are In-Database tools? Alteryx tools that visually build SQL Data Processing stays in the database you re working in Alteryx sends SQL through the tools and the database transforms the data A hybrid approach is often used in workflows

  4. How do In-Database tools work? Every tool builds a CTE (Common Table Expression) in the target database Alteryx pushes a consolidated SQL statement with CTEs that references the previous tools Data only leaves the database if a Data Stream Out / Browse In-DB tool is used You can see the SQL built by Alteryx at any time by using the Dynamic Output In-DB tool You can also push your own SQL with the Dynamic Input In-DB tool Dynamic Input In-DB Dynamic Output In-DB

  5. Solving Weekly Challenge #1 with In-Database Tools + SQL - Setup

  6. Weekly Challenge #1 Join to Range A company in Australia has source data which is made up of a series of postal codes (eg. 2000, 2001, 2002 etc.) amongst some other data fields. They have a separate reference table which contains postcode ranges (eg. 2000 to 2002) which they would like to use to match/filter their main data. Each Customer Record needs to be joined to the Lookup table based on a Postal Area Ranged region. Then finally summarize the customer data by Region, Sales Rep, and Responder, then a count of customers. Check and see what the result should look like by looking at the data labeled 'Output'. Your mission is to take the input files and blend them so your result matches the output shown.

  7. Weekly Challenge #1 Start files Solution

  8. Setup Set up a database of your choice Create additional resources in your database (database, schema, user, roles, etc.) Install the ODBC drivers (if any) Create the Data connections in Alteryx (regular and In-DB) Load the Start Files data in database tables Connect Alteryx to the tables

  9. Database Snowflake A 30-day trial can be used ($400 credit) the trial can be renewed with the same email account Minimal setup to get started (no local infrastructure required) ANSI-SQL compliant Easy and quick to use connectors in Alteryx Fast and scalable, can handle large datasets After the setup, create the database resources (database, schema, warehouse, role)

  10. ODBC Driver and Connections Download the Snowflake ODBC Driver from the Alteryx license portal Install the driver and set up an ODBC DSN (search for ODBC in Windows) Input the User, Snowflake Account URL, database, warehouse and role You will need to reference this DSN in Alteryx

  11. Data Connection to load data Create a Data Connection pointing to Snowflake Bulk Set up the connection by selecting the DSN created and picking up the Local staging method. Use the Table Stage.

  12. In-DB connection Create an In-DB Connection pointing to Snowflake Click on the Connection String arrow to insert the DSN and your credentials to Snowflake Make sure you change the Table/FieldName SQL Style to None Set up the write method with the Bulk Loader (follow the steps on the previous slide)

  13. Load Start files to Snowflake Double check the data types before outputting the start files to Snowflake (specially for string sizes) Use an Auto-Field tool and a Select tool Create a table for each text input tool. You can call it Source Data and Reference Data Make sure you create a dedicated schema in Snowflake for each challenge Use the Data Connection created previously

  14. Solving Weekly Challenge #1 with In-Database Tools + SQL - Workflow

  15. Original Solution

  16. In-DB approach - Connect to the Reference Data Connect to the Reference Data by using the Connect In-DB tool and the In-DB connection

  17. Formula In-DB tool Use string manipulation formulas from Snowflake to split the Range column In Alteryx we would use the Text to Columns tool, in SQL we have to think differently

  18. Output the Query generated Use the Dynamic Output In-DB tool to see the query that has been generated Select to output the Query, Query Alias List and Last Query Alias (we will use it)

  19. Paste the full query output in Snowflake You can use a SQL formatter to make it cleaner From here, we will need to work using SQL to keep the workflow going Alteryx generates CTE names by referencing the tools in the workflow with a unique ID We will build part of the workflow in Snowflake and go back to Alteryx

  20. Add the Last Query Alias to the CTE Make sure you modify the query by adding the Last Query Alias to the CTE It s the last tool alias

  21. Add a Recursive CTE To generate a table with all possible combinations of the Range column, we will use a Recursive CTE Make sure you uppercase the CTE name (TOOL_USER_1) and quote them The CTE names can be anything you want (just avoid special characters)

  22. Use a Range Join (SQL) SQL allows range joins (BETWEEN clauses) Join the Recursive CTE output with the referenced data In Alteryx, the recursive CTE + range join would be equivalent to the Generate Rows tool

  23. Transform the user added CTEs into a Query Alias List Convert the CTEs into a Query Alias List It should be as simple as adding an equal sign instead of the word AS Add it to a Text Input tool (this step could potentially be automated if you have .SQL files)

  24. Combine the Query Alias Lists Union the Query Alias List produced by the In-DB tools and the one it was just created using SQL Trim any whitespaces (they can cause errors) Concatenate the records into a single record by using the newline delimiter (\n) Add the In-DB connection created in the setup with a Formula tool Pass the Connection and the combined Query Alias List to the Dynamic Input In-DB tool

  25. Final In-DB steps Join the Source Data with the last step Use the Summarize In-DB tool to get the final count of Customers by Region, Sales Rep and Responder Stream out the data / Write to a table / See the final query

  26. Pick 3 Weekly Challenges you would like to see being solved using In-DB tools in the future

Related