Optimizing Queries and Constructing Percentage Cube in Business Intelligence
Presented at ENSMA, this paper introduces optimized strategies for constructing a percentage cube to enhance processing of complex queries in modern business intelligence. It discusses challenges, comparison between percentage queries and percentage cube, and showcases a data cube analysis of car sales. The focus is on introducing new aggregation functions and optimization techniques for efficient data processing.
Download Presentation
Please find below an Image/Link to download the presentation.
The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.
E N D
Presentation Transcript
ENSMA : Ecole Nationale Suprieure de Mcanique et d A rotechnique LIAS : Laboratoire d Informatique et d Automatique pour les Syst mes Percentage cube queries Optimisation Authors: Yiqun Zhang, Carlos Ordonez, Javier Garcia-Garcia, Ladjel Bellatreche Presented by: Abdallah KHELIL Page 1
Product Quarter Region Sales Clio 4 Q1 Europe 77340 RoW 84772 Q2 Europe 95368 RoW 111266 Q3 Europe 58305 RoW 69088 Pre-aggregated data allow to accelerate processing of some complex queries needed for modern business intelligence functionalities Page 2 2
Product Quarter Region Sales Clio 4 Q1 Europe 77340 RoW 84772 Optimized queries aggregation functions New approach to compute percentage cube by introducing the pct () function. New optimization strategies to accelerate percentage cube construction to process standard with Proposal Q2 Europe 95368 RoW 111266 Q3 Europe 58305 It function percentage lacks an important is Q RoW 69088 which the Q In relationship between aggregated measures from different levels of clustering. general, it is difficult to get the Q: Total sales Q : Total sales for the RoW w.r.t Q3 Page 3 3
Challenges Construct a percentage cube from fact table and measures Region Quarter Id_Region Desc_Region Id_Quarter Desc_Quarter Dimension 1 Dimension 2 Sales Id_Region Id_Quarter Qty_Sold Quantity of cars sold Fact table Page 4 4
Percentage queries vs. Percentage cube Percentage queries: Have one total and the individual level setting in the output SELECT L1 Lj, R1 Rk pct(A TOTAL BY L1 Lj BREAKDOWN BY R1 Rk) FROM F GROUP BY L,R; Percentage cube: Explores all the possible grouping column combinations SELECT L1 Lj, R1 Rk pct(A) FROM F GROUP BY L,R; WITH PERCENTAGE CUBE; Page 5 5
Data cube Quantity of cars sold in 2016 Group Renault Region Quarter Qty_Sold Europe Q1 Q2 Q3 Q1 Q2 Q3 77340 95368 58305 84772 111266 69088 Rest of the world Page 6 6
Percentage cube TOTAL BY Region BREAKDOWN BY Quarter Quarter Region Qty_Sold % Q1 Europe 34% Region Region Quarter Quarter Q2 Q3 Europe Europe 40% 26% Region Quarter Q1 RoW 32% Region Quarter Q2 RoW 42% Region Quarter Q3 RoW 26% Q Q SELECT Quarter,Region, pct(Qty_Sold TOTAL BY Region BREAKDOWN BY Quarter ) FROM Sales_Table GROUP BY Quarter,Region; Q: Total sales in region RoW Q : Total sales for Q3 within RoW Page 7 7
Percentage cube TOTAL BY ALL BREAKDOWN BY Quarter, Region Quarter Region Qty_Sold % Q1 Europe 16% ALL ALL Quarter, Region Quarter, Region Q2 Q3 Europe Europe 19% 12% ALL Quarter, Region Q1 RoW 17% ALL Quarter, Region Q2 RoW 22% Q ALL Quarter, Region Q3 RoW 14% Q SELECT Quarter,Region, pct(Qty_Sold BREAKDOWN BY Quarter,Region) Q: Total sales Q : Total sales considering RoW in the Q3 FROM Sales_Table GROUP BY Quarter,Region; Page 8 8
A: standard aggregated data L: first aggregation level R: second aggregation level Output: pct cube Input: data cube ? pct (A, {L}, {R}) Sales 77340 84772 95368 111266 58305 69088 Qty_Sold % 16% 19% 12% 17% 22% 14% We propose two strategies (w.r.t standard SQL operators) to implement the pct() function : OLAP windows. GROUP-BY. pct (Qty_Sold, , {Quarter,Region}) Page 9 9
Implementation of the pct() function using OLAP windows X pct (Qty_Sold, ,{Quarter,Region}) Qty_Sold A OVER(PARTITION BY {R}) R={Quarter, Region} 77340 95368 A R L Region Quarter Qty_Sold 1 Europe Q1 Q2 77340 95368 Y 2 Total Input: data cube 496139 Output: percentage cube Region Quarter X/Y 3 Europe Q1 Q2 16% 19% Page 10 10
Implementation of the pct() function using GROUP-BY pct (Qty_Sold, ,{Quarter,Region}) Findv A (GROUP BY {R}) R={Quarter, Region} Qty_Sold A R L 77340 95368 Region Quarter Qty_Sold 1 Europe Q1 Q2 77340 95368 2 A (GROUP BY {L}) L= Findv/Ftotal Region Quarter Ftotal 3 16% 19% Europe Q1 Q2 Total 496139 Page 11 11
Size of an OLAP cube A B Sales 1 A B Sales Europe Q1 Q2 77340 95368 Europe Europe Q1 Q2 77340 95368 2 A Sales Europe 172708 A B B Sales A B Q1 Q2 ... 162112 206634 A * B * Sales * * 496139 Page 12 12
Size of a percentage cube B A Sales Q1 Europe 48% 1 A B Sales A B Q2 Europe 46% A B A B A B A B A * A Europe Q1 34% A Percentage cube size : The size of a percentage cube is larger than the size of a standard OLAP cube. Solution: pruning the groups that have a number of elements exceeding a given threshold: Direct pruning Cascaded pruning vs. OLAP cube size : Q2 41% B 2 A B Sales A * * B Europe Q1 16% Europe Q2 19% B * B * * A Sales Europe 47% B Sales Q1 33% Q2 42% Page 13 13
Iceberg queries Vs. Percentage cube Iceberg queries Need of a threshold Pruning partitions that do not satisfy the threshold Percentage cube Need of a threshold Grouping by the total level aggregation (Ftotal) Pruning groups that do not satisfy a specified size (the threshold ) Page 14 14
Thd=30 ABCD(1080) BCD(108) ABC(90) ABD(90) ACD(180) BC(18) BD(18) CD(36) AD(30) AC(30) AB(15) A(5) B(3) C(6) D(6) Direct pruning Verify all possible combinations of grouping columns with respect to the given threshold Page 15 15
Thd=30 BCD(108) ABC(90) ABD(90) ACD(180) BC(18) BD(18) CD(36) AD(30) AC(30) AB(15) A(5) B(3) C(6) D(6) Cascaded pruning 1. 2. 3. Iterate on all combinations of the elements of L from bottom to top. A group is pruned if it does not satisfy the given threshold We stop going further in the search space with possible sets that are super sets of the pruned ones Page 16 16
Percentage aggregation: GROUP-BY vs OLAP . Scale factor=4, N=24M |L1| |R1| OLAP row_number() GROUP-BY Ftotal from F Ftotal from Findv DISTINCT 52 50 50 48 49 63 43 44 62 25 4 7 12 7 12 25 7 12 25 8 7 7 5 5 5 6 7 29 28 28 22 22 30 23 27 54 1K 10 10 19 35 13 56 16 10 11 51 200K Page 17 17
Cube generation: GROUP-BY vs. OLAP Scale factor=1, N= 6M d 2 3 4 5 6 GROUP-BY OLAP 4 41 155 545 2402 9174 10 30 148 1147 GROUP-BY method is about 10 times faster than the OLAP window function method for all d's. Page 18 18
Direct pruning vs. Cascaded pruning d Threshold (% of N) Direct pruning SF=1 91 91 90 124 268 269 272 437 Cascaded pruning SF=1 70 72 74 130 177 182 186 436 SF=8 690 693 692 728 1767 1769 1774 1981 SF=8 5 10% 8% 6% 674 675 676 729 1680 1687 1687 1930 0 6 10% 8% 6% 0 Page 19 19
Proposal New approach to compute percentage cube by introducing the pct () function. New optimization strategies construction Direct pruning Cascaded pruning to accelerate percentage cube Future work Page 20 20
QUESTIONS Page 21 21