Understanding SQL Aggregation and Grouping in Database Systems

Slide Note
Embed
Share

SQL offers powerful aggregation functions like SUM, AVG, COUNT, MIN, and MAX to perform calculations on column data efficiently. By utilizing DISTINCT and GROUP BY clauses, you can manipulate and organize your data effectively in database systems while handling NULL values appropriately.


Uploaded on Sep 26, 2024 | 0 Views


Download Presentation

Please find below an Image/Link to download the presentation.

The content on the website is provided AS IS for your information and personal use only. It may not be sold, licensed, or shared on other websites without obtaining consent from the author. Download presentation by click this link. If you encounter any issues during the download, it is possible that the publisher has removed the file from their server.

E N D

Presentation Transcript


  1. CSCE-608 Database Systems Spring 2024 Instructor: Jianer Chen Office: PETR 428 Phone: 845-4259 Email: chen@cse.tamu.edu Notes 12: SQL Grouping, aggregation, and having

  2. SQL: Structured Query language a very-high-level language. * say what to do rather than how to do it. * avoid a lot of data-manipulation details needed in procedural languages like C++ or Java. Database management system figures out the best way to execute queries * called query optimization For both data definition and data manipulation.

  3. Aggregations SUM, AVG, COUNT, MIN, and MAX can be applied to a column in a SELECT clause to produce that aggregation on the column. Also, COUNT(*) counts the number of tuples. Example: From Sells(bar, beer, price), find the average price of Bud: SELECT AVG(price) FROM Sells WHERE beer = Bud ;

  4. Sells(bar, beer, price) Eliminating Duplicates in Aggregation Use DISTINCT inside an aggregation. Example: find the number of different prices charged for Bud: SELECT COUNT(DISTINCT price) FROM Sells WHERE beer = Bud ;

  5. NULL is Ignored in Aggregation NULL never contributes to a sum, average, or count, and can never be the minimum or maximum of a column. But if there are no non-NULL values in a column, then the result of the aggregation is NULL.

  6. Sells(bar, beer, price) Example: Effect of NULL s SELECT count(*) FROM Sells WHERE beer = Bud ; The number of bars that sell Bud. SELECT count(price) FROM Sells WHERE beer = Bud ; The number of bars that sell Bud at a known price.

  7. Grouping We may follow a SELECT-FROM- WHERE expression by GROUP BY and a list of attributes. The relation that results from the SELECT-FROM-WHERE is grouped according to the values of all those attributes, and any aggregation is applied only within each group.

  8. Example: Grouping From Sells(bar, beer, price), find the average price for each beer: SELECT beer, AVG(price) FROM Sells GROUP BY beer;

  9. Example: Grouping From Sells(bar, beer, price), find the average price for each beer: SELECT beer, AVG(price) FROM Sells GROUP BY beer; Output one tuple for each group

  10. Sells(bar, beer, price) Frequents(drinker, bar) Example: Grouping From Sells and Frequents, find for each drinker the average price of Bud at the bars they frequent: SELECT drinker, AVG(price) FROM Frequents, Sells WHERE beer = Bud AND Frequents.bar = Sells.bar GROUP BY drinker; 10

  11. Sells(bar, beer, price) Frequents(drinker, bar) Example: Grouping From Sells and Frequents, find for each drinker the average price of Bud at the bars they frequent: compute drinker-bar- price for Bud tuples first, then group by drinker. SELECT drinker, AVG(price) FROM Frequents, Sells WHERE beer = Bud AND Frequents.bar = Sells.bar GROUP BY drinker; 11

  12. Restriction on SELECT Lists With Aggregation If any aggregation is used, then each element of the SELECT list must be either: 1. Aggregated, or 2. An attribute on the GROUP BY list.

  13. Sells(bar, beer, price) Illegal Query Example You might think you could find the bar that sells Bud the cheapest by: SELECT SELECT bar, bar, MIN MIN(price) FROM FROM Sells Sells WHERE WHERE beer = Bud ; beer = Bud ; But this query is illegal in SQL. (price)

  14. HAVING Clauses HAVING <condition> may follow a GROUP BY clause. If so, the condition applies to each group, and groups not satisfying the condition are eliminated.

  15. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. SELECT beer, AVG(price) FROM Sells GROUP BY beer

  16. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. group tuples (bar, beer, price) in Sells in terms of beer SELECT beer, AVG(price) FROM Sells GROUP BY beer

  17. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. group tuples (bar, beer, price) in Sells in terms of beer SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 (SELECT name FROM Beers WHERE manf = Pete s ); at least 3 bars appear in the beer group beer IN

  18. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. group tuples (bar, beer, price) in Sells in terms of beer SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer IN (SELECT name FROM Beers WHERE manf = Pete s ); at least 3 bars appear in the beer group

  19. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. group tuples (bar, beer, price) in Sells in terms of beer SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer IN (SELECT name FROM Beers WHERE manf = Pete s ); beers made by Pete s at least 3 bars appear in the beer group

  20. Sells(bar, beer, price) Beers(name, manf) Example. From Sells and Beers, find the average price of those beers that are either served in at least three bars or are manufactured by Pete s. group tuples (bar, beer, price) in Sells in terms of beer SELECT beer, AVG(price) FROM Sells GROUP BY beer HAVING COUNT(bar) >= 3 OR beer IN (SELECT name FROM Beers WHERE manf = Pete s ); beers made by Pete s at least 3 bars appear in the beer group the beer is made by Pete s

  21. Requirements on HAVING Conditions These conditions may refer to any relation or tuple-variable in the FROM clause. They may refer to attributes of those relations, as long as the attribute makes sense within a group; i.e., it is either: 1. A grouping attribute, or 2. Aggregated.

  22. Requirements on HAVING Conditions It is easier to understand this from an implementation viewpoint: SELECT FROM WHERE GROUP BY HAVING

  23. Requirements on HAVING Conditions It is easier to understand this from an implementation viewpoint: SELECT FROM WHERE GROUP BY HAVING step 4, pick the proper groups step 5, compute the output step 1, input step 2, pick the proper tuples step 3, group the picked tuples

  24. Database Modifications A modification command does not return a result (as a query does), but changes the database in some way.

  25. Database Modifications A modification command does not return a result (as a query does), but changes the database in some way. Three kinds of modifications: 1. Insert a tuple or tuples. 2. Delete a tuple or tuples. 3. Update the value(s) of an existing tuple or tuples.

  26. Insertion To insert a single tuple: INSERT INTO <relation> VALUES (<list of values>);

  27. Likes(drinker, beer) Insertion To insert a single tuple: INSERT INTO <relation> VALUES (<list of values>); Example: add to Likes(drinker, beer) the fact that Sally likes Bud. INSERT INTO Likes VALUES( Sally , Bud );

  28. Likes(drinker, beer) Insertion To insert a single tuple: INSERT INTO <relation> VALUES (<list of values>); Example: add to Likes(drinker, beer) the fact that Sally likes Bud. INSERT INTO Likes VALUES( Sally , Bud ); We may add a list of attributes to <relation>. Two reasons for doing so: 1. Forget the order of attributes for the relation. 2. Don t have values for all attributes, and want the system to fill in missing ones with default values.

  29. Likes(drinker, beer) Insertion To insert a single tuple: INSERT INTO <relation> VALUES (<list of values>); Example: add to Likes(drinker, beer) the fact that Sally likes Bud. INSERT INTO Likes VALUES( Sally , Bud ); We may add a list of attributes to <relation>. Two reasons for doing so: 1. Forget the order of attributes for the relation. 2. Don t have values for all attributes, and want the system to fill in missing ones with default values. So another solution for the above example: INSERT INTO Likes(beer, drinker) VALUES( Bud , Sally );

  30. Inserting Many Tuples We may insert the entire result of a query into a relation, using the form: INSERT INTO <relation> (<subquery>);

  31. Frequents(drinker, bar) Example. Using Frequents, enter into the new relation PotBuddies(name) all of Sally s potential buddies, i.e., those drinkers who frequent at least one bar that Sally also frequents.

  32. Frequents(drinker, bar) Example. Using Frequents, enter into the new relation PotBuddies(name) all of Sally s potential buddies, i.e., those drinkers who frequent at least one bar that Sally also frequents. 1. find all potential buddies of Sally by pairing Sally with those who frequent the bars Sally frequents. INSERT INTO PotBuddies (SELECT d2.drinker FROM Frequents d1, Frequents d2 WHERE d1.drinker = Sally AND d2.drinker <> Sally AND d1.bar = d2.bar);

  33. Frequents(drinker, bar) Example. Using Frequents, enter into the new relation PotBuddies(name) all of Sally s potential buddies, i.e., those drinkers who frequent at least one bar that Sally also frequents. 1. find all potential buddies of Sally by pairing Sally with those who frequent the bars Sally frequents. INSERT INTO PotBuddies (SELECT d2.drinker FROM Frequents d1, Frequents d2 WHERE d1.drinker = Sally AND d2.drinker <> Sally AND d1.bar = d2.bar); (Sally, Joe s, Tom, Joe s) (Sally, Sue s, Jeff, Sue s) (Sally, Sue s, Mary, Sue s)

  34. Frequents(drinker, bar) Example. Using Frequents, enter into the new relation PotBuddies(name) all of Sally s potential buddies, i.e., those drinkers who frequent at least one bar that Sally also frequents. 1. find all potential buddies of Sally by pairing Sally with those who frequent the bars Sally frequents. 2. collect the drinkers INSERT INTO PotBuddies (SELECT d2.drinker FROM Frequents d1, Frequents d2 WHERE d1.drinker = Sally AND d2.drinker <> Sally AND d1.bar = d2.bar); (Sally, Joe s, Tom, Joe s) (Sally, Sue s, Jeff, Sue s) (Sally, Sue s, Mary, Sue s)

  35. Frequents(drinker, bar) Example. Using Frequents, enter into the new relation PotBuddies(name) all of Sally s potential buddies, i.e., those drinkers who frequent at least one bar that Sally also frequents. 1. find all potential buddies of Sally by pairing Sally with those who frequent the bars Sally frequents. 2. collect the drinkers INSERT INTO PotBuddies (SELECT d2.drinker FROM Frequents d1, Frequents d2 WHERE d1.drinker = Sally AND d2.drinker <> Sally AND d1.bar = d2.bar); Tom Jeff Mary

  36. Frequents(drinker, bar) Example. Using Frequents, enter into the new relation PotBuddies(name) all of Sally s potential buddies, i.e., those drinkers who frequent at least one bar that Sally also frequents. 1. find all potential buddies of Sally by pairing Sally with those who frequent the bars Sally frequents. 2. collect the drinkers INSERT INTO PotBuddies (SELECT d2.drinker FROM Frequents d1, Frequents d2 WHERE d1.drinker = Sally AND d2.drinker <> Sally AND d1.bar = d2.bar); Tom Jeff Mary 3. add the drinkers to PotBuddies

  37. Frequents(drinker, bar) Example. Using Frequents, enter into the new relation PotBuddies(name) all of Sally s potential buddies, i.e., those drinkers who frequent at least one bar that Sally also frequents. PotBuddies 1. find all potential buddies of Sally by pairing Sally with those who frequent the bars Sally frequents. 2. collect the drinkers d2.drinker INSERT INTO PotBuddies (SELECT d2.drinker FROM Frequents d1, Frequents d2 WHERE d1.drinker = Sally AND d2.drinker <> Sally AND d1.bar = d2.bar); Tom Jeff Mary 3. add the drinkers to PotBuddies

  38. Deletion To delete tuples satisfying a condition from some relation: DELETE FROM <relation> WHERE <condition>;

  39. Likes(drinker, beer) Deletion To delete tuples satisfying a condition from some relation: DELETE FROM <relation> WHERE <condition>; Example. Delete from Likes the fact that Sally likes Bud: DELETE FROM Likes WHERE drinker = Sally AND beer = Bud ;

  40. Likes(drinker, beer) Deletion To delete tuples satisfying a condition from some relation: DELETE FROM <relation> WHERE <condition>; Example. Delete from Likes the fact that Sally likes Bud: DELETE FROM Likes WHERE drinker = Sally AND beer = Bud ; To make the relation Likes empty: DELETE FROM Likes Note that no WHERE clause is needed

  41. Example: Delete Many Tuples Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer. DELETE FROM Beers b WHERE EXISTS ( SELECT name FROM Beers WHERE manf = b.manf AND name <> b.name);

  42. Example: Delete Many Tuples Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer. DELETE FROM Beers b WHERE EXISTS ( SELECT name FROM Beers WHERE manf = b.manf AND name <> b.name); Beers with the same manufacturer and a different name from the name of the beer represented by tuple b.

  43. name manf Semantics of Deletion Bud Anheuser-Busch Bud Lite Anheuser-Busch Suppose Anheuser-Busch makes only Bud and Bud Lite. Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer.

  44. name manf Semantics of Deletion Bud Anheuser-Busch Bud Lite Anheuser-Busch Suppose Anheuser-Busch makes only Bud and Bud Lite. If we come to the tuple b for Bud first. Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer.

  45. name manf Semantics of Deletion Bud Anheuser-Busch Bud Lite Anheuser-Busch Suppose Anheuser-Busch makes only Bud and Bud Lite. If we come to the tuple b for Bud first. The subquery is nonempty, because of the Bud Lite tuple, so we delete Bud. Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer.

  46. name manf Semantics of Deletion Bud Anheuser-Busch Bud Lite Anheuser-Busch Suppose Anheuser-Busch makes only Bud and Bud Lite. If we come to the tuple b for Bud first. The subquery is nonempty, because of the Bud Lite tuple, so we delete Bud. Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer.

  47. name manf Semantics of Deletion ? Bud Anheuser-Busch Bud Lite Anheuser-Busch Suppose Anheuser-Busch makes only Bud and Bud Lite. If we come to the tuple b for Bud first. The subquery is nonempty, because of the Bud Lite tuple, so we delete Bud. (?) Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer.

  48. name manf Semantics of Deletion ? Bud Anheuser-Busch Bud Lite Anheuser-Busch Suppose Anheuser-Busch makes only Bud and Bud Lite. If we come to the tuple b for Bud first. The subquery is nonempty, because of the Bud Lite tuple, so we delete Bud. (?) Now, when b is the tuple for Bud Lite, do we delete that tuple too? ? Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer.

  49. name manf Semantics of Deletion Bud Anheuser-Busch Bud Lite Anheuser-Busch Suppose Anheuser-Busch makes only Bud and Bud Lite. If we come to the tuple b for Bud first. The subquery is nonempty, because of the Bud Lite tuple, so we delete Bud. (?) Now, when b is the tuple for Bud Lite, do we delete that tuple too? Answer: we do delete Bud Lite as well. Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer.

  50. name manf Semantics of Deletion Bud Anheuser-Busch Bud Lite Anheuser-Busch Suppose Anheuser-Busch makes only Bud and Bud Lite. If we come to the tuple b for Bud first. The subquery is nonempty, because of the Bud Lite tuple, so we delete Bud. (?) Now, when b is the tuple for Bud Lite, do we delete that tuple too? Answer: we do delete Bud Lite as well. Reason: Deletion proceeds in two stages: Delete from Beers(name, manf) all beers for which there is another beer by the same manufacturer.

Related