- May 15, 2020

ISYS2120: Data & Information ManagementWeek 4: Relational Algebra – SQL Nested Subqueries Presented byDr. Matloob KhushiSchool of Computer ScienceCf. Kifer/Bernstein/Lewis – Chapter 5.1; Ramakrishnan/Gehrke – Chapter 4.2; Ullman/Widom – Chapter 2.4Administration Assignment 1 is due by 18:00:00 on 6th September 2019 Late Penalty A lateness penalty of 10% (1 mark of your total assessment) per day on your maximum mark will be imposed. E.g. If you submit 2 days late, your maximum mark will be 10 – 2*1 = 8. Group formation Everyone should be in a group by now if not please let your tutor know.04a-204a-3Outline Foundations of Declarative Querying Relational Algebra Six basic operations Join operation Composition and equivalence rules More SQL: Nested SubqueriesBased on slides from Kifer/Bernstein/Lewis (2006) “Database Systems”and from Ramakrishnan/Gehrke (2003) “Database Management Systems”,and also including material from Fekete/Röhm.How Can We Talk to a Database?04a-4Database Querying with SQL04a-5SELECT studIDFROM Enrolled E, UnitOfStudy UWHERE E.uosCode = U.uosCodeAND U.uosName = ‘Database Systems I’“List all students (by SID) who are enrolled in Database Systems I”Many Ways to Write this Query…04a-6SELECT studIDFROM Enrolled E, UnitOfStudy UWHERE E.uosCode = U.uosCodeAND uosName = ‘Database Systems I’SELECT studIDFROM EnrolledWHERE uosCode IN (SELECT uosCodeFROM UnitOfStudyWHERE uosName = ‘Database Systems I’)SELECT studIDFROM (SELECT studID, uosNameFROM Enrolled NATURAL JOIN UnitOfStudy)RWHERE uosName= ‘Database Systems I’SELECT sid AS studIDFROM (SELECT u.uosCode AS u1, e.uosCode AS u2, e.studId AS sid, u.uosName AS titleFROM Enrolled e, UnitOfStudy u) SubQWHERE u1=u2 AND title = ‘Database Systems I’“List all students (by SID) who are enrolled in Database Systems I”How do we know what a DBMS is doing? All the SQL queries on the previous slide are equivalent On the same data set, they produce the same result Even when run on different database systems by various vendors, their results will be the same Why? Why do we know that this is the case? How can database system vendors even guarantee this? And why are some not equivalent?Wrong:04a-7SELECT studIDFROM Enrolled E, UnitOfStudy UWHERE uosName = ’Database Systems I’04a-8The Big Idea Users request information from a database using a query language A query that extracts information can be seen as calculating a relation from combining one or more relations in the current state of the database Relational algebra (RA) defines some basic operators that can be used to express a calculation like that Each operator takes one or more relation instances, and produces a relation instance (the operation part of the relational data model) Why is it important? Helps us understanding the precise meaning of declarative SQL queries. Intermediate language used within DBMS (cf. chapter on db tuning)04a-9The Role of Relational Algebra in a DBMS[cf. Kifer/Bernstein/Lewis, Figure 5.1]What is an Algebra? A language based on operators and a domain of values: basic operators atomic operands (constants and variables) rules to form complex expressions from operators and operands, typically using nesting with parentheses Example: Algebra of Arithmetic Operators for usual arithmetic operations: + – * / Operands: variables (x, y, z, …) and numerical constants Example arithmetic expression: 100 – ((x + y) / 2) In databases, operators and operands are finite relations, which leads to the Relational Algebra We refer to expression as a query and the value produced as query result04a-1004a-11Relational Algebra (RA) 1. Set Operations Union ( ∪ ) tuples in relation 1 or in relation 2. Intersection ( ∩ ) tuples in relation 1, as well as in relation 2. Difference ( – ) tuples in relation 1, but not in relation 2. 2. Operations that remove parts of a relation Selection ( σ ) selects a subset of rows from relation. Projection ( π ) deletes unwanted columns from relation. 3. Operations that combine tuples from two relations Cross-product ( X ) allows us to fully combine two relations. Join ( ) to combine matching tuples from two relations. 4. A schema-level ‘rename’ operation Rename ( ρ ) allows us to rename one field or even whole relation Domain: set of relations operators take one/more relations as inputs and gives a new relation as result=> RA queries by nesting of multiple operators (cf. composition rules at end)Visualisation of Relational Algebra04a-12Projection ( π ) Selection ( σ ) X =Cross-product ( X ) – =Set Minus ( – ) ∩ =Set Intersection ( ∩ ) ∪ =Set Union ( ∪ ) =Join ( ) 04a-13Running ExampleStudentsid name gender country1001 Ian M AUS1002 Ha Tschi F ROK1003 Grant M AUS1004 Simon M GBR1005 Jesse F CHN1006 Franzisca F GERStudent UnitOfStudygendernamesidtitleuos_codecreditPointsUnitOfStudyuos_code title creditPointsCOMP5138 Relational DBMS 6COMP5318 Data Mining 6INFO6007 IT Project Management 6SOFT1002 Algorithms 12ISYS3207 IS Project 4COMP5702 MIT Research Project 18Enrolledsid uos_code semester1001 COMP5138 2005-S21002 COMP5702 2005-S21003 COMP5138 2005-S21006 COMP5318 2005-S21001 INFS6014 2004-S11003 ISYS3207 2005-S2country semesterenrolled04a-14Set Operations These operations take two input relations R and S Set Union R ∪ S Definition: R U S = { t | t ∈ R ∨ t ∈ S } Set Intersection R ∩ S Definition: R ∩ S = { t | t ∈ R ∧ t ∈ S } Set Difference R – S Definition: R – S = { t | t ∈ R ∧ t ∉ S } Important constraint: R and S have the same schema R, S have the same arity (same number of fields) `Corresponding’ fields must have the same names and domains04a-15Projection ‘Deletes’ attributes that are not in projection list. Schema of result contains exactly the fields in the projection list with distinct values, with the same names that they had in the input relation. Examples:Studentname countryIan AUSHa Tschi ROKGrant AUSSimon GBRJesse CHNFranzisca GER∏ name, country (Student) ∏ title (UnitOfStudy)UnitOfStudytitleRelational DBMSData MiningIT Project ManagementAlgorithmsIS ProjectMIT Research Project04a-16Selection Selects rows that satisfy a selection condition. Example: Result relation can be the input for another relational algebra operation! (Operator composition.) Example:Studentsid name gender country1001 Ian M AUS1003 Grant M AUSStudentnameIanGrantσ country=‘AUS’ (Student)∏ name ( σ country=‘AUS’ (Student) )04a-17Cross-Product Defined as: R x S = {t s | t ∈ R ∧ s ∈ S} each tuple of R is paired with each tuple of S. Resulting schema has one field per field of R and S, with field names `inherited’ if possible. It might end in a conflict with two fields of the same name -> rename needed Sometimes also called Cartesian product Example:αβ12αββγ10102010aabbA BRC D ESA B C Dresultααααββββ11112222αββγαββγ1010201010102010aabbaabbEx =04a-18Joins Conditional Join: Example: Result schema same as the cross-product’s result schema. Sometimes called theta-join. Equi-Join: Special case where the condition c contains only equalities.sid given family_name gender country empid first_name last_name room1001 Cho Chung M AUS 47112344 Vera Chung 3211004 Ciao Poon M CHN 12345678 Simon Poon 4311004 Ciao Poon M CHN 99004400 Josiah Poon 4821111 Alice Poon F AUS 12345678 Simon Poon 4311111 Alice Poon F AUS 99004400 Josiah Poon 482… … … … … … …04a-19Natural Join Natural Join: Equijoin on all common fields. Result schema similar to cross-product, but only one copy of fields for which equality is specified.UnitOfStudyuos_code title pointsCOMP5138 Relational DBMS 6COMP5318 Data Mining 6INFO6007 IT Project Mgmt. 6SOFT1002 Algorithms 12ISYS3207 IS Project 4COMP5702 MIT Research Project 18Enrolledsid uos_code1001 COMP51381002 COMP57021003 COMP51381006 COMP53181001 INFO60071003 ISYS3207resultsid uos_code title points1001 COMP5138 Relational DBMS 61002 COMP5702 MIT Research Project 181003 COMP5138 Relational DBMS 61006 COMP5318 Data Mining 61001 INFO6007 IT Project Mgmt. 61003 ISYS3207 IS Project 4=04a-20Rename Operation Allows to name, and therefore to refer to, the results of relational-algebra expressions. Allows to refer to a relation by more than one name. Notation 1: ρ x (E) returns the expression E under the name X (rename whole relation) Notation 2: ρ X (a1 b1) (E) returns the result of expression E under the name X, and with the attributes a1 …. renamed to b1 … (rename individual attributes) (assumes that the relational-algebra expression E has arity n) Example:ρ Classlist( sid student ) ( σ uos_code=‘ISYS2120’ (Enrolled) )Basic versus Derived Operators We can distinguish between basic and derived RA operators Only 6 basic operators are required to express everything else: Union ( ∪ ) tuples in relation 1 or in relation 2. Set Difference ( – ) tuples in relation 1, but not in relation 2. Selection ( σ ) selects a subset of rows from relation. Projection ( π ) deletes unwanted columns from relation. Cross-product ( X ) allows us to fully combine two relations. Rename ( ρ ) allows us to rename one field to another name. Additional (derived) operations: intersection, join, division: Not essential, but (very!) useful. Cf. Join: 04a-2104a-22Relational Expressions A basic expression in the relational algebra consists of either one of the following: Variable that refers to a relation of the same name in the database A constant relation Let E1 and E2 be relational-algebra expressions; then the following are all relational-algebra expressions: E1 ∪ E2 E1 – E2 E1 x E2 E1 E2 σP (E1), P is a (complex) predicate on attributes in E1 πS (E1), S is a list consisting of some of the attributes in E1 ρX (E1), x is the new name for the result of E104a-23Equivalence Rules The following equivalence rules hold: Commutation rules1. πA ( σp ( R ) ) = σp ( πA ( R ) )2. R S = S R Association rule1. R (S T) = (R S) T Idempotence rules1. πA ( πB ( R ) ) = πA ( R ) if A ⊆ B2. σp1 ( σp2 ( R )) = σp1 ∧ p2 ( R ) Distribution rules1. πA ( R ∪ S ) = πA ( R ) ∪ πA ( S )2. σP ( R ∪ S ) = σP ( R ) ∪ σP ( S )3. σP ( R S ) = σP (R) S if P only references R4. πA,B(R S ) = πA (R) πB ( S ) if join-attr. in (A ∩ B)5. R ( S ∪ T ) = ( R S ) ∪ ( R T ) These are the basis for the automatic optimisation of relational queries04a-24You should now be able … to understand the Foundations of SQL basic operations of Relational Algebra:Projection, Selection, Rename, Set Operations, Cross Product the meaning of a relational join differences between an equi-join, a theta-join and a natural join to answer a (simple) English question on a given schema with a relational algebra query to translate a relational algebra expression into a SQL query and vice versa… (with the help of the next lecture)Time for a Break…04a-2504a-26Outline Foundations of Declarative Querying Relational Algebra Six basic operations Join operation Composition and equivalence rules More SQL: Nested SubqueriesBased on slides from Kifer/Bernstein/Lewis (2006) “Database Systems”and from Ramakrishnan/Gehrke (2003) “Database Management Systems”,and also including material from Röhm.Many Ways to Write this Query…04a-27SELECT actor_idFROM Film_Actor a, Film fWHERE a.film_id = f.film_idAND f.title = ‘VELVET TERMINATOR’SELECT actor_idFROM Film_ActorWHERE film_id IN (SELECT film_idFROM FilmWHERE title = ‘VELVET TERMINATOR’)SELECT actor_idFROM (SELECT actor_id, titleFROM Film_Actor NATURAL JOIN Film)RWHERE title = ‘VELVET TERMINATOR’SELECT aid AS actor_idFROM (SELECT f.film_id AS f1, a.film_id AS f2, actor_id AS aid, f.title AS titleFROM Film f, Film_Actor a) RWHERE f1=f2 AND title = ‘VELVET TERMINATOR’“Finds the actors (by ID) who played in the film VELVET TERMINATOR.”05-28Nested Subqueries SQL provides a mechanism for the nesting of subquerieshelping in the formulation of complex queries A subquery is a select-from-where expression that is nested within another query. In a condition of the WHERE clause As a “table” of the FROM clauseWithin the HAVING clause A common use of subqueries is to perform tests for set membership, set comparisons, and set cardinality.05-29Example: Nested Queries Find the names of students who have enrolled in ‘ISYS2120’?SELECT name FROM StudentWHERE sid IN ( SELECT sidFROM EnrolledWHERE uos_code=‘ISYS2120’ ) Which students have the same name as a lecturer?SELECT sid, name FROM StudentWHERE name IN ( SELECT nameFROM Lecturer )Subquery is embedded in parentheses. In this case it returns a list that will be used in the WHERE clause of the outer queryThe IN operator will test to see if the SID value of a row is included in the list returned from the subqueryCorrelated vs. Noncorrelated Subqueries Noncorrelated subqueries: Do not depend on data from the outer query Execute once for the entire outer query Correlated subqueries:Make use of data from the outer query Execute once for each row of the outer query Can use the EXISTS operator05-3005-31SELECT nameFROM StudentWHERE sid IN ( SELECT DISTINCT sidFROM Enrolled );These are the only students that have IDs in the Enrolled table2. The outer query executes on the results of the subquery and returns the searched student namesNo reference to data in outer query, so subquery executes once only1. The subquery executes first and returns as intermediate result all student IDs from the Enrolled tableSIDProcessing a Noncorrelated Subquery05-32Correlated Nested Queries With correlated nested queries, the inner subquery depends on the outer query Example:Find all students who have enrolled in lectures given by ‘Einstein’.SELECT DISTINCT nameFROM Student, Enrolled eWHERE Student.sid = e.sid ANDEXISTS ( SELECT 1FROM Lecturers, UnitofStudy uWHERE name = ‘Einstein’ ANDlecturer = empid ANDu.uos_code=e.uos_code )Subqueryrefers toEnrolled05-33Processing a Correlated Subquery1. First join the Student and Enrolled tables;2. get the uos_code of the 1. tuple3. Evaluate the subquery for the current uos_code to check whether it is taught by Einstein4. If yes, include in result.5. Loop to step (2) until whole outer query is checked. Student |>Note: only the students that enrolled in a course taught by Albert Einstein will be included in the final resultsSubquery refers to outer-query data, so executes once for each row of outer query05-34In vs. Exists Function The comparison operator IN compares a value v with a set (or multi-set) of values V, and evaluates to true if v is one of the elements in V A query written with nested SELECT… FROM… WHERE… blocks and using the = or IN comparison operators can always be expressed as a single block query. EXISTS is used to check whether the result of a correlated nested query is empty (contains no tuples) or not05-35In vs. Exists Function Find all students who have enrolled in lectures given by ‘Einstein’.SELECT distinct nameFROM Student JOIN Enrolled E USING (sid)WHERE EXISTS ( SELECT *FROM Lecturer JOIN UnitOfStudy UON (lecturer=empid)WHERE name = ‘Einstein’ ANDU.uos_code = E.uos_code )Query using IN without a subquerySELECT distinct nameFROM StudentWHERE Student.sid IN (SELECT e.sidFROM Enrolled e, Lecturer, UOS uWHERE name = ‘Einstein’AND lecturer = empidAND u.uos_code = e.uos_code)SELECT distinct students.nameFROM Student,Enrolled e,Lecturer,UOS uWHERE Student.sid = e.sidAND lecturer.name = ‘Einstein’AND lecturer = empidAND u.uos_code = e.uos_code05-36Set Comparison Operators in SQL (not) exists clause tests whether a set is (not) empty (true ⇔ R ≠ Ø) (true ⇔ R = Ø) unique clause (note: not supported by Oracle or PostgreSQL) tests whether a subquery has any duplicate tuples in its result all clause tests whether a predicate is true for the whole setF comp ALL R ⇔ ∀ t ∈ R : (F comp t) some clause (any) tests whether some comparison holds for at least one set elementF comp SOME R ⇔ ∃ t ∈ R : (F comp t)where comp can be: , ≥, =, ≠ F is a fixed value or an attribute R is a relation05-37Examples: Set Comparison Find the students with highest marks.SELECT S.sidFROM Student NATURAL JOIN AssessmentWHERE mark >= ALL ( SELECT markFROM AssessmentWHERE uos_code=‘ISYS2120’ ) Find students which never repeated any subjects.SELECT sid, nameFROM StudentWHERE unique ( SELECT uos_codeFROM EnrolledWHERE Enrolled.sid = Student.sid )05-38Examples: Set Comparison (cont’d) SQL does not directly support universal quantification (for all) SQL Work-around: Search predicates of the form “for all” or “ for every” can be formulated using the not exists clause Example: Find courses where all enrolled student already have a grade.SELECT uos_codeFROM UnitOfStudy UWHERE NOT EXISTS( SELECT *FROM Enrolled EWHERE E.uos_code=U.uos_codeand grade is null )Query execution Query execution time could greatly differ on different DBs. Cf Khushi, M; 2015 https://doi.org/10.1002/jcb.2504904a-3904a-40References Kifer/Bernstein/Lewis (2nd edition – 2006) Chapter 5.1one section on RA that covers everything as discussed here in the lecture Ramakrishnan/Gehrke (3rd edition – the ‘Cow’ book (2003)) Chapter 4.2one compact section on RA, including a discussion of relational division Ullman/Widom (3rd edition – 2008) Chapter 2.4a nice and gentle introduction to the basic RA operations,leaves out relational division though Chapters 5.1 and 5.2goes beyond what we cover here in the lecture by extending RA to bags and also introduces grouping, aggregation and sorting operators