- May 15, 2020

墨尔本大学 INFO20003 Assignment3课业解析 题意：查询处理和优化 解析： 第一题：

考虑两张关系表Parts和Supply，分别有60000和150000条记录，每页存储50条记录，二者均没有索引。对于下列查询语句，以磁盘I/O次数（页面数量）作为成本计算的方法，计算下列4种连接方式的成本，分别为Page-oriented Nested Loops Join、Block-oriented Nested Loops Join、Sort-Merge Join、Hash Join，写出成本计算公式：

SELECT *

FROM Parts INNER JOIN Supply

ON Parts.PartID = Supply.PID;

第二题：

考虑下面的关系模式：

Employee (EmpID, firstname, lastname, department, salary)

员工信息表包含1200个页面，每页存储120条记录。对于下列查询语句，分析和估计不同索引方式下查询的成本，clustered B+ tree index on (department, salary)、unclustered B+ tree index on (salary)、 unclustered Hash index

on (department)、unclustered Hash index on (salary）

SELECT *

FROM Employee

WHERE salary > 300,000 AND department = ‘Marketing’;

第三题：

考虑下面三种关系模式：

Employee (eid: integer, salary: integer, name: char(30))

Project (projid: integer, code: char(20), start: date, end: date, eid: integer)

Department (did: integer, projid: integer, budget: real, floor: integer)

对于下列查询语句,分别计算下列4种查询方案的成本，假设Project.projid和employee.salary都有B+树索引。

SELECT e.name, d.projid

FROM Employee e, Project p, Department d

WHERE e.eid = p.eid AND p.projid = d.projid

AND e.salary < 300,000 AND p.code = ‘alpha 340’; 涉及知识点： 嵌套循环连接、哈希连接、排序合并连接更多可加微信讨论微信号IT_51zuoyejunpdf
1INFO20003 Semester 2, 2019Assignment 3 – Query Processing and Query OptimisationDue: 6:00pm Friday 18 OctoberSubmission: Via LMS https://lms.unimelb.edu.auWeighting: 10% of your total assessment. The assignment will be gradedout of 20 marks.Question 1 (5 marks)Consider two relations called Parts and Supply. Imagine that relation Parts has 60,000 tuplesand Supply has 150,000 tuples. Both relations store 50 tuples per page. Consider the followingSQL statement:SELECT *FROM Parts INNER JOIN SupplyON Parts.PartID = Supply.PID;We wish to evaluate an equijoin between Supply and Parts, with an equality conditionParts.PartID = Supply.PID. There are 202 buffer pages available in memory for this operation.Both relations are stored as (unsorted) heap files. Neither relation has any indexes built on it.Consider the alternative join strategies described below and calculate the cost of eachalternative. Evaluate the algorithms using the number of disk I/O's (i.e. pages) as the cost. Foreach strategy, provide the formulae you use to calculate your cost estimates.a) Page-oriented Nested Loops Join. Consider Parts as the outer relation. (1 mark)b) Block-oriented Nested Loops Join. Consider Parts as the outer relation. (1 mark)c) Sort-Merge Join. Assume that Sort-Merge Join can be done in 2 passes. (1 mark)d) Hash Join. (1 mark)e) What would be the lowest possible cost to perform this query, assuming that noindexes are built on any of the two relations, and assuming that sufficient buffer spaceis available? What would be the minimum buffer size required to achieve this cost?Explain briefly. (1 mark)2Question 2 (5 marks)Consider a relation with the following schema:Employee (EmpID, firstname, lastname, department, salary)The Employee relation has 1200 pages and each page stores 120 tuples. The departmentattribute can take one of six values (“Marketing”, “Human Resource”, “Finance”, “PublicRelations”, “Sales and Distribution”, “Operation Management”) and salary can have valuesbetween 100,000 and 500,000 ([100,000, 500,000]).Suppose that the following SQL query is executed frequently using the given relation:SELECT *FROM EmployeeWHERE salary > 300,000 AND department = ‘Marketing’;Your job is to analyse the query plans and estimate the cost of the best plan utilizing theinformation given about different indexes in each part.a) Compute the estimated result size for the query, and the reduction factor of each filter.(1 mark)b) Compute the estimated cost of the best plan assuming that a clustered B+ tree indexon (department, salary) is the only index available. Suppose there are 300 indexpages. Discuss and calculate alternative plans. (1 mark)c) Compute the estimated cost of the best plan assuming that an unclustered B+ treeindex on (salary) is the only index available. Suppose there are 200 index pages.Discuss and calculate alternative plans. (1 mark)d) Compute the estimated cost of the best plan assuming that an unclustered Hash indexon (department) is the only index available. Discuss and calculate alternative plans.(1 mark)e) Compute the estimated cost of the best plan assuming that an unclustered Hash indexon (salary) is the only index available. Discuss and calculate alternative plans.(1 mark)3Question 3 (10 marks)Consider the following relational schema and SQL query. The schema captures informationabout employees, their departments and the projects they are involved in.Employee (eid: integer, salary: integer, name: char(30))Project (projid: integer, code: char(20), start: date, end: date, eid: integer)Department (did: integer, projid: integer, budget: real, floor: integer)Consider the following query:SELECT e.name, d.projidFROM Employee e, Project p, Department dWHERE e.eid = p.eid AND p.projid = d.projidAND e.salary < 300,000 AND p.code = ‘alpha 340’;The system’s statistics indicate that there are 1000 different project code values, and salaryof the employees range from 100,000 to 500,000 ([100,000, 500,000]). There is a total of60,000 projects, 5,000 employees and 20,000 departments in the database. Each relation fits100 tuples in a page. Assume eid is a candidate key for Employee, projid is a candidate keyfor Project, and did is a candidate key for the Department table. Suppose there exists aclustered B+ tree index on (Project.projid) of size 200 pages and suppose there is a clusteredB+ tree index on (employee.salary) of size 10 pages.a) Compute the estimated result size and the reduction factors (selectivity) of this query.(2 marks)b) Compute the cost of the plans shown below. Assume that sorting of any relation (ifrequired) can be done in 2 passes. NLJ is a Page-oriented Nested Loops Join.Assume that 100 tuples of a resulting join between Employee and Project fit in a page.Similarly, 100 tuples of a resulting join between Project and Department fit in a page.If selection over filtering predicates is not marked in the plan, assume it will happenon-the-fly after all joins are performed, as the last operation in the plan.(8 marks, 2 marks per plan)4Formatting Requirements:For each question, present an answer in the following format:• Show the question number before each question. You do not need to include the text of the questionitself.• Show your answer in blue text (please type your answers on a computer).• Start Question 2 and Question 3 on a new page.For each of the calculations, provide all the formulae you used to calculate your cost estimates, andshow your working, not only the result.Employee(Heap scan)Project(Heap scan)Department(Heap scan)Project(Heap scan)Department(Heap scan)Employee(Heap scan)NLJNLJHJSMJ[1] [2][3] [4]Project(Heap scan)Employee(Index scan (salary))Department(Heap scan)NLJHJ