How are each of the acid properties fulfilled?
A: undo/redo logging
C: serialisable schedules
I: serialisable schedules and isolation levels
D: redo/undo logging, recoverable schedules
What is query processing used for?
Here we’re taking in queries and making a sequence of operations we can actually do on a database and then executing these on the physical hard disk in the execution engine
What does it mean that SQL queries are declarative
They tell the DBMS what we want, not how to get it
How Does Query Processing Work?
-Optimising by modifying it.
-physical query plan involves choosing between different algorithms
What is a query plan?
A relational algebra expression that is obtained from an SQL Query
- they tell us exactly how to compute the result to a query
How are query plans typically represented?
Why do DBMSs use relational algebra over SQL for query plans?
How do we apply the select operation on a relation R
For each tuple t in R, if t satisfies condition, output t
- this requires reading the whole file
- so is very slow
How is a naive computation of a natural join carried out?
What is the run time of the Naïve Computation of Joins on relations S and R
|R| * |S| = number of tuples in R times number of tuples in S
What are equijoins?
Equijoins are just a variant of natural joins, but equijoins allow you to have two columns with the same output whereas natural join removes one of them
What is the run time of an equijoin?
What are the rules for doing an equijoin?
What happens if we’re doing the equijoin of R and S and column A has a lot of duplicated?
The size of the output will be very big (not efficient)
What sorting algorithm do we use to sort R by A and S by B and how does this affect the run time?
What is the run time of an equijoin (including sorting) if there’s not too many duplicates in A?
O(|R|log2|R| + |S|log2|S|)
- we can leave out the size of the output because this is linear so will be insignificant as the size of R and S increase (only if there aren’t too many duplicate rows in A)
Which algorithm is typically faster? merging sort algorithm or nested loop?
Merging sort algorithm
- exponential as opposed to quadratic
Why is it important that we take into account that relations are stored on the hard disk?
Can we read an entire relation in a faster way?
Yes, selection can be done faster if we know where to find the rows for G401
- this can be done by sorting and indexing
What does an index do?
an index addresses one or more attributes and corresponds to some table.
What is a primary index?
It points to the location of record on the disk and the data is sorted based on value
- typically the primary index is exactly what you define with primary keys
What is a secondary index?
What is the run time in regards to searching for an item in the table of indexes?
What is the run time for then going through each row that is listed in the index?