In the previous section (Relational Data Model Formalities) we defined the mathematical notion of the relational model. Now we know how the data can be stored using a relational data model but we do not know what to do with all these tables to retrieve something from the database yet. For example somebody could ask for the names of all suppliers that sell the part 'Screw'. Therefore two rather different kinds of notations for expressing operations on relations have been defined:
The Relational Algebra which is an algebraic notation, where queries are expressed by applying specialized operators to the relations.
The Relational Calculus which is a logical notation, where queries are expressed by formulating some logical restrictions that the tuples in the answer must satisfy.
The Relational Algebra was introduced by E. F. Codd in 1972. It consists of a set of operations on relations:
SELECT (σ): extracts tuples from a relation that satisfy a given restriction. Let R be a table that contains an attribute A. σA=a(R) = {t ∈ R ∣ t(A) = a} where t denotes a tuple of R and t(A) denotes the value of attribute A of tuple t.
PROJECT (π): extracts specified attributes (columns) from a relation. Let R be a relation that contains an attribute X. πX(R) = {t(X) ∣ t ∈ R}, where t(X) denotes the value of attribute X of tuple t.
PRODUCT (×): builds the Cartesian product of two relations. Let R be a table with arity k1 and let S be a table with arity k2. R × S is the set of all k1 + k2-tuples whose first k1 components form a tuple in R and whose last k2 components form a tuple in S.
UNION (∪): builds the set-theoretic union of two tables. Given the tables R and S (both must have the same arity), the union R ∪ S is the set of tuples that are in R or S or both.
INTERSECT (∩): builds the set-theoretic intersection of two tables. Given the tables R and S, R ∩ S is the set of tuples that are in R and in S. We again require that R and S have the same arity.
DIFFERENCE (− or ∖): builds the set difference of two tables. Let R and S again be two tables with the same arity. R - S is the set of tuples in R but not in S.
JOIN (∏): connects two tables by their common attributes. Let R be a table with the attributes A,B and C and let S be a table with the attributes C,D and E. There is one attribute common to both relations, the attribute C. R ∏ S = πR.A,R.B,R.C,S.D,S.E(σR.C=S.C(R × S)). What are we doing here? We first calculate the Cartesian product R × S. Then we select those tuples whose values for the common attribute C are equal (σR.C = S.C). Now we have a table that contains the attribute C two times and we correct this by projecting out the duplicate column.
Example 1-2. An Inner Join
Let's have a look at the tables that are produced by evaluating the steps necessary for a join. Let the following two tables be given:
R: S: A | B | C C | D | E ---+---+--- ---+---+--- 1 | 2 | 3 3 | a | b 4 | 5 | 6 6 | c | d 7 | 8 | 9
First we calculate the Cartesian product R × S and get:
R x S: A | B | R.C | S.C | D | E ---+---+-----+-----+---+--- 1 | 2 | 3 | 3 | a | b 1 | 2 | 3 | 6 | c | d 4 | 5 | 6 | 3 | a | b 4 | 5 | 6 | 6 | c | d 7 | 8 | 9 | 3 | a | b 7 | 8 | 9 | 6 | c | d
After the selection σR.C=S.C(R × S) we get:
A | B | R.C | S.C | D | E ---+---+-----+-----+---+--- 1 | 2 | 3 | 3 | a | b 4 | 5 | 6 | 6 | c | d
To remove the duplicate column S.C we project it out by the following operation: πR.A,R.B,R.C,S.D,S.E(σR.C=S.C(R × S)) and get:
A | B | C | D | E ---+---+---+---+--- 1 | 2 | 3 | a | b 4 | 5 | 6 | c | d
DIVIDE (÷): Let R be a table with the attributes A, B, C, and D and let S be a table with the attributes C and D. Then we define the division as:
R ÷ S = {t ∣ ∀ ts ∈ S ∃ tr ∈ Rsuch that tr(A,B)=t∧tr(C,D)=ts} where tr(x,y) denotes a tuple of table R that consists only of the components x and y. Note that the tuple t only consists of the components A and B of relation R.
Given the following tables
R: S: A | B | C | D C | D ---+---+---+--- ---+--- a | b | c | d c | d a | b | e | f e | f b | c | e | f e | d | c | d e | d | e | f a | b | d | eR ÷ S is derived as
A | B ---+--- a | b e | d
For a more detailed description and definition of the relational algebra refer to [Ullman, 1988] or [Date, 1994].
Example 1-3. A Query Using Relational Algebra
Recall that we formulated all those relational operators to be able to retrieve data from the database. Let's return to our example from the previous section (Operations in the Relational Data Model) where someone wanted to know the names of all suppliers that sell the part Screw. This question can be answered using relational algebra by the following operation:
πSUPPLIER.SNAME(σPART.PNAME='Screw'(SUPPLIER ∏ SELLS ∏ PART))
We call such an operation a query. If we evaluate the above query against the our example tables (The Suppliers and Parts Database) we will obtain the following result:
SNAME ------- Smith Adams
The relational calculus is based on the first order logic. There are two variants of the relational calculus:
The Domain Relational Calculus (DRC), where variables stand for components (attributes) of the tuples.
The Tuple Relational Calculus (TRC), where variables stand for tuples.
We want to discuss the tuple relational calculus only because it is the one underlying the most relational languages. For a detailed discussion on DRC (and also TRC) see Date, 1994 or Ullman, 1988.
The queries used in TRC are of the following form:
x(A) ∣ F(x)where x is a tuple variable A is a set of attributes and F is a formula. The resulting relation consists of all tuples t(A) that satisfy F(t).
If we want to answer the question from example A Query Using Relational Algebra using TRC we formulate the following query:
{x(SNAME) ∣ x ∈ SUPPLIER ∧ ∃ y ∈ SELLS ∃ z ∈ PART (y(SNO)=x(SNO) ∧ z(PNO)=y(PNO) ∧ z(PNAME)='Screw')}
Evaluating the query against the tables from The Suppliers and Parts Database again leads to the same result as in A Query Using Relational Algebra.
The relational algebra and the relational calculus have the same expressive power; i.e. all queries that can be formulated using relational algebra can also be formulated using the relational calculus and vice versa. This was first proved by E. F. Codd in 1972. This proof is based on an algorithm ("Codd's reduction algorithm") by which an arbitrary expression of the relational calculus can be reduced to a semantically equivalent expression of relational algebra. For a more detailed discussion on that refer to Date, 1994 and Ullman, 1988.
It is sometimes said that languages based on the relational calculus are "higher level" or "more declarative" than languages based on relational algebra because the algebra (partially) specifies the order of operations while the calculus leaves it to a compiler or interpreter to determine the most efficient order of evaluation.