Databases
Contents
[hide]Terminology of relational model
A database is an organized collection of data. It is the collection of schemas, tables, queries, reports, views, and other objects. https://en.wikipedia.org/wiki/Database
- What is the primary key: title_id
- What is the foreign key: Pub_id
- What is the cardinality: 13
- select count(title_id) as 'Cardinality of the «titles» table' from titles;
- What is its degree: 9
- What is the domain of contract: Smallint(6) NOT NULL
- What is the domain of country in the table Publishers: varchar(15) NOT NULL
Database design
Los siguietes archivos corresponden a las clase del Prof. Greg South at CCT. De estas clases se deriva gran parte del contenido de esta sección. Especialmente el contenido relacionado con la creación de Entity relationship diagrams:
- Media:Clase_del_Prof_Greg_South_at_CCT-Entity_relationship_diagrams1.pdf
- Media:Clase_del_Prof_Greg_South_at_CCT-ER_diagrams-Sample_Solutions.pdf
- Media:Clase_del_Prof_Greg_South_at_CCT-ER_diagrams-Company_database_example.pdf
Methodology
- Gather Requirements
- Conceptual design: Build a model independent of the choice of DBMS
- Logical design: Create the Relational Model
- Physical design: (How the database is stored in hardware) Produce MySQL create statements for each relation
Conceptual design
Entity relationship modelling
- Entity relationship (ER) model is a popular high-level conceptual data model.
- The main component of the model is the Entity relationship diagram.
Entity relationship diagram - ERD
La siguiente página proporciona una aplicación que permite crear ERD: https://www.draw.io/
The E-R diagram is a simple way of representing the data entities being modelled and the relationships between these data entities. It's Composed of entities, attributes and relationships.
Numerous ERD styles exist. The differences between the styles are only on how the entities and their relationships are illustrated. This module focuses on Chen's ERD style.
- Entity: an entity is any object in the system that we want to model and store information about. They are objects or items of interest
- In a University Database we might have: Entity for «Students», «Modules» and «Lecturers».
- Attributes: an attribute is a property of an entity.
- Students attributes such as: «StudentID», «Name» and «Course»
- Relationship: links between entities.
Example 1
Design an ER diagram for such a bookstore:
- The database must store book, author, publisher and warehouse information.
- For every book you must capture the title, isbn, year and price information. The isbn value is unique for a book.
- For every author you must store an id, name, address and the URL of their homepage. Each author can write many books, and each book can have many authors, for example.
- For every publisher you must store an id, name, address, phone number and an URL of their website.
- Books are stored at several warehouses, each of which has a code, address and phone number.
- A book has only one publisher.
- The warehouse stocks many different books. A book may be stocked at multiple warehouses.
- The database records the number of copies of a book stocked at various warehouses.
Con la información proporcionada, debemos identificar the entities, attributes and the relationships between entities.
Entities and attributes:
- book: b_isbn , b_title, b_year, b_price
- author: au_id, au_name, au_address, au_url
- publisher: pu_id, pu_name, pu_address, pu_phone, pu_url
- warehouse: w_code, w_address, w_phone
- num_copies (The database records the number of copies...) (este atributo no ha sido especificado como parte de un entidad en particular)
Relationships:
- Each author can «write» many books, and each book can have many authors: El verbo que en este caso describe la relación entre author y book es «write»
- A book has only one publisher: «publish»
- Books are «stored» at several warehouses: «stores/stocke»
- The warehouse «stocks» many different books. A book may be stocked at multiple warehouses: «stores/stocke»
- The database records the number of copies of a book «stocked» at various warehouses. «stores/stocke»
Example 2: the Company database example
- The company is organized into departments. Each department has a unique name, a unique number, and a particular employee who manages the department. We keep track of the start date when that employee began managing the department. A department may have several locations.
- A department controls a number of projects, each of which has a unique name, a unique number, and a single location.
- The database will store each employee’s name, Social Security number, address, salary, sex (gender), and birth date. An employee is assigned to one department, but may work on several projects, which are not necessarily controlled by the same department. It is required to keep track of the cur- rent number of hours per week that an employee works on each project, as well as the direct supervisor of each employee (who is another employee).
- The database will keep track of the dependents of each employee for insurance purposes, including each dependent’s first name, sex, birth date, and relationship to the employee.
Logical design
Relational Model
Transform the ER Model to Relational Model (logical data model).
Para explicar la tranformación del ER Model en el Relational Model nos vamos a basar en el Example 2: the Company database example
Seven step algorithm to convert the basic ER model constructs into relations:
Step 1: Mapping of regular entity types
- For each regular (strong) entity type E in the ER schema, create a relation R that includes all the simple attributes of E.
- Include only the simple component attributes of a composite attribute. E.g. see name attribute in Employee
- Choose one of the key attributes of E as the primary key for R.
Remember:
- Strong Entity Type - Entity type that is not dependent on some other entity type.
- Weak Entity Type - Entity type that is dependent on some other entity type.
Employee | |||||||
---|---|---|---|---|---|---|---|
Fname | Minit | Lname | Ssn | Bdate | Address | Sex | Salary |
Department | |
---|---|
Dname | Dnumber |
Project | ||
---|---|---|
Pname | Pnumber | Plocation |
Note: The foreign key, relationship attributes and multivalued attributes, if any, are not included yet; they will be added during subsequent steps.
Step 2: Mapping of weak entity type
For each weak entity type:
- Create a relation that includes all simple attributes.
- Include the simple component attributes of any composite attributes.
- Include as a foreign key, the primary key of the owner entity.
- The primary key of the new relation will be the combination of this foreign key and the partial key of the weak entity.
Side note on simple/composite attributes (back on er design):
- It is important to note whether an attribute is simple or composite
- Composite attributes are made up of simple attributes
- E.g, the address attribute can be simple and hold all the details of an address as a single value, such as, ’30 Westland St, Glasgow, G11 93Y’
- However, the address attribute may also represent a composite attribute made up of simple attributes that hold the address details as separate values in the street (30 Westland St’), city (‘Glasgow’), and postcode (‘G11 93Y’)
- The option to represent address details as a simple or composite attribute is determined by the users’ requirements:
- If the user doesn’t need to access the separate components of an address, we represent the address attribute as a simple attribute
- On the other hand if the user does need to access the individual components of an address, we represent the address attribute as being composite, made up of the required simple attributes
In our example, we create the weak entity type DEPENDENT:
Dependent | ||||
---|---|---|---|---|
Essn | Dependent_name | Sex | Bdate | Relationship |
- We include the primary key Ssn of the EMPLOYEE relation -which corresponds to the owner entity type- as a foreign key attribute of DEPENDENT; we rename it Essn, although this is not necessary.
- The primary key of the DEPENDENT relation is the combination {Essn, Dependent_name}, because Dependent_name (also renamed from Name in ER model) is the partial key of DEPENDENT.
Step 3: Mapping of binary 1:1 relationship types
For each binary 1:1 relationship type:
- Choose one of the relations and include the other’s PK in it as a FK.
- Include any simple attributes of the relationship (relationship attributes)
- In our example, we map the 1:1 relationship type MANAGES from our ER diagram by choosing the participating entity type DEPARTMENT to serve in the role of foreign key because its participation in the MANAGES relationship type is total (every department has a manager).
- The participation in the relationship is full on the Department side and partial on the Employee side. For this reason, IT IS BETTER to insert the PK of Employee as a FK in the Department Relation.
- Note: EMPLOYEE participation is partial. DEPARTMENT participation is not clear from the requirements. We question the users, who say that a department must have a manager at all times, which implies total participation. So, every department has a manager
- The participation in the relationship is full on the Department side and partial on the Employee side. For this reason, IT IS BETTER to insert the PK of Employee as a FK in the Department Relation.
- We include the primary key of the EMPLOYEE relation as foreign key in the DEPARTMENT relation and rename it to Mgr_ssn.
- We also include the simple attribute Start_date of the MANAGES relationship type in the DEPARTMENT relation and rename it Mgr_start_date
Department | |||
---|---|---|---|
Dname | Dnumber | Mgr_ssn | Mgr_start_date |
FURTHER NOTE:
- Note that it is possible to include the primary key of department as a foreign key in Employee table instead.
- In our example, this amounts to having a foreign key attribute, say Department_managed in the EMPLOYEE relation, but it will have a NULL value for employee tuples who do not manage a department.
- This would be a bad choice, because if only 2% of employees manage a department, then 98% of the foreign keys would be NULL in this case.
Step 4: Mapping of binary 1 to N relationship types
- For each binary 1 to N relationship in the ER diagram, include, as a foreign key, in the entity on the N side the primary key of the 1 side.
- Include simple attributes of 1:N relationship type as attributes of the relation on the N side.
In our example:
- For WORKS_FOR we include the primary key Dnumber of the DEPARTMENT relation as foreign key in the EMPLOYEE relation and call it Dno
- For SUPERVISION we include the primary key of the EMPLOYEE relation as foreign key in the EMPLOYEE relation itself -because the relationship is recursive- and call it Super_ssn.
Employee | |||||||||
---|---|---|---|---|---|---|---|---|---|
Fname | Minit | Lname | Ssn | Bdate | Address | Sex | Salary | Super_ssn | Dno |
- The CONTROLS relationship is mapped to the foreign key attribute Dnum of PROJECT, which references the primary key Dnumber of the DEPARTMENT relation.
Project | |||
---|---|---|---|
Pname | Pnumber | Plocation | Dnum |
Step 5: Mapping of binary m:n relationship types
- For each binary M:N relationship type, create a new relation:
- Include in the new relation as foreign keys, the primary keys of the two participating entities.
- The primary key of the new relation will be the combination of these two foreign keys.
- Include any simple attributes of M:N relationship type.
- In our example, we map the M:N relationship type WORKS_ON from our ERD, by creating the relation WORKS_ON:
- We include the primary keys of the PROJECT and EMPLOYEE relations as foreign keys in WORKS_ON and rename them Pno and Essn, respectively (renaming is not required; it is a design choice).
- We also include an attribute Hours in WORKS_ON to represent the Hours attribute of the relationship type.
- The primary key of the WORKS_ON relation is the combination of the foreign key attributes {Essn, Pno}.
Works_on | ||
---|---|---|
Essn | Pno | Hours |
Step 6: Mapping of multivalued attributes
- For each multivalued attribute A, create a new relation R:
- This relation R will include an attribute corresponding to A, plus the primary key attribute K -as a foreign key in R- of the relation that represents the entity type OR relationship type that has A as a multivalued attribute.
- The primary key of R is the combination of A and K. If the multivalued attribute is composite, we include its simple components.
- In our example, we create a relation DEPT_LOCATIONS:
- The attribute Dlocation represents the multivalued attribute LOCATIONS of DEPARTMENT, whereas Dnumber -as foreign key- represents the primary key of the DEPARTMENT relation.
- The primary key of DEPT_LOCATIONS is the combination of {Dnumber, Dlocation}.
Dept_locations | |
---|---|
Dnumber | Dlocation |
Step 7: Mapping of n-ary relationship types
- For each n-ary relationship type R, where n > 2 (relationship que envuelven más de dos entities) create a new relationship relation S to represent R:
- Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types.
- Also include any simple attributes of the n-ary relationship type
- Este caso no se encuentra en nuestro ejemplo (Company ER). Considere la siguiente relación:
Offers | |||
---|---|---|---|
course_no | instr_no | semestre_no | room_no |
Resultado
Discussion and summary
ER Model | Relational model |
---|---|
Entity type | Entity relation |
1:1 or 1:N relationship type | Foreign key (or relationship relation) |
M:N relationship type | Relationship relation and tow foreign keys |
n-ary relationship type | Relationship relation and n foreign keys |
Simple attribute | Attribute |
Composite attribute | Set of simple component attributes |
Multivalued attribute | Relation and foreign key |
Value set | Domain |
Key attribute | Primary (or secondary) key |
- One of the main points to note in a relational schema, in contrast to an ER schema, is that relationship types are not represented explicitly; instead, they are represented by having two attributes A and B, one a primary key and the other a foreign key (over the same domain) included in two relations S and T.
- Two tuples in S and T are related when they have the same value for A and B.
Normalisation
Validate the Relational Model using Normalisation.
Normalization is a technique for producing a suitable set of relations that support the data requirements of an nterprise.
A suitable set of relations include
- The minimal number of attributes necessary to support the data requirements of the enterprise;
- Attributes with a close logical relationship are found in the same relation;
- Minimal redundancy with each attribute represented only once with the important exception of attributes that form all or part of foreign keys. (these are essential for the joining of related relations)
Potential benefits of using a database that has a suitable set of relations is that the database will be:
- Easier for the user to access and maintain the data:
- If we minimize data redundancy:
- Updates to the data stored in the database are achieved with a minimal number of operations thus reducing the opportunities for data inconsistencies.
- Take up minimal storage space on the computer - Reduction in the file storage space required by the relations thus minimizing costs.
- If we minimize data redundancy:
We highlight two main approaches for using normalization Approach
- Bottom-up approach: Bottom-up standalone database design technique Approach
- Top-down approach: Validation technique which may have been created using a top-down approach such as ER modeling
No matter which approach is used, the goal is the same – creating a set of well-designed relations that meet the data requirements of the enterprise
Bottom-up vs Top down:
- Bottom-up design methodology (also called design by synthesis) considers the basic relationships among individual attributes as the starting point and uses those to construct relation schemas. This approach is not very popular in practice because it suffers from the problem of having to collect a large number of binary relationships among attributes as the starting point. For practical situations, it is next to impossible to capture binary relationships among all such pairs of attributes.
- In contrast, a top-down design methodology (also called design by analysis) starts with a number of groupings of attributes into relations that exist together naturally, for example, on an invoice, a form, or a report. The relations are then analyzed individually and collectively, leading to further decomposition until all desirable properties are met.
Informal design guidelines for relation schemas
Before discussing the formal theory of relational database design, we discuss informal guidelines that may be used as measures to determine the quality of relation schema design:
- Making sure that the semantics of the attributes is clear in the schema
- Reducing the redundant information in tuples
Imparting clear semantics to attributes in relations
- Guideline 1: Design a relation schema so that it is easy to explain its meaning.
- Do not combine attributes from multiple entity types and relationship types into a single relation.
- Whenever we group attributes to form a relation schema, we assume that attributes belonging to one relation have certain real-world meaning and a proper interpretation associated with them.
- In general, the easier it is to explain the semantics of the relation -or in other words, what a relation exactly means and stands for- the better the relation schema design will be.
- To illustrate this, consider Figure 1, a simplified version of the COMPANY relational database schema and Figure 2, which presents an example of populated relation states of this schema.
- The meaning of the EMPLOYEE relation schema is simple: Each tuple represents an employee, with values for the employee’s name (Ename), Social Security number (Ssn),... and the number of the department that the employee works for (Dnumber).
- The Dnumber attribute is a foreign key that represents an implicit relationship between EMPLOYEE and DEPARTMENT.
Examples of violating guideline 1
- The tuple in the EMP_DEPT relation schema represents a single employee but includes, along with the Dnumber (the identifier for the department he/she works for), additional information-namely, the name (Dname) of the department for which the employee works and the Social Security number (Dmgr_ssn) of the department manager.
- Esta tuple contiene información de dos entidades distintas (Employee and Department). It violates
Guideline 1 by mixing attributes from distinct real-world entities. Esto no es apropiado...
EMP_DEPT | ||||||
---|---|---|---|---|---|---|
Ename | Ssn | Bdate | Address | Dnumber | Dname | Dmgr_ssn |
Data redundancy
- Major aim of relational database design is to group attributes into relations to minimize data redundancy.
Grouping attributes into relation schemas has a significant effect on storage space.
For example, compare the space used by the two relations EMPLOYEE and DEPARTMENT with that for an EMP_DEPT base relation (which is the result of applying the NATURAL JOIN operation to EMPLOYEE and DEPARTMENT).
- In EMP_DEPT, the attribute values pertaining to a particular department (Dnumber, Dname, Dmgr_ssn) are repeated for every employee who works for that department. In contrast, each department’s information appears only once in the DEPARTMENT relation in Figure 2. Only the department number (Dnumber) is repeated in the EMPLOYEE relation for each employee who works in that department as a foreign key.
An anomaly is anything we try to do with a database that leads to unexpected and/or unpredictable results.
These anomalies can be classified into:
- Insertion anomalies,
- Deletion anomalies,
- and Modification (Update) anomalies.
Insertion anomalies
- To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the department that the employee works for, or NULLs (if the employee does not work for a department as yet).
- For example, to insert a new tuple for an employee who works in department number 5, we must enter all the attribute values of department 5 correctly so that they are consistent with the corresponding values for department 5 in other tuples in EMP_DEPT.
- In the EMPLOYEE and DEPARTMENT design, we do not have to worry about this consistency problem because we enter only the department number in the employee tuple; all other attribute values of department 5 are recorded only once in the database, as a single tuple in the DEPARTMENT relation
- It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation. The only way to do this is to place NULL values in the attributes for employee. This violates the entity integrity for EMP_DEPT because its primary key Ssn cannot be null.
- Moreover, when the first employee is assigned to that department, we do not need this tuple with NULL values anymore. This problem does not occur in the EMPLOYEE and DEPARTMENT design because a department is entered in the DEPARTMENT relation whether or not any employees work for it, and whenever an employee is assigned to that department, a corresponding tuple is inserted in EMPLOYEE.
Deletion anomalies
- If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost inadvertently from the database.
- This problem does not occur in the EMPLOYEE and DEPARTMENT design because DEPARTMENT tuples are stored separately.
Modification (Update) anomalies
- In EMP_DEPT, if we change the value of one of the attributes of a particular department -say, the manager of department 5- we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent.
- If we fail to update some tuples, the same department will be shown to have two different values for manager in different employee tuples, which would be wrong
Storing natural joins of relations leads to an additional problem referred to as Update anomalies.
Normalisation
Ver esta página: https://www.1keydata.com/database-normalization/first-normal-form-1nf.php
pag. 8, 14
FUNCTIONAL DEPENDENCIES Previously, we have dealt with the informal measures of database design. We now introduce a formal tool for analysis of relational schemas that enables us to detect and describe some of the above-mentioned problems in precise terms
The single most important concept in relational schema design theory is that of a functional dependency. Functional dependency describes the relationship between attributes.
For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A --> B), if each value of A in R is associated with exactly one value of B in R.
An alternative way to describe the relationship between attributes A and B is to say that "A functionally determines B"
When we say: A --> B:
- A and B may each consist of one or more attributes.
- The determinant of a functional dependency refers to the attribute or group of attributes on the left-hand side of the arrow. (For example, A is the determinant of B)
Example functional dependency that holds: Consider the values shown in staffNo and sName attributes of the Staff relation. Based on sample data, the following functional dependencies appear to hold:
- staffNo --> sName
- sName --> staffNo
However, if the values shown in the Staff relation simply represent a set of values at a given moment in time, then we are not so interested in such relationships between attributes. The reason is that we want to identify functional dependencies that hold for all possible values for attributes of a relation as these represent the types of integrity constraints that we need to identify.
If we know the staff number(staffNo) of a member of staff, we can determine the name of the member of staff (sName). However, it is possible for the sName attribute to hold duplicate values for members of staff with the same name, then we would not be able to determine the staff number of some members of staff.
The only functional dependency that remains true for all possible values for the staffNo and sName attributes of the Staff relation is:
- staffNo --> sName
- staffNo functionally determines sName
Characteristics of functional dependencies:
- Determinants should have the minimal number of attributes necessary to maintain the functional dependency with the attribute(s) on the right hand-side.
- This requirement is called full functional dependency.
- So that if A and B are attributes of a relation, B is functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A. [[[WHATTT!!!!]]]
Example partial and full functional dependency:
- Consider the following functional dependency that exists in the Staff relation:
- staffNo, sName --> branchNo
It is correct to say that each value of (staffNo, sName) is associated with a single value of branchNo. However, it is NOT a full functional dependency, because branchNo is also functionally dependent on as subset of (staffNo, sName), namely staffNo.
In other words the functional dependency shown is an example of partial dependency.
Type of functional dependency that we are interested in identifying is a full functional dependency:
- staffNo→ branchNo
Transitive dependency:
So far we have discussed functional dependencies that we are interested in for the purposes of normalization.
However, there is an additional type of functional dependency called a transitive dependency that we need to recognize because its existence in a relation can potentially cause update anomalies.
Transitive dependency describes a condition where A, B, and C are attributes of a relation such that if A → B and B → C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).
Example of transitive functional dependency:
- Consider functional dependencies in the StaffBranch relation
- staffNo --> sName, position, salary, branchNo, bAddress
- branchNo --> bAddress
- Transitive dependency, branchNo --> bAddress exists on staffNo via branchNo
- In other words, the StaffNo attribute functionally determines the bAddress via the branchNo attribute and neither branchNo or bAddress functionally determines staffNo
Remember: In StaffBranch relation there is redundant data, the details of a branch are repeated for every member of staff located at a branch.
Normalization of relations:
- The normalization process, as first proposed by Codd, takes a relation schema through a series of tests to certify whether it satisfies a certain normal form.
- The process, which proceeds in a top-down fashion by evaluating each relation against the criteria for normal forms and decomposing relations as necessary, can thus be considered as relational design by analysis.
- Initially, Codd proposed three normal forms, which he called first (1NF), second (2NF), and third normal form (3NF).
Why normalize: Normalization of data can be considered a process of analyzing the given relation schemas based on their functional dependencies and primary keys to achieve the desirable properties of:
- Minimizing redundancy and
- minimizing the insertion, deletion, and update anomalies discussed
Normalization is often executed as a series of steps. Each step corresponds to a specific normal form.
Normalization approach: 1st Normal Form:
- Each table contains all atomic data items(no repeating values), no repeating groups, and a designated primary key (no duplicated rows)
2nd Normal Form (composite keys):
- In 1NF
- Includes no partial dependencies
- No attribute dependent on a portion of primary key
3rd Normal Form:
- In 2NF
- Contains no transitive dependencies
- Attributes are functionally dependent on a non-key attributes (no non-key values based on other non-key values)
For a relational data model, it is important to recognize that it is only First Normal Form (1NF) that is critical in creating relations; all subsequent normal forms are optional. However, to avoid the update anomalies discussed previously, it is generally recommended that we proceed to at least Third Normal Form (3NF)
1NF - FIRST NORMAL FORM:
A database is in first normal form if it satisfies the following conditions:
- Contains only atomic values:
- An atomic value is a value that cannot be divided. For example, in the table shown below, the values in the [Color] column in the first row can be divided into "red" and "green", hence [TABLE_PRODUCT] is not in 1NF.
- There are no repeating groups:
- A repeating group means that a table contains two or more columns that are closely related. For example, a table that records data on a book and its author(s) with the following columns: [Book ID], [Author 1], [Author 2], [Author 3] is not in 1NF because [Author 1], [Author 2], and [Author 3] are all repeating the same attribute.
2NF – SECOND NORMAL FORM:
What is SQL
SQL (Structured Query Language) is a special-purpose programming language designed for managing data held in a relational database management system (RDBMS), or for stream processing in a relational data stream management system (RDSMS). https://en.wikipedia.org/wiki/SQL
What is a Database management system - DBMS
https://en.wikipedia.org/wiki/Database
A database management system (DBMS) is a computer software application that interacts with the user, other applications, and the database itself to capture and analyze data. A general-purpose DBMS is designed to allow the definition, creation, querying, update, and administration of databases.
Well-known DBMSs include MySQL, PostgreSQL, MongoDB, Microsoft SQL Server, Oracle, Sybase, SAP HANA, and IBM DB2.
A database is not generally portable across different DBMSs, but different DBMS can interoperate by using standards such as SQL and ODBC or JDBC to allow a single application to work with more than one DBMS. Database management systems are often classified according to the database model that they support; the most popular database systems since the 1980s have all supported the relational model as represented by the SQL language.
The DBMS acronym is sometimes extended to indicate the underlying database model, with RDBMS for the relational, OODBMS for the object (oriented), and ORDBMS for the object-relational model. Other extensions can indicate some other characteristic, such as DDBMS for distributed database management systems.
MySQL