Relational databases - structure and design
Relational databases (RDBMS) follow a structured approach where individual entries or records are stored in tables organized into rows and columns. Each record has a unique identifier called a primary key, and the fields in a row represent specific data items associated with that record.
For example, the row ID: 123, Ram Ratan Sastri, 57, Cranfield Gardens, London, W12 represents a record with ID: 123 as the primary key.
Each column corresponds to a field, such as Name or Address. Entities like User or Address are described by their attributes, such as First Name, Street, or Postcode.
The relationship between entities could be described in the database structure as - the User has an address. The multiplicity of the relationship can be established by asking key questions
-
Can one user have many addresses (yes - home address, office address, permanent address, forwarding address), the multiplicity of this relationship therefore is 1 user can have many or n number of addresses.
-
Can one address have many users (yes - a flat/ home may have many occupants or an office block can have several employees using the address, therefore 1 flat can 0 occupants (empty) or have n number of users as well)
Entities can be considered strong or weak depending again on how they are viewed.
- Can an address exist without a user (yes - strong entity)
- Can a user exist without an address (debatable, but probably no, therefore weak entity)
You can have two strong entities and two weak entities where one entity is stronger than the other, often determining strong and weak entities depends on user requirements and logical deductions and conclusions.
Key elements of RDBMS
- Tables: Collections of records organized in rows and columns
- Records: Individual entries in a table
- Primary keys: Unique identifiers for records
- Fields: Specific data points within a record
- Entities: Concepts or objects represented in the database
- Attributes: Characteristics of entities
- Relationships: Connections between entities usually established by foreign keys
- Multiplicities: The connections between entities determined by the minimum participation and the maximum participation, or cardinality, or a record
Data Occurrences in a Table
In a database table, data occurrences refer to the individual rows (or records) that represent specific instances of data. Each row contains values for the columns (or fields) defined in the table schema. For example:
| ID | Name | Age | City |
|---|---|---|---|
| 1 | Alice | 25 | New York |
| 2 | Bob | 30 | Los Angeles |
| 3 | Charlie | 22 | Chicago |
Key Concepts
- Columns: Represent the attributes or properties of the data (e.g.,
ID,Name,Age,City). - Rows: Represent individual occurrences of data (e.g., Alice, Bob, Charlie).
- Primary Key: A unique identifier for each row (e.g.,
IDin the table above).
Understanding relationships and multiplicity in RDBMS
Entities in a database are connected through relationships, which can be described as:
-
One-to-One (1..1): Each instance of Entity A is related to at most one instance of Entity B, and vice versa. Example: A person can have only one passport, and each passport belongs to only one person. This is also referred to as mandatory participation - as both the minimum and maximum instances that the data is required 1:1 on both sides. In the next section we will see how participation can be optional.
-
One-to-Many (1..*): Each instance of Entity A can be related to multiple instances of Entity B, but each instance of Entity B is related to at most one instance of Entity A. Example: A department can have many employees, but each employee belongs to only one department.
-
Many-to-Many (..): Each instance of Entity A can be related to multiple instances of Entity B, and vice versa. Example: Students can enroll in multiple courses, and each course can have multiple students.
Understanding participation and cardinality in multiplicity
When designing a database, two important concepts in the Entity-Relationship Diagram (ERD) are participation and cardinality. These concepts help define the relationships between entities and ensure the database structure accurately represents the real-world scenario.
Participation
Participation refers to whether all instances of an entity are required to participate in a relationship. It is the minimum number of instances to participate in the relationship, participation can be zero, one or many.
-
Total or mandatory participation: Every instance of the entity must participate in the relationship.
Example: If every employee in a company must belong to at least one department, theEmployeeentity has total participation in theworks inrelationship. -
Partial or optional participation: Some instances of the entity may not participate in the relationship. Example: Not all customers of a bank may have a loan. In this case, the
Customerentity has partial participation in thehas loanrelationship. Customers:Loans can be represented as 0:n showing this partial participation of an entity on one side of the relationship.
Where there is no participation possible on both sides for example a teacher may require a computer but not all teachers require computers. The relationship of the teacher:computer is 0:n, equally not every computer is allocated to a teacher computer to teacher could be 0:n. Here the paricitipation is partial on both sides. If the requirement is every teacher has a computer that would become 1:1 and mandatory participation.
Cardinality
Cardinality defines the maximum number of instances of one entity that can or must be associated with instances of another entity. It specifies the “degree” of the relationship. Cardinality is the opposite side of the spectrum to participation in a relationship. It refers to the maximum number of instances of the data required in the relationship.Any requirement above 1 instance of the record is classified as n number or many, represented by the asterisk in ERDs.
To summarize participation and cardinality depend on how the database requirements are outlined and met. Therefore the requirements of the database and the use of its instances determine participation and cardinality of entities in a relationship. Cardinality is important when establishing the mapping rules from a conceptual ERD to a logical ERD and ensuring the right keys are chosen for database tables that are to be built.
Occurrence vs. Multiplicity in Databases
Occurrence
- Definition: Refers to a specific instance of data in a database.
- Example: In a table of
Students, each row represents an occurrence of a student. - Context: Focuses on individual data points or records.
Multiplicity
- Definition: Describes the number of possible relationships between entities in a database.
- Example: In an ER diagram, a
1-to-manyrelationship betweenTeachersandClassesindicates multiplicity. - Context: Focuses on the cardinality of relationships (e.g.,
1:1,1:N,N:M).
Key Differences
| Aspect | Occurrence | Multiplicity |
|---|---|---|
| Focus | Individual data instances | Relationships between entities |
| Representation | Rows in a table | Cardinality in relationships |
| Example | A single student record | 1-to-many teacher-class link |
Preventing redundancy in RDBMS design
Data redundancy refers to the duplication of data within a database. While redundancy can sometimes improve availability, unintentional redundancy due to poor design can lead to inefficiencies.
Causes of data redundancy
- Poor database design.
- Lack of normalization.
- Manual data entry errors.
- Data replication for fault tolerance.
Problems with data redundancy
- Increased storage costs: Duplicate data consumes more space.
- Data inconsistency: Conflicting versions of data can arise.
- Maintenance overheads: Managing redundant data requires extra effort.
Benefits of controlled redundancy
- Improved Data Availability: Ensures data is accessible during failures.
- Faster Data Access: Reduces access time in distributed systems.
How RDBMS reduces redundancy
- Normalization:
- The process of organizing data into tables and defining relationships between them to minimize duplication and redundancy.
- Ensures that each piece of data is stored only once, reducing storage requirements and improving data consistency.
- Primary keys:
- Unique identifiers for records in a table.
- Prevents duplicate entries and ensures that each record can be uniquely identified.
- DBMS tools:
- Built-in mechanisms in relational database management systems (RDBMS) to enforce data integrity and consistency.
- Includes constraints like unique constraints, foreign keys, and checks to avoid redundant data.
- Regular audits:
- Periodic reviews of the database to identify and remove unnecessary duplicates.
- Helps maintain data quality and ensures that redundancy is minimized over time.
What is data normalisation
Data normalization is the process of organizing data to reduce redundancy and improve integrity. It involves structuring data into tables and ensuring logical dependencies.
Goals of data normalization
- Minimize data redundancy.
- Ensure data consistency.
- Simplify database maintenance.
What are normal data forms
- First Normal Form (1NF):
- Ensure all columns contain atomic values.
- Remove duplicate rows.
- Second Normal Form (2NF):
- Achieve ojectives of the first normal form
- Ensure all non-key attributes depend fully on the primary key.
- Third Normal Form (3NF):
- Achieve objectives of the second normal form
- Remove transitive dependencies.
- Boyce-Codd Normal Form (BCNF):
- A stricter version of the third normal form where every determinant is a candidate key.
Steps to normalize data
- Identify the data and its relationships
- Define primary keys for unique identification
- Eliminate redundancy by breaking down tables
- Establish relationships using foreign keys
- Validate against the desired normal form
Relationships Between Data Design and Data Systems
Data design and data systems are closely interconnected. Effective data design ensures that data systems are efficient, scalable, and maintainable. Below are the key aspects of their relationship:
1. Data Normalization
- Design Aspect: Organizing data to reduce redundancy and improve integrity.
- System Impact: Normalized tables reduce storage requirements and improve query performance.
- Example: Splitting a table into two related tables to avoid duplicate data.
2. Relationships Between Tables
- Design Aspect: Defining how tables relate to each other (e.g., one-to-one, one-to-many, many-to-many).
- System Impact: Relationships are implemented using foreign keys, ensuring data consistency.
- Example: A
Customerstable and anOrderstable linked by aCustomerID.
3. Indexing
- Design Aspect: Creating indexes on frequently queried columns.
- System Impact: Improves query performance but may increase storage and write operation costs.
- Example: Indexing the
Namecolumn in a table for faster search.
4. Scalability
- Design Aspect: Designing for future growth (e.g., partitioning, sharding).
- System Impact: Ensures the system can handle large volumes of data and users.
- Example: Partitioning a table by date for efficient querying.
5. Data Integrity
- Design Aspect: Enforcing rules to ensure data accuracy and consistency (e.g., constraints, validations).
- System Impact: Prevents invalid data from being entered into the system.
- Example: A
NOT NULLconstraint on a required column.
6. Data Access Patterns
- Design Aspect: Understanding how data will be accessed and designing accordingly.
- System Impact: Optimizes the system for read-heavy or write-heavy workloads.
- Example: Using a denormalized schema for read-heavy analytics systems.
Conclusion
Relational databases provide a robust framework for organizing and managing data efficiently. By leveraging key concepts like normalization, primary keys, and relationships, RDBMS minimizes redundancy, ensures data consistency, and simplifies maintenance. While controlled redundancy can enhance availability in specific scenarios, unintentional redundancy should be avoided to maintain the integrity and efficiency of the database. A well-designed RDBMS is essential for building reliable, scalable, and maintainable data-driven applications.
The design of data structures and relationships directly impacts the performance, scalability, and reliability of data systems. A well-thought-out data design ensures that the data system can meet current and future requirements efficiently.