Explain insertion, deletion and modification anomalies with suitable examples

This post will give a short and simple explanation of database normalization and other aspects of it. As the saying has it, “nobody is perfect!” and I am not an exception, so please feel free to comment out any mistakes that I might make.

Note: The realm of Database is huge and shouldn’t be underestimated based on this post.

What do we need Normalization for? To solve these issues:

Anomalies

1- Update Anomaly: Let say we have 10 columns in a table out of which 2 are called employee Name and employee address. Now if one employee changes it’s location then we would have to update the table. But the problem is, if the table is not normalized one employee can have multiple entries and while updating all of those entries one of them might get missed.

2- Insertion Anomaly: Let’s say we have a table that has 4 columns. Student ID, Student Name, Student Address and Student Grades. Now when a new student enroll in school, even though first three attributes can be filled but 4th attribute will have NULL value because he doesn't have any marks yet.

3- Deletion Anomaly: This anomaly indicates unnecessary deletion of important information from the table. Let’s say we have student’s information and courses they have taken as follows (student ID,Student Name, Course, address). If any student leaves the school then the entry related to that student will be deleted. However, that deletion will also delete the course information even though course depends upon the school and not the student.

Normalization try to bring the tables to granular state where these issues can be avoided. In simple words it tries to split tables into multiple tables and defines relationships between them using keys.

Important Keys

a) Primary Key: This key uniquely identify each entry in a table. This value cannot be repeated inside a table and cannot hold null values. Generally first columns is defined as primary key. Example (Student ID).

b) Foreign Key: This key can have repetitive values, but to uniquely identify each entry the table can still have primary key column separate of foreign key column. However, the foreign key will create a relation with another table where those values are defined as primary keys.

c) Compound Key: This is the methods of defining multiple columns as primary key. Situations where no column have unique values in a table, we can define a combination of two or more than two columns as unique and set it as primary key. For example: (Student Name, Address, Marks, …, etc.) Here it is likely that student can have same names, therefore we define combination of student name and address as primary key. Now it is more unlikely that there can be student with same name and same address.

d) Candidate Key: In simple words a candidate key is a key that can also serve as a primary key. For example: (Student ID, Student Roll No., Address, Marks) Here student id is primary key because it does not have repetitive value, does not have null values. However, student roll number also holds all the properties of primary key and thus considered as candidate key.

f) Surrogate Key: This means an artificially created value that uniquely identify each entry in a table when no other column was able to hold properties of a primary key. It is an additional column and generally holds integer values.

Remember, we cannot make any changes to table which have its primary key work as foreign key in another table. In other words, we cannot make changes to a primary key if it is referred to by foreign key in from another table. Or to say, we cannot make changes to primary key of a parent table if it has a child with foreign key referring to parent’s primary key.

Normalization

1NF: The first normal form signifies that each cell of the table must only have single value. Therefore, each intersection of rows and columns must hold atomic values. For example: If we have a column name phone_number than each row for that column must save only single phone number.

2NF: We saw candidate key above and here is where it plays a role. 2NF rule signifies that no non-prime attributes in the table are dependent on any of the candidate key. In simple words, If the table is representative of two different entities then it should be broken down into their own entities. For example: If we have a table (Student ID, Student Name, Course Number, Course Name, Teacher ID, Teacher Name) this is representing information about each student enrolled in each course which is taught by each teacher in school. Since it is a representative of three different entities it must be normalized into 2NF form.

3NF: This rule signifies that tables must be in 2NF form and each table should only contain columns that are non-transitively depended on primary key of their own table. In simple words, if we have a table (Transaction ID, price, quantity, total_sales) here the total sales is the product of price and quantity (price*quantity). Hence sales is transitively depended in Transaction ID which is a primary key here. So each attribute must directly depend upon the primary key.

What are insertion deletion and modification anomalies?

deletion anomaly: occurs when you delete a record that may contain attributes that shouldn't be deleted. functional dependency (FD): describes how individual attributes are related. insertion anomaly: occurs when you are inserting inconsistent information into a table.

What is insertion anomaly with example?

An insertion anomaly is the inability to add data to the database due to the absence of other data. For example, assume Student_Group is defined so that null values are not allowed. If a new employee is hired but not immediately assigned to a Student_Group then this employee could not be entered into the database.

What is modification anomaly example?

Likewise, when a student is deleted from the database due to graduation course data should not be inadvertently lost. These types of problems are examples of modification anomalies. Unexpected side effects that occur when changing the rows of a table with excessive redundancies.

Which of the following is an example of an deletion anomaly?

A deletion anomaly is the unintended loss of data due to deletion of other data. For example, if the student group Beta Alpha Psi disbanded and was deleted from the table above, J. Longfellow and the Accounting department would cease to exist.