What is distributed database? Explain data fragmentation techniques in detail with suitable example.

Distributed database (DDB)

 A distributed database (DDB) is a collection of multiple, logically interrelated databases distributed over a computer network. A distributed database management system (DDBMS) is the software that manages the DDB and provides an access mechanism that makes this distribution transparent to the users. The distributed database (DDB) and distributed database management system (DDBMS) together are called Distributed database systems (DDBS).


Data Fragmentation

Fragmentation is the task of dividing a table into a set of smaller tables. The subsets of the table are called fragments. These fragments may be stored at different locations. Moreover, fragmentation increases parallelism and provides better disaster recovery. Fragmentation can be of three types:

  • Vertical Fragmentation
  • Horizontal Fragmentation
  • Hybrid Fragmentation

Fragmentation should be done in a way so that the original table can be reconstructed from the fragments. This is needed so that the original table can be reconstructed from the fragments whenever required. This requirement is called "reconstructiveness".

Vertical Fragmentation

In vertical fragmentation, the fields or columns of a table are grouped into fragments. In order to maintain re-constructiveness, each fragment should contain the primary key field(s) of the table. Vertical fragmentation can be used to enforce the privacy of data.


Example 4.1: Let us consider that a college management system keeps records of all registered students in a Student table having the following schema.


Now, the address details are maintained in the admin section. In this case, the designer will fragment the database as follows: 

CREATE TABLE Std_address AS

SELECT Stu_id, Stu_address

FROM Student;

By executing the above query, we get the following result:


Horizontal Fragmentation

Horizontal fragmentation groups the tuples of a table in accordance with the values of one or more fields. Horizontal fragmentation should also confirm the rule of reconstructiveness. Each horizontal fragment must have all columns of the original base table.


Example 4.2: In the student schema which is shown in figure 4.7 if the details of all students of department 1 need to be maintained at the respective faculty, then the designer will horizontally. fragment the database as follows:

CREATE TABLE Department AS

SELECT *FROM Student

WHERE Dept_id=1;

By executing the above query, we get the following result:


Hybrid Fragmentation

In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques is used. This is the most flexible fragmentation technique since it generates fragments with minimal extraneous information. However, reconstruction of the original table is often an expensive task. Hybrid fragmentation can be done in two alternative ways:

1. At first, generate a set of horizontal fragments; then generate vertical fragments from one or more of the horizontal fragments.

2. At first, generate a set of vertical fragments; then generate horizontal fragments from one or more of the vertical fragments.




Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Discuss classification or taxonomy of virtualization at different levels.