Explain different approaches for constructing a data-warehouse (i.e top-down approach and the Bottom-up approach)

 Data Warehouse Architecture

  • A data warehouse is a heterogeneous collection of different data sources organized under a unified schema. 
  • There are 2 approaches for constructing a data-warehouse: The top-down approach and the Bottom-up approach are explained below.

1)Top-down approach



The essential components are discussed below:

1. External Sources –

The external source is a source from where data is collected irrespective of the type of data. Data can be structured, semi-structured, and unstructured as well.

2. Stage Area –

Since the data, extracted from the external sources do not follow a particular format, so there is a need to validate this data to load into the data warehouse. For this purpose, it is recommended to use the ETL tool.

• E(Extracted): Data is extracted from an external data source.

• T(Transform): Data is transformed into the standard format.

• L(Load): Data is loaded into the data warehouse after transforming it into the standard format.

3. Data-warehouse –

After cleansing of data, it is stored in the data warehouse as a central repository. It actually stores the metadata and the actual data gets stored in the data marts. Note that the data warehouse stores the data in its purest form in this top-down approach.

4. Data Marts –

Datamart is also a part of the storage component. It stores the information of a particular function of an organization which is handled by a single authority. There can be as many data marts in an organization depending upon the functions. We can also say that the data mart contains a subset of the data stored in the data warehouse.

5. Data Mining –

The practice of analyzing the big data present in the data warehouse is data mining. It is used to find the hidden patterns that are present in the database or in the data warehouse with the help of an algorithm of data mining. This approach is defined by Inmon as – data warehouse as a central repository for the complete organization and data marts are created from it after the complete data warehouse has been created.


Advantages of Top-Down Approach –

1. Since the data marts are created from the data warehouse, provides a consistent dimensional view of data marts.

2. Also, this model is considered the strongest model for business changes. That’s why big organizations prefer to follow this approach.

3. Creating a data mart from a data warehouse is easy.


Disadvantages of Top-Down Approach –

1. The cost, time is taken in designing, and its maintenance are very high.



2) Bottom-up approach:



1. First, the data is extracted from external sources (the same as happens in the top-down approach).

2. Then, the data go through the staging area (as explained above) and is loaded into data marts instead of the data warehouse. The data marts are created first and provide reporting capability. It addresses a single business area.

3. These data marts are then integrated into the data warehouse. This approach is given by Kimball as – data marts are created first and provide a thin view for analyses and a data warehouse is created after complete data marts have been created.


Advantages of Bottom-Up Approach –

1. As the data marts are created first, so the reports are quickly generated.

2. We can accommodate more data marts here and in this way the data warehouse can be extended.

3. Also, the cost and time taken in designing this model are low comparatively.


 Disadvantages of Bottom-Up Approach –

1. This model is not strong as the top-down approach as the dimensional view of data marts is not consistent as it is in the above approach.



Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Discuss classification or taxonomy of virtualization at different levels.