Explain different approaches for constructing a data-warehouse (i.e top-down approach and the Bottom-up approach)

- December 14, 2021

Data Warehouse Architecture

A data warehouse is a heterogeneous collection of different data sources organized under a unified schema.

There are 2 approaches for constructing a data-warehouse: The top-down approach and the Bottom-up approach are explained below.

1)Top-down approach

The essential components are discussed below:

1. External Sources –

The external source is a source from where data is collected irrespective of the type of data. Data can be structured, semi-structured, and unstructured as well.

2. Stage Area –

Since the data, extracted from the external sources do not follow a particular format, so there is a need to validate this data to load into the data warehouse. For this purpose, it is recommended to use the ETL tool.

• E(Extracted): Data is extracted from an external data source.

• T(Transform): Data is transformed into the standard format.

• L(Load): Data is loaded into the data warehouse after transforming it into the standard format.

3. Data-warehouse –

After cleansing of data, it is stored in the data warehouse as a central repository. It actually stores the metadata and the actual data gets stored in the data marts. Note that the data warehouse stores the data in its purest form in this top-down approach.

4. Data Marts –

Datamart is also a part of the storage component. It stores the information of a particular function of an organization which is handled by a single authority. There can be as many data marts in an organization depending upon the functions. We can also say that the data mart contains a subset of the data stored in the data warehouse.

5. Data Mining –

The practice of analyzing the big data present in the data warehouse is data mining. It is used to find the hidden patterns that are present in the database or in the data warehouse with the help of an algorithm of data mining. This approach is defined by Inmon as – data warehouse as a central repository for the complete organization and data marts are created from it after the complete data warehouse has been created.

Advantages of Top-Down Approach –

1. Since the data marts are created from the data warehouse, provides a consistent dimensional view of data marts.

2. Also, this model is considered the strongest model for business changes. That’s why big organizations prefer to follow this approach.

3. Creating a data mart from a data warehouse is easy.

Disadvantages of Top-Down Approach –

1. The cost, time is taken in designing, and its maintenance are very high.

2) Bottom-up approach:

1. First, the data is extracted from external sources (the same as happens in the top-down approach).

2. Then, the data go through the staging area (as explained above) and is loaded into data marts instead of the data warehouse. The data marts are created first and provide reporting capability. It addresses a single business area.

3. These data marts are then integrated into the data warehouse. This approach is given by Kimball as – data marts are created first and provide a thin view for analyses and a data warehouse is created after complete data marts have been created.

Advantages of Bottom-Up Approach –

1. As the data marts are created first, so the reports are quickly generated.

2. We can accommodate more data marts here and in this way the data warehouse can be extended.

3. Also, the cost and time taken in designing this model are low comparatively.

Disadvantages of Bottom-Up Approach –

1. This model is not strong as the top-down approach as the dimensional view of data marts is not consistent as it is in the above approach.

Search This Blog

Notes for BSc CSIT

Explain different approaches for constructing a data-warehouse (i.e top-down approach and the Bottom-up approach)

Comments

Post a Comment

Popular posts from this blog

What are different steps used in JDBC? Write down a small program showing all steps.

Discuss classification or taxonomy of virtualization at different levels.

Pure Versus Partial EC