Explain the different components of data warehouse. How data cube precomputation is performed? Describe.

Components of Data Warehouse 

A typical data warehouse has four main components: a central database, ETL (extract, transform, and load) tools, metadata, and access tools. All of these components are engineered for speed so that we can get results quickly and analyze data on the fly.




                                              Figure: Components of data warehouse

The figure shows the essential elements of a typical warehouse. We see the ETL shown on the left. The Data staging element serves as the next building block. In the middle, we see the Data Storage component that handles the data warehouses data. This element not only stores and manages the data; it also keeps track of data using the metadata repository. The Information Delivery component is shown on the right consists of all the different ways of making the information from the data warehouses available to the users. The major four components of Datawarehouse are listed below:

1. Central database

A database serves as the foundation of your data warehouse. Traditionally, these have been standard relational databases running on-premise or in the cloud. But because of Big Data, the mitu need for true, real-time performance, and a drastic reduction in the cost of RAM, in-memory e databases are rapidly gaining in popularity.

2. Data integration

Data is pulled from source systems and modified to align the information for rapid analytical consumption using a variety of data integration approaches such as ETL (extract, transform, load) and ELT as well as real-time data replication, bulk-load processing, data transformation, and data quality and enrichment services. 

3. Metadata

Metadata is data about your data. It specifies the source, usage, values, and other features of the datasets in your data warehouse. There are business metadata, which adds context to your data, and technical metadata, which describes how to access data - including where it resides and how it is structured.

4. Data warehouse access tools

Access tools allow users to interact with the data in your data warehouse. Examples of access tools include query and reporting tools, application development tools, data mining tools, and OLAP tools.

 Data cube precomputation is performed in the following ways:-

There are three choices for data cube materialization given a base cuboid:

1. No materialization: Do not pre-compute any of the cuboids. This leads to computing expensive multidimensional aggregates on the fly, which can be extremely slow.

2. Full materialization: Pre-compute all of the cuboids. The resulting lattice of computed cuboids is referred to as the full cube. This choice typically requires huge amounts of memory space in order to store all of the pre-computed cuboids.

3. Partial materialization: Selectively compute a proper subset of the whole set of possible cuboids. Alternatively, we may compute a subset of the cube, which contains only those cells that satisfy some user-specified criterion, such as where the tuple count of each cell is above some threshold. We will use the term subcube to refer to the latter case, where only some of the cells may be pre-computed for various cuboids. Partial materialization represents an interesting trade-off between storage space and response time.


Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?

Discuss classification or taxonomy of virtualization at different levels.