Explain the different components of data warehouse. How data cube precomputation is performed? Describe.
Components of Data Warehouse
A typical data warehouse has four main components: a central database, ETL (extract, transform, and load) tools, metadata, and access tools. All of these components are engineered for speed so that we can get results quickly and analyze data on the fly.
Figure: Components of data warehouse
The figure shows the essential elements of a typical warehouse. We see the ETL shown on the left. The Data staging element serves as the next building block. In the middle, we see the Data Storage component that handles the data warehouses data. This element not only stores and manages the data; it also keeps track of data using the metadata repository. The Information Delivery component is shown on the right consists of all the different ways of making the information from the data warehouses available to the users. The major four components of Datawarehouse are listed below:
1. Central database
A database serves as the foundation of your data warehouse. Traditionally, these have been standard relational databases running on-premise or in the cloud. But because of Big Data, the mitu need for true, real-time performance, and a drastic reduction in the cost of RAM, in-memory e databases are rapidly gaining in popularity.
2. Data integration
Data is pulled from source systems and modified to align the information for rapid analytical consumption using a variety of data integration approaches such as ETL (extract, transform, load) and ELT as well as real-time data replication, bulk-load processing, data transformation, and data quality and enrichment services.
3. Metadata
Metadata is data about your data. It specifies the source, usage, values, and other features of the datasets in your data warehouse. There are business metadata, which adds context to your data, and technical metadata, which describes how to access data - including where it resides and how it is structured.
4. Data warehouse access tools
Access tools allow users to interact with the data in your data warehouse. Examples of access tools include query and reporting tools, application development tools, data mining tools, and OLAP tools.
Data cube precomputation is performed in the following ways:-
There are three choices for data cube materialization given a base cuboid:
1. No materialization: Do not pre-compute any of the cuboids. This leads to computing expensive multidimensional aggregates on the fly, which can be extremely slow.
2. Full materialization: Pre-compute all of the cuboids. The resulting lattice of computed cuboids is referred to as the full cube. This choice typically requires huge amounts of memory space in order to store all of the pre-computed cuboids.
3. Partial materialization: Selectively compute a proper subset of the whole set of possible cuboids. Alternatively, we may compute a subset of the cube, which contains only those cells that satisfy some user-specified criterion, such as where the tuple count of each cell is above some threshold. We will use the term subcube to refer to the latter case, where only some of the cells may be pre-computed for various cuboids. Partial materialization represents an interesting trade-off between storage space and response time.
Comments
Post a Comment