Describe Multi-Way Array Aggregation (Multi-Way) Method for Computing Full Cubes with suitable example.

 The Multi-Way Array Aggregation (Multi-Way) Method for Computing Full Cubes

• It is an array-based "bottom-up" algorithm. The Multi-way Array Aggregation (or simply Multi-Way) method computes a full data cube by using a multidimensional array as its basic data structure. It is a typical MOLAP approach that uses direct array addressing, where dimension values are accessed via the position or index of their corresponding array locations. Hence, Multi-Way cannot perform any value-based reordering as an optimization technique. A different approach is developed for the array-based cube construction, as follows:

• Partition the array into chunks: A chunk is a sub-cube that is small enough to fit into the memory available for cube computation. Chunking is a method for dividing an n-dimensional array into small n-dimensional chunks, where each chunk is stored as an object on a disk. The chunks are compressed so as to remove wasted space resulting from empty array cells.

• Compute aggregates by visiting (i.e., accessing the values at) cube cells: The order in which cells are visited can be optimized so as to minimize the number of times that each cell must be revisited, thereby reducing memory access and storage costs.


Aggregation Strategy

1. Partitions array into chunks

2. Chunk: a small sub-cube that fits in memory

3. Data addressing

4. Uses chunk id and offset

5. Multi-way Aggregation

6. Computes aggregates in multi-way

7. Visits chunks in the order

a. to minimize memory access

b. to minimize memory space


The best traversing order to do multi-way aggregation:

• Method: the planes should be sorted and computed according to their size in ascending order.

• Idea: keep the smallest planes in the main memory, fetch and compute only one chunk at a time for the largest plane

• Limitation of the method: computing well only for a small number of dimensions

• If there are a large number of dimensions, "bottom-up computation”  and iceberg computation methods can be explored.

Example: Consider a 3-D data array containing the three dimensions A, B, and C. The 3-D array is partitioned into small chunks. Here, the array is partitioned into 64 chunks as shown in figure Dimension A is organized into four equal-sized partitions a0, a1, a2, and a3. B is organized into equal-sized partitions and Dimensions C is similarly organized into four partitions each. Chunk 2...... 64 corresponds to the sub-cubes a0b0c0, a1b0c0..., a3b3bc3, respectively.Suppose that the cardinality and size of the array of A, B, and C is 40, 400, and 4000, respectively. Thus, the size of each partition is 10, 100, and 1000, respectively. Full materialization of the corresponding data cube involves the computation of all of the cuboids defining this cube i.e., (apex A, B, C, AB, AC, BC, ABC).




Comments

Popular posts from this blog

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Discuss classification or taxonomy of virtualization at different levels.

Suppose that a data warehouse consists of the four dimensions; date, spectator, location, and game, and the two measures, count and charge, where charge is the fee that a spectator pays when watching a game on a given date. Spectators may be students, adults, or seniors, with each category having its own charge rate. a) Draw a star schema diagram for the data b) Starting with the base cuboid [date; spectator; location; game], what specific OLAP operations should perform in order to list the total charge paid by student spectators at GM Place in 2004?