What are the goals that data mining attempts to facilitate./Different Data Mining Tasks.

 Goals of Data Mining 

Data mining is typically carried out with some end goals or applications. Broadly speaking, these goals fall into the following classes: prediction, identification, classification, and optimization.

a)Prediction

  •  Data mining can show how certain attributes within the data will behave in the future. 
  • Examples of predictive data mining include the analysis of buying transactions to predict what consumers will buy under certain discounts, how much sales volume a store will generate in a given period, and whether deleting a product line will yield more profits.
  •  In such applications, business logic is used coupled with data mining. In a scientific context, certain seismic wave patterns may predict an earthquake with high probability.


b) Identification. 

  • Data patterns can be used to identify the existence of an item, an event, or an activity. 
  • For example, intruders trying to break a system may be identified by the programs executed, files accessed, and CPU time per session. In biological applications, the existence of a gene may be identified by certain sequences of nucleotide symbols in the DNA sequence. 
  • The area known as authentication is a form of identification. It ascertains whether a user is indeed a specific user or one from an authorized class, and involves a comparison of parameters or images, or signals against a database.


c) Classification. 

  • Data mining can partition the data so that different classes or categories can be identified based on combinations of parameters.
  •  For example, customers in a supermarket can be categorized into discount-seeking shoppers, shoppers in a rush, loyal regular shoppers, shoppers attached to name brands, and infrequent shoppers. This classification may be used in different analyses of customer buying transactions as a post-mining activity. 
  • Sometimes classification based on common domain knowledge is used as an input to decompose the mining problem and make it simpler. For instance, health foods, party foods, or school lunch foods are distinct categories in the supermarket business. 
  • It makes sense to analyze relationships within and across categories as separate problems. Such categorization may be used to encode the data appropriately before subjecting it to further data mining.


d)  Optimization

  •  One eventual goal of data mining may be to optimize the use of limited resources such as time, space, money, or materials and to maximize output variables such as sales or profits under a given set of constraints. 
  • As such, this goal of data mining resembles the objective function used in operations research problems that deal with optimization under constraints.


          So, The term data mining is popularly used in a very broad sense. In some situations, it includes statistical analysis and constrained optimization, as well as machine learning. There is no sharp line separating data mining from these disciplines. It is beyond our scope, therefore, to discuss in detail the entire range of applications that make up this vast body of work.


                         OR,

Introduction to Data Mining Tasks / Goal

The data mining tasks/goal can be classified generally into two types based on what a specific task tries to achieve. Those two categories are descriptive tasks and predictive tasks. The descriptive data mining tasks characterize the general properties of data whereas predictive data mining tasks perform inference on the available data set to predict how a new data set will behave.


Different Data Mining Tasks

There are a number of data mining tasks/goals such as classification, prediction, time-series analysis, association, clustering, summarization, etc. All these tasks are either predictive data mining tasks or descriptive data mining tasks. A data mining system can execute one or more of the above-specified tasks as part of data mining.





  • Predictive data mining tasks come up with a model from the available data set that is helpful in predicting unknown or future values of another data set of interest. A medical practitioner trying to diagnose a disease based on the medical test results of a patient can be considered as a predictive data mining task.
  •  Descriptive data mining tasks usually find data describing patterns and come up with new, significant information from the available data set. A retailer trying to identify products that are purchased together can be considered as a descriptive data mining task.


a) Classification

  • Classification derives a model to determine the class of an object based on its attributes. A collection of records will be available, each record with a set of attributes. One of the attributes will be a class attribute and the goal of the classification task is assigning a class attribute to the new set of records as accurately as possible.
  • Classification can be used in direct marketing, that is to reduce marketing costs by targeting a set of customers who are likely to buy a new product. Using the available data, it is possible to know which customers purchased similar products and who did not purchase in the past. Hence, {purchase, don’t purchase} decision forms the class attribute in this case. Once the class attribute is assigned, demographic and lifestyle information of customers who purchased similar products can be collected and promotion emails can be sent to them directly.


b) Prediction

The prediction task predicts the possible values of missing or future data. Prediction involves developing a model based on the available data and this model is used in predicting future values of a new data set of interest. For example, a model can predict the income of an employee based on education, experience, and other demographic factors like place of stay, gender, etc. Also, prediction analysis is used in different areas including medical diagnosis, fraud detection, etc.


c) Time - Series Analysis

Time series is a sequence of events where the next event is determined by one or more of the preceding events. Time series reflects the process being measured and there are certain components that affect the behavior of a process. Time series analysis includes methods to analyze time-series data in order to extract useful patterns, trends, rules, and statistics. Stock market prediction is an important application of time-series analysis.


d) Association

Association discovers the association or connection among a set of items. Association identifies the relationships between objects. Association analysis is used for commodity management, advertising, catalog design, direct marketing, etc. A retailer can identify the products that normally customers purchase together or even find the customers who respond to the promotion of the same kind of products. If a retailer finds that beer and nappy are bought together mostly, he can put nappies on sale to promote the sale of beer.


e) Clustering

Clustering is used to identify data objects that are similar to one another. The similarity can be decided based on a number of factors like purchase behavior, responsiveness to certain actions, geographical locations, and so on. For example, an insurance company can cluster its customers based on age, residence, income, etc. This group information will be helpful to understand the customers better and hence provide better-customized services.


f) Summarization

Summarization is the generalization of data. A set of relevant data is summarized which results in a smaller set that gives aggregated information of the data. For example, the shopping done by a customer can be summarized into total products, total spending, offers used, etc. Such high-level summarized information can be useful for sales or customer relationship teams for detailed customer and purchase behavior analysis. Data can be summarized at different abstraction levels and from different angles.


So, Different data mining tasks are the core of the data mining process. Different prediction and classification data mining tasks actually extract the required information from the available data sets.

Comments

Popular posts from this blog

Discuss classification or taxonomy of virtualization at different levels.

Suppose that a data warehouse for Big-University consists of the following four dimensions: student, course, semester, and instructor, and two measures count and avg_grade. When at the lowest conceptual level (e.g., for a given student, course, semester, and instructor combination), the avg_grade measure stores the actual course grade of the student. At higher conceptual levels, avg_grade stores the average grade for the given combination. a) Draw a snowflake schema diagram for the data warehouse. b) Starting with the base cuboid [student, course, semester, instructor], what specific OLAP operations (e.g., roll-up from semester to year) should one perform in order to list the average grade of CS courses for each BigUniversity student. c) If each dimension has five levels (including all), such as “student < major < status < university < all”, how many cuboids will this cube contain (including the base and apex cuboids)?

Short note on E-Government Architecture