How Does Classification Works?/LEARNING AND TESTING OF CLASSIFICATION
Data classification is a two-step process. They are:
1. Building the classifier: This step is also known as the model construction, training, or learning phase. In this step, a model or classifier is constructed by using any one classification algorithm based on a training set made up of database tuples and their associated class labels. The classifier constructed in this step can be a decision tree, if-then rules, weight-adjusted neural network or mathematical formulae, etc. For example, Consider, a class labeled training dataset of employees with attributes Name, Rank, and Years as input attributes features and Tenured as a category attribute with two possible values no and yes. In the learning step, from this given training data set classification algorithm learns the model or rule < IF rank="Professor OR years>6 THEN tenured='yes'> as shown in the figure below.
Using the classifier for classification: After the model has been constructed in the model construction step, then we can use the constructed model for classification purposes only if the model is accurate as per our application demands. So, before using the model, we first need to test its accuracy. To measure the accuracy of a model we need test data. The test data is randomly selected from the general data set, and it is similar in its structure to training data i.e., test data is also already labeled data. However, the test data should be independent of the training dataset, otherwise over-fitting will occur. To measure the accuracy of a model the known label of test data is compared with the classified result obtained from the model. The accuracy rate is the percentage of test samples that are correctly classified by the model.
Accuracy =Number of correct classifications/ Total number of test cases
If the accuracy calculated in this way is acceptable, then use the model to classify new data tuples whose class labels is not known. For example:
OR.
Data classification is a two-step process:
(1) Model construction
Training data are analyzed by a classification algorithm. A classifier is built describing a predetermined set of data classes or concepts. Also called as training phase or learning stage.
(2) Model usage
Test data are used to estimate the accuracy of the classification rules. If the accuracy is considered acceptable, the rules can be applied to the classification of new data tuples.
OR,
With the help of the bank loan application that we have discussed above, let us understand the working of classification. The Data Classification process includes two steps:
a) Building the Classifier or Model
b) Using Classifier for Classification
a)Building the Classifier or Model
• This step is the learning step or the learning phase.
• In this step the classification algorithms build the classifier.
• The classifier is built from the training set made up of database tuples and their associated class labels.
• Each tuple that constitutes the training set is referred to as a category or class. These tuples can also be referred to as samples, objects, or data points.
b) Using Classifier for Classification
In this step, the classifier is used for classification. Here the test data is used to estimate the accuracy of classification rules. The classification rules can be applied to the new data tuples if the accuracy is considered acceptable.
Comments
Post a Comment