classify

Classification of feature-vectors using KNN classifier.

The KNN class contains the classifier. It can classify() new datapoints as soon as it is properly trained using the train() method. The test() method provides a way to classify many vectors at once, and return the classifiers accuracy compared to a gold standard.

Author:Kjetil Valle <kjetilva@stud.ntnu.no>
class classify.KNN(use_centroids=False, k=5)

K-nearest neighbors classifier.

Classifier for labeled data in feature-vector format. Supports k-nearest classification against trained data samples, and 1-nearest classification against class centroids.

classify(qs, distance_metric='cosine')

Classifies a list of query cases.

When classifying only those features that are active are used, all other features are ignored. The set of active features can be changed by set_active_features().

Feature matrix qs is similar to that used in train(), i.e a NxM matrix where N is number of features and M documents.

The string distance_metric defines what metric to use when comparing feture vectors.
See http://docs.scipy.org/doc/scipy/reference/spatial.distance.html#scipy.spatial.distance.cdist for list of supported metrics.

Returns classification of each of the input cases.

set_active_features(list=None)

Changes the set of active feature.

Takes a list of features to make active. Could either be a list of feature indices, or boolean list with length equal to number of features where true == active. If None, all features are activated.

test(features, gold)

Tests this classifier against a set of labeled data.

It is assumed that the classifier has been trained before this method is called.

features is a NxM (features x documents) feature matrix, and gold a list of labels belonging to each of the documents in the feature matrix.

Returns the accuracy of the classifier over the training data.

train(features, labels)

Trains the KNN on a set of data.

Uses NxM feature matrix features with M samples, each of N features. See output from data.read_files().

The list of labels correspond to each of the M samples.

Previous topic

graph

Next topic

retrieval

This Page