Documentation¶

al¶

al.instance_strategies¶

The al.instance_strategies implements various active learning strategies.

class al.instance_strategies.BaseStrategy(seed=0)¶: Class - Base strategy

class al.instance_strategies.BootstrapFromEach(seed)¶

Class - used if not bootstrapped

bootstrap(pool, y, k=1)¶

Parameters

pool (int) - range of numbers within length of pool
y - None or possible pool
k (int) - 1 or possible bootstrap size

Returns

chosen array of indices

class al.instance_strategies.ErrorReductionStrategy(classifier, classifier_args, seed=0, sub_pool=None)¶

Class - used if strategy selected is erreduct, inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶

Overide method BaseStrategy.chooseNext

Parameters

pool (int) - range of numbers within length of pool
X - None or pool.toarray()
model - None
k (int) - 1 or step size
current_train_indices - None or array of trained indices
current_train_y - None or train_indices specific to y_pool

Returns

[candidates[i] for i in uis[:k]]

log_loss(probs)¶

Computes log_loss

Parameters

probs

Returns

ll/(len(probs)*1.)

class al.instance_strategies.LogGainStrategy(classifier, classifier_args, seed=0, sub_pool=None)¶

Class - used if strategy selected is loggain, inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶

Overide method BaseStrategy.chooseNext

Parameters

pool (int) - range of numbers within length of pool
X - None or pool.toarray()
model - None
k (int) - 1 or step size
current_train_indices - None or array of trained indices
current_train_y - None or train_indices specific to y_pool

Returns

[candidates[i] for i in uis[:k]]

log_gain(probs, labels)¶

Computes log_gain

Parameters

probs, labels

Returns

lg - computed log_gain

class al.instance_strategies.QBCStrategy(classifier, classifier_args, seed=0, sub_pool=None, num_committee=4)¶

Class - used if strategy selected is qbc, inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶

Overide method BaseStrategy.chooseNext

Parameters

pool (int) - range of numbers within length of pool
X - None or pool.toarray()
model - None
k (int) - 1 or step size
current_train_indices - None or array of trained indices
current_train_y - None or train_indices specific to y_pool

Returns

[candidates[i] for i in dis[:k]]

vote_entropy(sample)¶

Computes vote entropy.

Parameters

sample

Returns

out (int)

class al.instance_strategies.RandomBootstrap(seed)¶

Class - used if strategy selected is rand

bootstrap(pool, y=None, k=1)¶

Parameters

pool (int) - range of numbers within length of pool
y - None or possible pool
k (int) - 1 or possible bootstrap size

Returns

randS.chooseNext(pool, k=k) - choose next pool

class al.instance_strategies.RandomStrategy(seed=0)¶

Class - used if strategy is rand, inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶

Overide method BaseStrategy.chooseNext

Parameters

pool (int) - range of numbers within length of pool
X - None or pool.toarray()
model - None
k (int) - 1 or step size
current_train_indices - None or array of trained indices
current_train_y - None or train_indices specific to y_pool

Returns

[list_pool[i] for i in rand_indices[:k]] - array of random permutations given pool

class al.instance_strategies.RotateStrategy(strategies)¶

Class - inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶

Overide method BaseStrategy.chooseNext

Parameters

pool (int) - range of numbers within length of pool
X - None or pool.toarray()
model - None
k (int) - 1 or step size
current_train_indices - None or array of trained indices
current_train_y - None or train_indices specific to y_pool

Returns

self.strategies[self.counter].chooseNext(pool, X, model, k=k, current_train_indices = current_train_indices, current_train_y = current_train_y)

class al.instance_strategies.UncStrategy(seed=0, sub_pool=None)¶

Class - used if strategy selected is unc, inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶

Overide method BaseStrategy.chooseNext

Parameters

pool (int) - range of numbers within length of pool
X - None or pool.toarray()
model - None
k (int) - 1 or step size
current_train_indices - None or array of trained indices
current_train_y - None or train_indices specific to y_pool

Returns

[candidates[i] for i in uis[:k]]

al.learning_curve¶

The al.learning_curve implements the methods needed to run a given active learning strategy.

class al.learning_curve.LearningCurve¶

Class - run multiple trials or run trials one at a time

run_trials(X_pool, y_pool, X_test, y_test, al_strategy, classifier_name, classifier_arguments, bootstrap_size, step_size, budget, num_trials)¶

Runs a given active learning strategy multiple trials and returns the average performance.

Parameters

X_pool - returned from load_svmlight_file
y_pool - returned from load_svmlight_file
X_test - returned from load_svmlight_file
y_test - returned from load_svmlight_file
al_strategy - Represent a list of strategies for choosing next samples (default - rand).
classifier_name - Represents the classifier that will be used (default - MultinomialNB) .
classifier_arguments - Represents the arguments that will be passed to the classifier (default - ‘’).
bootstrap_size - Sets the Boot strap (default - 10).
step_size - Sets the step size (default - 10).
budget - Sets the budget (default - 500).
num_trials - Number of trials (default - 10).

Returns

(values, avg_accu, avg_auc) - training_size, respective average performance

front_end.cl¶

The front_end.cl.run_al_cl implements the methods needed to run the command-line interface.

class front_end.cl.run_al_cl.cmd_parse¶

Class - command line parser

assign_args()¶: Assigns values to each of the specified command line arguments for use by al.learning_curve

main()¶: Calls retrieve_args, assign_args, run_al

retrieve_args()¶: Adds arguments to the parser for each respective setting of the command line interface

run_al()¶: Calls al.learning_curve.LearningCurve and draws plots using utils.utils

front_end.cl.run_al_cl.load_data(dataset1, dataset2=None)¶

Loads the dataset(s) given in the the svmlight / libsvm format and assumes a train/test split

Parameters

dataset1 (str) - Path to the file of the first dataset.
dataset2 (str or None) - If not None, path to the file of second dataset

Returns

(X_pool, X_test, y_pool, y_test) - Pool and test files

Examples¶

The following code runs front_end.cl.run_al_cl with the following parameters:

number of trials - 5

strategy - rand

bootstrap - 10

budget - 500

step size - 10

subpool - 250

data paths - ../../../data/imdb-binary-pool-mindf5-ng11 ../../../data/imdb-binary-test-mindf5-ng11
python run_al_cl.py -c MultinomialNB -nt 5 -st rand -bs 10 -b 500 -sz 10 -sp 250 -d ../../../data/imdb-binary-pool-mindf5-ng11 ../../../data/imdb-binary-test-mindf5-ng11
The output of this code is:

Status:
Loading took 17.88s.

trial 0
trial 1
trial 2
trial 3
trial 4
Data output is placed in a file in your current working directory. The default filename is avg_results.txt.

Sample Data Output:
rand
accuracy
train size,mean
10,0.557016
20,0.538432
30,0.534664
40,0.575320
50,0.651672
60,0.621416
70,0.670400
80,0.645680
90,0.659520
100,0.610160
110,0.658024
Plot Image:

front_end.gui¶

The GUI module to run the active learning strategies.

class front_end.gui.run_al_gui.HelperFunctions¶

Class - includes helper functions.

all_combos()¶: Retrieve all possible combinations of classifier and strategy

gray_out()¶: Enables or disables the show_plots checkboxes depending on which classifiers - strategies have been run

gray_run()¶: Enables or disables run checkboxes depending on if the data has been loaded

load_data(dataset1, dataset2=None)¶

Loads the dataset(s) given in the the svmlight / libsvm format and assumes a train/test split

Parameters

dataset1 (str) - Path to the file of the first dataset.
dataset2 (str or None) - If not None, path to the file of second dataset

Returns

Pool and test files - X_pool, X_test, y_pool, y_test

open_data(filetype)¶

Set label values in gui and call front_end.gui.run_al_gui.HelperFunctions.load_data and front_end.gui.run_al_gui.HelperFunctions.gray_run

Parameters

filetype (str) - ‘train’, ‘test’, or ‘single’

class front_end.gui.run_al_gui.Main¶

Integrates all objects together

exit_master()¶: Closes gui

run()¶: Calls front_end.gui.run_al_gui.Main.show_menubar

show_menubar()¶: Configures menubar

class front_end.gui.run_al_gui.MainCanvas(master)¶

Class - creates main canvas (window)

add_alerts()¶: Creates alerts(labels)

add_buttons()¶: Creates buttons; calls front_end.gui.run_al_gui.MainCanvas.show_plots, front_end.gui.run_al_gui.MainCanvas.clear_plots, front_end.gui.run_al_gui.MainCanvas.run, front_end.gui.run_al_gui.MainCanvas.reset, front_end.gui.run_al_gui.MainCanvas.save_auc, front_end.gui.run_al_gui.MainCanvas.save_acc

add_classifier_frame_2(master)¶

Create show_plots classifier frame

Parameters

master - main Tkinter window

add_run_classifier_frame(master)¶

Creates run classifier frame

Parameters

master - main Tkinter window

add_run_strategy_frame(master)¶

Creates run strategy frame

Parameters

master - main Tkinter window

add_strategy_frame_2(master)¶

Create show_plots strategy frame

Parameters

master - main Tkinter window

clean(params_dict)¶

Cleans parameter values

Parameters

params_dict (dict) - parameters to be reset

clear_plots()¶: Clear the plots and show empty plots; calls front_end.gui.run_al_gui.MainCanvas.clean and front_end.gui.run_al_gui.MainCanvas.show_plots

plot_acc(clas_strat, width_org, height_org, savefile)¶

Plots accuracy

Parameters

clas_strat (list) - classifier-strategy combinations width_org (int) - picture width height_org (int) - picture height savefile (str) - path to picture’s save location

plot_auc(clas_strat, width_org, height_org, savefile)¶

Plots auc

Parameters

clas_strat (list) - classifier-strategy combinations width_org (int) - picture width height_org (int) - picture height savefile (str) - path to picture’s save location

reset()¶: Resets the gui; calls front_end.gui.run_al_gui.MainCanvas.clean

run()¶: Calls al.learning_curve.run_trials, utils.utils.assign_plot_params, utils.utils.data_to_py, and front_end.gui.run_al_gui.HelperFunctions.gray_out

save_acc()¶: Saves accuracy plot: calls front_end.gui.run_al_gui.MainCanvas.show_plots

save_auc()¶: Saves auc plot; calls front_end.gui.run_al_gui.MainCanvas.show_plots

show_plots(auc_save=False, acc_save=False)¶

Show the plots; calls front_end.gui.run_al_gui.MainCanvas.plot_acc and front_end.gui.run_al_gui.MainCanvas.plot_auc

Parameters

auc_save (str) - False or path to auc plot’s save location
acc_save (str) - False or path to accuracy plot’s save location

class front_end.gui.run_al_gui.MenuWindow(master)¶

Class - creates the menu bar (including the file and edit menus)

show_editmenu(master)¶

Creates the edit menubar and calls front_end.gui.run_al_gui.ParamsWindow.display_pref

Parameters

master - main Tkinter window

show_filemenu(master)¶

Creates the file menubar and calls front_end.gui.run_al_gui.HelperFunctions.open_data

Parameters

master - main Tkinter window

class front_end.gui.run_al_gui.ParamsWindow¶

Class - shows the parameters window in edit->parameters

check_int(param)¶

Check to make sure the user-defined parameter is a valid integer

Parameters

param (tuple) - user-defined parameter

Returns

True or false - depends on if the parameter is or is not a valid integer respectively

display_params()¶: Create labels, entry boxes, etc. for the parameters window

display_pref()¶: Display the parameters in a separate window

exit_pref()¶: Close the parameters window

Examples¶

The following provides an in-depth look at a sample run of front_end.gui.run_al_gui
python run_al_gui.py
GUI Main Window (with all values reset)

Setting up the gui to run the following equivalent run of the command line interface:
python run_al_cl.py -c MultinomialNB -d /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train hash -nt 5 -st rand -bs 10 -b 500 -sz 10 -sp 250
Choose train and test data files:

Edit parameters to match specified run:

Choose MultinomialNB and rand as the classifier-strategy combination:

Run terminal output:
python run_al_cl.py -pf MultinomialNB-rand -c MultinomialNB -d /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train hash -nt 5 -st rand -bs 10 -b 500 -sz 10 -sp 250
trial 0
trial 1
trial 2
trial 3
trial 4
Show plots when done: