Documentation

al

al.instance_strategies

The al.instance_strategies implements various active learning strategies.

class al.instance_strategies.BaseStrategy(seed=0)

Class - Base strategy

class al.instance_strategies.BootstrapFromEach(seed)

Class - used if not bootstrapped

bootstrap(pool, y, k=1)

Parameters

  • pool (int) - range of numbers within length of pool
  • y - None or possible pool
  • k (int) - 1 or possible bootstrap size

Returns

  • chosen array of indices
class al.instance_strategies.ErrorReductionStrategy(classifier, classifier_args, seed=0, sub_pool=None)

Class - used if strategy selected is erreduct, inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)

Overide method BaseStrategy.chooseNext

Parameters

  • pool (int) - range of numbers within length of pool
  • X - None or pool.toarray()
  • model - None
  • k (int) - 1 or step size
  • current_train_indices - None or array of trained indices
  • current_train_y - None or train_indices specific to y_pool

Returns

  • [candidates[i] for i in uis[:k]]
log_loss(probs)

Computes log_loss

Parameters

  • probs

Returns

  • ll/(len(probs)*1.)
class al.instance_strategies.LogGainStrategy(classifier, classifier_args, seed=0, sub_pool=None)

Class - used if strategy selected is loggain, inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)

Overide method BaseStrategy.chooseNext

Parameters

  • pool (int) - range of numbers within length of pool
  • X - None or pool.toarray()
  • model - None
  • k (int) - 1 or step size
  • current_train_indices - None or array of trained indices
  • current_train_y - None or train_indices specific to y_pool

Returns

  • [candidates[i] for i in uis[:k]]
log_gain(probs, labels)

Computes log_gain

Parameters

  • probs, labels

Returns

  • lg - computed log_gain
class al.instance_strategies.QBCStrategy(classifier, classifier_args, seed=0, sub_pool=None, num_committee=4)

Class - used if strategy selected is qbc, inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)

Overide method BaseStrategy.chooseNext

Parameters

  • pool (int) - range of numbers within length of pool
  • X - None or pool.toarray()
  • model - None
  • k (int) - 1 or step size
  • current_train_indices - None or array of trained indices
  • current_train_y - None or train_indices specific to y_pool

Returns

  • [candidates[i] for i in dis[:k]]
vote_entropy(sample)

Computes vote entropy.

Parameters

  • sample

Returns

  • out (int)
class al.instance_strategies.RandomBootstrap(seed)

Class - used if strategy selected is rand

bootstrap(pool, y=None, k=1)

Parameters

  • pool (int) - range of numbers within length of pool
  • y - None or possible pool
  • k (int) - 1 or possible bootstrap size

Returns

  • randS.chooseNext(pool, k=k) - choose next pool
class al.instance_strategies.RandomStrategy(seed=0)

Class - used if strategy is rand, inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)

Overide method BaseStrategy.chooseNext

Parameters

  • pool (int) - range of numbers within length of pool
  • X - None or pool.toarray()
  • model - None
  • k (int) - 1 or step size
  • current_train_indices - None or array of trained indices
  • current_train_y - None or train_indices specific to y_pool

Returns

  • [list_pool[i] for i in rand_indices[:k]] - array of random permutations given pool
class al.instance_strategies.RotateStrategy(strategies)

Class - inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)

Overide method BaseStrategy.chooseNext

Parameters

  • pool (int) - range of numbers within length of pool
  • X - None or pool.toarray()
  • model - None
  • k (int) - 1 or step size
  • current_train_indices - None or array of trained indices
  • current_train_y - None or train_indices specific to y_pool

Returns

  • self.strategies[self.counter].chooseNext(pool, X, model, k=k, current_train_indices = current_train_indices, current_train_y = current_train_y)
class al.instance_strategies.UncStrategy(seed=0, sub_pool=None)

Class - used if strategy selected is unc, inherits from al.instance_strategies.BaseStrategy

chooseNext(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)

Overide method BaseStrategy.chooseNext

Parameters

  • pool (int) - range of numbers within length of pool
  • X - None or pool.toarray()
  • model - None
  • k (int) - 1 or step size
  • current_train_indices - None or array of trained indices
  • current_train_y - None or train_indices specific to y_pool

Returns

  • [candidates[i] for i in uis[:k]]

al.learning_curve

The al.learning_curve implements the methods needed to run a given active learning strategy.

class al.learning_curve.LearningCurve

Class - run multiple trials or run trials one at a time

run_trials(X_pool, y_pool, X_test, y_test, al_strategy, classifier_name, classifier_arguments, bootstrap_size, step_size, budget, num_trials)

Runs a given active learning strategy multiple trials and returns the average performance.

Parameters

  • X_pool - returned from load_svmlight_file
  • y_pool - returned from load_svmlight_file
  • X_test - returned from load_svmlight_file
  • y_test - returned from load_svmlight_file
  • al_strategy - Represent a list of strategies for choosing next samples (default - rand).
  • classifier_name - Represents the classifier that will be used (default - MultinomialNB) .
  • classifier_arguments - Represents the arguments that will be passed to the classifier (default - ‘’).
  • bootstrap_size - Sets the Boot strap (default - 10).
  • step_size - Sets the step size (default - 10).
  • budget - Sets the budget (default - 500).
  • num_trials - Number of trials (default - 10).

Returns

  • (values, avg_accu, avg_auc) - training_size, respective average performance

front_end.cl

The front_end.cl.run_al_cl implements the methods needed to run the command-line interface.

class front_end.cl.run_al_cl.cmd_parse

Class - command line parser

assign_args()

Assigns values to each of the specified command line arguments for use by al.learning_curve

main()

Calls retrieve_args, assign_args, run_al

retrieve_args()

Adds arguments to the parser for each respective setting of the command line interface

run_al()

Calls al.learning_curve.LearningCurve and draws plots using utils.utils

front_end.cl.run_al_cl.load_data(dataset1, dataset2=None)

Loads the dataset(s) given in the the svmlight / libsvm format and assumes a train/test split

Parameters

  • dataset1 (str) - Path to the file of the first dataset.
  • dataset2 (str or None) - If not None, path to the file of second dataset

Returns

  • (X_pool, X_test, y_pool, y_test) - Pool and test files

Examples

The following code runs front_end.cl.run_al_cl with the following parameters:

  • number of trials - 5
  • strategy - rand
  • bootstrap - 10
  • budget - 500
  • step size - 10
  • subpool - 250
  • data paths - ../../../data/imdb-binary-pool-mindf5-ng11 ../../../data/imdb-binary-test-mindf5-ng11
python run_al_cl.py -c MultinomialNB -nt 5 -st rand -bs 10 -b 500 -sz 10 -sp 250 -d ../../../data/imdb-binary-pool-mindf5-ng11 ../../../data/imdb-binary-test-mindf5-ng11

The output of this code is:

Status:

Loading took 17.88s.

trial 0
trial 1
trial 2
trial 3
trial 4

Data output is placed in a file in your current working directory. The default filename is avg_results.txt.

Sample Data Output:

rand
accuracy
train size,mean
10,0.557016
20,0.538432
30,0.534664
40,0.575320
50,0.651672
60,0.621416
70,0.670400
80,0.645680
90,0.659520
100,0.610160
110,0.658024

Plot Image:

_images/run1.png

front_end.gui

The GUI module to run the active learning strategies.

class front_end.gui.run_al_gui.HelperFunctions

Class - includes helper functions.

all_combos()

Retrieve all possible combinations of classifier and strategy

gray_out()

Enables or disables the show_plots checkboxes depending on which classifiers - strategies have been run

gray_run()

Enables or disables run checkboxes depending on if the data has been loaded

load_data(dataset1, dataset2=None)

Loads the dataset(s) given in the the svmlight / libsvm format and assumes a train/test split

Parameters

  • dataset1 (str) - Path to the file of the first dataset.
  • dataset2 (str or None) - If not None, path to the file of second dataset

Returns

  • Pool and test files - X_pool, X_test, y_pool, y_test
open_data(filetype)

Set label values in gui and call front_end.gui.run_al_gui.HelperFunctions.load_data and front_end.gui.run_al_gui.HelperFunctions.gray_run

Parameters

  • filetype (str) - ‘train’, ‘test’, or ‘single’
class front_end.gui.run_al_gui.Main

Integrates all objects together

exit_master()

Closes gui

run()

Calls front_end.gui.run_al_gui.Main.show_menubar

show_menubar()

Configures menubar

class front_end.gui.run_al_gui.MainCanvas(master)

Class - creates main canvas (window)

add_alerts()

Creates alerts(labels)

add_buttons()

Creates buttons; calls front_end.gui.run_al_gui.MainCanvas.show_plots, front_end.gui.run_al_gui.MainCanvas.clear_plots, front_end.gui.run_al_gui.MainCanvas.run, front_end.gui.run_al_gui.MainCanvas.reset, front_end.gui.run_al_gui.MainCanvas.save_auc, front_end.gui.run_al_gui.MainCanvas.save_acc

add_classifier_frame_2(master)

Create show_plots classifier frame

Parameters

  • master - main Tkinter window
add_run_classifier_frame(master)

Creates run classifier frame

Parameters

  • master - main Tkinter window
add_run_strategy_frame(master)

Creates run strategy frame

Parameters

  • master - main Tkinter window
add_strategy_frame_2(master)

Create show_plots strategy frame

Parameters

  • master - main Tkinter window
clean(params_dict)

Cleans parameter values

Parameters

params_dict (dict) - parameters to be reset

clear_plots()

Clear the plots and show empty plots; calls front_end.gui.run_al_gui.MainCanvas.clean and front_end.gui.run_al_gui.MainCanvas.show_plots

plot_acc(clas_strat, width_org, height_org, savefile)

Plots accuracy

Parameters

clas_strat (list) - classifier-strategy combinations width_org (int) - picture width height_org (int) - picture height savefile (str) - path to picture’s save location

plot_auc(clas_strat, width_org, height_org, savefile)

Plots auc

Parameters

clas_strat (list) - classifier-strategy combinations width_org (int) - picture width height_org (int) - picture height savefile (str) - path to picture’s save location

reset()

Resets the gui; calls front_end.gui.run_al_gui.MainCanvas.clean

run()

Calls al.learning_curve.run_trials, utils.utils.assign_plot_params, utils.utils.data_to_py, and front_end.gui.run_al_gui.HelperFunctions.gray_out

save_acc()

Saves accuracy plot: calls front_end.gui.run_al_gui.MainCanvas.show_plots

save_auc()

Saves auc plot; calls front_end.gui.run_al_gui.MainCanvas.show_plots

show_plots(auc_save=False, acc_save=False)

Show the plots; calls front_end.gui.run_al_gui.MainCanvas.plot_acc and front_end.gui.run_al_gui.MainCanvas.plot_auc

Parameters

  • auc_save (str) - False or path to auc plot’s save location
  • acc_save (str) - False or path to accuracy plot’s save location
class front_end.gui.run_al_gui.MenuWindow(master)

Class - creates the menu bar (including the file and edit menus)

show_editmenu(master)

Creates the edit menubar and calls front_end.gui.run_al_gui.ParamsWindow.display_pref

Parameters

  • master - main Tkinter window
show_filemenu(master)

Creates the file menubar and calls front_end.gui.run_al_gui.HelperFunctions.open_data

Parameters

  • master - main Tkinter window
class front_end.gui.run_al_gui.ParamsWindow

Class - shows the parameters window in edit->parameters

check_int(param)

Check to make sure the user-defined parameter is a valid integer

Parameters

  • param (tuple) - user-defined parameter

Returns

True or false - depends on if the parameter is or is not a valid integer respectively

display_params()

Create labels, entry boxes, etc. for the parameters window

display_pref()

Display the parameters in a separate window

exit_pref()

Close the parameters window

Examples

The following provides an in-depth look at a sample run of front_end.gui.run_al_gui

python run_al_gui.py

GUI Main Window (with all values reset)

_images/gui_main.png

Setting up the gui to run the following equivalent run of the command line interface:

python run_al_cl.py -c MultinomialNB -d /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train hash -nt 5 -st rand -bs 10 -b 500 -sz 10 -sp 250

Choose train and test data files:

_images/choose_data.png _images/loaded_gui.png

Edit parameters to match specified run:

_images/edit_parameters.png

Choose MultinomialNB and rand as the classifier-strategy combination:

_images/choose_clas_strat.png

Run terminal output:

python run_al_cl.py -pf MultinomialNB-rand -c MultinomialNB -d /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train hash -nt 5 -st rand -bs 10 -b 500 -sz 10 -sp 250
trial 0
trial 1
trial 2
trial 3
trial 4

Show plots when done:

_images/show_plots.png

utils.utils

The utils.utils implements various helper functions.

utils.utils.assign_plot_params(avg_accu, avg_auc)

Assigns plot parameters

Parameters

  • avg_accu - respective average accuracy performance
  • avg_auc - respective average auc performance

Returns

  • accu_x (list)
  • accu_y (list)
  • auc_x (list)
  • auc_y (list)
utils.utils.data_to_file(filename, strategy, accu_y, auc_y, values)

Places data in file

Parameters

  • filename (str) - user-specified path
  • strategy
  • accu_y (list)
  • auc_y (list)
  • values (list)
utils.utils.data_to_py(filename, c, st, acc_x, acc_y, auc_x, auc_y)

Places plot data in python file

Parameters

  • filename (str) - user-specified path
  • c - classifier
  • st - strategy
  • acc_x (list)
  • accu_y (list)
  • auc_x (list)
  • auc_y (list)
utils.utils.draw_plots(strategy, accu_x, accu_y, auc_x, auc_y)

Draws the plot

Parameters

  • strategy
  • accu_x (list)
  • accu_y (list)
  • auc_x (list)
  • auc_y (list)
utils.utils.show_plt()

Shows the plot