Documentation¶
al¶
al.instance_strategies¶
The al.instance_strategies
implements various active learning strategies.
-
class
al.instance_strategies.
BaseStrategy
(seed=0)¶ Class - Base strategy
-
class
al.instance_strategies.
BootstrapFromEach
(seed)¶ Class - used if not bootstrapped
-
bootstrap
(pool, y, k=1)¶ Parameters
- pool (int) - range of numbers within length of pool
- y - None or possible pool
- k (int) - 1 or possible bootstrap size
Returns
- chosen array of indices
-
-
class
al.instance_strategies.
ErrorReductionStrategy
(classifier, classifier_args, seed=0, sub_pool=None)¶ Class - used if strategy selected is erreduct, inherits from
al.instance_strategies.BaseStrategy
-
chooseNext
(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶ Overide method BaseStrategy.chooseNext
Parameters
- pool (int) - range of numbers within length of pool
- X - None or pool.toarray()
- model - None
- k (int) - 1 or step size
- current_train_indices - None or array of trained indices
- current_train_y - None or train_indices specific to y_pool
Returns
- [candidates[i] for i in uis[:k]]
-
log_loss
(probs)¶ Computes log_loss
Parameters
- probs
Returns
- ll/(len(probs)*1.)
-
-
class
al.instance_strategies.
LogGainStrategy
(classifier, classifier_args, seed=0, sub_pool=None)¶ Class - used if strategy selected is loggain, inherits from
al.instance_strategies.BaseStrategy
-
chooseNext
(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶ Overide method BaseStrategy.chooseNext
Parameters
- pool (int) - range of numbers within length of pool
- X - None or pool.toarray()
- model - None
- k (int) - 1 or step size
- current_train_indices - None or array of trained indices
- current_train_y - None or train_indices specific to y_pool
Returns
- [candidates[i] for i in uis[:k]]
-
log_gain
(probs, labels)¶ Computes log_gain
Parameters
- probs, labels
Returns
- lg - computed log_gain
-
-
class
al.instance_strategies.
QBCStrategy
(classifier, classifier_args, seed=0, sub_pool=None, num_committee=4)¶ Class - used if strategy selected is qbc, inherits from
al.instance_strategies.BaseStrategy
-
chooseNext
(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶ Overide method BaseStrategy.chooseNext
Parameters
- pool (int) - range of numbers within length of pool
- X - None or pool.toarray()
- model - None
- k (int) - 1 or step size
- current_train_indices - None or array of trained indices
- current_train_y - None or train_indices specific to y_pool
Returns
- [candidates[i] for i in dis[:k]]
-
vote_entropy
(sample)¶ Computes vote entropy.
Parameters
- sample
Returns
- out (int)
-
-
class
al.instance_strategies.
RandomBootstrap
(seed)¶ Class - used if strategy selected is rand
-
bootstrap
(pool, y=None, k=1)¶ Parameters
- pool (int) - range of numbers within length of pool
- y - None or possible pool
- k (int) - 1 or possible bootstrap size
Returns
- randS.chooseNext(pool, k=k) - choose next pool
-
-
class
al.instance_strategies.
RandomStrategy
(seed=0)¶ Class - used if strategy is rand, inherits from
al.instance_strategies.BaseStrategy
-
chooseNext
(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶ Overide method BaseStrategy.chooseNext
Parameters
- pool (int) - range of numbers within length of pool
- X - None or pool.toarray()
- model - None
- k (int) - 1 or step size
- current_train_indices - None or array of trained indices
- current_train_y - None or train_indices specific to y_pool
Returns
- [list_pool[i] for i in rand_indices[:k]] - array of random permutations given pool
-
-
class
al.instance_strategies.
RotateStrategy
(strategies)¶ Class - inherits from
al.instance_strategies.BaseStrategy
-
chooseNext
(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶ Overide method BaseStrategy.chooseNext
Parameters
- pool (int) - range of numbers within length of pool
- X - None or pool.toarray()
- model - None
- k (int) - 1 or step size
- current_train_indices - None or array of trained indices
- current_train_y - None or train_indices specific to y_pool
Returns
- self.strategies[self.counter].chooseNext(pool, X, model, k=k, current_train_indices = current_train_indices, current_train_y = current_train_y)
-
-
class
al.instance_strategies.
UncStrategy
(seed=0, sub_pool=None)¶ Class - used if strategy selected is unc, inherits from
al.instance_strategies.BaseStrategy
-
chooseNext
(pool, X=None, model=None, k=1, current_train_indices=None, current_train_y=None)¶ Overide method BaseStrategy.chooseNext
Parameters
- pool (int) - range of numbers within length of pool
- X - None or pool.toarray()
- model - None
- k (int) - 1 or step size
- current_train_indices - None or array of trained indices
- current_train_y - None or train_indices specific to y_pool
Returns
- [candidates[i] for i in uis[:k]]
-
al.learning_curve¶
The al.learning_curve
implements the methods needed to
run a given active learning strategy.
-
class
al.learning_curve.
LearningCurve
¶ Class - run multiple trials or run trials one at a time
-
run_trials
(X_pool, y_pool, X_test, y_test, al_strategy, classifier_name, classifier_arguments, bootstrap_size, step_size, budget, num_trials)¶ Runs a given active learning strategy multiple trials and returns the average performance.
Parameters
- X_pool - returned from load_svmlight_file
- y_pool - returned from load_svmlight_file
- X_test - returned from load_svmlight_file
- y_test - returned from load_svmlight_file
- al_strategy - Represent a list of strategies for choosing next samples (default - rand).
- classifier_name - Represents the classifier that will be used (default - MultinomialNB) .
- classifier_arguments - Represents the arguments that will be passed to the classifier (default - ‘’).
- bootstrap_size - Sets the Boot strap (default - 10).
- step_size - Sets the step size (default - 10).
- budget - Sets the budget (default - 500).
- num_trials - Number of trials (default - 10).
Returns
- (values, avg_accu, avg_auc) - training_size, respective average performance
-
front_end.cl¶
The front_end.cl.run_al_cl
implements the methods needed to
run the command-line interface.
-
class
front_end.cl.run_al_cl.
cmd_parse
¶ Class - command line parser
-
assign_args
()¶ Assigns values to each of the specified command line arguments for use by
al.learning_curve
-
main
()¶ Calls
retrieve_args
,assign_args
,run_al
-
retrieve_args
()¶ Adds arguments to the parser for each respective setting of the command line interface
-
run_al
()¶ Calls
al.learning_curve.LearningCurve
and draws plots usingutils.utils
-
-
front_end.cl.run_al_cl.
load_data
(dataset1, dataset2=None)¶ Loads the dataset(s) given in the the svmlight / libsvm format and assumes a train/test split
Parameters
- dataset1 (str) - Path to the file of the first dataset.
- dataset2 (str or None) - If not None, path to the file of second dataset
Returns
- (X_pool, X_test, y_pool, y_test) - Pool and test files
Examples¶
The following code runs
front_end.cl.run_al_cl
with the following parameters:
- number of trials - 5
- strategy - rand
- bootstrap - 10
- budget - 500
- step size - 10
- subpool - 250
- data paths - ../../../data/imdb-binary-pool-mindf5-ng11 ../../../data/imdb-binary-test-mindf5-ng11
python run_al_cl.py -c MultinomialNB -nt 5 -st rand -bs 10 -b 500 -sz 10 -sp 250 -d ../../../data/imdb-binary-pool-mindf5-ng11 ../../../data/imdb-binary-test-mindf5-ng11The output of this code is:
Status:
Loading took 17.88s. trial 0 trial 1 trial 2 trial 3 trial 4Data output is placed in a file in your current working directory. The default filename is avg_results.txt.
Sample Data Output:
rand accuracy train size,mean 10,0.557016 20,0.538432 30,0.534664 40,0.575320 50,0.651672 60,0.621416 70,0.670400 80,0.645680 90,0.659520 100,0.610160 110,0.658024Plot Image:
front_end.gui¶
The GUI module to run the active learning strategies.
-
class
front_end.gui.run_al_gui.
HelperFunctions
¶ Class - includes helper functions.
-
all_combos
()¶ Retrieve all possible combinations of classifier and strategy
-
gray_out
()¶ Enables or disables the show_plots checkboxes depending on which classifiers - strategies have been run
-
gray_run
()¶ Enables or disables run checkboxes depending on if the data has been loaded
-
load_data
(dataset1, dataset2=None)¶ Loads the dataset(s) given in the the svmlight / libsvm format and assumes a train/test split
Parameters
- dataset1 (str) - Path to the file of the first dataset.
- dataset2 (str or None) - If not None, path to the file of second dataset
Returns
- Pool and test files - X_pool, X_test, y_pool, y_test
-
open_data
(filetype)¶ Set label values in gui and call
front_end.gui.run_al_gui.HelperFunctions.load_data
andfront_end.gui.run_al_gui.HelperFunctions.gray_run
Parameters
- filetype (str) - ‘train’, ‘test’, or ‘single’
-
-
class
front_end.gui.run_al_gui.
Main
¶ Integrates all objects together
-
exit_master
()¶ Closes gui
-
run
()¶
Configures menubar
-
-
class
front_end.gui.run_al_gui.
MainCanvas
(master)¶ Class - creates main canvas (window)
-
add_alerts
()¶ Creates alerts(labels)
Creates buttons; calls
front_end.gui.run_al_gui.MainCanvas.show_plots
,front_end.gui.run_al_gui.MainCanvas.clear_plots
,front_end.gui.run_al_gui.MainCanvas.run
,front_end.gui.run_al_gui.MainCanvas.reset
,front_end.gui.run_al_gui.MainCanvas.save_auc
,front_end.gui.run_al_gui.MainCanvas.save_acc
-
add_classifier_frame_2
(master)¶ Create show_plots classifier frame
Parameters
- master - main Tkinter window
-
add_run_classifier_frame
(master)¶ Creates run classifier frame
Parameters
- master - main Tkinter window
-
add_run_strategy_frame
(master)¶ Creates run strategy frame
Parameters
- master - main Tkinter window
-
add_strategy_frame_2
(master)¶ Create show_plots strategy frame
Parameters
- master - main Tkinter window
-
clean
(params_dict)¶ Cleans parameter values
Parameters
params_dict (dict) - parameters to be reset
-
clear_plots
()¶ Clear the plots and show empty plots; calls
front_end.gui.run_al_gui.MainCanvas.clean
andfront_end.gui.run_al_gui.MainCanvas.show_plots
-
plot_acc
(clas_strat, width_org, height_org, savefile)¶ Plots accuracy
Parameters
clas_strat (list) - classifier-strategy combinations width_org (int) - picture width height_org (int) - picture height savefile (str) - path to picture’s save location
-
plot_auc
(clas_strat, width_org, height_org, savefile)¶ Plots auc
Parametersclas_strat (list) - classifier-strategy combinations width_org (int) - picture width height_org (int) - picture height savefile (str) - path to picture’s save location
-
reset
()¶ Resets the gui; calls
front_end.gui.run_al_gui.MainCanvas.clean
-
run
()¶ Calls
al.learning_curve.run_trials
,utils.utils.assign_plot_params
,utils.utils.data_to_py
, andfront_end.gui.run_al_gui.HelperFunctions.gray_out
-
save_acc
()¶ Saves accuracy plot: calls
front_end.gui.run_al_gui.MainCanvas.show_plots
-
save_auc
()¶ Saves auc plot; calls
front_end.gui.run_al_gui.MainCanvas.show_plots
-
show_plots
(auc_save=False, acc_save=False)¶ Show the plots; calls
front_end.gui.run_al_gui.MainCanvas.plot_acc
andfront_end.gui.run_al_gui.MainCanvas.plot_auc
Parameters
- auc_save (str) - False or path to auc plot’s save location
- acc_save (str) - False or path to accuracy plot’s save location
-
-
class
front_end.gui.run_al_gui.
MenuWindow
(master)¶ Class - creates the menu bar (including the file and edit menus)
Creates the edit menubar and calls
front_end.gui.run_al_gui.ParamsWindow.display_pref
Parameters
- master - main Tkinter window
Creates the file menubar and calls
front_end.gui.run_al_gui.HelperFunctions.open_data
Parameters
- master - main Tkinter window
-
class
front_end.gui.run_al_gui.
ParamsWindow
¶ Class - shows the parameters window in edit->parameters
-
check_int
(param)¶ Check to make sure the user-defined parameter is a valid integer
Parameters
- param (tuple) - user-defined parameter
Returns
True or false - depends on if the parameter is or is not a valid integer respectively
-
display_params
()¶ Create labels, entry boxes, etc. for the parameters window
-
display_pref
()¶ Display the parameters in a separate window
-
exit_pref
()¶ Close the parameters window
-
Examples¶
The following provides an in-depth look at a sample run of
front_end.gui.run_al_gui
python run_al_gui.pyGUI Main Window (with all values reset)
Setting up the gui to run the following equivalent run of the command line interface:
python run_al_cl.py -c MultinomialNB -d /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train hash -nt 5 -st rand -bs 10 -b 500 -sz 10 -sp 250Choose train and test data files:
Edit parameters to match specified run:
Choose MultinomialNB and rand as the classifier-strategy combination:
Run terminal output:
python run_al_cl.py -pf MultinomialNB-rand -c MultinomialNB -d /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train /home/geet/Dropbox/Research/Bilgic/data/20_newsgroups_train hash -nt 5 -st rand -bs 10 -b 500 -sz 10 -sp 250 trial 0 trial 1 trial 2 trial 3 trial 4Show plots when done:
utils.utils¶
The utils.utils
implements various helper functions.
-
utils.utils.
assign_plot_params
(avg_accu, avg_auc)¶ Assigns plot parameters
Parameters
- avg_accu - respective average accuracy performance
- avg_auc - respective average auc performance
Returns
- accu_x (list)
- accu_y (list)
- auc_x (list)
- auc_y (list)
-
utils.utils.
data_to_file
(filename, strategy, accu_y, auc_y, values)¶ Places data in file
Parameters
- filename (str) - user-specified path
- strategy
- accu_y (list)
- auc_y (list)
- values (list)
-
utils.utils.
data_to_py
(filename, c, st, acc_x, acc_y, auc_x, auc_y)¶ Places plot data in python file
Parameters
- filename (str) - user-specified path
- c - classifier
- st - strategy
- acc_x (list)
- accu_y (list)
- auc_x (list)
- auc_y (list)
-
utils.utils.
draw_plots
(strategy, accu_x, accu_y, auc_x, auc_y)¶ Draws the plot
Parameters
- strategy
- accu_x (list)
- accu_y (list)
- auc_x (list)
- auc_y (list)
-
utils.utils.
show_plt
()¶ Shows the plot