Maximum size of dataset want to train on.
Maximum size of dataset want to train on. Value should be > 0. Default is 1000000.
Function to set parameters before passing into the validation step eg - do data balancing or dropping based on the labels
Function to set parameters before passing into the validation step eg - do data balancing or dropping based on the labels
Parameters set in examining data
Fraction of data to reserve for test Default is 0.1
Fraction of data to reserve for test Default is 0.1
Seed for data splitting
Seed for data splitting
Function to use to create the training set and test set.
Function to use to create the training set and test set.
(dataTrain, dataTest)
Rebalance the training data within the validation step
Rebalance the training data within the validation step
to prepare for model training. first column must be the label as a double
balanced training set and a test set
Add a splitter parameter to name the label column
Add a splitter parameter to name the label column
Instance that will make a holdout set and prepare the data for multiclass modeling Creates instance that will split data into training and test set filtering out any labels that don't meet the minimum fraction cutoff or fall in the top N labels specified.