unique name of the operation this stage performs
uid for instance
type tag for numeric feature type
type tag for numeric feature value type
numeric evidence for feature type value
Input Features type
Input Features type
Computed splits
Computed splits
should or not split
computed split values
bucket labels
Checks the input length
Checks the input length
input features
true is input size as expected, false otherwise
Check if the stage is serializable
Check if the stage is serializable
Failure if not serializable
Compute splits using DecisionTreeClassifier
Compute splits using DecisionTreeClassifier
input dataset of (label, feature) tuples
feature name
computed Splits
This method is used to make a copy of the instance with new parameters in several methods in spark internals Default will find the constructor and make a copy for any class (AS LONG AS ALL CONSTRUCTOR PARAMS ARE VALS, this is why type tags are written as implicit vals in base classes).
This method is used to make a copy of the instance with new parameters in several methods in spark internals Default will find the constructor and make a copy for any class (AS LONG AS ALL CONSTRUCTOR PARAMS ARE VALS, this is why type tags are written as implicit vals in base classes).
Note: that the convention in spark is to have the uid be a constructor argument, so that copies will share a uid with the original (developers should follow this convention).
new parameters want to add to instance
a new instance with the same uid
Spark operation on dataset to produce RDD for constructor fit function and then turn output function into a Model
Spark operation on dataset to produce RDD for constructor fit function and then turn output function into a Model
input data for this stage
a fitted model that will perform the transformation specified by the function defined in constructor fit
Function that fits the binary model
Function that fits the binary model
Gets an input feature Note: this method IS NOT safe to use outside the driver, please use getTransientFeature method instead
Gets an input feature Note: this method IS NOT safe to use outside the driver, please use getTransientFeature method instead
array of features
NoSuchElementException
if the features are not set
RuntimeException
in case one of the features is null
Gets the input features Note: this method IS NOT safe to use outside the driver, please use getTransientFeatures method instead
Gets the input features Note: this method IS NOT safe to use outside the driver, please use getTransientFeatures method instead
array of features
NoSuchElementException
if the features are not set
RuntimeException
in case one of the features is null
Output features that will be created by this stage
Output features that will be created by this stage
feature of type OutputFeatures
Name of output feature (i.e.
Name of output feature (i.e. column created by this stage)
Gets an input feature at index i
Gets an input feature at index i
input index
maybe an input feature
Gets the input Features
Criterion used for information gain calculation (case-insensitive).
Criterion used for information gain calculation (case-insensitive). Supported: "entropy" and "gini". (default = gini)
Function to convert InputFeatures to an Array of FeatureLike
Function to convert InputFeatures to an Array of FeatureLike
an Array of FeatureLike
Maximum number of bins Must be >= 2 and <= number of categories in any categorical feature.
Maximum number of bins Must be >= 2 and <= number of categories in any categorical feature. (default = 32)
Maximum depth of the tree (>= 0).
Maximum depth of the tree (>= 0). E.g., depth 0 means 1 leaf node; depth 1 means 1 internal node + 2 leaf nodes. (default = 5)
Minimum information gain for a split to be considered at a tree node.
Minimum information gain for a split to be considered at a tree node. Should be >= 0.0. (default = 0.0)
Minimum number of instances each child must have after split.
Minimum number of instances each child must have after split. If a split causes the left or right child to have fewer than minInstancesPerNode, the split will be discarded as invalid. Should be >= 1. (default = 1)
numeric evidence for feature type value
Function to be called on getMetadata
Function to be called on getMetadata
Function to be called on setInput
Function to be called on setInput
unique name of the operation this stage performs
unique name of the operation this stage performs
Function to convert OutputFeatures to an Array of FeatureLike
Function to convert OutputFeatures to an Array of FeatureLike
an Array of FeatureLike
Should output feature be a response? Yes, if any of the input features are.
Should output feature be a response? Yes, if any of the input features are.
true if the the output feature should be a response
Get the metadata describing the output vector
Get the metadata describing the output vector
This does not trigger onGetMetadata()
Metadata of output vector
Input features that will be used by the stage
Input features that will be used by the stage
feature of type InputFeatures
Sets input features
Sets input features
feature like type
array of input features
this stage
Option to keep track of invalid values
Option to keep track of invalid values
Option to keep track of values that were missing
Option to keep track of values that were missing
Stage unique name consisting of the stage operation name and uid
Stage unique name consisting of the stage operation name and uid
stage name
This function translates the input and output features into spark schema checks and changes that will occur on the underlying data frame
This function translates the input and output features into spark schema checks and changes that will occur on the underlying data frame
schema of the input data frame
a new schema with the output features added
type tag for first input
type tag for first input
type tag for second input
type tag for second input
type tag for first input value
type tag for first input value
type tag for second input value
type tag for second input value
type tag for output
type tag for output
type tag for output value
type tag for output value
uid for instance
uid for instance
Compute the output vector metadata only from the input features.
Compute the output vector metadata only from the input features. Vectorizers use this to derive the full vector, including pivot columns or indicator features.
Vector metadata from input features
Get the name of the output vector
Get the name of the output vector
Output vector name as a string
Smart bucketizer for numeric map values based on a Decision Tree classifier.
numeric feature type value
numeric map feature type