Contains results of Raw Feature Filter tests for a given feature
Contains results of Raw Feature Filter tests for a given feature
feature name
map key associated with distribution (when the feature is a map)
training fill rate did not meet min required
null indicator correlation (absolute) exceeded max allowed
scoring fill rate did not meet min required
distribution mismatch: JS Divergence exceeded max allowed
distribution mismatch: fill rate difference exceeded max allowed
distribution mismatch: fill ratio difference exceeded max allowed
feature excluded after failing one or more tests
Class containing summary information for a feature
Class containing summary information for a feature
name of the feature
map key associated with distribution (when the feature is a map)
total count of feature seen
number of empties seen in feature
binned counts of feature values (hashed for strings, evenly spaced bins for numerics)
either min and max number of tokens for text data, or splits used for bins for numeric data
Contains RFF filtered data, features to drop, and results from RFF
Contains RFF filtered data, features to drop, and results from RFF
RFF cleaned data
raw features dropped by RFF
keys in map features dropped by RFF
feature information calculated from the training data
Specialized stage that will load up data and compute distributions and empty counts on raw features.
Specialized stage that will load up data and compute distributions and empty counts on raw features. This information is then used to compute which raw features should be excluded from the workflow DAG Note: Currently, raw features that aren't explicitly blocklisted, but are not used because they are inputs to explicitly blocklisted features are not present as raw features in the model, nor in ModelInsights. However, they are accessible from an OpWorkflowModel via getRawFeatureFilterResults().
datatype of the reader
Contains configuration settings for Raw Feature Filter
Contains raw feature metrics computing in Raw Feature Filter
Contains raw feature metrics computing in Raw Feature Filter
feature name
map key associated with distribution (when the feature is a map)
proportion of values that are null in the training distribution
correlation between null indicator and the label in the training distribution
proportion of values that are null in the scoring distribution
Jensen-Shannon (JS) divergence between the training and scoring distributions
absolute difference in fill rates between the training and scoring distributions
ratio of difference in fill rates between the training and scoring distributions
Contains configuration and results from RawFeatureFilter
Contains configuration and results from RawFeatureFilter
configuration settings for RawFeatureFilter
feature distributions calculated from training data
feature metrics calculated by RawFeatureFilter
results of RawFeatureFilter tests (reasons why feature is dropped or not)
Class used to get summaries of prepared features to determine distribution binning strategy
Class used to get summaries of prepared features to determine distribution binning strategy
minimum value seen for double, minimum number of tokens in one text for text
maximum value seen for double, maximum number of tokens in one text for text
sum of values for double, total number of tokens for text
number of doubles for double, number of texts for text