com.salesforce.op.stages.impl.preparators
Identifies which features to drop based on input exclusion criteria, and returns array of dropped columns, with messages for logging why columns were dropped
Identifies which features to drop based on input exclusion criteria, and returns array of dropped columns, with messages for logging why columns were dropped
ColumnStatistics containing multivariate statistics computed by Spark
Min variance for dropping features
Min correlation with label for dropping features
Max correlation with label for dropping features
Max correlation between features for dropping the later features
Max Cramer's V for dropping categorical features
Max allowed confidence of association rules for dropping features
Threshold for association rule
Whether to remove features descended from parent feature with derived features that meet exclusion criteria
Whether individual hash is dropped or kept independently of related null indicators or other hashes
columns to drop, with exclusion reasons
Builds an Array of ColumnStatistics objects containing all the data we calculate for each column (eg.
Builds an Array of ColumnStatistics objects containing all the data we calculate for each column (eg. mean, max, variance, correlation, cramer's V, etc.)
Sequence of OpVectorColumnMetadata to use for grouping features
Multivariate statistics previously computed by Spark
Name of label and index of the column corresponding to the label
Array containing correlations between each feature vector element and the label
Indices that we actually compute correlations for (eg. can ignore hashed text features)
Array of CategoricalGroupStats for each group of feature vector indices corresponding to a categorical feature
Array of ColumnStatistics objects, one for each column in metaCols
Transformation used in derived feature filters.
Transformation used in derived feature filters. If removeBadFeatures
true, then this is just identity (does nothing); otherwise, returns OPVector with only columns in indicesToKeep
column indices of derived features to keep
whether to remove any features
OPVector with bad features dropped if removeBadFeatures
is true