Apply OpIndexToStringNoFilter transformer.
Optional array of labels specifying index-string mapping. If not provided or if empty, then metadata from input feature is used instead.
name to give strings that appear in transform but not in fit
how to transform values not seen in fitting
deindexed text feature
OpStringIndexerNoFilter for converting text into indices
Apply SanityChecker estimator.
Rate to downsample the data for statistical calculations (note: actual sampling will not be exact due to Spark's dataset sampling behavior)
Seed to use when sampling
Lower limit on number of samples in downsampled data set (note: sample limit will not be exact, due to Spark's dataset sampling behavior)
Upper limit on number of samples in downsampled data set (note: sample limit will not be exact, due to Spark's dataset sampling behavior)
Maximum correlation (absolute value) allowed between a feature in the feature vector and the label
Minimum correlation (absolute value) allowed between a feature in the feature vector and the label
Which coefficient to use for computing correlation
Minimum amount of variance allowed for each feature and label
If set to true, this will automatically remove all the bad features from the feature vector
remove all features descended from a parent feature
protect text shared hash from related null indicators and other hashes
Maximum allowed confidence of association rules in categorical variables. A categorical variable will be removed if there is a choice where the maximum confidence is above this threshold, and the support for that choice is above the min rule support parameter, defined below.
Categoricals can be removed if an association rule is found between one of the choices and a categorical label where the confidence of that rule is above maxRuleConfidence and the support fraction of that choice is above minRuleSupport.
If true, then only calculate correlations between features and label instead of the entire correlation matrix which includes all feature-feature correlations
Setting for what categories of feature vector columns to exclude from the correlation calculation (eg. hashed text features)
If true, treat label as categorical. If not set, check number of distinct labels to decide whether a label should be treated categorical.
sanity checked feature vector
Apply standard isotonic regression transformer shortcut function.
feature to calibrate against
increasing default true or decreasing
Apply PercentileBucketizer transformer shortcut function.
number of bins to scale into
Apply real vectorizer: Converts a sequence of RealNN features into a vector feature.
other features of same type
Z-normalization shortcut function using OpStandardScaler.