Package

com.salesforce.op

filters

Permalink

package filters

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. filters
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Type Members

  1. case class ExclusionReasons(name: String, key: Option[String], trainingUnfilledState: Boolean, trainingNullLabelLeaker: Boolean, scoringUnfilledState: Boolean, jsDivergenceMismatch: Boolean, fillRateDiffMismatch: Boolean, fillRatioDiffMismatch: Boolean, excluded: Boolean) extends Product with Serializable

    Permalink

    Contains results of Raw Feature Filter tests for a given feature

    Contains results of Raw Feature Filter tests for a given feature

    name

    feature name

    key

    map key associated with distribution (when the feature is a map)

    trainingUnfilledState

    training fill rate did not meet min required

    trainingNullLabelLeaker

    null indicator correlation (absolute) exceeded max allowed

    scoringUnfilledState

    scoring fill rate did not meet min required

    jsDivergenceMismatch

    distribution mismatch: JS Divergence exceeded max allowed

    fillRateDiffMismatch

    distribution mismatch: fill rate difference exceeded max allowed

    fillRatioDiffMismatch

    distribution mismatch: fill ratio difference exceeded max allowed

    excluded

    feature excluded after failing one or more tests

  2. case class FeatureDistribution(name: String, key: Option[String], count: Long, nulls: Long, distribution: Array[Double], summaryInfo: Array[Double], moments: Option[Moments] = None, cardEstimate: Option[TextStats] = None, type: FeatureDistributionType = FeatureDistributionType.Training) extends FeatureDistributionLike with Product with Serializable

    Permalink

    Class containing summary information for a feature

    Class containing summary information for a feature

    name

    name of the feature

    key

    map key associated with distribution (when the feature is a map)

    count

    total count of feature seen

    nulls

    number of empties seen in feature

    distribution

    binned counts of feature values (hashed for strings, evenly spaced bins for numerics)

    summaryInfo

    either min and max number of tokens for text data, or splits used for bins for numeric data

  3. case class FilteredRawData(cleanedData: DataFrame, featuresToDrop: Array[OPFeature], mapKeysToDrop: Map[String, Set[String]], rawFeatureFilterResults: RawFeatureFilterResults) extends Product with Serializable

    Permalink

    Contains RFF filtered data, features to drop, and results from RFF

    Contains RFF filtered data, features to drop, and results from RFF

    cleanedData

    RFF cleaned data

    featuresToDrop

    raw features dropped by RFF

    mapKeysToDrop

    keys in map features dropped by RFF

    rawFeatureFilterResults

    feature information calculated from the training data

  4. class RawFeatureFilter[T] extends Serializable

    Permalink

    Specialized stage that will load up data and compute distributions and empty counts on raw features.

    Specialized stage that will load up data and compute distributions and empty counts on raw features. This information is then used to compute which raw features should be excluded from the workflow DAG Note: Currently, raw features that aren't explicitly blocklisted, but are not used because they are inputs to explicitly blocklisted features are not present as raw features in the model, nor in ModelInsights. However, they are accessible from an OpWorkflowModel via getRawFeatureFilterResults().

    T

    datatype of the reader

  5. case class RawFeatureFilterConfig(minFill: Double = 0.0, maxFillDifference: Double = Double.PositiveInfinity, maxFillRatioDiff: Double = Double.PositiveInfinity, maxJSDivergence: Double = 1.0, maxCorrelation: Double = 1.0, correlationType: CorrelationType = CorrelationType.Pearson, jsDivergenceProtectedFeatures: Seq[String] = Seq.empty, protectedFeatures: Seq[String] = Seq.empty) extends Product with Serializable

    Permalink

    Contains configuration settings for Raw Feature Filter

  6. trait RawFeatureFilterFormats extends AnyRef

    Permalink
  7. case class RawFeatureFilterMetrics(name: String, key: Option[String], trainingFillRate: Double, trainingNullLabelAbsoluteCorr: Option[Double], scoringFillRate: Option[Double], jsDivergence: Option[Double], fillRateDiff: Option[Double], fillRatioDiff: Option[Double]) extends Product with Serializable

    Permalink

    Contains raw feature metrics computing in Raw Feature Filter

    Contains raw feature metrics computing in Raw Feature Filter

    name

    feature name

    key

    map key associated with distribution (when the feature is a map)

    trainingFillRate

    proportion of values that are null in the training distribution

    trainingNullLabelAbsoluteCorr

    correlation between null indicator and the label in the training distribution

    scoringFillRate

    proportion of values that are null in the scoring distribution

    jsDivergence

    Jensen-Shannon (JS) divergence between the training and scoring distributions

    fillRateDiff

    absolute difference in fill rates between the training and scoring distributions

    fillRatioDiff

    ratio of difference in fill rates between the training and scoring distributions

  8. case class RawFeatureFilterResults(rawFeatureFilterConfig: RawFeatureFilterConfig = RawFeatureFilterConfig(), rawFeatureDistributions: Seq[FeatureDistribution] = Seq.empty, rawFeatureFilterMetrics: Seq[RawFeatureFilterMetrics] = Seq.empty, exclusionReasons: Seq[ExclusionReasons] = Seq.empty) extends Product with Serializable

    Permalink

    Contains configuration and results from RawFeatureFilter

    Contains configuration and results from RawFeatureFilter

    rawFeatureFilterConfig

    configuration settings for RawFeatureFilter

    rawFeatureDistributions

    feature distributions calculated from training data

    rawFeatureFilterMetrics

    feature metrics calculated by RawFeatureFilter

    exclusionReasons

    results of RawFeatureFilter tests (reasons why feature is dropped or not)

  9. case class Summary(min: Double, max: Double, sum: Double, count: Double) extends Product with Serializable

    Permalink

    Class used to get summaries of prepared features to determine distribution binning strategy

    Class used to get summaries of prepared features to determine distribution binning strategy

    min

    minimum value seen for double, minimum number of tokens in one text for text

    max

    maximum value seen for double, maximum number of tokens in one text for text

    sum

    sum of values for double, total number of tokens for text

    count

    number of doubles for double, number of texts for text

Value Members

  1. object FeatureDistribution extends Serializable

    Permalink
  2. object RawFeatureFilter extends Serializable

    Permalink
  3. object RawFeatureFilterConfig extends RawFeatureFilterFormats with Serializable

    Permalink
  4. object RawFeatureFilterResults extends RawFeatureFilterFormats with Serializable

    Permalink
  5. object Summary extends Product with Serializable

    Permalink

Inherited from AnyRef

Inherited from Any

Ungrouped