Class/Object

com.salesforce.op.filters

RawFeatureFilter

Related Docs: object RawFeatureFilter | package filters

Permalink

class RawFeatureFilter[T] extends Serializable

Specialized stage that will load up data and compute distributions and empty counts on raw features. This information is then used to compute which raw features should be excluded from the workflow DAG Note: Currently, raw features that aren't explicitly blacklisted, but are not used because they are inputs to explicitly blacklisted features are not present as raw features in the model, nor in ModelInsights. However, they are accessible from an OpWorkflowModel via getRawFeatureDistributions().

T

datatype of the reader

Linear Supertypes
Serializable, Serializable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RawFeatureFilter
  2. Serializable
  3. Serializable
  4. AnyRef
  5. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RawFeatureFilter(trainingReader: Reader[T], scoringReader: Option[Reader[T]], bins: Int, minFill: Double, maxFillDifference: Double, maxFillRatioDiff: Double, maxJSDivergence: Double, maxCorrelation: Double, correlationType: CorrelationType = CorrelationType.Pearson, jsDivergenceProtectedFeatures: Set[String] = Set.empty, protectedFeatures: Set[String] = Set.empty, textBinsFormula: (Summary, Int) ⇒ Int = RawFeatureFilter.textBinsFormula, timePeriod: Option[TimePeriod] = None)

    Permalink

    trainingReader

    reader to get the training data

    scoringReader

    reader to get the scoring data for comparison (optional - if not present will exclude based on training data features only)

    bins

    number of bins to use in computing feature distributions (histograms for numerics, hashes for strings)

    minFill

    minimum fill rate a feature must have in the training dataset and scoring dataset to be kept

    maxFillDifference

    maximum acceptable fill rate difference between training and scoring data to be kept

    maxFillRatioDiff

    maximum acceptable fill ratio between training and scoring (larger / smaller)

    maxJSDivergence

    maximum Jensen-Shannon divergence between training and scoring distributions to be kept

    maxCorrelation

    maximum absolute correlation allowed between raw predictor null indicator and label

    correlationType

    type of correlation metric to use

    jsDivergenceProtectedFeatures

    features that are protected from removal by JS divergence check

    protectedFeatures

    features that are protected from removal

    textBinsFormula

    formula to compute the text features bin size. Input arguments are Summary and number of bins to use in computing feature distributions (histograms for numerics, hashes for strings). Output is the bins for the text features.

    timePeriod

    Time period used to apply circulate date transformation for date features, if not specified will use regular numeric feature transformation

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. val bins: Int

    Permalink

    number of bins to use in computing feature distributions (histograms for numerics, hashes for strings)

  6. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  7. val correlationType: CorrelationType

    Permalink

    type of correlation metric to use

  8. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  9. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. def generateFilteredRaw(rawFeatures: Array[OPFeature], parameters: OpParams)(implicit spark: SparkSession): FilteredRawData

    Permalink

    Function that gets raw features and params used in workflow.

    Function that gets raw features and params used in workflow. Will use this information along with readers for this stage to determine which features should be dropped from the workflow

    rawFeatures

    raw features used in the workflow

    parameters

    parameters used in the workflow

    spark

    spark instance

    returns

    dataframe that has had bad features and bad map keys removed and a list of all features that should be dropped from the DAG

  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  14. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  15. val jsDivergenceProtectedFeatures: Set[String]

    Permalink

    features that are protected from removal by JS divergence check

  16. lazy val log: Logger

    Permalink
    Attributes
    protected
  17. val maxCorrelation: Double

    Permalink

    maximum absolute correlation allowed between raw predictor null indicator and label

  18. val maxFillDifference: Double

    Permalink

    maximum acceptable fill rate difference between training and scoring data to be kept

  19. val maxFillRatioDiff: Double

    Permalink

    maximum acceptable fill ratio between training and scoring (larger / smaller)

  20. val maxJSDivergence: Double

    Permalink

    maximum Jensen-Shannon divergence between training and scoring distributions to be kept

  21. val minFill: Double

    Permalink

    minimum fill rate a feature must have in the training dataset and scoring dataset to be kept

  22. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  23. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  24. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  25. val protectedFeatures: Set[String]

    Permalink

    features that are protected from removal

  26. val scoringReader: Option[Reader[T]]

    Permalink

    reader to get the scoring data for comparison (optional - if not present will exclude based on training data features only)

  27. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  28. val textBinsFormula: (Summary, Int) ⇒ Int

    Permalink

    formula to compute the text features bin size.

    formula to compute the text features bin size. Input arguments are Summary and number of bins to use in computing feature distributions (histograms for numerics, hashes for strings). Output is the bins for the text features.

  29. val timePeriod: Option[TimePeriod]

    Permalink

    Time period used to apply circulate date transformation for date features, if not specified will use regular numeric feature transformation

  30. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  31. val trainingReader: Reader[T]

    Permalink

    reader to get the training data

  32. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  33. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  34. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from AnyRef

Inherited from Any

Ungrouped