Class/Object

com.salesforce.op

OpWorkflowModel

Related Docs: object OpWorkflowModel | package op

Permalink

class OpWorkflowModel extends OpWorkflowCore

Workflow model is a container and executor for the sequence of transformations that have been fit to the data to produce the desired output features

Linear Supertypes
OpWorkflowCore, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. OpWorkflowModel
  2. OpWorkflowCore
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new OpWorkflowModel(uid: String = UID[OpWorkflowModel], trainingParams: OpParams)

    Permalink

    uid

    unique identifier for this workflow model

    trainingParams

    params that were used during model training

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def applyTransformationsDAG(rawData: DataFrame, dag: StagesDAG, persistEveryKStages: Int)(implicit spark: SparkSession): DataFrame

    Permalink

    Efficiently applies all fitted stages grouping by level in the DAG where possible

    Efficiently applies all fitted stages grouping by level in the DAG where possible

    rawData

    data to transform

    dag

    computation graph

    persistEveryKStages

    breaks in computation to persist

    spark

    spark session

    returns

    transformed dataframe

    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def checkReadersAndFeatures(): Unit

    Permalink

    Check that readers and features are set and that params match them

    Check that readers and features are set and that params match them

    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  7. def checkUnmatchedFeatures(): Unit

    Permalink

    Determine if any of the raw features do not have a matching reader

    Determine if any of the raw features do not have a matching reader

    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def computeDataUpTo(feature: OPFeature, persistEveryKStages: Int = OpWorkflowModel.PersistEveryKStages)(implicit spark: SparkSession): DataFrame

    Permalink

    Returns a dataframe containing all the columns generated up to and including the feature input

    Returns a dataframe containing all the columns generated up to and including the feature input

    feature

    input feature to compute up to

    persistEveryKStages

    persist data in transforms every k stages for performance improvement

    returns

    Dataframe containing columns corresponding to all of the features generated up to the feature given

    Definition Classes
    OpWorkflowModel → OpWorkflowCore
    Exceptions thrown

    IllegalArgumentException if a feature is not part of this workflow

  10. def computeDataUpTo(feature: OPFeature, path: String)(implicit spark: SparkSession): Unit

    Permalink

    Computes a dataframe containing all the columns generated up to the feature input and saves it to the specified path in avro format

    Computes a dataframe containing all the columns generated up to the feature input and saves it to the specified path in avro format

    Definition Classes
    OpWorkflowCore
  11. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  12. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  13. def evaluate[T <: EvaluationMetrics](evaluator: OpEvaluatorBase[T], metricsPath: Option[String] = None, scoresPath: Option[String] = None)(implicit arg0: ClassTag[T], spark: SparkSession): T

    Permalink

    Load up the data by the reader, transform it and then evaluate

    Load up the data by the reader, transform it and then evaluate

    evaluator

    OP Evaluator

    metricsPath

    path to write out the metrics

    spark

    spark session

    returns

    evaluation metrics

  14. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  15. def findOriginStageId(feature: OPFeature): Option[Int]

    Permalink

    Looks at model parents to match parent stage for features (since features are created from the estimator not the fitted transformer)

    Looks at model parents to match parent stage for features (since features are created from the estimator not the fitted transformer)

    feature

    feature want to find origin stage for

    returns

    index of the parent stage

    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  16. def generateRawData()(implicit spark: SparkSession): DataFrame

    Permalink

    Used to generate dataframe from reader and raw features list

    Used to generate dataframe from reader and raw features list

    returns

    Dataframe with all the features generated + persisted

    Attributes
    protected
    Definition Classes
    OpWorkflowModel → OpWorkflowCore
  17. final def getBlacklist(): Array[OPFeature]

    Permalink

    Get the list of raw features which have been blacklisted

    Get the list of raw features which have been blacklisted

    returns

    blacklisted features

    Definition Classes
    OpWorkflowCore
  18. final def getBlacklistMapKeys(): Map[String, Set[String]]

    Permalink

    Get the list of Map Keys which have been blacklisted

    Get the list of Map Keys which have been blacklisted

    returns

    blacklisted map keys

    Definition Classes
    OpWorkflowCore
  19. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  20. def getMetadata(features: OPFeature*): Map[OPFeature, Metadata]

    Permalink

    Get the metadata associated with the features

    Get the metadata associated with the features

    features

    features to get metadata for

    returns

    metadata associated with the features

    Exceptions thrown

    IllegalArgumentException if a feature is not part of this workflow

  21. def getOriginStageOf[T <: FeatureType](feature: FeatureLike[T]): OpPipelineStage[T]

    Permalink

    Gets the fitted stage that generates the input feature

    Gets the fitted stage that generates the input feature

    T

    Type of feature

    feature

    feature want the origin stage for

    returns

    Fitted origin stage for feature

    Exceptions thrown

    IllegalArgumentException if a feature is not part of this workflow

  22. final def getParameters(): OpParams

    Permalink

    Get the parameter settings passed into the workflow

    Get the parameter settings passed into the workflow

    returns

    OpWorkflowParams set for this workflow

    Definition Classes
    OpWorkflowCore
  23. final def getRawFeatureDistributions(): Array[FeatureDistribution]

    Permalink

    Get raw feature distribution information computed on training and scoring data during raw feature filter

    Get raw feature distribution information computed on training and scoring data during raw feature filter

    returns

    sequence of feature distribution information

    Definition Classes
    OpWorkflowCore
  24. final def getRawScoringFeatureDistributions(): Array[FeatureDistribution]

    Permalink

    Get raw feature distribution information computed on scoring data during raw feature filter

    Get raw feature distribution information computed on scoring data during raw feature filter

    returns

    sequence of feature distribution information

    Definition Classes
    OpWorkflowCore
  25. final def getRawTrainingFeatureDistributions(): Array[FeatureDistribution]

    Permalink

    Get raw feature distribution information computed on training data during raw feature filter

    Get raw feature distribution information computed on training data during raw feature filter

    returns

    sequence of feature distribution information

    Definition Classes
    OpWorkflowCore
  26. final def getResultFeatures(): Array[OPFeature]

    Permalink

    Get the final features generated by the workflow

    Get the final features generated by the workflow

    returns

    result features for workflow

    Definition Classes
    OpWorkflowCore
  27. final def getStages(): Array[OPStage]

    Permalink

    Get the stages used in this workflow

    Get the stages used in this workflow

    returns

    stages in the workflow

    Definition Classes
    OpWorkflowCore
  28. def getUpdatedFeatures(features: Array[OPFeature]): Array[OPFeature]

    Permalink

    Gets the updated version of a feature when the DAG has been modified with a raw feature filter

    Gets the updated version of a feature when the DAG has been modified with a raw feature filter

    features

    feature want a the updated history for

    returns

    Updated instance of feature

    Exceptions thrown

    IllegalArgumentException if a feature is not part of this workflow

  29. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  30. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  31. lazy val log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  32. def modelInsights(feature: OPFeature): ModelInsights

    Permalink

    Get model insights for the model used to create the input feature.

    Get model insights for the model used to create the input feature. Will traverse the DAG to find the LAST model selector and sanity checker used in the creation of the selected feature

    feature

    feature to find model info for

    returns

    Model insights class containing summary of modeling and sanity checking

  33. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  34. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  35. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  36. def save(path: String, overwrite: Boolean = true): Unit

    Permalink

    Save this model to a path

    Save this model to a path

    path

    path to save the model

    overwrite

    should overwrite if the path exists

  37. def score(path: Option[String] = None, keepRawFeatures: Boolean = OpWorkflowModel.KeepRawFeatures, keepIntermediateFeatures: Boolean = ..., persistEveryKStages: Int = OpWorkflowModel.PersistEveryKStages, persistScores: Boolean = OpWorkflowModel.PersistScores)(implicit spark: SparkSession): DataFrame

    Permalink

    Load up the data as specified by the data reader then transform that data using the transformers specified in this workflow.

    Load up the data as specified by the data reader then transform that data using the transformers specified in this workflow. We will always keep the key and result features in the returned dataframe, but there are options to keep the other raw & intermediate features.

    This method optimizes scoring by grouping applying bulks of OpTransformer stages on each step. The rest of the stages go are applied sequentially (as org.apache.spark.ml.Pipeline does)

    path

    optional path to write out the scores to a file

    keepRawFeatures

    flag to enable keeping raw features in the output DataFrame as well

    keepIntermediateFeatures

    flag to enable keeping intermediate features in the output DataFrame as well

    persistEveryKStages

    how often to break up catalyst by persisting the data (applies for non OpTransformer stages only), to turn off set to Int.MaxValue (not recommended)

    persistScores

    should persist the final scores dataframe

    returns

    Dataframe that contains all the columns generated by the transformers in this workflow model as well as the key and result features, along with other features if the above flags are set to true.

  38. def scoreAndEvaluate(evaluator: OpEvaluatorBase[_ <: EvaluationMetrics], path: Option[String] = None, keepRawFeatures: Boolean = OpWorkflowModel.KeepRawFeatures, keepIntermediateFeatures: Boolean = ..., persistEveryKStages: Int = OpWorkflowModel.PersistEveryKStages, persistScores: Boolean = OpWorkflowModel.PersistScores, metricsPath: Option[String] = None)(implicit spark: SparkSession): (DataFrame, EvaluationMetrics)

    Permalink

    Load up the data as specified by the data reader then transform that data using the transformers specified in this workflow.

    Load up the data as specified by the data reader then transform that data using the transformers specified in this workflow. We will always keep the key and result features in the returned dataframe, but there are options to keep the other raw & intermediate features.

    This method optimizes scoring by grouping applying bulks of OpTransformer stages on each step. The rest of the stages go are applied sequentially (as org.apache.spark.ml.Pipeline does)

    evaluator

    evalutator to use for metrics generation

    path

    optional path to write out the scores to a file

    keepRawFeatures

    flag to enable keeping raw features in the output DataFrame as well

    keepIntermediateFeatures

    flag to enable keeping intermediate features in the output DataFrame as well

    persistEveryKStages

    how often to break up catalyst by persisting the data (applies for non OpTransformer stages only), to turn off set to Int.MaxValue (not recommended)

    persistScores

    should persist the final scores dataframe

    metricsPath

    optional path to write out the metrics to a file

    returns

    Dataframe that contains all the columns generated by the transformers in this workflow model as well as the key and result features, along with other features if the above flags are set to true. Also returns metrics computed with evaluator.

  39. def setBlacklist(features: Array[OPFeature]): OpWorkflowModel.this.type

    Permalink
    Attributes
    protected[com.salesforce.op]
  40. def setBlacklistMapKeys(mapKeys: Map[String, Set[String]]): OpWorkflowModel.this.type

    Permalink
    Attributes
    protected[com.salesforce.op]
  41. def setFeatures(features: Array[OPFeature]): OpWorkflowModel.this.type

    Permalink
    Attributes
    protected[com.salesforce.op]
  42. final def setInputDataset[T](ds: Dataset[T], key: (T) ⇒ String = ReaderKey.randomKey)(implicit arg0: scala.reflect.api.JavaUniverse.WeakTypeTag[T]): OpWorkflowModel.this.type

    Permalink

    Set input dataset which contains columns corresponding to the raw features used in the workflow The type of the dataset (Dataset[T]) must match the type of the FeatureBuilders[T] used to generate the raw features

    Set input dataset which contains columns corresponding to the raw features used in the workflow The type of the dataset (Dataset[T]) must match the type of the FeatureBuilders[T] used to generate the raw features

    ds

    input dataset for workflow

    key

    key extract function

    returns

    this workflow

    Definition Classes
    OpWorkflowCore
  43. final def setInputRDD[T](rdd: RDD[T], key: (T) ⇒ String = ReaderKey.randomKey)(implicit arg0: scala.reflect.api.JavaUniverse.WeakTypeTag[T]): OpWorkflowModel.this.type

    Permalink

    Set input rdd which contains columns corresponding to the raw features used in the workflow The type of the rdd (RDD[T]) must match the type of the FeatureBuilders[T] used to generate the raw features

    Set input rdd which contains columns corresponding to the raw features used in the workflow The type of the rdd (RDD[T]) must match the type of the FeatureBuilders[T] used to generate the raw features

    rdd

    input rdd for workflow

    key

    key extract function

    returns

    this workflow

    Definition Classes
    OpWorkflowCore
  44. final def setParameters(newParams: OpParams): OpWorkflowModel.this.type

    Permalink

    Set reader parameters from OpWorkflowParams object for run (stage parameters passed in will have no effect)

    Set reader parameters from OpWorkflowParams object for run (stage parameters passed in will have no effect)

    newParams

    new parameter values

  45. final def setReader(r: Reader[_]): OpWorkflowModel.this.type

    Permalink

    Set data reader that will be used to generate data frame for stages

    Set data reader that will be used to generate data frame for stages

    r

    reader for workflow

    returns

    this workflow

    Definition Classes
    OpWorkflowCore
  46. def summary(): String

    Permalink

    Extracts all summary metadata from transformers in JSON format

    Extracts all summary metadata from transformers in JSON format

    returns

    json string summary

  47. def summaryJson(): JValue

    Permalink

    Extracts all summary metadata from transformers in JSON format

    Extracts all summary metadata from transformers in JSON format

    returns

    json summary

  48. def summaryPretty(insights: ModelInsights = ..., topK: Int = 15): String

    Permalink

    Generated high level model summary in a compact print friendly format containing: selected model info, model evaluation results and feature correlations/contributions/cramersV values.

    Generated high level model summary in a compact print friendly format containing: selected model info, model evaluation results and feature correlations/contributions/cramersV values.

    insights

    model insights to compute the summary against

    topK

    top K of feature correlations/contributions/cramersV values to print

    returns

    high level model summary in a compact print friendly format

  49. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  50. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  51. val trainingParams: OpParams

    Permalink

    params that were used during model training

  52. val uid: String

    Permalink

    unique identifier for this workflow model

    unique identifier for this workflow model

    Definition Classes
    OpWorkflowModel → OpWorkflowCore
  53. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  54. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  55. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  56. final def withWorkflowCV: OpWorkflowModel.this.type

    Permalink

    :: Experimental :: Decides whether the cross-validation/train-validation-split will be done at workflow level This will remove issues with data leakage, however it will impact the runtime

    :: Experimental :: Decides whether the cross-validation/train-validation-split will be done at workflow level This will remove issues with data leakage, however it will impact the runtime

    returns

    this workflow that will train part of the DAG in the cross-validation/train validation split

    Definition Classes
    OpWorkflowCore
    Annotations
    @Experimental()

Inherited from OpWorkflowCore

Inherited from AnyRef

Inherited from Any

Ungrouped