Class/Object

com.salesforce.op

OpWorkflowModel

Related Docs: object OpWorkflowModel | package op

Permalink

class OpWorkflowModel extends OpWorkflowCore

Workflow model is a container and executor for the sequence of transformations that have been fit to the data to produce the desired output features

Linear Supertypes
OpWorkflowCore, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. OpWorkflowModel
  2. OpWorkflowCore
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new OpWorkflowModel(uid: String = UID[OpWorkflowModel], trainingParams: OpParams)

    Permalink

    uid

    unique identifier for this workflow model

    trainingParams

    params that were used during model training

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def applyTransformationsDAG(rawData: DataFrame, dag: StagesDAG, persistEveryKStages: Int)(implicit spark: SparkSession): DataFrame

    Permalink

    Efficiently applies all fitted stages grouping by level in the DAG where possible

    Efficiently applies all fitted stages grouping by level in the DAG where possible

    rawData

    data to transform

    dag

    computation graph

    persistEveryKStages

    breaks in computation to persist

    spark

    spark session

    returns

    transformed dataframe

    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. var blocklistedFeatures: Array[OPFeature]

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  7. var blocklistedMapKeys: Map[String, Set[String]]

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  8. def checkReadersAndFeatures(): Unit

    Permalink

    Check that readers and features are set and that params match them

    Check that readers and features are set and that params match them

    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  9. def checkUnmatchedFeatures(): Unit

    Permalink

    Determine if any of the raw features do not have a matching reader

    Determine if any of the raw features do not have a matching reader

    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  10. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  11. def computeDataUpTo(feature: OPFeature, persistEveryKStages: Int = OpWorkflowModel.PersistEveryKStages)(implicit spark: SparkSession): DataFrame

    Permalink

    Returns a dataframe containing all the columns generated up to and including the feature input

    Returns a dataframe containing all the columns generated up to and including the feature input

    feature

    input feature to compute up to

    persistEveryKStages

    persist data in transforms every k stages for performance improvement

    returns

    Dataframe containing columns corresponding to all of the features generated up to the feature given

    Definition Classes
    OpWorkflowModel → OpWorkflowCore
    Exceptions thrown

    IllegalArgumentException if a feature is not part of this workflow

  12. def computeDataUpTo(feature: OPFeature, path: String)(implicit spark: SparkSession): Unit

    Permalink

    Computes a dataframe containing all the columns generated up to the feature input and saves it to the specified path in avro format

    Computes a dataframe containing all the columns generated up to the feature input and saves it to the specified path in avro format

    Definition Classes
    OpWorkflowCore
  13. def copy(): OpWorkflowModel

    Permalink

    Creates a copy of this OpWorkflowModel instance

    Creates a copy of this OpWorkflowModel instance

    returns

    copy of this OpWorkflowModel instance

  14. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  15. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  16. def evaluate[T <: EvaluationMetrics](evaluator: OpEvaluatorBase[T], metricsPath: Option[String] = None, scoresPath: Option[String] = None)(implicit arg0: ClassTag[T], spark: SparkSession): T

    Permalink

    Load up the data by the reader, transform it and then evaluate

    Load up the data by the reader, transform it and then evaluate

    evaluator

    OP Evaluator

    metricsPath

    path to write out the metrics

    spark

    spark session

    returns

    evaluation metrics

  17. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  18. def findOriginStageId(feature: OPFeature): Option[Int]

    Permalink

    Looks at model parents to match parent stage for features (since features are created from the estimator not the fitted transformer)

    Looks at model parents to match parent stage for features (since features are created from the estimator not the fitted transformer)

    feature

    feature want to find origin stage for

    returns

    index of the parent stage

    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  19. def generateRawData()(implicit spark: SparkSession): DataFrame

    Permalink

    Used to generate dataframe from reader and raw features list

    Used to generate dataframe from reader and raw features list

    returns

    Dataframe with all the features generated + persisted

    Attributes
    protected
    Definition Classes
    OpWorkflowModel → OpWorkflowCore
  20. final def getAllFeatures(): Array[OPFeature]

    Permalink

    Get all the features that potentially are generated by the workflow: raw, intermediate and result features

    Get all the features that potentially are generated by the workflow: raw, intermediate and result features

    returns

    all the features that potentially are generated by the workflow: raw, intermediate and result features

    Definition Classes
    OpWorkflowCore
  21. final def getBlocklist(): Array[OPFeature]

    Permalink

    Get the list of raw features which have been blocklisted

    Get the list of raw features which have been blocklisted

    returns

    blocklisted features

    Definition Classes
    OpWorkflowCore
  22. final def getBlocklistMapKeys(): Map[String, Set[String]]

    Permalink

    Get the list of Map Keys which have been blocklisted

    Get the list of Map Keys which have been blocklisted

    returns

    blocklisted map keys

    Definition Classes
    OpWorkflowCore
  23. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  24. def getMetadata(features: OPFeature*): Map[OPFeature, Metadata]

    Permalink

    Get the metadata associated with the features

    Get the metadata associated with the features

    features

    features to get metadata for

    returns

    metadata associated with the features

    Exceptions thrown

    IllegalArgumentException if a feature is not part of this workflow

  25. def getOriginStageOf[T <: FeatureType](feature: FeatureLike[T]): OpPipelineStage[T]

    Permalink

    Gets the fitted stage that generates the input feature

    Gets the fitted stage that generates the input feature

    T

    Type of feature

    feature

    feature want the origin stage for

    returns

    Fitted origin stage for feature

    Exceptions thrown

    IllegalArgumentException if a feature is not part of this workflow

  26. final def getParameters(): OpParams

    Permalink

    Get the parameter settings passed into the workflow

    Get the parameter settings passed into the workflow

    returns

    OpWorkflowParams set for this workflow

    Definition Classes
    OpWorkflowCore
  27. final def getRawFeatureDistributions(): Seq[FeatureDistribution]

    Permalink

    Get raw feature distribution information computed on training and scoring data during raw feature filter

    Get raw feature distribution information computed on training and scoring data during raw feature filter

    returns

    sequence of feature distribution information

    Definition Classes
    OpWorkflowCore
  28. final def getRawFeatureFilterResults(): RawFeatureFilterResults

    Permalink

    Get raw feature filter results (filter configuration, feature distributions, and feature exclusion reasons)

    Get raw feature filter results (filter configuration, feature distributions, and feature exclusion reasons)

    returns

    raw feature filter results

    Definition Classes
    OpWorkflowCore
  29. final def getRawFeatures(): Array[OPFeature]

    Permalink

    Get the raw features generated by the workflow

    Get the raw features generated by the workflow

    returns

    raw features for workflow

    Definition Classes
    OpWorkflowCore
  30. final def getRawScoringFeatureDistributions(): Seq[FeatureDistribution]

    Permalink

    Get raw feature distribution information computed on scoring data during raw feature filter

    Get raw feature distribution information computed on scoring data during raw feature filter

    returns

    sequence of feature distribution information

    Definition Classes
    OpWorkflowCore
  31. final def getRawTrainingFeatureDistributions(): Seq[FeatureDistribution]

    Permalink

    Get raw feature distribution information computed on training data during raw feature filter

    Get raw feature distribution information computed on training data during raw feature filter

    returns

    sequence of feature distribution information

    Definition Classes
    OpWorkflowCore
  32. final def getReader(): Reader[_]

    Permalink

    Get data reader that will be used to generate data frame for stages

    Get data reader that will be used to generate data frame for stages

    returns

    reader for workflow

    Definition Classes
    OpWorkflowCore
  33. final def getResultFeatures(): Array[OPFeature]

    Permalink

    Get the final features generated by the workflow

    Get the final features generated by the workflow

    returns

    result features for workflow

    Definition Classes
    OpWorkflowCore
  34. final def getStages(): Array[OPStage]

    Permalink

    Get the stages used in this workflow

    Get the stages used in this workflow

    returns

    stages in the workflow

    Definition Classes
    OpWorkflowCore
  35. def getUpdatedFeatures(features: Array[OPFeature]): Array[OPFeature]

    Permalink

    Gets the updated version of a feature when the DAG has been modified with a raw feature filter

    Gets the updated version of a feature when the DAG has been modified with a raw feature filter

    features

    feature want a the updated history for

    returns

    Updated instance of feature

    Exceptions thrown

    IllegalArgumentException if a feature is not part of this workflow

  36. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  37. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  38. final def isWorkflowCV: Boolean

    Permalink

    Whether the cross-validation/train-validation-split will be done at workflow level

    Whether the cross-validation/train-validation-split will be done at workflow level

    returns

    true if the cross-validation will be done at workflow level, false otherwise

    Definition Classes
    OpWorkflowCore
  39. var isWorkflowCVEnabled: Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  40. lazy val log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  41. def modelInsights(feature: OPFeature): ModelInsights

    Permalink

    Get model insights for the model used to create the input feature.

    Get model insights for the model used to create the input feature. Will traverse the DAG to find the LAST model selector and sanity checker used in the creation of the selected feature

    feature

    feature to find model info for

    returns

    Model insights class containing summary of modeling and sanity checking

  42. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  43. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  44. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  45. var parameters: OpParams

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  46. var rawFeatureFilterResults: RawFeatureFilterResults

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  47. var rawFeatures: Array[OPFeature]

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  48. var reader: Option[Reader[_]]

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  49. var resultFeatures: Array[OPFeature]

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  50. def save(path: String, overwrite: Boolean = true, modelStagingDir: String = WorkflowFileReader.modelStagingDir): Unit

    Permalink

    Save this model to a path

    Save this model to a path

    path

    path to save the model

    overwrite

    should overwrite if the path exists

    modelStagingDir

    local folder to copy and unpack stored model to for loading

  51. def score(path: Option[String] = None, keepRawFeatures: Boolean = OpWorkflowModel.KeepRawFeatures, keepIntermediateFeatures: Boolean = ..., persistEveryKStages: Int = OpWorkflowModel.PersistEveryKStages, persistScores: Boolean = OpWorkflowModel.PersistScores)(implicit spark: SparkSession): DataFrame

    Permalink

    Load up the data as specified by the data reader then transform that data using the transformers specified in this workflow.

    Load up the data as specified by the data reader then transform that data using the transformers specified in this workflow. We will always keep the key and result features in the returned dataframe, but there are options to keep the other raw & intermediate features.

    This method optimizes scoring by grouping applying bulks of OpTransformer stages on each step. The rest of the stages go are applied sequentially (as org.apache.spark.ml.Pipeline does)

    path

    optional path to write out the scores to a file

    keepRawFeatures

    flag to enable keeping raw features in the output DataFrame as well

    keepIntermediateFeatures

    flag to enable keeping intermediate features in the output DataFrame as well

    persistEveryKStages

    how often to break up catalyst by persisting the data (applies for non OpTransformer stages only), to turn off set to Int.MaxValue (not recommended)

    persistScores

    should persist the final scores dataframe

    returns

    Dataframe that contains all the columns generated by the transformers in this workflow model as well as the key and result features, along with other features if the above flags are set to true.

  52. def scoreAndEvaluate(evaluator: OpEvaluatorBase[_ <: EvaluationMetrics], path: Option[String] = None, keepRawFeatures: Boolean = OpWorkflowModel.KeepRawFeatures, keepIntermediateFeatures: Boolean = ..., persistEveryKStages: Int = OpWorkflowModel.PersistEveryKStages, persistScores: Boolean = OpWorkflowModel.PersistScores, metricsPath: Option[String] = None)(implicit spark: SparkSession): (DataFrame, EvaluationMetrics)

    Permalink

    Load up the data as specified by the data reader then transform that data using the transformers specified in this workflow.

    Load up the data as specified by the data reader then transform that data using the transformers specified in this workflow. We will always keep the key and result features in the returned dataframe, but there are options to keep the other raw & intermediate features.

    This method optimizes scoring by grouping applying bulks of OpTransformer stages on each step. The rest of the stages go are applied sequentially (as org.apache.spark.ml.Pipeline does)

    evaluator

    evalutator to use for metrics generation

    path

    optional path to write out the scores to a file

    keepRawFeatures

    flag to enable keeping raw features in the output DataFrame as well

    keepIntermediateFeatures

    flag to enable keeping intermediate features in the output DataFrame as well

    persistEveryKStages

    how often to break up catalyst by persisting the data (applies for non OpTransformer stages only), to turn off set to Int.MaxValue (not recommended)

    persistScores

    should persist the final scores dataframe

    metricsPath

    optional path to write out the metrics to a file

    returns

    Dataframe that contains all the columns generated by the transformers in this workflow model as well as the key and result features, along with other features if the above flags are set to true. Also returns metrics computed with evaluator.

  53. def setBlocklist(features: Array[OPFeature]): OpWorkflowModel.this.type

    Permalink
    Attributes
    protected[com.salesforce.op]
  54. def setBlocklistMapKeys(mapKeys: Map[String, Set[String]]): OpWorkflowModel.this.type

    Permalink
    Attributes
    protected[com.salesforce.op]
  55. def setFeatures(features: Array[OPFeature]): OpWorkflowModel.this.type

    Permalink
    Attributes
    protected[com.salesforce.op]
  56. final def setInputDataset[T](ds: Dataset[T], key: (T) ⇒ String = ReaderKey.randomKey)(implicit arg0: scala.reflect.api.JavaUniverse.WeakTypeTag[T]): OpWorkflowModel.this.type

    Permalink

    Set input dataset which contains columns corresponding to the raw features used in the workflow The type of the dataset (Dataset[T]) must match the type of the FeatureBuilders[T] used to generate the raw features

    Set input dataset which contains columns corresponding to the raw features used in the workflow The type of the dataset (Dataset[T]) must match the type of the FeatureBuilders[T] used to generate the raw features

    ds

    input dataset for workflow

    key

    key extract function

    returns

    this workflow

    Definition Classes
    OpWorkflowCore
  57. final def setInputRDD[T](rdd: RDD[T], key: (T) ⇒ String = ReaderKey.randomKey)(implicit arg0: scala.reflect.api.JavaUniverse.WeakTypeTag[T]): OpWorkflowModel.this.type

    Permalink

    Set input rdd which contains columns corresponding to the raw features used in the workflow The type of the rdd (RDD[T]) must match the type of the FeatureBuilders[T] used to generate the raw features

    Set input rdd which contains columns corresponding to the raw features used in the workflow The type of the rdd (RDD[T]) must match the type of the FeatureBuilders[T] used to generate the raw features

    rdd

    input rdd for workflow

    key

    key extract function

    returns

    this workflow

    Definition Classes
    OpWorkflowCore
  58. final def setParameters(newParams: OpParams): OpWorkflowModel.this.type

    Permalink

    Set reader parameters from OpWorkflowParams object for run (stage parameters passed in will have no effect)

    Set reader parameters from OpWorkflowParams object for run (stage parameters passed in will have no effect)

    newParams

    new parameter values

  59. final def setReader(r: Reader[_]): OpWorkflowModel.this.type

    Permalink

    Set data reader that will be used to generate data frame for stages

    Set data reader that will be used to generate data frame for stages

    r

    reader for workflow

    returns

    this workflow

    Definition Classes
    OpWorkflowCore
  60. var stages: Array[OPStage]

    Permalink
    Attributes
    protected
    Definition Classes
    OpWorkflowCore
  61. def summary(): String

    Permalink

    Extracts all summary metadata from transformers in JSON format

    Extracts all summary metadata from transformers in JSON format

    returns

    json string summary

  62. def summaryJson(): JValue

    Permalink

    Extracts all summary metadata from transformers in JSON format

    Extracts all summary metadata from transformers in JSON format

    returns

    json summary

  63. def summaryPretty(insights: ModelInsights = ..., topK: Int = 15): String

    Permalink

    Generated high level model summary in a compact print friendly format containing: selected model info, model evaluation results and feature correlations/contributions/cramersV values.

    Generated high level model summary in a compact print friendly format containing: selected model info, model evaluation results and feature correlations/contributions/cramersV values.

    insights

    model insights to compute the summary against

    topK

    top K of feature correlations/contributions/cramersV values to print

    returns

    high level model summary in a compact print friendly format

  64. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  65. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  66. val trainingParams: OpParams

    Permalink

    params that were used during model training

  67. val uid: String

    Permalink

    unique identifier for this workflow model

    unique identifier for this workflow model

    Definition Classes
    OpWorkflowModel → OpWorkflowCore
  68. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  69. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  70. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  71. final def withWorkflowCV: OpWorkflowModel.this.type

    Permalink

    :: Experimental :: Decides whether the cross-validation/train-validation-split will be done at workflow level This will remove issues with data leakage, however it will impact the runtime

    :: Experimental :: Decides whether the cross-validation/train-validation-split will be done at workflow level This will remove issues with data leakage, however it will impact the runtime

    returns

    this workflow that will train part of the DAG in the cross-validation/train validation split

    Definition Classes
    OpWorkflowCore
    Annotations
    @Experimental()

Inherited from OpWorkflowCore

Inherited from AnyRef

Inherited from Any

Ungrouped