Class

com.salesforce.op.stages.impl.feature

OpLDA

Related Doc: package feature

Permalink

class OpLDA extends OpEstimatorWrapper[OPVector, OPVector, LDA, LDAModel]

Wrapper around spark ml LDA (Latent Dirichlet Allocation) for use with OP pipelines

Linear Supertypes
OpEstimatorWrapper[OPVector, OPVector, LDA, LDAModel], SwUnaryEstimator[OPVector, OPVector, LDAModel, LDA], SparkWrapperParams[LDA], OpPipelineStage1[OPVector, OPVector], HasOut[OPVector], HasIn1, OpPipelineStage[OPVector], OpPipelineStageBase, MLWritable, OpPipelineStageParams, InputParams, Estimator[SwUnaryModel[OPVector, OPVector, LDAModel]], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. OpLDA
  2. OpEstimatorWrapper
  3. SwUnaryEstimator
  4. SparkWrapperParams
  5. OpPipelineStage1
  6. HasOut
  7. HasIn1
  8. OpPipelineStage
  9. OpPipelineStageBase
  10. MLWritable
  11. OpPipelineStageParams
  12. InputParams
  13. Estimator
  14. PipelineStage
  15. Logging
  16. Params
  17. Serializable
  18. Serializable
  19. Identifiable
  20. AnyRef
  21. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new OpLDA(uid: String = UID[OpLDA])

    Permalink

Type Members

  1. final type InputFeatures = FeatureLike[OPVector]

    Permalink

    Input Features type

    Input Features type

    Definition Classes
    OpPipelineStage1OpPipelineStageInputParams
  2. final type OutputFeatures = FeatureLike[OPVector]

    Permalink
    Definition Classes
    OpPipelineStageOpPipelineStageBase

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. final def checkInputLength(features: Array[_]): Boolean

    Permalink

    Checks the input length

    Checks the input length

    features

    input features

    returns

    true is input size as expected, false otherwise

    Definition Classes
    OpPipelineStage1InputParams
  7. def checkSerializable: Try[Unit]

    Permalink

    Check if the stage is serializable

    Check if the stage is serializable

    returns

    Failure if not serializable

    Definition Classes
    OpPipelineStageBase
  8. final def clear(param: Param[_]): OpLDA.this.type

    Permalink
    Definition Classes
    Params
  9. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  10. final def copy(extra: ParamMap): OpLDA.this.type

    Permalink

    This method is used to make a copy of the instance with new parameters in several methods in spark internals Default will find the constructor and make a copy for any class (AS LONG AS ALL CONSTRUCTOR PARAMS ARE VALS, this is why type tags are written as implicit vals in base classes).

    This method is used to make a copy of the instance with new parameters in several methods in spark internals Default will find the constructor and make a copy for any class (AS LONG AS ALL CONSTRUCTOR PARAMS ARE VALS, this is why type tags are written as implicit vals in base classes).

    Note: that the convention in spark is to have the uid be a constructor argument, so that copies will share a uid with the original (developers should follow this convention).

    extra

    new parameters want to add to instance

    returns

    a new instance with the same uid

    Definition Classes
    OpPipelineStageBase → Params
  11. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  12. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  13. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  14. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  15. val estimator: LDA

    Permalink

    the estimator to wrap

    the estimator to wrap

    Definition Classes
    OpEstimatorWrapper
  16. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  17. def explainParams(): String

    Permalink
    Definition Classes
    Params
  18. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  19. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  20. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  21. def fit(dataset: Dataset[_]): SwUnaryModel[OPVector, OPVector, LDAModel]

    Permalink
    Definition Classes
    SwUnaryEstimator → Estimator
  22. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[SwUnaryModel[OPVector, OPVector, LDAModel]]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  23. def fit(dataset: Dataset[_], paramMap: ParamMap): SwUnaryModel[OPVector, OPVector, LDAModel]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  24. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): SwUnaryModel[OPVector, OPVector, LDAModel]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  25. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  26. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  27. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  28. def getInputColParamNames(): Array[String]

    Permalink

    Gets names of parameters that control input columns for Spark stage

    Gets names of parameters that control input columns for Spark stage

    Definition Classes
    SparkWrapperParams
  29. final def getInputFeature[T <: FeatureType](i: Int): Option[FeatureLike[T]]

    Permalink

    Gets an input feature Note: this method IS NOT safe to use outside the driver, please use getTransientFeature method instead

    Gets an input feature Note: this method IS NOT safe to use outside the driver, please use getTransientFeature method instead

    returns

    array of features

    Definition Classes
    InputParams
    Exceptions thrown

    NoSuchElementException if the features are not set

    RuntimeException in case one of the features is null

  30. final def getInputFeatures(): Array[OPFeature]

    Permalink

    Gets the input features Note: this method IS NOT safe to use outside the driver, please use getTransientFeatures method instead

    Gets the input features Note: this method IS NOT safe to use outside the driver, please use getTransientFeatures method instead

    returns

    array of features

    Definition Classes
    InputParams
    Exceptions thrown

    NoSuchElementException if the features are not set

    RuntimeException in case one of the features is null

  31. final def getInputSchema(): StructType

    Permalink
    Definition Classes
    OpPipelineStageParams
  32. final def getMetadata(): Metadata

    Permalink
    Definition Classes
    OpPipelineStageParams
  33. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  34. def getOutput(): FeatureLike[OPVector]

    Permalink

    Output features that will be created by this stage

    Output features that will be created by this stage

    returns

    feature of type OutputFeatures

    Definition Classes
    HasOut → OpPipelineStageBase
  35. def getOutputColParamNames(): Array[String]

    Permalink

    Gets names of parameters that control output columns for Spark stage

    Gets names of parameters that control output columns for Spark stage

    Definition Classes
    SparkWrapperParams
  36. final def getOutputFeatureName: String

    Permalink

    Name of output feature (i.e.

    Name of output feature (i.e. column created by this stage)

    Definition Classes
    OpPipelineStage
  37. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  38. def getSparkMlStage(): Option[LDA]

    Permalink

    Method to access the spark stage being wrapped

    Method to access the spark stage being wrapped

    returns

    Option of spark ml stage

    Definition Classes
    SparkWrapperParams
  39. def getStageSavePath(): Option[String]

    Permalink

    Gets a save path for wrapped spark stage

    Gets a save path for wrapped spark stage

    Definition Classes
    SparkWrapperParams
  40. final def getTransientFeature(i: Int): Option[TransientFeature]

    Permalink

    Gets an input feature at index i

    Gets an input feature at index i

    i

    input index

    returns

    maybe an input feature

    Definition Classes
    InputParams
  41. final def getTransientFeatures(): Array[TransientFeature]

    Permalink

    Gets the input Features

    Gets the input Features

    returns

    input features

    Definition Classes
    InputParams
  42. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  43. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  44. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  45. final def in1: TransientFeature

    Permalink
    Attributes
    protected
    Definition Classes
    HasIn1
  46. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  47. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  48. final def inputAsArray(in: InputFeatures): Array[OPFeature]

    Permalink

    Function to convert InputFeatures to an Array of FeatureLike

    Function to convert InputFeatures to an Array of FeatureLike

    returns

    an Array of FeatureLike

    Definition Classes
    OpPipelineStage1InputParams
  49. val inputParamName: String

    Permalink
    Definition Classes
    OpLDASwUnaryEstimator
  50. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  51. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  52. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  53. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  54. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  55. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  56. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  57. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  58. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  59. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  60. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  61. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  62. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  63. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  64. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  65. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  66. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  67. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  68. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  69. def onGetMetadata(): Unit

    Permalink

    Function to be called on getMetadata

    Function to be called on getMetadata

    Attributes
    protected
    Definition Classes
    OpPipelineStageParams
  70. def onSetInput(): Unit

    Permalink

    Function to be called on setInput

    Function to be called on setInput

    Attributes
    protected
    Definition Classes
    InputParams
  71. val operationName: String

    Permalink

    unique name of the operation this stage performs

    unique name of the operation this stage performs

    Definition Classes
    SwUnaryEstimatorOpPipelineStageBase
  72. final def outputAsArray(out: OutputFeatures): Array[OPFeature]

    Permalink

    Function to convert OutputFeatures to an Array of FeatureLike

    Function to convert OutputFeatures to an Array of FeatureLike

    returns

    an Array of FeatureLike

    Definition Classes
    OpPipelineStageOpPipelineStageBase
  73. def outputFeatureUid: String

    Permalink
    Attributes
    protected[com.salesforce.op]
    Definition Classes
    OpPipelineStage1OpPipelineStage
  74. def outputIsResponse: Boolean

    Permalink

    Should output feature be a response? Yes, if any of the input features are.

    Should output feature be a response? Yes, if any of the input features are.

    returns

    true if the the output feature should be a response

    Definition Classes
    OpPipelineStage
  75. val outputParamName: String

    Permalink
    Definition Classes
    OpLDASwUnaryEstimator
  76. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  77. def save(path: String): Unit

    Permalink
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  78. final def set(paramPair: ParamPair[_]): OpLDA.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  79. final def set(param: String, value: Any): OpLDA.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  80. final def set[T](param: Param[T], value: T): OpLDA.this.type

    Permalink
    Definition Classes
    Params
  81. def setCheckpointInterval(value: Int): OpLDA.this.type

    Permalink

    Set param for checkpoint interval (>= 1) or disable checkpoint (-1).

    Set param for checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations.

  82. final def setDefault(paramPairs: ParamPair[_]*): OpLDA.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  83. final def setDefault[T](param: Param[T], value: T): OpLDA.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  84. def setDocConcentation(value: Array[Double]): OpLDA.this.type

    Permalink

    Set param for concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta").

    Set param for concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta").

    This is the parameter to a Dirichlet distribution, where larger values mean more smoothing (more regularization).

    If not set by the user, then docConcentration is set automatically. If set to singleton vector [alpha], then alpha is replicated to a vector of length k in fitting. Otherwise, the docConcentration vector must be length k. (default = automatic)

    Optimizer-specific parameter settings:

    • EM
      • Currently only supports symmetric distributions, so all values in the vector should be the same.
      • Values should be > 1.0
      • default = uniformly (50 / k) + 1, where 50/k is common in LDA libraries and +1 follows from Asuncion et al. (2009), who recommend a +1 adjustment for EM.
    • Online
  85. def setDocConcentration(value: Double): OpLDA.this.type

    Permalink
  86. final def setInput(features: InputFeatures): OpLDA.this.type

    Permalink

    Input features that will be used by the stage

    Input features that will be used by the stage

    returns

    feature of type InputFeatures

    Definition Classes
    OpPipelineStageBase
  87. final def setInputFeatures[S <: OPFeature](features: Array[S]): OpLDA.this.type

    Permalink

    Sets input features

    Sets input features

    S

    feature like type

    features

    array of input features

    returns

    this stage

    Attributes
    protected
    Definition Classes
    InputParams
  88. def setK(value: Int): OpLDA.this.type

    Permalink

    Set param for number of topics (clusters) to infer.

    Set param for number of topics (clusters) to infer. Must be > 1. Default: 10.

  89. def setMaxIter(value: Int): OpLDA.this.type

    Permalink

    Set param for maximum number of iterations (>= 0).

    Set param for maximum number of iterations (>= 0). Default: 20

  90. final def setMetadata(m: Metadata): OpLDA.this.type

    Permalink
    Definition Classes
    OpPipelineStageParams
  91. def setOptimizer(value: String): OpLDA.this.type

    Permalink

    Set param for optimizer or inference algorithm used to estimate the LDA model.

    Set param for optimizer or inference algorithm used to estimate the LDA model.

    Currently supported (case-insensitive):

    • "online": Online Variational Bayes (default)
    • "em": Expectation-Maximization

    For details, see the following papers:

  92. def setOutputFeatureName(name: String): OpLDA.this.type

    Permalink
    Definition Classes
    OpPipelineStage
  93. def setSeed(value: Long): OpLDA.this.type

    Permalink

    Set param for random seed.

  94. def setSparkMlStage(stage: Option[LDA]): OpLDA.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    SparkWrapperParams
  95. def setStageSavePath(path: String): OpLDA.this.type

    Permalink

    Sets a save path for wrapped spark stage

    Sets a save path for wrapped spark stage

    Definition Classes
    SparkWrapperParams
  96. def setSubsamplingRate(value: Double): OpLDA.this.type

    Permalink

    For Online optimizer only: optimizer = "online".

    For Online optimizer only: optimizer = "online".

    Set param for fraction of the corpus to be sampled and used in each iteration of mini-batch gradient descent, in range (0, 1].

    Note that this should be adjusted in synch with LDA.maxIter so the entire corpus is used. Specifically, set both so that maxIterations * miniBatchFraction >= 1.

    Note: This is the same as the miniBatchFraction parameter in org.apache.spark.mllib.clustering.OnlineLDAOptimizer.

    Default: 0.05, i.e., 5% of total documents.

  97. def setTopicConcentration(value: Double): OpLDA.this.type

    Permalink

    Set param for concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms.

    Set param for concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms.

    This is the parameter to a symmetric Dirichlet distribution.

    Note: The topics' distributions over terms are called "beta" in the original LDA paper by Blei et al., but are called "phi" in many later papers such as Asuncion et al., 2009.

    If not set by the user, then topicConcentration is set automatically. (default = automatic)

    Optimizer-specific parameter settings:

    • EM
      • Value should be > 1.0
      • default = 0.1 + 1, where 0.1 gives a small amount of smoothing and +1 follows Asuncion et al. (2009), who recommend a +1 adjustment for EM.
    • Online
  98. final val sparkInputColParamNames: StringArrayParam

    Permalink
    Definition Classes
    SparkWrapperParams
  99. final val sparkMlStage: SparkStageParam[LDA]

    Permalink
    Definition Classes
    SparkWrapperParams
  100. final val sparkOutputColParamNames: StringArrayParam

    Permalink
    Definition Classes
    SparkWrapperParams
  101. final def stageName: String

    Permalink

    Stage unique name consisting of the stage operation name and uid

    Stage unique name consisting of the stage operation name and uid

    returns

    stage name

    Definition Classes
    OpPipelineStageBase
  102. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  103. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  104. final def transformSchema(schema: StructType): StructType

    Permalink

    This function translates the input and output features into spark schema checks and changes that will occur on the underlying data frame

    This function translates the input and output features into spark schema checks and changes that will occur on the underlying data frame

    schema

    schema of the input data frame

    returns

    a new schema with the output features added

    Definition Classes
    OpPipelineStageBase
  105. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  106. implicit val tti: scala.reflect.api.JavaUniverse.TypeTag[OPVector]

    Permalink

    type tag for input

    type tag for input

    Definition Classes
    SwUnaryEstimator
  107. implicit val tto: scala.reflect.api.JavaUniverse.TypeTag[OPVector]

    Permalink

    type tag for output

    type tag for output

    Definition Classes
    SwUnaryEstimator → HasOut
  108. implicit val ttov: scala.reflect.api.JavaUniverse.TypeTag[Value]

    Permalink

    type tag for output value

    type tag for output value

    Definition Classes
    SwUnaryEstimator → HasOut
  109. val uid: String

    Permalink

    stage uid

    stage uid

    Definition Classes
    SwUnaryEstimator → Identifiable
  110. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  111. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  112. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  113. final def write: MLWriter

    Permalink
    Definition Classes
    OpPipelineStageBase → MLWritable

Inherited from OpEstimatorWrapper[OPVector, OPVector, LDA, LDAModel]

Inherited from SwUnaryEstimator[OPVector, OPVector, LDAModel, LDA]

Inherited from SparkWrapperParams[LDA]

Inherited from HasOut[OPVector]

Inherited from HasIn1

Inherited from OpPipelineStage[OPVector]

Inherited from OpPipelineStageBase

Inherited from MLWritable

Inherited from OpPipelineStageParams

Inherited from InputParams

Inherited from Estimator[SwUnaryModel[OPVector, OPVector, LDAModel]]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

setParam

Ungrouped