OpLDA

Instance Constructors

new OpLDA(uid: String = UID[OpLDA])

Type Members

final type InputFeatures = FeatureLike[OPVector]

Input Features type

Input Features type

Definition Classes

OpPipelineStage1 → OpPipelineStage → InputParams
final type OutputFeatures = FeatureLike[OPVector]

Definition Classes

OpPipelineStage → OpPipelineStageBase

Value Members

final def !=(arg0: Any): Boolean

Definition Classes

AnyRef → Any
final def ##(): Int

Definition Classes

AnyRef → Any
final def $[T](param: Param[T]): T

Attributes

protected

Definition Classes

Params
final def ==(arg0: Any): Boolean

Definition Classes

AnyRef → Any
final def asInstanceOf[T0]: T0

Definition Classes

Any
final def checkInputLength(features: Array[_]): Boolean

Checks the input length

Checks the input length

features

input features

returns

true is input size as expected, false otherwise

Definition Classes

OpPipelineStage1 → InputParams
def checkSerializable: Try[Unit]

Check if the stage is serializable

Check if the stage is serializable

returns

Failure if not serializable

Definition Classes

OpPipelineStageBase
final def clear(param: Param[_]): OpLDA.this.type

Definition Classes

Params
def clone(): AnyRef

Attributes

protected[java.lang]

Definition Classes

AnyRef

Annotations

@throws( ... )
final def copy(extra: ParamMap): OpLDA.this.type

This method is used to make a copy of the instance with new parameters in several methods in spark internals Default will find the constructor and make a copy for any class (AS LONG AS ALL CONSTRUCTOR PARAMS ARE VALS, this is why type tags are written as implicit vals in base classes).

This method is used to make a copy of the instance with new parameters in several methods in spark internals Default will find the constructor and make a copy for any class (AS LONG AS ALL CONSTRUCTOR PARAMS ARE VALS, this is why type tags are written as implicit vals in base classes).

Note: that the convention in spark is to have the uid be a constructor argument, so that copies will share a uid with the original (developers should follow this convention).

extra

new parameters want to add to instance

returns

a new instance with the same uid

Definition Classes

OpPipelineStageBase → Params
def copyValues[T <: Params](to: T, extra: ParamMap): T

Attributes

protected

Definition Classes

Params
final def defaultCopy[T <: Params](extra: ParamMap): T

Attributes

protected

Definition Classes

Params
final def eq(arg0: AnyRef): Boolean

Definition Classes

AnyRef
def equals(arg0: Any): Boolean

Definition Classes

AnyRef → Any
val estimator: LDA

the estimator to wrap

the estimator to wrap

Definition Classes

OpEstimatorWrapper
def explainParam(param: Param[_]): String

Definition Classes

Params
def explainParams(): String

Definition Classes

Params
final def extractParamMap(): ParamMap

Definition Classes

Params
final def extractParamMap(extra: ParamMap): ParamMap

Definition Classes

Params
def finalize(): Unit

Attributes

protected[java.lang]

Definition Classes

AnyRef

Annotations

@throws( classOf[java.lang.Throwable] )
def fit(dataset: Dataset[_]): SwUnaryModel[OPVector, OPVector, LDAModel]

Definition Classes

SwUnaryEstimator → Estimator
def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[SwUnaryModel[OPVector, OPVector, LDAModel]]

Definition Classes

Estimator

Annotations

@Since( "2.0.0" )
def fit(dataset: Dataset[_], paramMap: ParamMap): SwUnaryModel[OPVector, OPVector, LDAModel]

Definition Classes

Estimator

Annotations

@Since( "2.0.0" )
def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): SwUnaryModel[OPVector, OPVector, LDAModel]

Definition Classes

Estimator

Annotations

@Since( "2.0.0" ) @varargs()
final def get[T](param: Param[T]): Option[T]

Definition Classes

Params
final def getClass(): Class[_]

Definition Classes

AnyRef → Any
final def getDefault[T](param: Param[T]): Option[T]

Definition Classes

Params
def getInputColParamNames(): Array[String]

Gets names of parameters that control input columns for Spark stage

Gets names of parameters that control input columns for Spark stage

Definition Classes

SparkWrapperParams
final def getInputFeature[T <: FeatureType](i: Int): Option[FeatureLike[T]]

Gets an input feature Note: this method IS NOT safe to use outside the driver, please use getTransientFeature method instead

Gets an input feature Note: this method IS NOT safe to use outside the driver, please use getTransientFeature method instead

returns

array of features

Definition Classes

InputParams

Exceptions thrown

NoSuchElementException if the features are not set

RuntimeException in case one of the features is null
final def getInputFeatures(): Array[OPFeature]

Gets the input features Note: this method IS NOT safe to use outside the driver, please use getTransientFeatures method instead

Gets the input features Note: this method IS NOT safe to use outside the driver, please use getTransientFeatures method instead

returns

array of features

Definition Classes

InputParams

Exceptions thrown

NoSuchElementException if the features are not set

RuntimeException in case one of the features is null
final def getInputSchema(): StructType

Definition Classes

OpPipelineStageParams
def getLocalMlStage(): Option[Transformer]

Method to access the local version of stage being wrapped

Method to access the local version of stage being wrapped

returns

Option of ml leap runtime version of the spark stage after reloading as local

Definition Classes

SparkWrapperParams
final def getMetadata(): Metadata

Definition Classes

OpPipelineStageParams
final def getOrDefault[T](param: Param[T]): T

Definition Classes

Params
def getOutput(): FeatureLike[OPVector]

Output features that will be created by this stage

Output features that will be created by this stage

returns

feature of type OutputFeatures

Definition Classes

HasOut → OpPipelineStageBase
def getOutputColParamNames(): Array[String]

Gets names of parameters that control output columns for Spark stage

Gets names of parameters that control output columns for Spark stage

Definition Classes

SparkWrapperParams
final def getOutputFeatureName: String

Name of output feature (i.e.

Name of output feature (i.e. column created by this stage)

Definition Classes

OpPipelineStage
def getParam(paramName: String): Param[Any]

Definition Classes

Params
def getSparkMlStage(): Option[LDA]

Method to access the spark stage being wrapped

Method to access the spark stage being wrapped

returns

Option of spark ml stage

Definition Classes

SparkWrapperParams
def getStageSavePath(): Option[String]

Gets a save path for wrapped spark stage

Gets a save path for wrapped spark stage

Definition Classes

SparkWrapperParams
final def getTransientFeature(i: Int): Option[TransientFeature]

Gets an input feature at index i

Gets an input feature at index i

i

input index

returns

maybe an input feature

Definition Classes

InputParams
final def getTransientFeatures(): Array[TransientFeature]

Gets the input Features

Gets the input Features

returns

input features

Definition Classes

InputParams
final def hasDefault[T](param: Param[T]): Boolean

Definition Classes

Params
def hasParam(paramName: String): Boolean

Definition Classes

Params
def hashCode(): Int

Definition Classes

AnyRef → Any
final def in1: TransientFeature

Attributes

protected

Definition Classes

HasIn1
def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

Attributes

protected

Definition Classes

Logging
def initializeLogIfNecessary(isInterpreter: Boolean): Unit

Attributes

protected

Definition Classes

Logging
final def inputAsArray(in: InputFeatures): Array[OPFeature]

Function to convert InputFeatures to an Array of FeatureLike

Function to convert InputFeatures to an Array of FeatureLike

returns

an Array of FeatureLike

Definition Classes

OpPipelineStage1 → InputParams
val inputParamName: String

Definition Classes

OpLDA → SwUnaryEstimator
final def isDefined(param: Param[_]): Boolean

Definition Classes

Params
final def isInstanceOf[T0]: Boolean

Definition Classes

Any
final def isSet(param: Param[_]): Boolean

Definition Classes

Params
def isTraceEnabled(): Boolean

Attributes

protected

Definition Classes

Logging
def log: Logger

Attributes

protected

Definition Classes

Logging
def logDebug(msg: ⇒ String, throwable: Throwable): Unit

Attributes

protected

Definition Classes

Logging
def logDebug(msg: ⇒ String): Unit

Attributes

protected

Definition Classes

Logging
def logError(msg: ⇒ String, throwable: Throwable): Unit

Attributes

protected

Definition Classes

Logging
def logError(msg: ⇒ String): Unit

Attributes

protected

Definition Classes

Logging
def logInfo(msg: ⇒ String, throwable: Throwable): Unit

Attributes

protected

Definition Classes

Logging
def logInfo(msg: ⇒ String): Unit

Attributes

protected

Definition Classes

Logging
def logName: String

Attributes

protected

Definition Classes

Logging
def logTrace(msg: ⇒ String, throwable: Throwable): Unit

Attributes

protected

Definition Classes

Logging
def logTrace(msg: ⇒ String): Unit

Attributes

protected

Definition Classes

Logging
def logWarning(msg: ⇒ String, throwable: Throwable): Unit

Attributes

protected

Definition Classes

Logging
def logWarning(msg: ⇒ String): Unit

Attributes

protected

Definition Classes

Logging
final def ne(arg0: AnyRef): Boolean

Definition Classes

AnyRef
final def notify(): Unit

Definition Classes

AnyRef
final def notifyAll(): Unit

Definition Classes

AnyRef
def onGetMetadata(): Unit

Function to be called on getMetadata

Function to be called on getMetadata

Attributes

protected

Definition Classes

OpPipelineStageParams
def onSetInput(): Unit

Function to be called on setInput

Function to be called on setInput

Attributes

protected

Definition Classes

InputParams
val operationName: String

unique name of the operation this stage performs

unique name of the operation this stage performs

Definition Classes

SwUnaryEstimator → OpPipelineStageBase
final def outputAsArray(out: OutputFeatures): Array[OPFeature]

Function to convert OutputFeatures to an Array of FeatureLike

Function to convert OutputFeatures to an Array of FeatureLike

returns

an Array of FeatureLike

Definition Classes

OpPipelineStage → OpPipelineStageBase
def outputFeatureUid: String

Attributes

protected[com.salesforce.op]

Definition Classes

OpPipelineStage1 → OpPipelineStage
def outputIsResponse: Boolean

Should output feature be a response? Yes, if any of the input features are.

Should output feature be a response? Yes, if any of the input features are.

returns

true if the the output feature should be a response

Definition Classes

OpPipelineStage
val outputParamName: String

Definition Classes

OpLDA → SwUnaryEstimator
lazy val params: Array[Param[_]]

Definition Classes

Params
def save(path: String): Unit

Definition Classes

MLWritable

Annotations

@Since( "1.6.0" ) @throws( ... )
final def set(paramPair: ParamPair[_]): OpLDA.this.type

Attributes

protected

Definition Classes

Params
final def set(param: String, value: Any): OpLDA.this.type

Attributes

protected

Definition Classes

Params
final def set[T](param: Param[T], value: T): OpLDA.this.type

Definition Classes

Params
def setCheckpointInterval(value: Int): OpLDA.this.type

Set param for checkpoint interval (>= 1) or disable checkpoint (-1).

Set param for checkpoint interval (>= 1) or disable checkpoint (-1). E.g. 10 means that the cache will get checkpointed every 10 iterations.
final def setDefault(paramPairs: ParamPair[_]*): OpLDA.this.type

Attributes

protected

Definition Classes

Params
final def setDefault[T](param: Param[T], value: T): OpLDA.this.type

Attributes

protected

Definition Classes

Params
def setDocConcentation(value: Array[Double]): OpLDA.this.type

Set param for concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta").
Set param for concentration parameter (commonly named "alpha") for the prior placed on documents' distributions over topics ("theta").

This is the parameter to a Dirichlet distribution, where larger values mean more smoothing (more regularization).

If not set by the user, then docConcentration is set automatically. If set to singleton vector [alpha], then alpha is replicated to a vector of length k in fitting. Otherwise, the docConcentration vector must be length k. (default = automatic)

Optimizer-specific parameter settings:
- EM
  - Currently only supports symmetric distributions, so all values in the vector should be the same.
  - Values should be > 1.0
  - default = uniformly (50 / k) + 1, where 50/k is common in LDA libraries and +1 follows from Asuncion et al. (2009), who recommend a +1 adjustment for EM.
- Online
  - Values should be >= 0
  - default = uniformly (1.0 / k), following the implementation from https://github.com/Blei-Lab/onlineldavb.
def setDocConcentration(value: Double): OpLDA.this.type
final def setInput(features: InputFeatures): OpLDA.this.type

Input features that will be used by the stage

Input features that will be used by the stage

returns

feature of type InputFeatures

Definition Classes

OpPipelineStageBase
final def setInputFeatures[S <: OPFeature](features: Array[S]): OpLDA.this.type

Sets input features

Sets input features

S

feature like type

features

array of input features

returns

this stage

Attributes

protected

Definition Classes

InputParams
def setK(value: Int): OpLDA.this.type

Set param for number of topics (clusters) to infer.

Set param for number of topics (clusters) to infer. Must be > 1. Default: 10.
def setMaxIter(value: Int): OpLDA.this.type

Set param for maximum number of iterations (>= 0).

Set param for maximum number of iterations (>= 0). Default: 20
final def setMetadata(m: Metadata): OpLDA.this.type

Definition Classes

OpPipelineStageParams
def setOptimizer(value: String): OpLDA.this.type

Set param for optimizer or inference algorithm used to estimate the LDA model.
Set param for optimizer or inference algorithm used to estimate the LDA model.

Currently supported (case-insensitive):
- "online": Online Variational Bayes (default)
- "em": Expectation-Maximization
For details, see the following papers:
- Online LDA: Hoffman, Blei and Bach. "Online Learning for Latent Dirichlet Allocation." Neural Information Processing Systems, 2010. http://www.cs.columbia.edu/~blei/papers/HoffmanBleiBach2010b.pdf
- EM: Asuncion et al. "On Smoothing and Inference for Topic Models." Uncertainty in Artificial Intelligence, 2009. http://arxiv.org/pdf/1205.2662.pdf
def setOutputDF(df: DataFrame): Unit

Definition Classes

SparkWrapperParams
def setOutputFeatureName(name: String): OpLDA.this.type

Definition Classes

OpPipelineStage
def setSeed(value: Long): OpLDA.this.type

Set param for random seed.
def setSparkMlStage(stage: Option[LDA]): OpLDA.this.type

Attributes

protected

Definition Classes

SparkWrapperParams
def setStageSavePath(path: String): OpLDA.this.type

Sets a save path for wrapped spark stage

Sets a save path for wrapped spark stage

Definition Classes

SparkWrapperParams
def setSubsamplingRate(value: Double): OpLDA.this.type

For Online optimizer only: optimizer = "online".

For Online optimizer only: optimizer = "online".

Set param for fraction of the corpus to be sampled and used in each iteration of mini-batch gradient descent, in range (0, 1].

Note that this should be adjusted in synch with LDA.maxIter so the entire corpus is used. Specifically, set both so that maxIterations * miniBatchFraction >= 1.

Note: This is the same as the miniBatchFraction parameter in org.apache.spark.mllib.clustering.OnlineLDAOptimizer.

Default: 0.05, i.e., 5% of total documents.
def setTopicConcentration(value: Double): OpLDA.this.type

Set param for concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms.
Set param for concentration parameter (commonly named "beta" or "eta") for the prior placed on topics' distributions over terms.

This is the parameter to a symmetric Dirichlet distribution.

Note: The topics' distributions over terms are called "beta" in the original LDA paper by Blei et al., but are called "phi" in many later papers such as Asuncion et al., 2009.

If not set by the user, then topicConcentration is set automatically. (default = automatic)

Optimizer-specific parameter settings:
- EM
  - Value should be > 1.0
  - default = 0.1 + 1, where 0.1 gives a small amount of smoothing and +1 follows Asuncion et al. (2009), who recommend a +1 adjustment for EM.
- Online
  - Value should be >= 0
  - default = (1.0 / k), following the implementation from https://github.com/Blei-Lab/onlineldavb.
final val sparkInputColParamNames: StringArrayParam

Definition Classes

SparkWrapperParams
final val sparkMlStage: SparkStageParam[LDA]

Definition Classes

SparkWrapperParams
final val sparkOutputColParamNames: StringArrayParam

Definition Classes

SparkWrapperParams
final def stageName: String

Stage unique name consisting of the stage operation name and uid

Stage unique name consisting of the stage operation name and uid

returns

stage name

Definition Classes

OpPipelineStageBase
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes

AnyRef
def toString(): String

Definition Classes

Identifiable → AnyRef → Any
final def transformSchema(schema: StructType): StructType

This function translates the input and output features into spark schema checks and changes that will occur on the underlying data frame

This function translates the input and output features into spark schema checks and changes that will occur on the underlying data frame

schema

schema of the input data frame

returns

a new schema with the output features added

Definition Classes

OpPipelineStageBase
def transformSchema(schema: StructType, logging: Boolean): StructType

Attributes

protected

Definition Classes

PipelineStage

Annotations

@DeveloperApi()
implicit val tti: scala.reflect.api.JavaUniverse.TypeTag[OPVector]

type tag for input

type tag for input

Definition Classes

SwUnaryEstimator
implicit val tto: scala.reflect.api.JavaUniverse.TypeTag[OPVector]

type tag for output

type tag for output

Definition Classes

SwUnaryEstimator → HasOut
implicit val ttov: scala.reflect.api.JavaUniverse.TypeTag[Value]

type tag for output value

type tag for output value

Definition Classes

SwUnaryEstimator → HasOut
val uid: String

stage uid

stage uid

Definition Classes

SwUnaryEstimator → Identifiable
final def wait(): Unit

Definition Classes

AnyRef

Annotations

@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes

AnyRef

Annotations

@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes

AnyRef

Annotations

@throws( ... )
final def write: MLWriter

Definition Classes

OpPipelineStageBase → MLWritable

Related Doc: package feature

class OpLDA extends OpEstimatorWrapper[OPVector, OPVector, LDA, LDAModel]

Instance Constructors

new OpLDA(uid: String = UID[OpLDA])

Type Members

final type InputFeatures = FeatureLike[OPVector]

final type OutputFeatures = FeatureLike[OPVector]

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def $[T](param: Param[T]): T

final def ==(arg0: Any): Boolean

final def asInstanceOf[T0]: T0

final def checkInputLength(features: Array[_]): Boolean

def checkSerializable: Try[Unit]

final def clear(param: Param[_]): OpLDA.this.type

def clone(): AnyRef

final def copy(extra: ParamMap): OpLDA.this.type

def copyValues[T <: Params](to: T, extra: ParamMap): T

final def defaultCopy[T <: Params](extra: ParamMap): T

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

val estimator: LDA

def explainParam(param: Param[_]): String

def explainParams(): String

final def extractParamMap(): ParamMap

final def extractParamMap(extra: ParamMap): ParamMap

def finalize(): Unit

def fit(dataset: Dataset[_]): SwUnaryModel[OPVector, OPVector, LDAModel]

def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[SwUnaryModel[OPVector, OPVector, LDAModel]]

def fit(dataset: Dataset[_], paramMap: ParamMap): SwUnaryModel[OPVector, OPVector, LDAModel]

def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): SwUnaryModel[OPVector, OPVector, LDAModel]

final def get[T](param: Param[T]): Option[T]

final def getClass(): Class[_]

final def getDefault[T](param: Param[T]): Option[T]

def getInputColParamNames(): Array[String]

final def getInputFeature[T <: FeatureType](i: Int): Option[FeatureLike[T]]

final def getInputFeatures(): Array[OPFeature]

final def getInputSchema(): StructType

def getLocalMlStage(): Option[Transformer]

final def getMetadata(): Metadata

final def getOrDefault[T](param: Param[T]): T

def getOutput(): FeatureLike[OPVector]

def getOutputColParamNames(): Array[String]

final def getOutputFeatureName: String

def getParam(paramName: String): Param[Any]

def getSparkMlStage(): Option[LDA]

def getStageSavePath(): Option[String]

final def getTransientFeature(i: Int): Option[TransientFeature]

final def getTransientFeatures(): Array[TransientFeature]

final def hasDefault[T](param: Param[T]): Boolean

def hasParam(paramName: String): Boolean

def hashCode(): Int

final def in1: TransientFeature

def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

def initializeLogIfNecessary(isInterpreter: Boolean): Unit

final def inputAsArray(in: InputFeatures): Array[OPFeature]

val inputParamName: String

final def isDefined(param: Param[_]): Boolean

final def isInstanceOf[T0]: Boolean

final def isSet(param: Param[_]): Boolean

def isTraceEnabled(): Boolean

def log: Logger

def logDebug(msg: ⇒ String, throwable: Throwable): Unit

def logDebug(msg: ⇒ String): Unit

def logError(msg: ⇒ String, throwable: Throwable): Unit

def logError(msg: ⇒ String): Unit

def logInfo(msg: ⇒ String, throwable: Throwable): Unit

def logInfo(msg: ⇒ String): Unit

def logName: String

def logTrace(msg: ⇒ String, throwable: Throwable): Unit

def logTrace(msg: ⇒ String): Unit

def logWarning(msg: ⇒ String, throwable: Throwable): Unit

def logWarning(msg: ⇒ String): Unit

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def onGetMetadata(): Unit

def onSetInput(): Unit