Class

com.salesforce.op.stages.impl.feature

SmartTextMapVectorizer

Related Doc: package feature

Permalink

class SmartTextMapVectorizer[T <: OPMap[String]] extends SequenceEstimator[T, OPVector] with PivotParams with CleanTextFun with SaveOthersParams with TrackNullsParam with MinSupportParam with TextTokenizerParams with TrackTextLenParam with HashingVectorizerParams with MapHashingFun with OneHotFun with MapStringPivotHelper with MapVectorizerFuns[String, OPMap[String]] with MaxCardinalityParams with MinLengthStdDevParams with NameDetectFun[Text]

Convert a sequence of text map features into a vector by detecting categoricals that are disguised as text. A categorical will be represented as a vector consisting of occurrences of top K most common values of that feature plus occurrences of non top k values and a null indicator (if enabled). Non-categoricals will be converted into a vector using the hashing trick. In addition, a null indicator is created for each non-categorical (if enabled).

Detection and removal of names in the input columns can be enabled with the sensitiveFeatureMode param.

Linear Supertypes
NameDetectFun[Text], NameDetectParams, MinLengthStdDevParams, MaxCardinalityParams, MapVectorizerFuns[String, OPMap[String]], CleanTextMapFun, MapPivotParams, VectorizerDefaults, MapStringPivotHelper, OneHotFun, UniqueCountFun, MapHashingFun, HashingFun, HashingVectorizerParams, TrackTextLenParam, TextTokenizerParams, TextMatchingParams, LanguageDetectionParams, MinSupportParam, TrackNullsParam, SaveOthersParams, CleanTextFun, PivotParams, TextParams, SequenceEstimator[T, OPVector], OpPipelineStageN[T, OPVector], HasOut[OPVector], HasInN, OpPipelineStage[OPVector], OpPipelineStageBase, MLWritable, OpPipelineStageParams, InputParams, Estimator[SequenceModel[T, OPVector]], PipelineStage, Logging, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SmartTextMapVectorizer
  2. NameDetectFun
  3. NameDetectParams
  4. MinLengthStdDevParams
  5. MaxCardinalityParams
  6. MapVectorizerFuns
  7. CleanTextMapFun
  8. MapPivotParams
  9. VectorizerDefaults
  10. MapStringPivotHelper
  11. OneHotFun
  12. UniqueCountFun
  13. MapHashingFun
  14. HashingFun
  15. HashingVectorizerParams
  16. TrackTextLenParam
  17. TextTokenizerParams
  18. TextMatchingParams
  19. LanguageDetectionParams
  20. MinSupportParam
  21. TrackNullsParam
  22. SaveOthersParams
  23. CleanTextFun
  24. PivotParams
  25. TextParams
  26. SequenceEstimator
  27. OpPipelineStageN
  28. HasOut
  29. HasInN
  30. OpPipelineStage
  31. OpPipelineStageBase
  32. MLWritable
  33. OpPipelineStageParams
  34. InputParams
  35. Estimator
  36. PipelineStage
  37. Logging
  38. Params
  39. Serializable
  40. Serializable
  41. Identifiable
  42. AnyRef
  43. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SmartTextMapVectorizer(uid: String = UID[SmartTextMapVectorizer[T]])(implicit tti: scala.reflect.api.JavaUniverse.TypeTag[T], ttiv: scala.reflect.api.JavaUniverse.TypeTag[Map[String, String]])

    Permalink

    uid

    uid for instance

Type Members

  1. final type InputFeatures = Array[FeatureLike[T]]

    Permalink

    Input Features type

    Input Features type

    Definition Classes
    OpPipelineStageNOpPipelineStageInputParams
  2. type MapMap = Map[String, Map[String, Long]]

    Permalink
    Definition Classes
    MapStringPivotHelper
  3. final type OutputFeatures = FeatureLike[OPVector]

    Permalink
    Definition Classes
    OpPipelineStageOpPipelineStageBase
  4. type SeqMapMap = Seq[utils.spark.SequenceAggregators.MapMap]

    Permalink
    Definition Classes
    MapStringPivotHelper
  5. type SeqSeqTupArr = Seq[Seq[(String, Array[String])]]

    Permalink
    Definition Classes
    MapStringPivotHelper

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. final val allowListKeys: StringArrayParam

    Permalink
    Definition Classes
    MapPivotParams
  6. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  7. final val autoDetectLanguage: BooleanParam

    Permalink

    Indicates whether to attempt language detection.

    Indicates whether to attempt language detection.

    Definition Classes
    LanguageDetectionParams
  8. final val autoDetectThreshold: DoubleParam

    Permalink

    Language detection threshold.

    Language detection threshold. If none of the detected languages have confidence greater than the threshold then defaultLanguage is used.

    Definition Classes
    LanguageDetectionParams
  9. final val binaryFreq: BooleanParam

    Permalink
    Definition Classes
    HashingVectorizerParams
  10. final val blockListKeys: StringArrayParam

    Permalink
    Definition Classes
    MapPivotParams
  11. implicit def booleanToDouble(v: Boolean): Double

    Permalink
    Definition Classes
    VectorizerDefaults
  12. final def checkInputLength(features: Array[_]): Boolean

    Permalink

    Checks the input length

    Checks the input length

    features

    input features

    returns

    true is input size as expected, false otherwise

    Definition Classes
    OpPipelineStageNInputParams
  13. final def checkSerializable: Try[Unit]

    Permalink

    Check if the stage is serializable

    Check if the stage is serializable

    returns

    Failure if not serializable

    Definition Classes
    SequenceEstimatorOpPipelineStageBase
  14. final val cleanKeys: BooleanParam

    Permalink
    Definition Classes
    MapPivotParams
  15. def cleanMap[V](m: Map[String, V], shouldCleanKey: Boolean, shouldCleanValue: Boolean): Map[String, V]

    Permalink
    Definition Classes
    CleanTextMapFun
  16. final val cleanText: BooleanParam

    Permalink
    Definition Classes
    TextParams
  17. def cleanTextFn(s: String, shouldClean: Boolean): String

    Permalink
    Definition Classes
    CleanTextFun
  18. final def clear(param: Param[_]): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    Params
  19. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  20. final def copy(extra: ParamMap): SmartTextMapVectorizer.this.type

    Permalink

    This method is used to make a copy of the instance with new parameters in several methods in spark internals Default will find the constructor and make a copy for any class (AS LONG AS ALL CONSTRUCTOR PARAMS ARE VALS, this is why type tags are written as implicit vals in base classes).

    This method is used to make a copy of the instance with new parameters in several methods in spark internals Default will find the constructor and make a copy for any class (AS LONG AS ALL CONSTRUCTOR PARAMS ARE VALS, this is why type tags are written as implicit vals in base classes).

    Note: that the convention in spark is to have the uid be a constructor argument, so that copies will share a uid with the original (developers should follow this convention).

    extra

    new parameters want to add to instance

    returns

    a new instance with the same uid

    Definition Classes
    OpPipelineStageBase → Params
  21. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  22. def countMapUniques[V](dataset: Dataset[Seq[Map[String, V]]], size: Int, bits: Int)(implicit kryo: KryoSerializer, ct: ClassTag[V]): (Seq[Map[String, HLL]], Long)

    Permalink

    Count unique values of each of the sequence & map key components in the dataset using HyperLogLog HLL

    Count unique values of each of the sequence & map key components in the dataset using HyperLogLog HLL

    V

    value type

    dataset

    dataset to count unique values

    size

    size of each sequence component

    bits

    number of bits for HyperLogLog HLL

    kryo

    kryo serializer to serialize V value into array of bytes

    ct

    class tag of V - needed by kryo

    returns

    HyperLogLog HLL of unique values count for each of the sequence components and total rows count

    Definition Classes
    UniqueCountFun
  23. def countUniques[V](dataset: Dataset[Seq[V]], size: Int, bits: Int)(implicit kryo: KryoSerializer, ct: ClassTag[V]): (Seq[HLL], Long)

    Permalink

    Count unique values of each of the sequence components in the dataset using HyperLogLog HLL

    Count unique values of each of the sequence components in the dataset using HyperLogLog HLL

    V

    value type

    dataset

    dataset to count unique values

    size

    size of each sequence component

    bits

    number of bits for HyperLogLog HLL

    kryo

    kryo serializer to serialize V value into array of bytes

    ct

    class tag of V - needed by kryo

    returns

    HyperLogLog HLL of unique values count for each of the sequence components and total rows count

    Definition Classes
    UniqueCountFun
  24. final val coveragePct: DoubleParam

    Permalink
    Definition Classes
    MaxCardinalityParams
  25. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  26. final val defaultLanguage: Param[String]

    Permalink

    Default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.

    Default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.

    Definition Classes
    LanguageDetectionParams
  27. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  28. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  29. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  30. def explainParams(): String

    Permalink
    Definition Classes
    Params
  31. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  32. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  33. def filterKeys[V](m: Map[String, V], shouldCleanKey: Boolean, shouldCleanValue: Boolean): Map[String, V]

    Permalink
    Attributes
    protected
    Definition Classes
    MapPivotParams
  34. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  35. def fit(dataset: Dataset[_]): SequenceModel[T, OPVector]

    Permalink

    Spark operation on dataset to produce Dataset for constructor fit function and then turn output function into a Model

    Spark operation on dataset to produce Dataset for constructor fit function and then turn output function into a Model

    dataset

    input data for this stage

    returns

    a fitted model that will perform the transformation specified by the function defined in constructor fit

    Definition Classes
    SequenceEstimator → Estimator
  36. def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[SequenceModel[T, OPVector]]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  37. def fit(dataset: Dataset[_], paramMap: ParamMap): SequenceModel[T, OPVector]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" )
  38. def fit(dataset: Dataset[_], firstParamPair: ParamPair[_], otherParamPairs: ParamPair[_]*): SequenceModel[T, OPVector]

    Permalink
    Definition Classes
    Estimator
    Annotations
    @Since( "2.0.0" ) @varargs()
  39. def fitFn(dataset: Dataset[Seq[Map[String, String]]]): SequenceModel[T, OPVector]

    Permalink

    Function that fits the sequence model

    Function that fits the sequence model

    Definition Classes
    SmartTextMapVectorizerSequenceEstimator
  40. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  41. def getAutoDetectLanguage: Boolean

    Permalink
    Definition Classes
    LanguageDetectionParams
  42. def getAutoDetectThreshold: Double

    Permalink
    Definition Classes
    LanguageDetectionParams
  43. def getCategoryMaps[V](in: Dataset[Seq[Map[String, V]]], convertToMapOfMaps: (Map[String, V]) ⇒ MapMap, shouldCleanKeys: Boolean, shouldCleanValues: Boolean): Dataset[SeqMapMap]

    Permalink
    Attributes
    protected
    Definition Classes
    MapStringPivotHelper
  44. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  45. final def getCoveragePct: Double

    Permalink
    Definition Classes
    MaxCardinalityParams
  46. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  47. def getDefaultLanguage: Language

    Permalink
    Definition Classes
    LanguageDetectionParams
  48. def getHashAlgorithm: HashAlgorithm

    Permalink
    Definition Classes
    HashingVectorizerParams
  49. def getHashSpaceStrategy: HashSpaceStrategy

    Permalink
    Definition Classes
    HashingVectorizerParams
  50. final def getInputFeature[T <: FeatureType](i: Int): Option[FeatureLike[T]]

    Permalink

    Gets an input feature Note: this method IS NOT safe to use outside the driver, please use getTransientFeature method instead

    Gets an input feature Note: this method IS NOT safe to use outside the driver, please use getTransientFeature method instead

    returns

    array of features

    Definition Classes
    InputParams
    Exceptions thrown

    NoSuchElementException if the features are not set

    RuntimeException in case one of the features is null

  51. final def getInputFeatures(): Array[OPFeature]

    Permalink

    Gets the input features Note: this method IS NOT safe to use outside the driver, please use getTransientFeatures method instead

    Gets the input features Note: this method IS NOT safe to use outside the driver, please use getTransientFeatures method instead

    returns

    array of features

    Definition Classes
    InputParams
    Exceptions thrown

    NoSuchElementException if the features are not set

    RuntimeException in case one of the features is null

  52. final def getInputSchema(): StructType

    Permalink
    Definition Classes
    OpPipelineStageParams
  53. def getKeyValues(in: Dataset[Seq[Map[String, String]]], shouldCleanKeys: Boolean, shouldCleanValues: Boolean): Seq[Seq[String]]

    Permalink
    Attributes
    protected
    Definition Classes
    MapVectorizerFuns
  54. final def getMaxCardinality: Int

    Permalink
    Definition Classes
    MaxCardinalityParams
  55. final def getMetadata(): Metadata

    Permalink
    Definition Classes
    OpPipelineStageParams
  56. final def getMinLengthStdDev: Double

    Permalink
    Definition Classes
    MinLengthStdDevParams
  57. def getMinTokenLength: Int

    Permalink
    Definition Classes
    TextTokenizerParams
  58. def getNumFeatures(): Int

    Permalink
    Definition Classes
    HashingVectorizerParams
  59. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  60. def getOutput(): FeatureLike[OPVector]

    Permalink

    Output features that will be created by this stage

    Output features that will be created by this stage

    returns

    feature of type OutputFeatures

    Definition Classes
    HasOut → OpPipelineStageBase
  61. final def getOutputFeatureName: String

    Permalink

    Name of output feature (i.e.

    Name of output feature (i.e. column created by this stage)

    Definition Classes
    OpPipelineStage
  62. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  63. def getSensitiveFeatureMode: SensitiveFeatureMode

    Permalink
    Definition Classes
    NameDetectParams
  64. def getStripHtml: Boolean

    Permalink
    Definition Classes
    TextTokenizerParams
  65. def getTextLengthType: TextLengthType

    Permalink
    Definition Classes
    MinLengthStdDevParams
  66. def getToLowercase: Boolean

    Permalink
    Definition Classes
    TextMatchingParams
  67. def getTopValues(categoryMaps: Dataset[SeqMapMap], inputSize: Int, topK: Int, minSup: Int): SeqSeqTupArr

    Permalink
    Attributes
    protected
    Definition Classes
    MapStringPivotHelper
  68. final def getTransientFeature(i: Int): Option[TransientFeature]

    Permalink

    Gets an input feature at index i

    Gets an input feature at index i

    i

    input index

    returns

    maybe an input feature

    Definition Classes
    InputParams
  69. final def getTransientFeatures(): Array[TransientFeature]

    Permalink

    Gets the input Features

    Gets the input Features

    returns

    input features

    Definition Classes
    InputParams
  70. def getUnseenName: String

    Permalink
    Definition Classes
    SaveOthersParams
  71. val guardMaxNumberOfTokens: IntParam

    Permalink
    Definition Classes
    NameDetectParams
  72. val guardMinCountForStdDevCheck: IntParam

    Permalink
    Definition Classes
    NameDetectParams
  73. val guardMinCountForUniqueCheck: IntParam

    Permalink
    Definition Classes
    NameDetectParams
  74. val guardMinStdDev: DoubleParam

    Permalink
    Definition Classes
    NameDetectParams
  75. val guardMinTextLength: IntParam

    Permalink
    Definition Classes
    NameDetectParams
  76. val guardMinUniqueCheck: IntParam

    Permalink
    Definition Classes
    NameDetectParams
  77. val guardPctMaxNumberOfTokens: DoubleParam

    Permalink
    Definition Classes
    NameDetectParams
  78. val guardPctMinTextLength: DoubleParam

    Permalink
    Definition Classes
    NameDetectParams
  79. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  80. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  81. def hash(inputs: Seq[Map[String, TextList]], allKeys: Seq[Seq[String]], params: HashingFunctionParams): OPVector

    Permalink
    Attributes
    protected
    Definition Classes
    MapHashingFun
  82. def hash[T <: OPCollection](in: Seq[T], features: Array[TransientFeature], params: HashingFunctionParams): OPVector

    Permalink

    Hashes input sequence of values into OPVector using the supplied hashing params

    Hashes input sequence of values into OPVector using the supplied hashing params

    Attributes
    protected
    Definition Classes
    HashingFun
  83. final val hashAlgorithm: Param[String]

    Permalink
    Definition Classes
    HashingVectorizerParams
  84. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  85. final val hashSpaceStrategy: Param[String]

    Permalink
    Definition Classes
    HashingVectorizerParams
  86. final val hashWithIndex: BooleanParam

    Permalink
    Definition Classes
    HashingVectorizerParams
  87. def hashingTF(params: HashingFunctionParams): HashingTF

    Permalink

    HashingTF instance

    HashingTF instance

    Attributes
    protected
    Definition Classes
    HashingFun
  88. val ignoreNulls: BooleanParam

    Permalink
    Definition Classes
    NameDetectParams
  89. final def inN: Array[TransientFeature]

    Permalink
    Attributes
    protected
    Definition Classes
    HasInN
  90. def initializeLogIfNecessary(isInterpreter: Boolean, silent: Boolean): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  91. def initializeLogIfNecessary(isInterpreter: Boolean): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  92. final def inputAsArray(in: InputFeatures): Array[OPFeature]

    Permalink

    Function to convert InputFeatures to an Array of FeatureLike

    Function to convert InputFeatures to an Array of FeatureLike

    returns

    an Array of FeatureLike

    Definition Classes
    OpPipelineStageNInputParams
  93. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  94. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  95. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  96. def isSharedHashSpace(p: HashingFunctionParams, numFeatures: Option[Int] = None): Boolean

    Permalink

    Determine if the transformer should use a shared hash space for all features or not

    Determine if the transformer should use a shared hash space for all features or not

    returns

    true if the shared hashing space to be used, false otherwise

    Attributes
    protected
    Definition Classes
    HashingFun
  97. def isTraceEnabled(): Boolean

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  98. def log: Logger

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  99. def logDebug(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  100. def logDebug(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  101. def logError(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  102. def logError(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  103. def logInfo(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  104. def logInfo(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  105. def logName: String

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  106. def logTrace(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  107. def logTrace(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  108. def logWarning(msg: ⇒ String, throwable: Throwable): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  109. def logWarning(msg: ⇒ String): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Logging
  110. def makeOutputVectorMetadata(topValues: SeqSeqTupArr, inputFeatures: Array[TransientFeature], operationName: String, outputName: String, stageName: String, trackNulls: Boolean = false): OpVectorMetadata

    Permalink
    Attributes
    protected
    Definition Classes
    MapStringPivotHelper
  111. def makeSmartTextMapVectorizerModelArgs(aggregatedStats: Array[TextMapStats]): SmartTextMapVectorizerModelArgs

    Permalink
  112. def makeVectorColumnMetadata(topValues: SeqSeqTupArr, inputFeatures: Array[TransientFeature], unseenName: String, trackNulls: Boolean = false): Array[OpVectorColumnMetadata]

    Permalink
    Attributes
    protected
    Definition Classes
    MapStringPivotHelper
  113. def makeVectorColumnMetadata(shouldTrackNulls: Boolean, unseen: Option[String], topValues: Seq[Seq[String]], features: Array[TransientFeature]): Array[OpVectorColumnMetadata]

    Permalink
    Attributes
    protected
    Definition Classes
    OneHotFun
  114. def makeVectorColumnMetadata(hashFeatures: Array[TransientFeature], ignoreFeatures: Array[TransientFeature], params: HashingFunctionParams, hashKeys: Seq[Seq[String]], ignoreKeys: Seq[Seq[String]], shouldTrackNulls: Boolean, shouldTrackLen: Boolean): Array[OpVectorColumnMetadata]

    Permalink
    Attributes
    protected
    Definition Classes
    MapHashingFun
  115. def makeVectorColumnMetadata(features: Array[TransientFeature], params: HashingFunctionParams): Array[OpVectorColumnMetadata]

    Permalink
    Attributes
    protected
    Definition Classes
    HashingFun
  116. def makeVectorMetaWithNullIndicators(allKeys: Seq[Seq[String]]): OpVectorMetadata

    Permalink
    Attributes
    protected
    Definition Classes
    MapVectorizerFuns
  117. def makeVectorMetadata(allKeys: Seq[Seq[String]]): OpVectorMetadata

    Permalink
    Attributes
    protected
    Definition Classes
    MapVectorizerFuns
  118. def makeVectorMetadata(shouldTrackNulls: Boolean, unseen: Option[String], topValues: Seq[Seq[String]], outputName: String, features: Array[TransientFeature], stageName: String): OpVectorMetadata

    Permalink
    Attributes
    protected
    Definition Classes
    OneHotFun
  119. def makeVectorMetadata(features: Array[TransientFeature], params: HashingFunctionParams, outputName: String): OpVectorMetadata

    Permalink
    Attributes
    protected
    Definition Classes
    HashingFun
  120. final val maxCardinality: IntParam

    Permalink
    Definition Classes
    MaxCardinalityParams
  121. final val minLengthStdDev: DoubleParam

    Permalink
    Definition Classes
    MinLengthStdDevParams
  122. final val minSupport: IntParam

    Permalink
    Definition Classes
    MinSupportParam
  123. final val minTokenLength: IntParam

    Permalink

    Minimum token length, >= 1.

    Minimum token length, >= 1.

    Definition Classes
    TextTokenizerParams
  124. val nameThreshold: DoubleParam

    Permalink
    Definition Classes
    NameDetectParams
  125. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  126. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  127. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  128. final val numFeatures: IntParam

    Permalink
    Definition Classes
    HashingVectorizerParams
  129. def onGetMetadata(): Unit

    Permalink

    Function to be called on getMetadata

    Function to be called on getMetadata

    Attributes
    protected
    Definition Classes
    OpPipelineStageParams
  130. def onSetInput(): Unit

    Permalink

    Function to be called on setInput

    Function to be called on setInput

    Definition Classes
    VectorizerDefaultsInputParams
  131. val operationName: String

    Permalink

    unique name of the operation this stage performs

    unique name of the operation this stage performs

    Definition Classes
    SequenceEstimatorOpPipelineStageBase
  132. final def outputAsArray(out: OutputFeatures): Array[OPFeature]

    Permalink

    Function to convert OutputFeatures to an Array of FeatureLike

    Function to convert OutputFeatures to an Array of FeatureLike

    returns

    an Array of FeatureLike

    Definition Classes
    OpPipelineStageOpPipelineStageBase
  133. def outputFeatureUid: String

    Permalink
    Attributes
    protected[com.salesforce.op]
    Definition Classes
    OpPipelineStageNOpPipelineStage
  134. def outputIsResponse: Boolean

    Permalink

    Should output feature be a response? Yes, if any of the input features are.

    Should output feature be a response? Yes, if any of the input features are.

    returns

    true if the the output feature should be a response

    Definition Classes
    OpPipelineStage
  135. def outputVectorMeta: OpVectorMetadata

    Permalink

    Get the metadata describing the output vector

    Get the metadata describing the output vector

    This does not trigger onGetMetadata()

    returns

    Metadata of output vector

    Attributes
    protected
    Definition Classes
    VectorizerDefaults
  136. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  137. def prepare[T <: OPCollection](el: T, shouldHashWithIndex: Boolean, shouldPrependFeatureName: Boolean, featureNameHash: Int): Iterable[Any]

    Permalink

    Function that prepares the input columns to be hashed Note that MurMur3 hashing algorithm only defined for primitive types so need to convert tuples to strings.

    Function that prepares the input columns to be hashed Note that MurMur3 hashing algorithm only defined for primitive types so need to convert tuples to strings. MultiPickList sets are hashed as is since there is no meaningful order in the selected choices. Lists and vectors can be hashed with or without their indices, since order may be important. Maps are hashed as (key,value) strings.

    el

    element we are hashing (eg. an OPList, OPMap, etc.)

    returns

    an Iterable object corresponding to the hashed element

    Attributes
    protected
    Definition Classes
    HashingFun
  138. final val prependFeatureName: BooleanParam

    Permalink
    Definition Classes
    HashingVectorizerParams
  139. def save(path: String): Unit

    Permalink
    Definition Classes
    MLWritable
    Annotations
    @Since( "1.6.0" ) @throws( ... )
  140. val sensitiveFeatureMode: Param[String]

    Permalink
    Definition Classes
    NameDetectParams
  141. val seqIConvert: FeatureTypeSparkConverter[T]

    Permalink
    Definition Classes
    SequenceEstimator
  142. implicit val seqIEncoder: Encoder[Seq[T.Value]]

    Permalink
    Definition Classes
    SequenceEstimator
  143. final def set(paramPair: ParamPair[_]): SmartTextMapVectorizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  144. final def set(param: String, value: Any): SmartTextMapVectorizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  145. final def set[T](param: Param[T], value: T): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    Params
  146. final def setAllowListKeys(keys: Array[String]): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    MapPivotParams
  147. def setAutoDetectLanguage(value: Boolean): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    LanguageDetectionParams
  148. def setAutoDetectThreshold(value: Double): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    LanguageDetectionParams
  149. def setBinaryFreq(v: Boolean): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    HashingVectorizerParams
  150. final def setBlockListKeys(keys: Array[String]): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    MapPivotParams
  151. def setCleanKeys(clean: Boolean): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    MapPivotParams
  152. def setCleanText(clean: Boolean): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    TextParams
  153. final def setCoveragePct(v: Double): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    MaxCardinalityParams
  154. final def setDefault(paramPairs: ParamPair[_]*): SmartTextMapVectorizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  155. final def setDefault[T](param: Param[T], value: T): SmartTextMapVectorizer.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  156. def setDefaultLanguage(value: Language): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    LanguageDetectionParams
  157. def setGuardCheckValues(maxNumberOfTokens: Int = $(guardMaxNumberOfTokens), pctMaxNumberOfTokens: Double = $(guardPctMaxNumberOfTokens), minTextLength: Int = $(guardMinTextLength), pctMinTextLength: Double = $(guardPctMinTextLength), minCountForStdDevCheck: Int = $(guardMinCountForStdDevCheck), minStdDev: Double = $(guardMinStdDev), minCountForUniqueCheck: Int = $(guardMinCountForUniqueCheck), minUniqueCheck: Int = $(guardMinUniqueCheck)): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    NameDetectParams
  158. def setHashAlgorithm(h: HashAlgorithm): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    HashingVectorizerParams
  159. def setHashSpaceStrategy(v: HashSpaceStrategy): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    HashingVectorizerParams
  160. def setHashWithIndex(v: Boolean): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    HashingVectorizerParams
  161. def setIgnoreNulls(value: Boolean): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    NameDetectParams
  162. final def setInput(features: FeatureLike[T]*): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    OpPipelineStageN
  163. final def setInput(features: InputFeatures): SmartTextMapVectorizer.this.type

    Permalink

    Input features that will be used by the stage

    Input features that will be used by the stage

    returns

    feature of type InputFeatures

    Definition Classes
    OpPipelineStageBase
  164. final def setInputFeatures[S <: OPFeature](features: Array[S]): SmartTextMapVectorizer.this.type

    Permalink

    Sets input features

    Sets input features

    S

    feature like type

    features

    array of input features

    returns

    this stage

    Attributes
    protected
    Definition Classes
    InputParams
  165. final def setMaxCardinality(v: Int): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    MaxCardinalityParams
  166. final def setMetadata(m: Metadata): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    OpPipelineStageParams
  167. final def setMinLengthStdDev(v: Double): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    MinLengthStdDevParams
  168. def setMinSupport(min: Int): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    MinSupportParam
  169. def setMinTokenLength(value: Int): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    TextTokenizerParams
  170. def setNumFeatures(v: Int): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    HashingVectorizerParams
  171. def setOutputFeatureName(name: String): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    OpPipelineStage
  172. def setPrependFeatureName(v: Boolean): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    HashingVectorizerParams
  173. def setSensitiveFeatureMode(v: SensitiveFeatureMode): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    NameDetectParams
  174. def setStripHtml(value: Boolean): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    TextTokenizerParams
  175. def setTextLengthType(v: TextLengthType): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    MinLengthStdDevParams
  176. def setThreshold(value: Double): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    NameDetectParams
  177. def setToLowercase(value: Boolean): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    TextMatchingParams
  178. def setTopK(numberToKeep: Int): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    PivotParams
  179. def setTrackNulls(v: Boolean): SmartTextMapVectorizer.this.type

    Permalink

    Option to keep track of values that were missing

    Option to keep track of values that were missing

    Definition Classes
    TrackNullsParam
  180. def setTrackTextLen(v: Boolean): SmartTextMapVectorizer.this.type

    Permalink

    Option to keep track of text lengths

    Option to keep track of text lengths

    Definition Classes
    TrackTextLenParam
  181. def setUnseenName(unseenNameIn: String): SmartTextMapVectorizer.this.type

    Permalink
    Definition Classes
    SaveOthersParams
  182. def shouldRemoveSensitive: Boolean

    Permalink
    Definition Classes
    NameDetectParams
  183. final def stageName: String

    Permalink

    Stage unique name consisting of the stage operation name and uid

    Stage unique name consisting of the stage operation name and uid

    returns

    stage name

    Definition Classes
    OpPipelineStageBase
  184. final val stripHtml: BooleanParam

    Permalink
    Definition Classes
    TextTokenizerParams
  185. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  186. final val textLengthType: Param[String]

    Permalink
    Definition Classes
    MinLengthStdDevParams
  187. final val toLowercase: BooleanParam

    Permalink

    Indicates whether to convert all characters to lowercase before string operation.

    Indicates whether to convert all characters to lowercase before string operation.

    Definition Classes
    TextMatchingParams
  188. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  189. def tokenize(text: Text, languageDetector: LanguageDetector = TextTokenizer.LanguageDetector, analyzer: TextAnalyzer = ...): TextTokenizerResult

    Permalink
    Definition Classes
    TextTokenizerParams
  190. final val topK: IntParam

    Permalink
    Definition Classes
    PivotParams
  191. final val trackNulls: BooleanParam

    Permalink
    Definition Classes
    TrackNullsParam
  192. final val trackTextLen: BooleanParam

    Permalink
    Definition Classes
    TrackTextLenParam
  193. final def transformSchema(schema: StructType): StructType

    Permalink

    This function translates the input and output features into spark schema checks and changes that will occur on the underlying data frame

    This function translates the input and output features into spark schema checks and changes that will occur on the underlying data frame

    schema

    schema of the input data frame

    returns

    a new schema with the output features added

    Definition Classes
    OpPipelineStageBase
  194. def transformSchema(schema: StructType, logging: Boolean): StructType

    Permalink
    Attributes
    protected
    Definition Classes
    PipelineStage
    Annotations
    @DeveloperApi()
  195. implicit val tti: scala.reflect.api.JavaUniverse.TypeTag[T]

    Permalink

    type tag for input

    type tag for input

    Definition Classes
    SequenceEstimator
  196. implicit val ttiv: scala.reflect.api.JavaUniverse.TypeTag[T.Value]

    Permalink

    type tag for input value

    type tag for input value

    Definition Classes
    SequenceEstimator
  197. implicit val tto: scala.reflect.api.JavaUniverse.TypeTag[OPVector]

    Permalink

    type tag for input

    type tag for input

    Definition Classes
    SequenceEstimator → HasOut
  198. implicit val ttov: scala.reflect.api.JavaUniverse.TypeTag[Value]

    Permalink

    type tag for output value

    type tag for output value

    Definition Classes
    SequenceEstimator → HasOut
  199. val uid: String

    Permalink

    uid for instance

    uid for instance

    Definition Classes
    SequenceEstimator → Identifiable
  200. final val unseenName: Param[String]

    Permalink
    Definition Classes
    SaveOthersParams
  201. def vectorMetadataFromInputFeatures: OpVectorMetadata

    Permalink

    Compute the output vector metadata only from the input features.

    Compute the output vector metadata only from the input features. Vectorizers use this to derive the full vector, including pivot columns or indicator features.

    returns

    Vector metadata from input features

    Attributes
    protected
    Definition Classes
    VectorizerDefaults
  202. def vectorMetadataWithNullIndicators: OpVectorMetadata

    Permalink
    Attributes
    protected
    Definition Classes
    VectorizerDefaults
  203. def vectorOutputName: String

    Permalink

    Get the name of the output vector

    Get the name of the output vector

    returns

    Output vector name as a string

    Attributes
    protected
    Definition Classes
    VectorizerDefaults
  204. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  205. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  206. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  207. final def write: MLWriter

    Permalink
    Definition Classes
    OpPipelineStageBase → MLWritable

Inherited from NameDetectFun[Text]

Inherited from NameDetectParams

Inherited from MinLengthStdDevParams

Inherited from MaxCardinalityParams

Inherited from MapVectorizerFuns[String, OPMap[String]]

Inherited from CleanTextMapFun

Inherited from MapPivotParams

Inherited from VectorizerDefaults

Inherited from MapStringPivotHelper

Inherited from OneHotFun

Inherited from UniqueCountFun

Inherited from MapHashingFun

Inherited from HashingFun

Inherited from HashingVectorizerParams

Inherited from TrackTextLenParam

Inherited from TextTokenizerParams

Inherited from TextMatchingParams

Inherited from LanguageDetectionParams

Inherited from MinSupportParam

Inherited from TrackNullsParam

Inherited from SaveOthersParams

Inherited from CleanTextFun

Inherited from PivotParams

Inherited from TextParams

Inherited from SequenceEstimator[T, OPVector]

Inherited from OpPipelineStageN[T, OPVector]

Inherited from HasOut[OPVector]

Inherited from HasInN

Inherited from OpPipelineStage[OPVector]

Inherited from OpPipelineStageBase

Inherited from MLWritable

Inherited from OpPipelineStageParams

Inherited from InputParams

Inherited from Estimator[SequenceModel[T, OPVector]]

Inherited from PipelineStage

Inherited from Logging

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

Ungrouped