Class

com.salesforce.op.dsl.RichTextFeature

RichTextFeature

Related Doc: package RichTextFeature

Permalink

implicit class RichTextFeature[T <: Text] extends AnyRef

Linear Supertypes
AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RichTextFeature
  2. AnyRef
  3. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RichTextFeature(f: FeatureLike[T])(implicit arg0: scala.reflect.api.JavaUniverse.TypeTag[T], ttiv: scala.reflect.api.JavaUniverse.TypeTag[Option[String]])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def detectLanguages(languageDetector: LanguageDetector = LangDetector.DefaultDetector): FeatureLike[RealMap]

    Permalink

    Detect the language of the text

    Detect the language of the text

    languageDetector

    a language detector instance

    returns

    real map feature containing the detected languages with confidence scores. Confidence score is range of [0.0, 1.0], with higher values implying greater confidence.

  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  9. val f: FeatureLike[T]

    Permalink
  10. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  11. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  12. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  13. def indexed(unseenName: String = ..., handleInvalid: StringIndexerHandleInvalid = StringIndexerHandleInvalid.NoFilter): FeatureLike[RealNN]

    Permalink

    Apply OpStringIndexerNoFilter estimator.

    Apply OpStringIndexerNoFilter estimator.

    A label indexer that maps a text column of labels to an ML feature of label indices. The indices are in [0, numLabels), ordered by label frequencies. So the most frequent label gets index 0.

    unseenName

    name to give strings that appear in transform but not in fit

    handleInvalid

    how to transform values not seen in fitting

    returns

    indexed real feature

    See also

    OpIndexToString for the inverse transformation

  14. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  15. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  16. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  17. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  18. def pivot(others: Array[FeatureLike[T]] = Array.empty, topK: Int = TransmogrifierDefaults.TopK, minSupport: Int = TransmogrifierDefaults.MinSupport, cleanText: Boolean = TransmogrifierDefaults.CleanText, trackNulls: Boolean = TransmogrifierDefaults.TrackNulls): FeatureLike[OPVector]

    Permalink

    Converts a sequence of Text features into a vector keeping the top K most common occurrences of each Text feature (ie the final vector has length k * number of Text inputs).

    Converts a sequence of Text features into a vector keeping the top K most common occurrences of each Text feature (ie the final vector has length k * number of Text inputs). Plus two additional columns for "other" values and nulls - which will capture values that do not make the cut or values not seen in training

    others

    other features to include in the pivot

    topK

    keep topK values

    minSupport

    Min times a value must occur to be retained in pivot

    cleanText

    if true ignores capitalization and punctuations when grouping categories

    trackNulls

    keep an extra column that indicated if feature was null

  19. def recognizeEntities(languageDetector: LanguageDetector = ..., analyzer: TextAnalyzer = NameEntityRecognizer.Analyzer, sentenceSplitter: SentenceSplitter = NameEntityRecognizer.Splitter, tagger: NameEntityTagger[_ <: TaggerResult] = NameEntityRecognizer.Tagger, autoDetectLanguage: Boolean = ..., autoDetectThreshold: Double = ..., defaultLanguage: Language = ...): FeatureLike[MultiPickListMap]

    Permalink

    Find name entities of the text using OpenNLP OpenNLPAnalyzer

    Find name entities of the text using OpenNLP OpenNLPAnalyzer

    languageDetector

    a language detector instance

    analyzer

    a text analyzer instance

    sentenceSplitter

    sentence splitter

    tagger

    name entity recognition tagger

    autoDetectLanguage

    indicates whether to attempt language detection

    autoDetectThreshold

    Language detection threshold. If none of the detected languages have confidence greater than the threshold then defaultLanguage is used.

    defaultLanguage

    default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.

    returns

    name entity sets feature

  20. def smartVectorize(maxCategoricalCardinality: Int, numHashes: Int, autoDetectLanguage: Boolean, minTokenLength: Int, toLowercase: Boolean, cleanText: Boolean = TransmogrifierDefaults.CleanText, trackNulls: Boolean = TransmogrifierDefaults.TrackNulls, topK: Int = TransmogrifierDefaults.TopK, minSupport: Int = TransmogrifierDefaults.MinSupport, unseenName: String = TransmogrifierDefaults.OtherString, hashWithIndex: Boolean = ..., binaryFreq: Boolean = TransmogrifierDefaults.BinaryFreq, prependFeatureName: Boolean = ..., autoDetectThreshold: Double = TextTokenizer.AutoDetectThreshold, hashSpaceStrategy: HashSpaceStrategy = ..., defaultLanguage: Language = TextTokenizer.DefaultLanguage, hashAlgorithm: HashAlgorithm = ..., others: Array[FeatureLike[T]] = Array.empty): FeatureLike[OPVector]

    Permalink

    Vectorize text features by treating low cardinality text features as categoricals and applying hashing trick to high caridinality ones.

    Vectorize text features by treating low cardinality text features as categoricals and applying hashing trick to high caridinality ones.

    maxCategoricalCardinality

    max cardinality for a text feature to be treated as categorical

    numHashes

    number of features (hashes) to generate

    autoDetectLanguage

    indicates whether to attempt language detection

    minTokenLength

    minimum token length, >= 1.

    toLowercase

    indicates whether to convert all characters to lowercase before analyzing

    cleanText

    indicates whether to ignore capitalization and punctuation

    trackNulls

    indicates whether or not to track null values in a separate column.

    topK

    number of most common elements to be used as categorical pivots

    minSupport

    minimum number of occurrences an element must have to appear in pivot

    unseenName

    name to give indexes which do not have a label name associated with them

    hashWithIndex

    include indices when hashing a feature that has them (OPLists or OPVectors)

    binaryFreq

    if true, term frequency vector will be binary such that non-zero term counts will be set to 1.0

    prependFeatureName

    if true, prepends a input feature name to each token of that feature

    autoDetectThreshold

    Language detection threshold. If none of the detected languages have confidence greater than the threshold then defaultLanguage is used.

    hashSpaceStrategy

    strategy to determine whether to use shared hash space for all included features

    defaultLanguage

    default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.

    hashAlgorithm

    hash algorithm to use

    others

    additional text features

    returns

    result feature of type Vector

  21. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  22. def toMultiPickList: FeatureLike[MultiPickList]

    Permalink

    Convert this Text feature into a MultiPickList feature, whose category is a one-element set of this Text's value.

    Convert this Text feature into a MultiPickList feature, whose category is a one-element set of this Text's value.

    returns

    A new MultiPickList feature

  23. def toNGramSimilarity(that: FeatureLike[T], nGramSize: Int = NGramSimilarity.nGramSize): FeatureLike[RealNN]

    Permalink

    Apply N-gram Similarity transformer

    Apply N-gram Similarity transformer

    that

    other text feature

    nGramSize

    the size of the n-gram to be used to compute the string distance

    returns

    ngrammed feature

  24. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  25. def tokenize(autoDetectLanguage: Boolean = TextTokenizer.AutoDetectLanguage, autoDetectThreshold: Double = TextTokenizer.AutoDetectThreshold, defaultLanguage: Language = TextTokenizer.DefaultLanguage, minTokenLength: Int = TextTokenizer.MinTokenLength, toLowercase: Boolean = TextTokenizer.ToLowercase, stripHtml: Boolean = TextTokenizer.StripHtml): FeatureLike[TextList]

    Permalink

    Tokenize text using LuceneTextAnalyzer with OptimaizeLanguageDetector

    Tokenize text using LuceneTextAnalyzer with OptimaizeLanguageDetector

    autoDetectLanguage

    indicates whether to attempt language detection

    autoDetectThreshold

    Language detection threshold. If none of the detected languages have confidence greater than the threshold then defaultLanguage is used.

    defaultLanguage

    default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.

    minTokenLength

    minimum token length, >= 1.

    toLowercase

    indicates whether to convert all characters to lowercase before analyzing

    stripHtml

    indicates whether to strip HTML tags from the text or not before analyzing

    returns

    tokenized feature

  26. def tokenize(languageDetector: LanguageDetector, analyzer: TextAnalyzer, autoDetectLanguage: Boolean, autoDetectThreshold: Double, defaultLanguage: Language, minTokenLength: Int, toLowercase: Boolean): FeatureLike[TextList]

    Permalink

    Tokenize text using the provided analyzer

    Tokenize text using the provided analyzer

    languageDetector

    a language detector instance

    analyzer

    a text analyzer instance

    autoDetectLanguage

    indicates whether to attempt language detection

    autoDetectThreshold

    Language detection threshold. If none of the detected languages have confidence greater than the threshold then defaultLanguage is used.

    defaultLanguage

    default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.

    minTokenLength

    minimum token length, >= 1.

    toLowercase

    indicates whether to convert all characters to lowercase before analyzing

    returns

    tokenized feature

  27. def tokenizeRegex(pattern: String, group: Int = 1, minTokenLength: Int = TextTokenizer.MinTokenLength, toLowercase: Boolean = TextTokenizer.ToLowercase): FeatureLike[TextList]

    Permalink

    Tokenize text using regex pattern matching to construct distinct tokens.

    Tokenize text using regex pattern matching to construct distinct tokens. NOTE: This Tokenizer does not output tokens that are of zero length.

    pattern

    is the regular expression

    group

    selects the matching group as the token (default: -1, which is equivalent to "split". In this case, the tokens will be equivalent to the output from (without empty tokens).

    minTokenLength

    minimum token length, >= 1.

    toLowercase

    indicates whether to convert all characters to lowercase before analyzing

    returns

    tokenized feature

  28. implicit val ttiv: scala.reflect.api.JavaUniverse.TypeTag[Option[String]]

    Permalink
  29. def vectorize(numHashes: Int, autoDetectLanguage: Boolean, minTokenLength: Int, toLowercase: Boolean, trackNulls: Boolean = TransmogrifierDefaults.TrackNulls, hashWithIndex: Boolean = ..., binaryFreq: Boolean = TransmogrifierDefaults.BinaryFreq, prependFeatureName: Boolean = ..., autoDetectThreshold: Double = TextTokenizer.AutoDetectThreshold, hashSpaceStrategy: HashSpaceStrategy = ..., defaultLanguage: Language = TextTokenizer.DefaultLanguage, hashAlgorithm: HashAlgorithm = ..., languageDetector: LanguageDetector = TextTokenizer.LanguageDetector, analyzer: TextAnalyzer = TextTokenizer.Analyzer, others: Array[FeatureLike[T]] = Array.empty): FeatureLike[OPVector]

    Permalink

    Vectorize text features by first tokenizing each using TextTokenizer and then applying OPCollectionHashingVectorizer.

    Vectorize text features by first tokenizing each using TextTokenizer and then applying OPCollectionHashingVectorizer.

    numHashes

    number of features (hashes) to generate

    autoDetectLanguage

    indicates whether to attempt language detection

    minTokenLength

    minimum token length, >= 1.

    toLowercase

    indicates whether to convert all characters to lowercase before analyzing

    trackNulls

    indicates whether or not to track null values in a separate column. Since features may be combined into a shared hash space here, the null value should be tracked separately

    hashWithIndex

    include indices when hashing a feature that has them (OPLists or OPVectors)

    binaryFreq

    if true, term frequency vector will be binary such that non-zero term counts will be set to 1.0

    prependFeatureName

    if true, prepends a input feature name to each token of that feature

    autoDetectThreshold

    Language detection threshold. If none of the detected languages have confidence greater than the threshold then defaultLanguage is used.

    hashSpaceStrategy

    strategy to determine whether to use shared hash space for all included features

    defaultLanguage

    default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.

    hashAlgorithm

    hash algorithm to use

    languageDetector

    a language detector instance

    analyzer

    a text analyzer instance

    others

    other text features to vectorize with the parent feature

    returns

    result feature of type Vector

  30. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  31. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  32. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from AnyRef

Inherited from Any

Ungrouped