Package

com.salesforce.op.stages.impl

feature

Permalink

package feature

Visibility
  1. Public
  2. All

Type Members

  1. class AbsoluteValueTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

    Permalink

    Absolute value transformer

    Absolute value transformer

    I

    input feature type

  2. class AddTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

    Permalink

    Plus function truth table (Real as example):

    Plus function truth table (Real as example):

    Real.empty + Real.empty = Real.empty Real.empty + Real(x) = Real(x) Real(x) + Real.empty = Real(x) Real(x) + Real(y) = Real(x + y)

  3. class AliasTransformer[I <: FeatureType] extends UnaryTransformer[I, I]

    Permalink

    No-op (identity) alias feature transformer allowing renaming features without applying a transformation on values.

    No-op (identity) alias feature transformer allowing renaming features without applying a transformation on values.

    I

    feature type

  4. class BinaryMapVectorizer[T <: OPMap[Boolean]] extends OPMapVectorizer[Boolean, T]

    Permalink

    Class for vectorizing BinaryMap features.

    Class for vectorizing BinaryMap features. Fills missing keys with args.defaultValue, which does not depend on the key, so getFillByKey returns an empty sequence.

    T

    input feature type to vectorize into an OPVector

  5. final class BinaryMapVectorizerModel[T <: OPMap[Boolean]] extends OPMapVectorizerModel[Boolean, T]

    Permalink
  6. class BinaryVectorizer extends SequenceTransformer[Binary, OPVector] with VectorizerDefaults with TrackNullsParam

    Permalink

    Vectorizes Binary inputs where each input is transformed into 2 vector elements where the first element is [1 -> true] or [0 -> false] and the second element is [1 -> filled value] or [0 -> original value].

    Vectorizes Binary inputs where each input is transformed into 2 vector elements where the first element is [1 -> true] or [0 -> false] and the second element is [1 -> filled value] or [0 -> original value]. The vector representation for each input is concatenated into a final vector representation.

    Example:

    Data: Seq[(Binary, Binary)] = ((Some(false), None)) => f1, f2 new BinaryVectorizer().setInput(f1, f2).setFillValue(10)

    will produce Array(0.0, 0.0, 10.0, 1.0)

  7. class CeilTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Integral]

    Permalink

    Ceil transformer

    Ceil transformer

    I

    input feature type

  8. trait CleanTextFun extends AnyRef

    Permalink
  9. trait CleanTextMapFun extends CleanTextFun

    Permalink
  10. sealed trait CombinationStrategy extends EnumEntry with Serializable

    Permalink

    Model Combination Strategies

  11. sealed trait DateListPivot extends EnumEntry with Serializable

    Permalink
  12. class DateListVectorizer[T <: OPList[Long]] extends SequenceTransformer[T, OPVector] with VectorizerDefaults with TrackNullsParam

    Permalink

    Converts a sequence of DateLists features into a vector feature.

    Converts a sequence of DateLists features into a vector feature. Can choose how to pivot the features

  13. class DateMapToUnitCircleVectorizer[T <: DateMap] extends SequenceEstimator[T, OPVector] with DateToUnitCircleParams with MapVectorizerFuns[Long, T]

    Permalink

    Following: http://webspace.ship.edu/pgmarr/geo441/lectures/lec%2016%20-%20directional%20statistics.pdf Transforms a Date or DateTime field into a cartesian coordinate representation of an extracted time period on the unit circle

    Following: http://webspace.ship.edu/pgmarr/geo441/lectures/lec%2016%20-%20directional%20statistics.pdf Transforms a Date or DateTime field into a cartesian coordinate representation of an extracted time period on the unit circle

    parameter timePeriod The time period to extract from the timestamp enum from: DayOfMonth, DayOfWeek, DayOfYear, HourOfDay, MonthOfYear, WeekOfMonth, WeekOfYear

    We extract the timePeriod from the timestamp and map this onto the unit circle containing the number of time periods equally spaced. For example, when timePeriod = HourOfDay, the timestamp 01/01/2018 6:37 maps to the point on the circle with angle radians = 2*math.Pi*6/24 We return the cartesian coordinates of this point: (math.cos(radians), math.sin(radians))

    The first time period always has angle 0.

    Note: We use the ISO week date format https://en.wikipedia.org/wiki/ISO_week_date#First_week Monday is the first day of the week & the first week of the year is the week wit the first Monday after Jan 1.

  14. final class DateMapToUnitCircleVectorizerModel[T <: DateMap] extends SequenceModel[T, OPVector] with CleanTextMapFun

    Permalink

    Model for DateMapToUnitCircleVectorizer

    Model for DateMapToUnitCircleVectorizer

    T

    DateMap type

  15. class DateMapVectorizer[T <: OPMap[Long]] extends OPMapVectorizer[Long, T]

    Permalink

    Class for vectorizing DateMap features.

    Class for vectorizing DateMap features. Fills missing keys with args.defaultValue, which does not depend on the key, so getFillByKey returns an empty sequence.

    T

    input feature type to vectorize into an OPVector

  16. final class DateMapVectorizerModel[T <: OPMap[Long]] extends OPMapVectorizerModel[Long, T]

    Permalink
  17. trait DateToUnitCircleParams extends Params

    Permalink
  18. class DateToUnitCircleTransformer[T <: Date] extends SequenceTransformer[T, OPVector] with DateToUnitCircleParams

    Permalink

    Following: http://webspace.ship.edu/pgmarr/geo441/lectures/lec%2016%20-%20directional%20statistics.pdf Transforms a Date or DateTime field into a cartesian coordinate representation of an extracted time period on the unit circle

    Following: http://webspace.ship.edu/pgmarr/geo441/lectures/lec%2016%20-%20directional%20statistics.pdf Transforms a Date or DateTime field into a cartesian coordinate representation of an extracted time period on the unit circle

    parameter timePeriod The time period to extract from the timestamp enum from: DayOfMonth, DayOfWeek, DayOfYear, HourOfDay, MonthOfYear, WeekOfMonth, WeekOfYear

    We extract the timePeriod from the timestamp and map this onto the unit circle containing the number of time periods equally spaced. For example, when timePeriod = HourOfDay, the timestamp 01/01/2018 6:37 maps to the point on the circle with angle radians = 2*math.Pi*6/24 We return the cartesian coordinates of this point: (math.cos(radians), math.sin(radians))

    The first time period always has angle 0.

    Note: We use the ISO week date format https://en.wikipedia.org/wiki/ISO_week_date#First_week Monday is the first day of the week & the first week of the year is the week wit the first Monday after Jan 1.

  19. class DecisionTreeNumericBucketizer[N, I2 <: OPNumeric[N]] extends BinaryEstimator[RealNN, I2, OPVector] with DecisionTreeNumericBucketizerParams with VectorizerDefaults with TrackInvalidParam with TrackNullsParam with NumericBucketizerMetadata with AllowLabelAsInput[OPVector]

    Permalink

    Smart bucketizer for numeric values based on a Decision Tree classifier.

    Smart bucketizer for numeric values based on a Decision Tree classifier.

    N

    numeric feature type value

    I2

    numeric feature type

  20. final class DecisionTreeNumericBucketizerModel[I2 <: OPNumeric[_]] extends BinaryModel[RealNN, I2, OPVector] with AllowLabelAsInput[OPVector]

    Permalink
  21. trait DecisionTreeNumericBucketizerParams extends AnyRef

    Permalink
  22. class DecisionTreeNumericMapBucketizer[N, I2 <: OPMap[N]] extends BinaryEstimator[RealNN, I2, OPVector] with DecisionTreeNumericBucketizerParams with VectorizerDefaults with TrackInvalidParam with TrackNullsParam with NumericBucketizerMetadata with MapPivotParams with CleanTextMapFun with AllowLabelAsInput[OPVector]

    Permalink

    Smart bucketizer for numeric map values based on a Decision Tree classifier.

    Smart bucketizer for numeric map values based on a Decision Tree classifier.

    N

    numeric feature type value

    I2

    numeric map feature type

  23. final class DecisionTreeNumericMapBucketizerModel[I2 <: OPMap[_]] extends BinaryModel[RealNN, I2, OPVector] with CleanTextMapFun with AllowLabelAsInput[OPVector]

    Permalink
  24. final class DescalerTransformer[I1 <: Real, I2 <: Real, O <: Real] extends BinaryTransformer[I1, I2, O]

    Permalink

    A transformer that takes as inputs a feature to descale and (potentially different) scaled feature which contains the metadata for reconstructing the inverse scaling function.

    A transformer that takes as inputs a feature to descale and (potentially different) scaled feature which contains the metadata for reconstructing the inverse scaling function. Transforms the 2nd input feature by applying the inverse of the scaling function found in the metadata - 1st input feature the feature to descale - 2nd input feature the scaled feature containing metadata for constructing the scaling used to make this column

    I1

    feature type for first input

    I2

    feature type for the second input

    O

    output feature type

  25. class DivideTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

    Permalink

    Divide function truth table (Real as example):

    Divide function truth table (Real as example):

    Real.empty / Real.empty = Real.empty Real.empty / Real(x) = Real.empty Real(x) / Real.empty = Real.empty Real(x) / Real(y) = Real(x * y) filter ("is not NaN or Infinity")

  26. class DropIndicesByTransformer extends UnaryTransformer[OPVector, OPVector]

    Permalink

    Allows columns to be dropped from a feature vector based on properties of the metadata about what is contained in each column (will work only on vectors) created with OpVectorMetadata

  27. class EmailToPickListMapTransformer extends OPMapTransformer[Email, PickList, EmailMap, PickListMap]

    Permalink
  28. case class EmptyScalerArgs() extends ScalingArgs with Product with Serializable

    Permalink

    Case class for Scaling families that take no parameters

  29. class ExistsTransformer[I <: FeatureType] extends UnaryTransformer[I, Binary]

    Permalink
  30. class ExpTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

    Permalink

    Exp transformer: returns Euler's number e raised to the power of feature value

    Exp transformer: returns Euler's number e raised to the power of feature value

    I

    input feature type

  31. class FillMissingWithMean[N, I <: OPNumeric[N]] extends UnaryEstimator[I, RealNN]

    Permalink

    Fill missing values with mean for any numeric feature

  32. final class FillMissingWithMeanModel[I <: OPNumeric[_]] extends UnaryModel[I, RealNN]

    Permalink
  33. class FilterMap[I <: OPMap[_]] extends UnaryTransformer[I, I] with MapPivotParams with TextParams with CleanTextMapFun

    Permalink

    Filters maps by keys provided in a allowlist or blocklist

    Filters maps by keys provided in a allowlist or blocklist

    I

    input feature type

  34. class FilterTransformer[I <: FeatureType] extends UnaryTransformer[I, I]

    Permalink
  35. class FloorTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Integral]

    Permalink

    Floor transformer

    Floor transformer

    I

    input feature type

  36. sealed class GenderDetectStrategy extends EnumEntry

    Permalink

    Defines the different kinds of gender detection strategies that are possible

    Defines the different kinds of gender detection strategies that are possible

    We need to overwrite toString in order to provide serialization during the Spark map and reduce steps and then the fromString function provides deserialization back to the GenderDetectStrategy class for the companion transformer

  37. class GeolocationMapVectorizer extends SequenceEstimator[GeolocationMap, OPVector] with MapVectorizerFuns[Seq[Double], GeolocationMap] with TrackNullsParam

    Permalink
  38. final class GeolocationMapVectorizerModel extends SequenceModel[GeolocationMap, OPVector] with CleanTextMapFun

    Permalink
  39. class GeolocationVectorizer extends SequenceEstimator[Geolocation, OPVector] with VectorizerDefaults with TrackNullsParam with GeolocationFunctions

    Permalink

    Converts a sequence of Geolocation features into a vector feature.

    Converts a sequence of Geolocation features into a vector feature. Can choose to fill null values with the mean or a constant

  40. final class GeolocationVectorizerModel extends SequenceModel[Geolocation, OPVector] with VectorizerDefaults

    Permalink
  41. sealed trait HashAlgorithm extends EnumEntry with Serializable

    Permalink

    Hashing Algorithms

  42. sealed trait HashSpaceStrategy extends EnumEntry with Serializable

    Permalink

    Hash space strategy

  43. case class HashingFunctionParams(hashWithIndex: Boolean, prependFeatureName: Boolean, numFeatures: Int, numInputs: Int, maxNumOfFeatures: Int, binaryFreq: Boolean, hashAlgorithm: HashAlgorithm, hashSpaceStrategy: HashSpaceStrategy) extends Product with Serializable

    Permalink

    Hashing Parameters

    Hashing Parameters

    hashWithIndex

    if true, include indices when hashing a feature that has them (OPLists or OPVectors)

    prependFeatureName

    if true, prepends a input feature name to each token of that feature

    numFeatures

    number of features (hashes) to generate

    numInputs

    number of inputs

    maxNumOfFeatures

    max number of features (hashes)

    binaryFreq

    if true, term frequency vector will be binary such that non-zero term counts will be set to 1.0

    hashAlgorithm

    hash algorithm to use

    hashSpaceStrategy

    strategy to determine whether to use shared hash space for all included features

  44. class HumanNameDetector[T <: Text] extends UnaryEstimator[T, NameStats] with NameDetectFun[T]

    Permalink

    Unary estimator for identifying whether a single Text column is a name or not.

    Unary estimator for identifying whether a single Text column is a name or not. If the column does appear to be a name, a custom map will be returned that contains the guessed gender for each entry (gender detection only supported for English at the moment). If the column does not appear to be a name, then the output will be an empty map.

    T

    the FeatureType (subtype of Text) to operate over

  45. case class HumanNameDetectorMetadata(treatAsName: Boolean, predictedNameProb: Double, genderResultsByStrategy: Map[String, GenderStats]) extends MetadataLike with Product with Serializable

    Permalink
  46. class HumanNameDetectorModel[T <: Text] extends UnaryModel[T, NameStats] with NameDetectFun[T]

    Permalink
  47. sealed abstract class Inclusion extends EnumEntry with Serializable

    Permalink
  48. sealed trait IndexToStringHandleInvalid extends EnumEntry with Serializable

    Permalink
  49. class IntegralMapVectorizer[T <: OPMap[Long]] extends OPMapVectorizer[Long, T]

    Permalink

    Class for vectorizing IntegralMap features.

    Class for vectorizing IntegralMap features. Fills missing keys with the mode for that key.

    T

    input feature type to vectorize into an OPVector

  50. final class IntegralMapVectorizerModel[T <: OPMap[Long]] extends OPMapVectorizerModel[Long, T]

    Permalink
  51. class IntegralVectorizer[T <: Integral] extends SequenceEstimator[T, OPVector] with VectorizerDefaults with TrackNullsParam

    Permalink

    Converts a sequence of Integral features into a vector feature.

    Converts a sequence of Integral features into a vector feature. Can choose to fill null values with the mean or a constant

  52. final class IntegralVectorizerModel[T <: Integral] extends SequenceModel[T, OPVector] with VectorizerDefaults

    Permalink
  53. class IsValidPhoneDefaultCountry extends UnaryTransformer[Phone, Binary] with PhoneParams

    Permalink

    Transformer to determine if a phone numbers is valid when no country code is available.

    Transformer to determine if a phone numbers is valid when no country code is available. The default locale will be used for validation. All phone numbers with less than 2 characters will be categorized as invalid All phone numbers that starts with "+" will be evaluated with international formatting

    Returns binary feature true if phone is valid false if invalid and none if phone number is none

  54. class IsValidPhoneMapDefaultCountry extends UnaryTransformer[PhoneMap, BinaryMap] with PhoneParams

    Permalink

    Transformer to determine if a map of phone numbers is valid when no country code is available.

    Transformer to determine if a map of phone numbers is valid when no country code is available. The default locale will be used for validation. All phone numbers with less than 2 characters will be categorized as invalid All phone numbers that starts with "+" will be evaluated with international formatting

    Returns binary map feature true if phone is valid false if invalid and none if phone number is none

  55. class IsValidPhoneNumber extends BinaryTransformer[Phone, Text, Binary] with PhoneCountryParams

    Permalink

    Determine whether a phone number is valid given the country's regional code.

    Determine whether a phone number is valid given the country's regional code. By default the regional code will be checked against those provided in Google's PhoneNumber library. If the input regional code is not found, the default locale will be used for validation.

    If the User provided a Country name to code mapping, the phone number can only be validated against the input mapping. This transformer will first match on regional code, failing that, it will select the country with the closest Q-Distance.

    All phone numbers with less than 2 characters will be categorized as invalid

    All phone numbers that starts with "+" will be evaluated with international formatting

    Returns binary feature true if phone is valid false if invalid and none if phone number is none

  56. class JaccardSimilarity extends BinaryTransformer[MultiPickList, MultiPickList, RealNN]

    Permalink

    Calculates the Jaccard Similarity between two sets.

    Calculates the Jaccard Similarity between two sets. If both inputs are empty, Jaccard Similarity is defined as 1.0

  57. class LangDetector[T <: Text] extends UnaryTransformer[T, RealMap]

    Permalink

    Transformer that detects the language of the text

  58. trait LanguageDetectionParams extends Params

    Permalink
  59. case class LinearScaler(args: LinearScalerArgs) extends Scaler with Product with Serializable

    Permalink

    A case class representing a linear scaling function

    A case class representing a linear scaling function

    args

    case class containing the slope and intercept of the scaling function

  60. case class LinearScalerArgs(slope: Double, intercept: Double) extends ScalingArgs with Product with Serializable

    Permalink

    Parameters need to uniquely define a linear scaling function

    Parameters need to uniquely define a linear scaling function

    slope

    the slope of the linear scaler

    intercept

    the x axis intercept of the linear scaler

  61. case class LogScaler() extends Scaler with Product with Serializable

    Permalink

    A case class representing a logarithmic scaling function

  62. class LogTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

    Permalink

    Log base N transformer

    Log base N transformer

    I

    input feature type

  63. trait MapPivotParams extends Params

    Permalink
  64. trait MapStringPivotHelper extends SaveOthersParams

    Permalink
  65. trait MapVectorizerFuns[A, T <: OPMap[A]] extends VectorizerDefaults with MapPivotParams with CleanTextMapFun

    Permalink
  66. trait MaxCardinalityParams extends Params

    Permalink
  67. trait MaxPctCardinalityParams extends Params

    Permalink
  68. class MimeTypeDetector extends UnaryTransformer[Base64, Text] with MimeTypeDetectorParams

    Permalink

    Detects MIME type for Base64 encoded binary data.

  69. class MimeTypeMapDetector extends UnaryTransformer[Base64Map, PickListMap] with MimeTypeDetectorParams

    Permalink

    Detects MIME type for Base64Map encoded binary data.

  70. trait MinLengthStdDevParams extends Params

    Permalink
  71. trait MinSupportParam extends Params

    Permalink
  72. class MultiLabelJoiner extends BinaryTransformer[RealNN, OPVector, RealMap]

    Permalink

    Joins probability score with label from string indexer stage

    Joins probability score with label from string indexer stage

    returns

    Map(label -> probability)

  73. class MultiPickListMapVectorizer[T <: OPMap[Set[String]]] extends SequenceEstimator[T, OPVector] with PivotParams with MapPivotParams with TextParams with MapStringPivotHelper with CleanTextMapFun with MinSupportParam with TrackNullsParam with MaxPctCardinalityParams with MaxPctCardinalityFun

    Permalink

    Converts a sequence of KeyMultiPickList features into a vector keeping the top K most common occurrences of each key in the maps for that feature (ie the final vector has length k * number of keys * number of features).

    Converts a sequence of KeyMultiPickList features into a vector keeping the top K most common occurrences of each key in the maps for that feature (ie the final vector has length k * number of keys * number of features). Each key found will also generate an other column which will capture values that do not make the cut or where not seen in training. Note that any keys not seen in training will be ignored.

  74. final class MultiPickListMapVectorizerModel[T <: OPMap[Set[String]]] extends SequenceModel[T, OPVector] with CleanTextMapFun

    Permalink
  75. class MultiplyTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

    Permalink

    Multiply function truth table (Real as example):

    Multiply function truth table (Real as example):

    Real.empty * Real.empty = Real.empty Real.empty * Real(x) = Real.empty Real(x) * Real.empty = Real.empty Real(x) * Real(y) = Real(x * y) filter ("is not NaN or Infinity")

  76. class NameEntityRecognizer[T <: Text] extends UnaryTransformer[T, MultiPickListMap] with LanguageDetectionParams

    Permalink

    Name Entity NameEntityType text recognizer.

    Name Entity NameEntityType text recognizer.

    Note: when providing your own the analyzer/splitter/tagger make sure they can work together, for instance OpenNLP models require their own analyzers to be provided when tokenizing. The returned feature type is a MultiPickListMap which contains sets of entities for all the tokens

    T

    text feature type

  77. class NumericBucketizer[I1 <: OPNumeric[_]] extends UnaryTransformer[I1, OPVector] with VectorizerDefaults with NumericBucketizerParams with NumericBucketizerMetadata

    Permalink

    Numeric Bucketizer

    Numeric Bucketizer

    I1

    numeric feature type

  78. trait NumericBucketizerParams extends TrackInvalidParam with TrackNullsParam

    Permalink
  79. sealed trait NumericMapDefaultParam extends Params

    Permalink
  80. class OPCollectionHashingVectorizer[T <: OPCollection] extends SequenceTransformer[T, OPVector] with VectorizerDefaults with PivotParams with CleanTextFun with HashingFun with HashingVectorizerParams

    Permalink

    Generic hashing vectorizer to convert features of type OPCollection into Vectors

    Generic hashing vectorizer to convert features of type OPCollection into Vectors

    In more details: It tries to hash entries in the collection using the specified hashing algorithm to build a single vector. If the desired number of features (= hash space size) for all features combined is larger than Integer.Max (the maximal index for a vector), then all the features use the same hash space. There are also options for the user to hash indices with collections where that makes sense (OPLists and OPVectors), and to force a shared hash space, even if the number of feature is not high enough to require it.

  81. sealed abstract class OPCollectionTransformer[I <: FeatureType, O <: FeatureType, ICol <: OPCollection, OCol <: OPCollection] extends UnaryTransformer[ICol, OCol]

    Permalink

    Abstract base class for a set of transformer wrappers that allow unary transformers between non-collection types to be used on collection types.

    Abstract base class for a set of transformer wrappers that allow unary transformers between non-collection types to be used on collection types. For example, we can use a UnaryLambdaTransformer[Email, Integer] on a map's values, creating a UnaryLambdaTransformer[EmailMap, IntegralMap]. This base class will be inherited by concrete classes for OPMaps, OPList, and OPSets (in order to enforce not allowing these collection types to be transformed into each other, eg. no MultiPickList to RealMap transformations).

    The OP type hierarchy does not allow direct type checking of such transformer wrappers (eg. Real#Value is Option[Double] and RealMap#Value is Map[String, Double], so there's no way to enforce that a RealMap can only hold what is contained in a Real) since the types themselves are not created with typetags for performance reasons. However, we can still enforce that operations like building a UnaryLambdaTransformer[RealMap, StringMap] from a UnaryLambdaTransformer[Real, Integer] is not possible by using the Spark types in validateTypes.

    I

    input feature type for supplied non-collection transformer

    O

    output feature type for supplied non-collection transformer

    ICol

    input feature type for desired collection transformer

    OCol

    output feature type for desired collection transformer

  82. abstract class OPMapVectorizer[A, T <: OPMap[A]] extends SequenceEstimator[T, OPVector] with MapVectorizerFuns[Double, RealMap] with NumericMapDefaultParam with TrackNullsParam

    Permalink

    Base class for vectorizing OPMap[A] features.

    Base class for vectorizing OPMap[A] features. Individual vectorizers for different feature types need to implement the getFillByKey function (which calculates any fill values that differ by key - means, modes, etc.) and the makeModel function (which specifies which type of model will be returned).

    A

    value type for underlying map

    T

    input feature type to vectorize into an OPVector

  83. sealed abstract class OPMapVectorizerModel[A, I <: OPMap[A]] extends SequenceModel[I, OPVector] with CleanTextMapFun

    Permalink
  84. sealed case class OPMapVectorizerModelArgs(allKeys: Seq[Seq[String]], fillByKey: Seq[Map[String, Double]], shouldCleanKeys: Boolean, shouldCleanValues: Boolean, defaultValue: Double, trackNulls: Boolean = TransmogrifierDefaults.TrackNulls) extends Product with Serializable

    Permalink

    OPMap vectorizer model arguments

    OPMap vectorizer model arguments

    allKeys

    all keys per feature

    fillByKey

    fill values for features

    shouldCleanKeys

    should clean map keys

    shouldCleanValues

    should clean map values

    defaultValue

    default value to replace with

    trackNulls

    add column to track null values for each map key

  85. class OpCountVectorizer extends OpEstimatorWrapper[TextList, OPVector, CountVectorizer, CountVectorizerModel]

    Permalink

    Wrapper around spark ml CountVectorizer for use with OP pipelines

  86. class OpHashingTF extends OpTransformerWrapper[TextList, OPVector, HashingTF]

    Permalink

    Wrapper for org.apache.spark.ml.feature.HashingTF

    Wrapper for org.apache.spark.ml.feature.HashingTF

    Maps a sequence of terms to their term frequencies using the hashing trick. Currently we use Austin Appleby's MurmurHash 3 algorithm (MurmurHash3_x86_32) to calculate the hash code value for the term object. Since a simple modulo is used to transform the hash function to a column index, it is advisable to use a power of two as the numFeatures parameter; otherwise the features will not be mapped evenly to the columns.

    See also

    HashingTF for more info

  87. class OpIndexToString extends OpTransformerWrapper[RealNN, Text, IndexToString]

    Permalink

    Wrapper for org.apache.spark.ml.feature.IndexToString

    Wrapper for org.apache.spark.ml.feature.IndexToString

    NOTE THAT THIS CLASS EITHER FILTERS OUT OR THROWS AN ERROR IF PREVIOUSLY UNSEEN VALUES APPEAR

    A transformer that maps a feature of indices back to a new feature of corresponding text values. The index-string mapping is either from the ML attributes of the input feature, or from user-supplied labels (which take precedence over ML attributes).

    See also

    OpStringIndexer for converting text into indices

  88. class OpIndexToStringNoFilter extends UnaryTransformer[RealNN, Text] with SaveOthersParams

    Permalink

    A transformer that maps a feature of indices back to a new feature of corresponding text values.

    A transformer that maps a feature of indices back to a new feature of corresponding text values. The index-string mapping is either from the ML attributes of the input feature, or from user-supplied labels (which take precedence over ML attributes).

    See also

    OpStringIndexerNoFilter for converting text into indices

  89. class OpLDA extends OpEstimatorWrapper[OPVector, OPVector, LDA, LDAModel]

    Permalink

    Wrapper around spark ml LDA (Latent Dirichlet Allocation) for use with OP pipelines

  90. class OpNGram extends OpTransformerWrapper[TextList, TextList, NGram]

    Permalink

    Wrapper for org.apache.spark.ml.feature.NGram

    Wrapper for org.apache.spark.ml.feature.NGram

    A feature transformer that converts the input array of strings into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.

    When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned.

    See also

    NGram for more info

  91. abstract class OpOneHotVectorizer[T <: FeatureType] extends SequenceEstimator[T, OPVector] with PivotParams with CleanTextFun with SaveOthersParams with TrackNullsParam with MinSupportParam with OneHotFun with MaxPctCardinalityParams

    Permalink

    Converts a sequence of features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs).

    Converts a sequence of features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs). Plus an additional column for "other" values - which will capture values that do not make the cut or values not seen in training, and an additional column for empty values unless null tracking is disabled.

  92. abstract class OpOneHotVectorizerModel[T <: FeatureType] extends SequenceModel[T, OPVector] with CleanTextFun with OneHotModelFun[T]

    Permalink
  93. class OpScalarStandardScaler extends UnaryEstimator[RealNN, RealNN]

    Permalink

    Wraps Spark's native StandardScaler, which operates on vectors, to enable it to operate directly on scalars.

  94. final class OpScalarStandardScalerModel extends UnaryModel[RealNN, RealNN]

    Permalink
  95. class OpSetVectorizer[T <: OPSet[_]] extends OpOneHotVectorizer[T]

    Permalink

    Converts a sequence of OpSet features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs).

    Converts a sequence of OpSet features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs). Plus an additional column for "other" values - which will capture values that do not make the cut or values not seen in training, and an additional column for empty values unless null tracking is disabled.

  96. final class OpSetVectorizerModel[T <: OPSet[_]] extends OpOneHotVectorizerModel[T]

    Permalink
  97. class OpStopWordsRemover extends OpTransformerWrapper[TextList, TextList, StopWordsRemover]

    Permalink

    Wrapper for org.apache.spark.ml.feature.StopWordsRemover

    Wrapper for org.apache.spark.ml.feature.StopWordsRemover

    A feature transformer that filters out stop words from input.

    Note

    null values from input array are preserved unless adding null to stopWords explicitly.

    See also

    StopWordsRemover for more info

    Stop words (Wikipedia)

  98. class OpStringIndexer[T <: Text] extends OpEstimatorWrapper[T, RealNN, StringIndexer, StringIndexerModel]

    Permalink

    Wrapper for org.apache.spark.ml.feature.StringIndexer

    Wrapper for org.apache.spark.ml.feature.StringIndexer

    NOTE THAT THIS CLASS EITHER FILTERS OUT OR THROWS AN ERROR IF PREVIOUSLY UNSEEN VALUES APPEAR

    A label indexer that maps a text column of labels to an ML feature of label indices. The indices are in [0, numLabels), ordered by label frequencies. So the most frequent label gets index 0.

    See also

    OpIndexToString for the inverse transformation

  99. class OpStringIndexerNoFilter[I <: Text] extends UnaryEstimator[I, RealNN] with SaveOthersParams

    Permalink

    A label indexer that maps a text column of labels to an ML feature of label indices.

    A label indexer that maps a text column of labels to an ML feature of label indices. The indices are in [0, numLabels), ordered by label frequencies. So the most frequent label gets index 0.

    See also

    OpIndexToStringNoFilter for the inverse transformation

  100. final class OpStringIndexerNoFilterModel[I <: Text] extends UnaryModel[I, RealNN]

    Permalink
  101. class OpTextPivotVectorizer[T <: Text] extends OpOneHotVectorizer[T]

    Permalink

    Converts a sequence of Text features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs).

    Converts a sequence of Text features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs). Plus an additional column for "other" values - which will capture values that do not make the cut or values not seen in training, and an additional column for empty values unless null tracking is disabled.

  102. final class OpTextPivotVectorizerModel[T <: Text] extends OpOneHotVectorizerModel[T]

    Permalink
  103. class OpWord2Vec extends OpEstimatorWrapper[TextList, OPVector, Word2Vec, Word2VecModel]

    Permalink

    Wrapper around spark ml word2vec for use with OP pipelines

  104. class ParsePhoneDefaultCountry extends UnaryTransformer[Phone, Phone] with PhoneParams

    Permalink

    Transformer to determine if a phone numbers is valid when no country code is available.

    Transformer to determine if a phone numbers is valid when no country code is available. The default locale will be used for validation. All phone numbers with less than 2 characters will be categorized as invalid All phone numbers that starts with "+" will be evaluate with international formatting

    Returns stripped number if number is valid. And None other wise.

  105. class ParsePhoneNumber extends BinaryTransformer[Phone, Text, Phone] with PhoneCountryParams

    Permalink

    Determine whether a phone number is valid given the country's regional code.

    Determine whether a phone number is valid given the country's regional code. By default the regional code will be checked against those provided in Google's PhoneNumber library. If the input regional code is not found, the default locale will be used for validation.

    If the User provided a Country name to code mapping, the phone number can only be validated against the input mapping. This transformer will first match on regional code, failing that, it will select the country with the closest Q-Distance.

    All phone numbers with less than 2 characters will be categorized as invalid

    All phone numbers that starts with "+" will be evaluated with international formatting

    Returns stripped number if number is valid. And None other wise.

  106. class PercentileCalibrator extends UnaryEstimator[RealNN, RealNN]

    Permalink

    Wraps around org.apache.spark.ml.feature.QuantileDiscretizer

  107. final class PercentileCalibratorModel extends UnaryModel[RealNN, RealNN]

    Permalink
  108. trait PhoneCountryParams extends PhoneParams

    Permalink
  109. trait PhoneParams extends Params

    Permalink
  110. trait PivotParams extends TextParams

    Permalink
  111. class PowerTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

    Permalink

    Power transformer

    Power transformer

    I

    input feature type

  112. final class PredictionDescaler[I <: Real, O <: Real] extends BinaryTransformer[Prediction, I, O]

    Permalink

    Applies to the input column the inverse of the scaling function defined in the Prediction feature metadata.

    Applies to the input column the inverse of the scaling function defined in the Prediction feature metadata. - 1st input feature is the Prediction feature to descale - 2nd input feature is scaled Prediction feature containing the metadata for constructing the scaling used to make this column

    I

    input feature type

    O

    output feature type

  113. class RealMapVectorizer[T <: OPMap[Double]] extends OPMapVectorizer[Double, T]

    Permalink

    Class for vectorizing RealMap features.

    Class for vectorizing RealMap features. Fills missing keys with the mean for that key.

    T

    input feature type to vectorize into an OPVector

  114. final class RealMapVectorizerModel[T <: OPMap[Double]] extends OPMapVectorizerModel[Double, T]

    Permalink
  115. class RealNNVectorizer extends SequenceTransformer[RealNN, OPVector] with VectorizerDefaults

    Permalink

    Converts a sequence of real non nullable features into a vector feature

  116. class RealVectorizer[T <: Real] extends SequenceEstimator[T, OPVector] with VectorizerDefaults with TrackNullsParam

    Permalink

    Converts a sequence of Nullable Numeric features into a vector feature.

    Converts a sequence of Nullable Numeric features into a vector feature. Can choose to fill null values with the mean or a constant

  117. final class RealVectorizerModel[T <: Real] extends SequenceModel[T, OPVector] with VectorizerDefaults

    Permalink
  118. class ReplaceTransformer[I <: FeatureType] extends UnaryTransformer[I, I]

    Permalink
  119. class RoundDigitsTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

    Permalink

    Round digits transformer

    Round digits transformer

    I

    input feature type

  120. class RoundTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Integral]

    Permalink

    Round transformer

    Round transformer

    I

    input feature type

  121. trait SaveOthersParams extends Params

    Permalink
  122. class ScalarAddTransformer[I <: OPNumeric[_], N] extends UnaryTransformer[I, Real]

    Permalink

    Scalar addition transformer

    Scalar addition transformer

    I

    input feature type

    N

    value type

  123. class ScalarDivideTransformer[I <: OPNumeric[_], N] extends UnaryTransformer[I, Real]

    Permalink

    Scalar divide transformer

    Scalar divide transformer

    I

    input feature type

    N

    value type

  124. class ScalarMultiplyTransformer[I <: OPNumeric[_], N] extends UnaryTransformer[I, Real]

    Permalink

    Scalar multiply transformer

    Scalar multiply transformer

    I

    input feature type

    N

    value type

  125. class ScalarSubtractTransformer[I <: OPNumeric[_], N] extends UnaryTransformer[I, Real]

    Permalink

    Scalar subtract transformer

    Scalar subtract transformer

    I

    input feature type

    N

    value type

  126. trait Scaler extends Serializable

    Permalink

    A trait for defining a new family of scaling functions scalingType: a ScalingType Enum for the scaling name args: A case class containing the args needed to define scaling and inverse scaling functions scale: The scaling function descale: The inverse scaling function

    A trait for defining a new family of scaling functions scalingType: a ScalingType Enum for the scaling name args: A case class containing the args needed to define scaling and inverse scaling functions scale: The scaling function descale: The inverse scaling function

    To add a new family of scaling functions: Add an entry to the scalingType enum, define a Case class extending Scaler, and add a case statement to both the Scaler and ScalerMetaData case classes

  127. case class ScalerMetadata(scalingType: ScalingType, scalingArgs: ScalingArgs) extends Product with Serializable

    Permalink

    Metadata containing the info needed to reconstruct a Scaler instance

    Metadata containing the info needed to reconstruct a Scaler instance

    scalingType

    the family of functions containing the scaler

    scalingArgs

    the args uniquely defining a function in the scaling family

  128. final class ScalerTransformer[I <: Real, O <: Real] extends UnaryTransformer[I, O]

    Permalink

    Scaling transformer that applies a scaling function to a numerical feature

    Scaling transformer that applies a scaling function to a numerical feature

    I

    input feature type

    O

    output feature type

  129. trait ScalingArgs extends JsonLike

    Permalink

    A trait to be extended by a case class containing the args needed to define a family of scaling & descaling functions

  130. sealed trait ScalingType extends EnumEntry with Serializable

    Permalink
  131. class SetNGramSimilarity extends NGramSimilarity[MultiPickList]

    Permalink

    Compute char ngram distance for MultiPickList features.

  132. case class SmartTextFeatureInfo(key: String, vectorizationMethod: TextVectorizationMethod, topValues: Array[String]) extends JsonLike with Product with Serializable

    Permalink

    Info about each feature within a text map

    Info about each feature within a text map

    key

    name of a feature

    vectorizationMethod

    method to use for text vectorization (either pivot, hashing, or ignoring)

    topValues

    most common values of a feature (only for categoricals)

  133. class SmartTextMapVectorizer[T <: OPMap[String]] extends SequenceEstimator[T, OPVector] with PivotParams with CleanTextFun with SaveOthersParams with TrackNullsParam with MinSupportParam with TextTokenizerParams with TrackTextLenParam with HashingVectorizerParams with MapHashingFun with OneHotFun with MapStringPivotHelper with MapVectorizerFuns[String, OPMap[String]] with MaxCardinalityParams with MinLengthStdDevParams with NameDetectFun[Text]

    Permalink

    Convert a sequence of text map features into a vector by detecting categoricals that are disguised as text.

    Convert a sequence of text map features into a vector by detecting categoricals that are disguised as text. A categorical will be represented as a vector consisting of occurrences of top K most common values of that feature plus occurrences of non top k values and a null indicator (if enabled). Non-categoricals will be converted into a vector using the hashing trick. In addition, a null indicator is created for each non-categorical (if enabled).

    Detection and removal of names in the input columns can be enabled with the sensitiveFeatureMode param.

  134. final class SmartTextMapVectorizerModel[T <: OPMap[String]] extends SequenceModel[T, OPVector] with TextTokenizerParams with TrackTextLenParam with MapHashingFun with TextMapPivotVectorizerModelFun[OPMap[String]]

    Permalink
  135. case class SmartTextMapVectorizerModelArgs(allFeatureInfo: Seq[Seq[SmartTextFeatureInfo]], shouldCleanKeys: Boolean, shouldCleanValues: Boolean, shouldTrackNulls: Boolean, hashingParams: HashingFunctionParams) extends JsonLike with Product with Serializable

    Permalink

    Arguments for SmartTextMapVectorizerModel

    allFeatureInfo

    info about each feature with each text map

    shouldCleanKeys

    should clean feature keys

    shouldCleanValues

    should clean feature values

    shouldTrackNulls

    should track nulls

    hashingParams

    hashing function params

  136. class SmartTextVectorizer[T <: Text] extends SequenceEstimator[T, OPVector] with PivotParams with CleanTextFun with SaveOthersParams with TrackNullsParam with MinSupportParam with TextTokenizerParams with TrackTextLenParam with HashingVectorizerParams with HashingFun with OneHotFun with MaxCardinalityParams with MinLengthStdDevParams with NameDetectFun[T]

    Permalink

    Convert a sequence of text features into a vector by detecting categoricals that are disguised as text.

    Convert a sequence of text features into a vector by detecting categoricals that are disguised as text. A categorical will be represented as a vector consisting of occurrences of top K most common values of that feature plus occurrences of non top k values and a null indicator (if enabled). Non-categoricals will be converted into a vector using the hashing trick. In addition, a null indicator is created for each non-categorical (if enabled).

    Detection and removal of names in the input columns can be enabled with the sensitiveFeatureMode param.

  137. final class SmartTextVectorizerModel[T <: Text] extends SequenceModel[T, OPVector] with TextTokenizerParams with TrackTextLenParam with HashingFun with OneHotModelFun[Text]

    Permalink
  138. case class SmartTextVectorizerModelArgs(vectorizationMethods: Array[TextVectorizationMethod], topValues: Array[Seq[String]], shouldCleanText: Boolean, shouldTrackNulls: Boolean, hashingParams: HashingFunctionParams) extends JsonLike with Product with Serializable

    Permalink

    Arguments for SmartTextVectorizerModel

    Arguments for SmartTextVectorizerModel

    vectorizationMethods

    method to use for text vectorization (either pivot, hashing, or ignoring)

    topValues

    top values to each feature

    shouldCleanText

    should clean text value

    shouldTrackNulls

    should track nulls

    hashingParams

    hashing function params

  139. class SqrtTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

    Permalink

    Square root transformer

    Square root transformer

    I

    input feature type

  140. sealed trait StringIndexerHandleInvalid extends EnumEntry with Serializable

    Permalink
  141. class SubstringTransformer[I1 <: Text, I2 <: Text] extends BinaryTransformer[I1, I2, Binary] with TextMatchingParams

    Permalink

    Checks if the first input is a substring of the second input

    Checks if the first input is a substring of the second input

    I1

    first input feature type

    I2

    second input feature type

  142. class SubtractTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

    Permalink

    Minus function truth table (Real as example):

    Minus function truth table (Real as example):

    Real.empty - Real.empty = Real.empty Real.empty - Real(x) = Real(-x) Real(x) - Real.empty = Real(x) Real(x) - Real(y) = Real(x - y)

  143. class TextLenTransformer[T <: TextList] extends SequenceTransformer[T, OPVector] with VectorizerDefaults with TextTokenizerParams with TextParams

    Permalink

    Sequence transformer for generating a sequence of text lengths from a sequence of TextList values (eg.

    Sequence transformer for generating a sequence of text lengths from a sequence of TextList values (eg. tokenized raw text)

  144. sealed trait TextLengthType extends EnumEntry with Serializable

    Permalink

    Method for computing text lengths

  145. class TextListNullTransformer[T <: TextList] extends SequenceTransformer[T, OPVector] with VectorizerDefaults

    Permalink

    Creates null indicator columns for a sequence of input TextList features, originally for use as a separate stage in null tracking for hashed text features (easier to do outside the hashing vectorizer since we can make a null indicator column for each input feature without having to add lots of complex logic in the hashing vectorizer to deal with metadata for shared vs.

    Creates null indicator columns for a sequence of input TextList features, originally for use as a separate stage in null tracking for hashed text features (easier to do outside the hashing vectorizer since we can make a null indicator column for each input feature without having to add lots of complex logic in the hashing vectorizer to deal with metadata for shared vs. separate hash spaces.

  146. class TextMapHashingVectorizer[T <: OPMap[String]] extends OPMapVectorizer[String, T] with TextParams

    Permalink
  147. final class TextMapHashingVectorizerModel[T <: OPMap[String]] extends OPMapVectorizerModel[String, T] with TextTokenizerParams with HashingFun

    Permalink
  148. class TextMapLenEstimator[T <: OPMap[String]] extends SequenceEstimator[T, OPVector] with MapVectorizerFuns[String, T]

    Permalink

    Estimator for computing text lengths on fields stored in text maps.

    Estimator for computing text lengths on fields stored in text maps. Note that because there are no maps from String to TextList, we need to do the tokenization here (unlike the TextLenTransformer).

  149. final class TextMapLenModel[T <: OPMap[String]] extends SequenceModel[T, OPVector] with CleanTextMapFun with TextTokenizerParams

    Permalink
  150. class TextMapNullEstimator[T <: OPMap[String]] extends SequenceEstimator[T, OPVector] with MapVectorizerFuns[String, T]

    Permalink

    Creates null indicator columns for a sequence of input TextMap features, originally for use as a separate stage in null tracking for hashed text features (easier to do outside the hashing vectorizer since we can make a null indicator column for each input feature without having to add lots of complex logic in the hashing vectorizer to deal with metadata for shared vs.

    Creates null indicator columns for a sequence of input TextMap features, originally for use as a separate stage in null tracking for hashed text features (easier to do outside the hashing vectorizer since we can make a null indicator column for each input feature without having to add lots of complex logic in the hashing vectorizer to deal with metadata for shared vs. separate hash spaces.

  151. final class TextMapNullModel[T <: OPMap[String]] extends SequenceModel[T, OPVector] with CleanTextMapFun with TextTokenizerParams

    Permalink
  152. class TextMapPivotVectorizer[T <: OPMap[String]] extends SequenceEstimator[T, OPVector] with PivotParams with MapPivotParams with TextParams with MapStringPivotHelper with CleanTextMapFun with MinSupportParam with TrackNullsParam with MaxPctCardinalityParams with MaxPctCardinalityFun

    Permalink

    Converts a sequence of KeyString features into a vector keeping the top K most common occurrences of each key in the maps for that feature (ie the final vector has length k * number of keys * number of features).

    Converts a sequence of KeyString features into a vector keeping the top K most common occurrences of each key in the maps for that feature (ie the final vector has length k * number of keys * number of features). Each key found will also generate an other column which will capture values that do not make the cut or where not seen in training. Note that any keys not seen in training will be ignored.

  153. final class TextMapPivotVectorizerModel[T <: OPMap[String]] extends SequenceModel[T, OPVector] with TextMapPivotVectorizerModelFun[T]

    Permalink
  154. trait TextMatchingParams extends Params

    Permalink
  155. class TextNGramSimilarity[T <: Text] extends NGramSimilarity[T]

    Permalink

    Compute char ngram distance for Text features.

  156. trait TextParams extends Params

    Permalink
  157. class TextTokenizer[T <: Text] extends UnaryTransformer[T, TextList] with TextTokenizerParams

    Permalink

    Transformer that takes anything of type Text or lower and returns a TextList of tokens extracted from that text

    Transformer that takes anything of type Text or lower and returns a TextList of tokens extracted from that text

    Annotations
    @ReaderWriter()
  158. trait TextTokenizerParams extends LanguageDetectionParams with TextMatchingParams

    Permalink
  159. class TextTokenizerReaderWriter[T <: Text] extends OpPipelineStageReaderWriter[TextTokenizer[T]]

    Permalink

    Special reader/writer class for TextTokenizer stage

  160. sealed trait TextVectorizationMethod extends EnumEntry with Serializable

    Permalink

    Methods of vectorizing text (eg.

    Methods of vectorizing text (eg. to be chosen by statistics computed in SmartTextVectorizer)

  161. sealed abstract class TimePeriod extends EnumEntry with Serializable

    Permalink
  162. class TimePeriodListTransformer[I <: DateList] extends UnaryTransformer[I, OPVector]

    Permalink

    TimePeriodMapTransformer extracts one of a set of time periods from a date/datetime list

    TimePeriodMapTransformer extracts one of a set of time periods from a date/datetime list

    I

    input feature type

  163. class TimePeriodMapTransformer[I <: DateMap] extends UnaryTransformer[I, IntegralMap]

    Permalink

    TimePeriodMapTransformer extracts one of a set of time periods from a date/datetime map

    TimePeriodMapTransformer extracts one of a set of time periods from a date/datetime map

    I

    input feature type

  164. class TimePeriodTransformer[I <: Date] extends UnaryTransformer[I, Integral]

    Permalink

    TimePeriodTransformer extracts one of a set of time periods from a date/datetime

    TimePeriodTransformer extracts one of a set of time periods from a date/datetime

    I

    input feature type

  165. case class TimePeriodVal(value: Int, min: Int, max: Int) extends Product with Serializable

    Permalink
  166. class ToOccurTransformer[I <: FeatureType] extends UnaryTransformer[I, RealNN]

    Permalink

    Transformer that converts input feature of type I into doolean feature using a user specified function that maps object type I to a Boolean

    Transformer that converts input feature of type I into doolean feature using a user specified function that maps object type I to a Boolean

    I

    Object type to be mapped to a double (doolean).

  167. class TopNLabelJoiner extends MultiLabelJoiner

    Permalink

    Joins probability score with label from string indexer stage and Sorts by highest score and returns up topN.

    Joins probability score with label from string indexer stage and Sorts by highest score and returns up topN. and Filters out the class - UnseenLabel

  168. class TopNLabelProbMap extends UnaryTransformer[RealMap, RealMap]

    Permalink

    Sorts the label probability map and returns the topN.

  169. trait TrackInvalidParam extends Params

    Permalink

    Param that decides whether or not the values that are considered invalid are tracked

  170. trait TrackNullsParam extends Params

    Permalink

    Param that decides whether or not the values that were missing are tracked

  171. trait TrackTextLenParam extends Params

    Permalink

    Param that decides whether or not lengths of text are tracked during vectorization

  172. class UrlMapToPickListMapTransformer extends UnaryTransformer[URLMap, PickListMap]

    Permalink
  173. class ValidEmailTransformer extends UnaryTransformer[Email, Binary]

    Permalink

    Checks if an email is valid

  174. trait VectorizerDefaults extends OpPipelineStageBase

    Permalink
  175. class VectorsCombiner extends SequenceEstimator[OPVector, OPVector]

    Permalink

    Takes in a sequence of vectors and combines them into a single vector

  176. final class VectorsCombinerModel extends SequenceModel[OPVector, OPVector]

    Permalink

Value Members

  1. object CombinationStrategy extends Enum[CombinationStrategy] with Serializable

    Permalink
  2. object DateListPivot extends Enum[DateListPivot] with Serializable

    Permalink

    Enumeration object that contains the option to pivot the DateList feature

    Enumeration object that contains the option to pivot the DateList feature

    1) SinceFirst - replace the feature by the number of days between the first event and reference date

    2) SinceLast - replace the feature by the number of days between the last event and reference date

    3) ModeDay - replace the feature by a pivot that indicates the mode of the day of the week Example : If the mode is Monday then it will return (1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)

    4) ModeMonth - replace the feature by a pivot that indicates the mode of the month

    5) ModeHour - replace the feature by a pivot that indicates the mode of the hour of the day.

  3. object DecisionTreeNumericBucketizer extends Product with Serializable

    Permalink
  4. object EmailToPickListMapTransformer extends Serializable

    Permalink
  5. object GenderDetectStrategy extends Enum[GenderDetectStrategy] with Product with Serializable

    Permalink
  6. object HashAlgorithm extends Enum[HashAlgorithm] with Serializable

    Permalink
  7. object HashSpaceStrategy extends Enum[HashSpaceStrategy] with Serializable

    Permalink
  8. object HumanNameDetectorMetadata extends Product with Serializable

    Permalink
  9. object Inclusion extends Enum[Inclusion] with Serializable

    Permalink
  10. object IndexToStringHandleInvalid extends Enum[IndexToStringHandleInvalid] with Serializable

    Permalink
  11. object LangDetector extends Serializable

    Permalink
  12. object NGramSimilarity extends Serializable

    Permalink
  13. object NameEntityRecognizer extends Serializable

    Permalink
  14. object NumericBucketizer extends Serializable

    Permalink
  15. object OpIndexToStringNoFilter extends Serializable

    Permalink
  16. object OpOneHotVectorizer extends Serializable

    Permalink
  17. object OpStringIndexerNoFilter extends Serializable

    Permalink
  18. object PercentileCalibrator extends Product with Serializable

    Permalink
  19. object PhoneNumberParser extends Product with Serializable

    Permalink
  20. object Scaler extends Serializable

    Permalink

    Scaler instance factory

  21. object ScalerMetadata extends Serializable

    Permalink
  22. object ScalingType extends Enum[ScalingType] with Serializable

    Permalink
  23. object SmartTextVectorizer extends Serializable

    Permalink
  24. object StringIndexerHandleInvalid extends Enum[StringIndexerHandleInvalid] with Serializable

    Permalink
  25. object TextLengthType extends Enum[TextLengthType] with Serializable

    Permalink
  26. object TextMapHashingVectorizerNames

    Permalink
  27. object TextTokenizer extends Serializable

    Permalink
  28. object TextVectorizationMethod extends Enum[TextVectorizationMethod] with Serializable

    Permalink
  29. object TikaHelper

    Permalink

    Tika helper

  30. object TimePeriod extends Enum[TimePeriod] with Serializable

    Permalink
  31. object ToOccurTransformer extends Serializable

    Permalink
  32. object TopNLabelJoiner extends Serializable

    Permalink
  33. object VectorizerUtils extends Product with Serializable

    Permalink

Ungrouped