feature

Type Members

class AbsoluteValueTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

Absolute value transformer

Absolute value transformer

I

input feature type
class AddTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

Plus function truth table (Real as example):

Plus function truth table (Real as example):

Real.empty + Real.empty = Real.empty Real.empty + Real(x) = Real(x) Real(x) + Real.empty = Real(x) Real(x) + Real(y) = Real(x + y)
class AliasTransformer[I <: FeatureType] extends UnaryTransformer[I, I]

No-op (identity) alias feature transformer allowing renaming features without applying a transformation on values.

No-op (identity) alias feature transformer allowing renaming features without applying a transformation on values.

I

feature type
class BinaryMapVectorizer[T <: OPMap[Boolean]] extends OPMapVectorizer[Boolean, T]

Class for vectorizing BinaryMap features.

Class for vectorizing BinaryMap features. Fills missing keys with args.defaultValue, which does not depend on the key, so getFillByKey returns an empty sequence.

T

input feature type to vectorize into an OPVector
final class BinaryMapVectorizerModel[T <: OPMap[Boolean]] extends OPMapVectorizerModel[Boolean, T]
class BinaryVectorizer extends SequenceTransformer[Binary, OPVector] with VectorizerDefaults with TrackNullsParam

Vectorizes Binary inputs where each input is transformed into 2 vector elements where the first element is [1 -> true] or [0 -> false] and the second element is [1 -> filled value] or [0 -> original value].

Vectorizes Binary inputs where each input is transformed into 2 vector elements where the first element is [1 -> true] or [0 -> false] and the second element is [1 -> filled value] or [0 -> original value]. The vector representation for each input is concatenated into a final vector representation.

Example:

Data: Seq[(Binary, Binary)] = ((Some(false), None)) => f1, f2 new BinaryVectorizer().setInput(f1, f2).setFillValue(10)

will produce Array(0.0, 0.0, 10.0, 1.0)
class CeilTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Integral]

Ceil transformer

Ceil transformer

I

input feature type
trait CleanTextFun extends AnyRef
trait CleanTextMapFun extends CleanTextFun
sealed trait CombinationStrategy extends EnumEntry with Serializable

Model Combination Strategies
sealed trait DateListPivot extends EnumEntry with Serializable
class DateListVectorizer[T <: OPList[Long]] extends SequenceTransformer[T, OPVector] with VectorizerDefaults with TrackNullsParam

Converts a sequence of DateLists features into a vector feature.

Converts a sequence of DateLists features into a vector feature. Can choose how to pivot the features
class DateMapToUnitCircleVectorizer[T <: DateMap] extends SequenceEstimator[T, OPVector] with DateToUnitCircleParams with MapVectorizerFuns[Long, T]

Following: http://webspace.ship.edu/pgmarr/geo441/lectures/lec%2016%20-%20directional%20statistics.pdf Transforms a Date or DateTime field into a cartesian coordinate representation of an extracted time period on the unit circle

Following: http://webspace.ship.edu/pgmarr/geo441/lectures/lec%2016%20-%20directional%20statistics.pdf Transforms a Date or DateTime field into a cartesian coordinate representation of an extracted time period on the unit circle

parameter timePeriod The time period to extract from the timestamp enum from: DayOfMonth, DayOfWeek, DayOfYear, HourOfDay, MonthOfYear, WeekOfMonth, WeekOfYear

We extract the timePeriod from the timestamp and map this onto the unit circle containing the number of time periods equally spaced. For example, when timePeriod = HourOfDay, the timestamp 01/01/2018 6:37 maps to the point on the circle with angle radians = 2*math.Pi*6/24 We return the cartesian coordinates of this point: (math.cos(radians), math.sin(radians))

The first time period always has angle 0.

Note: We use the ISO week date format https://en.wikipedia.org/wiki/ISO_week_date#First_week Monday is the first day of the week & the first week of the year is the week wit the first Monday after Jan 1.
final class DateMapToUnitCircleVectorizerModel[T <: DateMap] extends SequenceModel[T, OPVector] with CleanTextMapFun

Model for DateMapToUnitCircleVectorizer

Model for DateMapToUnitCircleVectorizer

T

DateMap type
class DateMapVectorizer[T <: OPMap[Long]] extends OPMapVectorizer[Long, T]

Class for vectorizing DateMap features.

Class for vectorizing DateMap features. Fills missing keys with args.defaultValue, which does not depend on the key, so getFillByKey returns an empty sequence.

T

input feature type to vectorize into an OPVector
final class DateMapVectorizerModel[T <: OPMap[Long]] extends OPMapVectorizerModel[Long, T]
trait DateToUnitCircleParams extends Params
class DateToUnitCircleTransformer[T <: Date] extends SequenceTransformer[T, OPVector] with DateToUnitCircleParams

Following: http://webspace.ship.edu/pgmarr/geo441/lectures/lec%2016%20-%20directional%20statistics.pdf Transforms a Date or DateTime field into a cartesian coordinate representation of an extracted time period on the unit circle

Following: http://webspace.ship.edu/pgmarr/geo441/lectures/lec%2016%20-%20directional%20statistics.pdf Transforms a Date or DateTime field into a cartesian coordinate representation of an extracted time period on the unit circle

parameter timePeriod The time period to extract from the timestamp enum from: DayOfMonth, DayOfWeek, DayOfYear, HourOfDay, MonthOfYear, WeekOfMonth, WeekOfYear

We extract the timePeriod from the timestamp and map this onto the unit circle containing the number of time periods equally spaced. For example, when timePeriod = HourOfDay, the timestamp 01/01/2018 6:37 maps to the point on the circle with angle radians = 2*math.Pi*6/24 We return the cartesian coordinates of this point: (math.cos(radians), math.sin(radians))

The first time period always has angle 0.

Note: We use the ISO week date format https://en.wikipedia.org/wiki/ISO_week_date#First_week Monday is the first day of the week & the first week of the year is the week wit the first Monday after Jan 1.
class DecisionTreeNumericBucketizer[N, I2 <: OPNumeric[N]] extends BinaryEstimator[RealNN, I2, OPVector] with DecisionTreeNumericBucketizerParams with VectorizerDefaults with TrackInvalidParam with TrackNullsParam with NumericBucketizerMetadata with AllowLabelAsInput[OPVector]

Smart bucketizer for numeric values based on a Decision Tree classifier.

Smart bucketizer for numeric values based on a Decision Tree classifier.

N

numeric feature type value

I2

numeric feature type
final class DecisionTreeNumericBucketizerModel[I2 <: OPNumeric[_]] extends BinaryModel[RealNN, I2, OPVector] with AllowLabelAsInput[OPVector]
trait DecisionTreeNumericBucketizerParams extends AnyRef
class DecisionTreeNumericMapBucketizer[N, I2 <: OPMap[N]] extends BinaryEstimator[RealNN, I2, OPVector] with DecisionTreeNumericBucketizerParams with VectorizerDefaults with TrackInvalidParam with TrackNullsParam with NumericBucketizerMetadata with MapPivotParams with CleanTextMapFun with AllowLabelAsInput[OPVector]

Smart bucketizer for numeric map values based on a Decision Tree classifier.

Smart bucketizer for numeric map values based on a Decision Tree classifier.

N

numeric feature type value

I2

numeric map feature type
final class DecisionTreeNumericMapBucketizerModel[I2 <: OPMap[_]] extends BinaryModel[RealNN, I2, OPVector] with CleanTextMapFun with AllowLabelAsInput[OPVector]
final class DescalerTransformer[I1 <: Real, I2 <: Real, O <: Real] extends BinaryTransformer[I1, I2, O]

A transformer that takes as inputs a feature to descale and (potentially different) scaled feature which contains the metadata for reconstructing the inverse scaling function.

A transformer that takes as inputs a feature to descale and (potentially different) scaled feature which contains the metadata for reconstructing the inverse scaling function. Transforms the 2nd input feature by applying the inverse of the scaling function found in the metadata - 1st input feature the feature to descale - 2nd input feature the scaled feature containing metadata for constructing the scaling used to make this column

I1

feature type for first input

I2

feature type for the second input

O

output feature type
class DivideTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

Divide function truth table (Real as example):

Divide function truth table (Real as example):

Real.empty / Real.empty = Real.empty Real.empty / Real(x) = Real.empty Real(x) / Real.empty = Real.empty Real(x) / Real(y) = Real(x * y) filter ("is not NaN or Infinity")
class DropIndicesByTransformer extends UnaryTransformer[OPVector, OPVector]

Allows columns to be dropped from a feature vector based on properties of the metadata about what is contained in each column (will work only on vectors) created with OpVectorMetadata
class EmailToPickListMapTransformer extends OPMapTransformer[Email, PickList, EmailMap, PickListMap]
case class EmptyScalerArgs() extends ScalingArgs with Product with Serializable

Case class for Scaling families that take no parameters
class ExistsTransformer[I <: FeatureType] extends UnaryTransformer[I, Binary]
class ExpTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

Exp transformer: returns Euler's number e raised to the power of feature value

Exp transformer: returns Euler's number e raised to the power of feature value

I

input feature type
class FillMissingWithMean[N, I <: OPNumeric[N]] extends UnaryEstimator[I, RealNN]

Fill missing values with mean for any numeric feature
final class FillMissingWithMeanModel[I <: OPNumeric[_]] extends UnaryModel[I, RealNN]
class FilterMap[I <: OPMap[_]] extends UnaryTransformer[I, I] with MapPivotParams with TextParams with CleanTextMapFun

Filters maps by keys provided in a allowlist or blocklist

Filters maps by keys provided in a allowlist or blocklist

I

input feature type
class FilterTransformer[I <: FeatureType] extends UnaryTransformer[I, I]
class FloorTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Integral]

Floor transformer

Floor transformer

I

input feature type
sealed class GenderDetectStrategy extends EnumEntry

Defines the different kinds of gender detection strategies that are possible

Defines the different kinds of gender detection strategies that are possible

We need to overwrite toString in order to provide serialization during the Spark map and reduce steps and then the fromString function provides deserialization back to the GenderDetectStrategy class for the companion transformer
class GeolocationMapVectorizer extends SequenceEstimator[GeolocationMap, OPVector] with MapVectorizerFuns[Seq[Double], GeolocationMap] with TrackNullsParam
final class GeolocationMapVectorizerModel extends SequenceModel[GeolocationMap, OPVector] with CleanTextMapFun
class GeolocationVectorizer extends SequenceEstimator[Geolocation, OPVector] with VectorizerDefaults with TrackNullsParam with GeolocationFunctions

Converts a sequence of Geolocation features into a vector feature.

Converts a sequence of Geolocation features into a vector feature. Can choose to fill null values with the mean or a constant
final class GeolocationVectorizerModel extends SequenceModel[Geolocation, OPVector] with VectorizerDefaults
sealed trait HashAlgorithm extends EnumEntry with Serializable

Hashing Algorithms
sealed trait HashSpaceStrategy extends EnumEntry with Serializable

Hash space strategy
case class HashingFunctionParams(hashWithIndex: Boolean, prependFeatureName: Boolean, numFeatures: Int, numInputs: Int, maxNumOfFeatures: Int, binaryFreq: Boolean, hashAlgorithm: HashAlgorithm, hashSpaceStrategy: HashSpaceStrategy) extends Product with Serializable

Hashing Parameters

Hashing Parameters

hashWithIndex

if true, include indices when hashing a feature that has them (OPLists or OPVectors)

prependFeatureName

if true, prepends a input feature name to each token of that feature

numFeatures

number of features (hashes) to generate

numInputs

number of inputs

maxNumOfFeatures

max number of features (hashes)

binaryFreq

if true, term frequency vector will be binary such that non-zero term counts will be set to 1.0

hashAlgorithm

hash algorithm to use

hashSpaceStrategy

strategy to determine whether to use shared hash space for all included features
class HumanNameDetector[T <: Text] extends UnaryEstimator[T, NameStats] with NameDetectFun[T]

Unary estimator for identifying whether a single Text column is a name or not.

Unary estimator for identifying whether a single Text column is a name or not. If the column does appear to be a name, a custom map will be returned that contains the guessed gender for each entry (gender detection only supported for English at the moment). If the column does not appear to be a name, then the output will be an empty map.

T

the FeatureType (subtype of Text) to operate over
case class HumanNameDetectorMetadata(treatAsName: Boolean, predictedNameProb: Double, genderResultsByStrategy: Map[String, GenderStats]) extends MetadataLike with Product with Serializable
class HumanNameDetectorModel[T <: Text] extends UnaryModel[T, NameStats] with NameDetectFun[T]
sealed abstract class Inclusion extends EnumEntry with Serializable
sealed trait IndexToStringHandleInvalid extends EnumEntry with Serializable
class IntegralMapVectorizer[T <: OPMap[Long]] extends OPMapVectorizer[Long, T]

Class for vectorizing IntegralMap features.

Class for vectorizing IntegralMap features. Fills missing keys with the mode for that key.

T

input feature type to vectorize into an OPVector
final class IntegralMapVectorizerModel[T <: OPMap[Long]] extends OPMapVectorizerModel[Long, T]
class IntegralVectorizer[T <: Integral] extends SequenceEstimator[T, OPVector] with VectorizerDefaults with TrackNullsParam

Converts a sequence of Integral features into a vector feature.

Converts a sequence of Integral features into a vector feature. Can choose to fill null values with the mean or a constant
final class IntegralVectorizerModel[T <: Integral] extends SequenceModel[T, OPVector] with VectorizerDefaults
class IsValidPhoneDefaultCountry extends UnaryTransformer[Phone, Binary] with PhoneParams

Transformer to determine if a phone numbers is valid when no country code is available.

Transformer to determine if a phone numbers is valid when no country code is available. The default locale will be used for validation. All phone numbers with less than 2 characters will be categorized as invalid All phone numbers that starts with "+" will be evaluated with international formatting

Returns binary feature true if phone is valid false if invalid and none if phone number is none
class IsValidPhoneMapDefaultCountry extends UnaryTransformer[PhoneMap, BinaryMap] with PhoneParams

Transformer to determine if a map of phone numbers is valid when no country code is available.

Transformer to determine if a map of phone numbers is valid when no country code is available. The default locale will be used for validation. All phone numbers with less than 2 characters will be categorized as invalid All phone numbers that starts with "+" will be evaluated with international formatting

Returns binary map feature true if phone is valid false if invalid and none if phone number is none
class IsValidPhoneNumber extends BinaryTransformer[Phone, Text, Binary] with PhoneCountryParams

Determine whether a phone number is valid given the country's regional code.

Determine whether a phone number is valid given the country's regional code. By default the regional code will be checked against those provided in Google's PhoneNumber library. If the input regional code is not found, the default locale will be used for validation.

If the User provided a Country name to code mapping, the phone number can only be validated against the input mapping. This transformer will first match on regional code, failing that, it will select the country with the closest Q-Distance.

All phone numbers with less than 2 characters will be categorized as invalid

All phone numbers that starts with "+" will be evaluated with international formatting

Returns binary feature true if phone is valid false if invalid and none if phone number is none
class JaccardSimilarity extends BinaryTransformer[MultiPickList, MultiPickList, RealNN]

Calculates the Jaccard Similarity between two sets.

Calculates the Jaccard Similarity between two sets. If both inputs are empty, Jaccard Similarity is defined as 1.0
class LangDetector[T <: Text] extends UnaryTransformer[T, RealMap]

Transformer that detects the language of the text
trait LanguageDetectionParams extends Params
case class LinearScaler(args: LinearScalerArgs) extends Scaler with Product with Serializable

A case class representing a linear scaling function

A case class representing a linear scaling function

args

case class containing the slope and intercept of the scaling function
case class LinearScalerArgs(slope: Double, intercept: Double) extends ScalingArgs with Product with Serializable

Parameters need to uniquely define a linear scaling function

Parameters need to uniquely define a linear scaling function

slope

the slope of the linear scaler

intercept

the x axis intercept of the linear scaler
case class LogScaler() extends Scaler with Product with Serializable

A case class representing a logarithmic scaling function
class LogTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

Log base N transformer

Log base N transformer

I

input feature type
trait MapPivotParams extends Params
trait MapStringPivotHelper extends SaveOthersParams
trait MapVectorizerFuns[A, T <: OPMap[A]] extends VectorizerDefaults with MapPivotParams with CleanTextMapFun
trait MaxCardinalityParams extends Params
trait MaxPctCardinalityParams extends Params
class MimeTypeDetector extends UnaryTransformer[Base64, Text] with MimeTypeDetectorParams

Detects MIME type for Base64 encoded binary data.
class MimeTypeMapDetector extends UnaryTransformer[Base64Map, PickListMap] with MimeTypeDetectorParams

Detects MIME type for Base64Map encoded binary data.
trait MinLengthStdDevParams extends Params
trait MinSupportParam extends Params
class MultiLabelJoiner extends BinaryTransformer[RealNN, OPVector, RealMap]

Joins probability score with label from string indexer stage

Joins probability score with label from string indexer stage

returns

Map(label -> probability)
class MultiPickListMapVectorizer[T <: OPMap[Set[String]]] extends SequenceEstimator[T, OPVector] with PivotParams with MapPivotParams with TextParams with MapStringPivotHelper with CleanTextMapFun with MinSupportParam with TrackNullsParam with MaxPctCardinalityParams with MaxPctCardinalityFun

Converts a sequence of KeyMultiPickList features into a vector keeping the top K most common occurrences of each key in the maps for that feature (ie the final vector has length k * number of keys * number of features).

Converts a sequence of KeyMultiPickList features into a vector keeping the top K most common occurrences of each key in the maps for that feature (ie the final vector has length k * number of keys * number of features). Each key found will also generate an other column which will capture values that do not make the cut or where not seen in training. Note that any keys not seen in training will be ignored.
final class MultiPickListMapVectorizerModel[T <: OPMap[Set[String]]] extends SequenceModel[T, OPVector] with CleanTextMapFun
class MultiplyTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

Multiply function truth table (Real as example):

Multiply function truth table (Real as example):

Real.empty * Real.empty = Real.empty Real.empty * Real(x) = Real.empty Real(x) * Real.empty = Real.empty Real(x) * Real(y) = Real(x * y) filter ("is not NaN or Infinity")
class NameEntityRecognizer[T <: Text] extends UnaryTransformer[T, MultiPickListMap] with LanguageDetectionParams

Name Entity NameEntityType text recognizer.

Name Entity NameEntityType text recognizer.

Note: when providing your own the analyzer/splitter/tagger make sure they can work together, for instance OpenNLP models require their own analyzers to be provided when tokenizing. The returned feature type is a MultiPickListMap which contains sets of entities for all the tokens

T

text feature type
class NumericBucketizer[I1 <: OPNumeric[_]] extends UnaryTransformer[I1, OPVector] with VectorizerDefaults with NumericBucketizerParams with NumericBucketizerMetadata

Numeric Bucketizer

Numeric Bucketizer

I1

numeric feature type
trait NumericBucketizerParams extends TrackInvalidParam with TrackNullsParam
sealed trait NumericMapDefaultParam extends Params
class OPCollectionHashingVectorizer[T <: OPCollection] extends SequenceTransformer[T, OPVector] with VectorizerDefaults with PivotParams with CleanTextFun with HashingFun with HashingVectorizerParams

Generic hashing vectorizer to convert features of type OPCollection into Vectors

Generic hashing vectorizer to convert features of type OPCollection into Vectors

In more details: It tries to hash entries in the collection using the specified hashing algorithm to build a single vector. If the desired number of features (= hash space size) for all features combined is larger than Integer.Max (the maximal index for a vector), then all the features use the same hash space. There are also options for the user to hash indices with collections where that makes sense (OPLists and OPVectors), and to force a shared hash space, even if the number of feature is not high enough to require it.
sealed abstract class OPCollectionTransformer[I <: FeatureType, O <: FeatureType, ICol <: OPCollection, OCol <: OPCollection] extends UnaryTransformer[ICol, OCol]

Abstract base class for a set of transformer wrappers that allow unary transformers between non-collection types to be used on collection types.

Abstract base class for a set of transformer wrappers that allow unary transformers between non-collection types to be used on collection types. For example, we can use a UnaryLambdaTransformer[Email, Integer] on a map's values, creating a UnaryLambdaTransformer[EmailMap, IntegralMap]. This base class will be inherited by concrete classes for OPMaps, OPList, and OPSets (in order to enforce not allowing these collection types to be transformed into each other, eg. no MultiPickList to RealMap transformations).

The OP type hierarchy does not allow direct type checking of such transformer wrappers (eg. Real#Value is Option[Double] and RealMap#Value is Map[String, Double], so there's no way to enforce that a RealMap can only hold what is contained in a Real) since the types themselves are not created with typetags for performance reasons. However, we can still enforce that operations like building a UnaryLambdaTransformer[RealMap, StringMap] from a UnaryLambdaTransformer[Real, Integer] is not possible by using the Spark types in validateTypes.

I

input feature type for supplied non-collection transformer

O

output feature type for supplied non-collection transformer

ICol

input feature type for desired collection transformer

OCol

output feature type for desired collection transformer
abstract class OPMapVectorizer[A, T <: OPMap[A]] extends SequenceEstimator[T, OPVector] with MapVectorizerFuns[Double, RealMap] with NumericMapDefaultParam with TrackNullsParam

Base class for vectorizing OPMap[A] features.

Base class for vectorizing OPMap[A] features. Individual vectorizers for different feature types need to implement the getFillByKey function (which calculates any fill values that differ by key - means, modes, etc.) and the makeModel function (which specifies which type of model will be returned).

A

value type for underlying map

T

input feature type to vectorize into an OPVector
sealed abstract class OPMapVectorizerModel[A, I <: OPMap[A]] extends SequenceModel[I, OPVector] with CleanTextMapFun
sealed case class OPMapVectorizerModelArgs(allKeys: Seq[Seq[String]], fillByKey: Seq[Map[String, Double]], shouldCleanKeys: Boolean, shouldCleanValues: Boolean, defaultValue: Double, trackNulls: Boolean = TransmogrifierDefaults.TrackNulls) extends Product with Serializable

OPMap vectorizer model arguments

OPMap vectorizer model arguments

allKeys

all keys per feature

fillByKey

fill values for features

shouldCleanKeys

should clean map keys

shouldCleanValues

should clean map values

defaultValue

default value to replace with

trackNulls

add column to track null values for each map key
class OpCountVectorizer extends OpEstimatorWrapper[TextList, OPVector, CountVectorizer, CountVectorizerModel]

Wrapper around spark ml CountVectorizer for use with OP pipelines
class OpHashingTF extends OpTransformerWrapper[TextList, OPVector, HashingTF]

Wrapper for org.apache.spark.ml.feature.HashingTF

Wrapper for org.apache.spark.ml.feature.HashingTF

Maps a sequence of terms to their term frequencies using the hashing trick. Currently we use Austin Appleby's MurmurHash 3 algorithm (MurmurHash3_x86_32) to calculate the hash code value for the term object. Since a simple modulo is used to transform the hash function to a column index, it is advisable to use a power of two as the numFeatures parameter; otherwise the features will not be mapped evenly to the columns.

See also

HashingTF for more info
class OpIndexToString extends OpTransformerWrapper[RealNN, Text, IndexToString]

Wrapper for org.apache.spark.ml.feature.IndexToString

Wrapper for org.apache.spark.ml.feature.IndexToString

NOTE THAT THIS CLASS EITHER FILTERS OUT OR THROWS AN ERROR IF PREVIOUSLY UNSEEN VALUES APPEAR

A transformer that maps a feature of indices back to a new feature of corresponding text values. The index-string mapping is either from the ML attributes of the input feature, or from user-supplied labels (which take precedence over ML attributes).

See also

OpStringIndexer for converting text into indices
class OpIndexToStringNoFilter extends UnaryTransformer[RealNN, Text] with SaveOthersParams

A transformer that maps a feature of indices back to a new feature of corresponding text values.

A transformer that maps a feature of indices back to a new feature of corresponding text values. The index-string mapping is either from the ML attributes of the input feature, or from user-supplied labels (which take precedence over ML attributes).

See also

OpStringIndexerNoFilter for converting text into indices
class OpLDA extends OpEstimatorWrapper[OPVector, OPVector, LDA, LDAModel]

Wrapper around spark ml LDA (Latent Dirichlet Allocation) for use with OP pipelines
class OpNGram extends OpTransformerWrapper[TextList, TextList, NGram]

Wrapper for org.apache.spark.ml.feature.NGram

Wrapper for org.apache.spark.ml.feature.NGram

A feature transformer that converts the input array of strings into an array of n-grams. Null values in the input array are ignored. It returns an array of n-grams where each n-gram is represented by a space-separated string of words.

When the input is empty, an empty array is returned. When the input array length is less than n (number of elements per n-gram), no n-grams are returned.

See also

NGram for more info
abstract class OpOneHotVectorizer[T <: FeatureType] extends SequenceEstimator[T, OPVector] with PivotParams with CleanTextFun with SaveOthersParams with TrackNullsParam with MinSupportParam with OneHotFun with MaxPctCardinalityParams

Converts a sequence of features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs).

Converts a sequence of features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs). Plus an additional column for "other" values - which will capture values that do not make the cut or values not seen in training, and an additional column for empty values unless null tracking is disabled.
abstract class OpOneHotVectorizerModel[T <: FeatureType] extends SequenceModel[T, OPVector] with CleanTextFun with OneHotModelFun[T]
class OpScalarStandardScaler extends UnaryEstimator[RealNN, RealNN]

Wraps Spark's native StandardScaler, which operates on vectors, to enable it to operate directly on scalars.
final class OpScalarStandardScalerModel extends UnaryModel[RealNN, RealNN]
class OpSetVectorizer[T <: OPSet[_]] extends OpOneHotVectorizer[T]

Converts a sequence of OpSet features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs).

Converts a sequence of OpSet features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs). Plus an additional column for "other" values - which will capture values that do not make the cut or values not seen in training, and an additional column for empty values unless null tracking is disabled.
final class OpSetVectorizerModel[T <: OPSet[_]] extends OpOneHotVectorizerModel[T]
class OpStopWordsRemover extends OpTransformerWrapper[TextList, TextList, StopWordsRemover]

Wrapper for org.apache.spark.ml.feature.StopWordsRemover

Wrapper for org.apache.spark.ml.feature.StopWordsRemover

A feature transformer that filters out stop words from input.

Note

null values from input array are preserved unless adding null to stopWords explicitly.

See also

StopWordsRemover for more info

Stop words (Wikipedia)
class OpStringIndexer[T <: Text] extends OpEstimatorWrapper[T, RealNN, StringIndexer, StringIndexerModel]

Wrapper for org.apache.spark.ml.feature.StringIndexer

Wrapper for org.apache.spark.ml.feature.StringIndexer

NOTE THAT THIS CLASS EITHER FILTERS OUT OR THROWS AN ERROR IF PREVIOUSLY UNSEEN VALUES APPEAR

A label indexer that maps a text column of labels to an ML feature of label indices. The indices are in [0, numLabels), ordered by label frequencies. So the most frequent label gets index 0.

See also

OpIndexToString for the inverse transformation
class OpStringIndexerNoFilter[I <: Text] extends UnaryEstimator[I, RealNN] with SaveOthersParams

A label indexer that maps a text column of labels to an ML feature of label indices.

A label indexer that maps a text column of labels to an ML feature of label indices. The indices are in [0, numLabels), ordered by label frequencies. So the most frequent label gets index 0.

See also

OpIndexToStringNoFilter for the inverse transformation
final class OpStringIndexerNoFilterModel[I <: Text] extends UnaryModel[I, RealNN]
class OpTextPivotVectorizer[T <: Text] extends OpOneHotVectorizer[T]

Converts a sequence of Text features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs).

Converts a sequence of Text features into a vector keeping the top K most common occurrences of each feature (ie the final vector has length K * number of inputs). Plus an additional column for "other" values - which will capture values that do not make the cut or values not seen in training, and an additional column for empty values unless null tracking is disabled.
final class OpTextPivotVectorizerModel[T <: Text] extends OpOneHotVectorizerModel[T]
class OpWord2Vec extends OpEstimatorWrapper[TextList, OPVector, Word2Vec, Word2VecModel]

Wrapper around spark ml word2vec for use with OP pipelines
class ParsePhoneDefaultCountry extends UnaryTransformer[Phone, Phone] with PhoneParams

Transformer to determine if a phone numbers is valid when no country code is available.

Transformer to determine if a phone numbers is valid when no country code is available. The default locale will be used for validation. All phone numbers with less than 2 characters will be categorized as invalid All phone numbers that starts with "+" will be evaluate with international formatting

Returns stripped number if number is valid. And None other wise.
class ParsePhoneNumber extends BinaryTransformer[Phone, Text, Phone] with PhoneCountryParams

Determine whether a phone number is valid given the country's regional code.

Determine whether a phone number is valid given the country's regional code. By default the regional code will be checked against those provided in Google's PhoneNumber library. If the input regional code is not found, the default locale will be used for validation.

If the User provided a Country name to code mapping, the phone number can only be validated against the input mapping. This transformer will first match on regional code, failing that, it will select the country with the closest Q-Distance.

All phone numbers with less than 2 characters will be categorized as invalid

All phone numbers that starts with "+" will be evaluated with international formatting

Returns stripped number if number is valid. And None other wise.
class PercentileCalibrator extends UnaryEstimator[RealNN, RealNN]

Wraps around org.apache.spark.ml.feature.QuantileDiscretizer
final class PercentileCalibratorModel extends UnaryModel[RealNN, RealNN]
trait PhoneCountryParams extends PhoneParams
trait PhoneParams extends Params
trait PivotParams extends TextParams
class PowerTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

Power transformer

Power transformer

I

input feature type
final class PredictionDescaler[I <: Real, O <: Real] extends BinaryTransformer[Prediction, I, O]

Applies to the input column the inverse of the scaling function defined in the Prediction feature metadata.

Applies to the input column the inverse of the scaling function defined in the Prediction feature metadata. - 1st input feature is the Prediction feature to descale - 2nd input feature is scaled Prediction feature containing the metadata for constructing the scaling used to make this column

I

input feature type

O

output feature type
class RealMapVectorizer[T <: OPMap[Double]] extends OPMapVectorizer[Double, T]

Class for vectorizing RealMap features.

Class for vectorizing RealMap features. Fills missing keys with the mean for that key.

T

input feature type to vectorize into an OPVector
final class RealMapVectorizerModel[T <: OPMap[Double]] extends OPMapVectorizerModel[Double, T]
class RealNNVectorizer extends SequenceTransformer[RealNN, OPVector] with VectorizerDefaults

Converts a sequence of real non nullable features into a vector feature
class RealVectorizer[T <: Real] extends SequenceEstimator[T, OPVector] with VectorizerDefaults with TrackNullsParam

Converts a sequence of Nullable Numeric features into a vector feature.

Converts a sequence of Nullable Numeric features into a vector feature. Can choose to fill null values with the mean or a constant
final class RealVectorizerModel[T <: Real] extends SequenceModel[T, OPVector] with VectorizerDefaults
class ReplaceTransformer[I <: FeatureType] extends UnaryTransformer[I, I]
class RoundDigitsTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

Round digits transformer

Round digits transformer

I

input feature type
class RoundTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Integral]

Round transformer

Round transformer

I

input feature type
trait SaveOthersParams extends Params
class ScalarAddTransformer[I <: OPNumeric[_], N] extends UnaryTransformer[I, Real]

Scalar addition transformer

Scalar addition transformer

I

input feature type

N

value type
class ScalarDivideTransformer[I <: OPNumeric[_], N] extends UnaryTransformer[I, Real]

Scalar divide transformer

Scalar divide transformer

I

input feature type

N

value type
class ScalarMultiplyTransformer[I <: OPNumeric[_], N] extends UnaryTransformer[I, Real]

Scalar multiply transformer

Scalar multiply transformer

I

input feature type

N

value type
class ScalarSubtractTransformer[I <: OPNumeric[_], N] extends UnaryTransformer[I, Real]

Scalar subtract transformer

Scalar subtract transformer

I

input feature type

N

value type
trait Scaler extends Serializable

A trait for defining a new family of scaling functions scalingType: a ScalingType Enum for the scaling name args: A case class containing the args needed to define scaling and inverse scaling functions scale: The scaling function descale: The inverse scaling function

A trait for defining a new family of scaling functions scalingType: a ScalingType Enum for the scaling name args: A case class containing the args needed to define scaling and inverse scaling functions scale: The scaling function descale: The inverse scaling function

To add a new family of scaling functions: Add an entry to the scalingType enum, define a Case class extending Scaler, and add a case statement to both the Scaler and ScalerMetaData case classes
case class ScalerMetadata(scalingType: ScalingType, scalingArgs: ScalingArgs) extends Product with Serializable

Metadata containing the info needed to reconstruct a Scaler instance

Metadata containing the info needed to reconstruct a Scaler instance

scalingType

the family of functions containing the scaler

scalingArgs

the args uniquely defining a function in the scaling family
final class ScalerTransformer[I <: Real, O <: Real] extends UnaryTransformer[I, O]

Scaling transformer that applies a scaling function to a numerical feature

Scaling transformer that applies a scaling function to a numerical feature

I

input feature type

O

output feature type
trait ScalingArgs extends JsonLike

A trait to be extended by a case class containing the args needed to define a family of scaling & descaling functions
sealed trait ScalingType extends EnumEntry with Serializable
class SetNGramSimilarity extends NGramSimilarity[MultiPickList]

Compute char ngram distance for MultiPickList features.
case class SmartTextFeatureInfo(key: String, vectorizationMethod: TextVectorizationMethod, topValues: Array[String]) extends JsonLike with Product with Serializable

Info about each feature within a text map

Info about each feature within a text map

key

name of a feature

vectorizationMethod

method to use for text vectorization (either pivot, hashing, or ignoring)

topValues

most common values of a feature (only for categoricals)
class SmartTextMapVectorizer[T <: OPMap[String]] extends SequenceEstimator[T, OPVector] with PivotParams with CleanTextFun with SaveOthersParams with TrackNullsParam with MinSupportParam with TextTokenizerParams with TrackTextLenParam with HashingVectorizerParams with MapHashingFun with OneHotFun with MapStringPivotHelper with MapVectorizerFuns[String, OPMap[String]] with MaxCardinalityParams with MinLengthStdDevParams with NameDetectFun[Text]

Convert a sequence of text map features into a vector by detecting categoricals that are disguised as text.

Convert a sequence of text map features into a vector by detecting categoricals that are disguised as text. A categorical will be represented as a vector consisting of occurrences of top K most common values of that feature plus occurrences of non top k values and a null indicator (if enabled). Non-categoricals will be converted into a vector using the hashing trick. In addition, a null indicator is created for each non-categorical (if enabled).

Detection and removal of names in the input columns can be enabled with the sensitiveFeatureMode param.
final class SmartTextMapVectorizerModel[T <: OPMap[String]] extends SequenceModel[T, OPVector] with TextTokenizerParams with TrackTextLenParam with MapHashingFun with TextMapPivotVectorizerModelFun[OPMap[String]]
case class SmartTextMapVectorizerModelArgs(allFeatureInfo: Seq[Seq[SmartTextFeatureInfo]], shouldCleanKeys: Boolean, shouldCleanValues: Boolean, shouldTrackNulls: Boolean, hashingParams: HashingFunctionParams) extends JsonLike with Product with Serializable

Arguments for SmartTextMapVectorizerModel

Arguments for SmartTextMapVectorizerModel

allFeatureInfo

info about each feature with each text map

shouldCleanKeys

should clean feature keys

shouldCleanValues

should clean feature values

shouldTrackNulls

should track nulls

hashingParams

hashing function params
class SmartTextVectorizer[T <: Text] extends SequenceEstimator[T, OPVector] with PivotParams with CleanTextFun with SaveOthersParams with TrackNullsParam with MinSupportParam with TextTokenizerParams with TrackTextLenParam with HashingVectorizerParams with HashingFun with OneHotFun with MaxCardinalityParams with MinLengthStdDevParams with NameDetectFun[T]

Convert a sequence of text features into a vector by detecting categoricals that are disguised as text.

Convert a sequence of text features into a vector by detecting categoricals that are disguised as text. A categorical will be represented as a vector consisting of occurrences of top K most common values of that feature plus occurrences of non top k values and a null indicator (if enabled). Non-categoricals will be converted into a vector using the hashing trick. In addition, a null indicator is created for each non-categorical (if enabled).

Detection and removal of names in the input columns can be enabled with the sensitiveFeatureMode param.
final class SmartTextVectorizerModel[T <: Text] extends SequenceModel[T, OPVector] with TextTokenizerParams with TrackTextLenParam with HashingFun with OneHotModelFun[Text]
case class SmartTextVectorizerModelArgs(vectorizationMethods: Array[TextVectorizationMethod], topValues: Array[Seq[String]], shouldCleanText: Boolean, shouldTrackNulls: Boolean, hashingParams: HashingFunctionParams) extends JsonLike with Product with Serializable

Arguments for SmartTextVectorizerModel

Arguments for SmartTextVectorizerModel

vectorizationMethods

method to use for text vectorization (either pivot, hashing, or ignoring)

topValues

top values to each feature

shouldCleanText

should clean text value

shouldTrackNulls

should track nulls

hashingParams

hashing function params
class SqrtTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

Square root transformer

Square root transformer

I

input feature type
sealed trait StringIndexerHandleInvalid extends EnumEntry with Serializable
class SubstringTransformer[I1 <: Text, I2 <: Text] extends BinaryTransformer[I1, I2, Binary] with TextMatchingParams

Checks if the first input is a substring of the second input

Checks if the first input is a substring of the second input

I1

first input feature type

I2

second input feature type
class SubtractTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

Minus function truth table (Real as example):

Minus function truth table (Real as example):

Real.empty - Real.empty = Real.empty Real.empty - Real(x) = Real(-x) Real(x) - Real.empty = Real(x) Real(x) - Real(y) = Real(x - y)
class TextLenTransformer[T <: TextList] extends SequenceTransformer[T, OPVector] with VectorizerDefaults with TextTokenizerParams with TextParams

Sequence transformer for generating a sequence of text lengths from a sequence of TextList values (eg.

Sequence transformer for generating a sequence of text lengths from a sequence of TextList values (eg. tokenized raw text)
sealed trait TextLengthType extends EnumEntry with Serializable

Method for computing text lengths
class TextListNullTransformer[T <: TextList] extends SequenceTransformer[T, OPVector] with VectorizerDefaults

Creates null indicator columns for a sequence of input TextList features, originally for use as a separate stage in null tracking for hashed text features (easier to do outside the hashing vectorizer since we can make a null indicator column for each input feature without having to add lots of complex logic in the hashing vectorizer to deal with metadata for shared vs.

Creates null indicator columns for a sequence of input TextList features, originally for use as a separate stage in null tracking for hashed text features (easier to do outside the hashing vectorizer since we can make a null indicator column for each input feature without having to add lots of complex logic in the hashing vectorizer to deal with metadata for shared vs. separate hash spaces.
class TextMapHashingVectorizer[T <: OPMap[String]] extends OPMapVectorizer[String, T] with TextParams
final class TextMapHashingVectorizerModel[T <: OPMap[String]] extends OPMapVectorizerModel[String, T] with TextTokenizerParams with HashingFun
class TextMapLenEstimator[T <: OPMap[String]] extends SequenceEstimator[T, OPVector] with MapVectorizerFuns[String, T]

Estimator for computing text lengths on fields stored in text maps.

Estimator for computing text lengths on fields stored in text maps. Note that because there are no maps from String to TextList, we need to do the tokenization here (unlike the TextLenTransformer).
final class TextMapLenModel[T <: OPMap[String]] extends SequenceModel[T, OPVector] with CleanTextMapFun with TextTokenizerParams
class TextMapNullEstimator[T <: OPMap[String]] extends SequenceEstimator[T, OPVector] with MapVectorizerFuns[String, T]

Creates null indicator columns for a sequence of input TextMap features, originally for use as a separate stage in null tracking for hashed text features (easier to do outside the hashing vectorizer since we can make a null indicator column for each input feature without having to add lots of complex logic in the hashing vectorizer to deal with metadata for shared vs.

Creates null indicator columns for a sequence of input TextMap features, originally for use as a separate stage in null tracking for hashed text features (easier to do outside the hashing vectorizer since we can make a null indicator column for each input feature without having to add lots of complex logic in the hashing vectorizer to deal with metadata for shared vs. separate hash spaces.
final class TextMapNullModel[T <: OPMap[String]] extends SequenceModel[T, OPVector] with CleanTextMapFun with TextTokenizerParams
class TextMapPivotVectorizer[T <: OPMap[String]] extends SequenceEstimator[T, OPVector] with PivotParams with MapPivotParams with TextParams with MapStringPivotHelper with CleanTextMapFun with MinSupportParam with TrackNullsParam with MaxPctCardinalityParams with MaxPctCardinalityFun

Converts a sequence of KeyString features into a vector keeping the top K most common occurrences of each key in the maps for that feature (ie the final vector has length k * number of keys * number of features).

Converts a sequence of KeyString features into a vector keeping the top K most common occurrences of each key in the maps for that feature (ie the final vector has length k * number of keys * number of features). Each key found will also generate an other column which will capture values that do not make the cut or where not seen in training. Note that any keys not seen in training will be ignored.
final class TextMapPivotVectorizerModel[T <: OPMap[String]] extends SequenceModel[T, OPVector] with TextMapPivotVectorizerModelFun[T]
trait TextMatchingParams extends Params
class TextNGramSimilarity[T <: Text] extends NGramSimilarity[T]

Compute char ngram distance for Text features.
trait TextParams extends Params
class TextTokenizer[T <: Text] extends UnaryTransformer[T, TextList] with TextTokenizerParams

Transformer that takes anything of type Text or lower and returns a TextList of tokens extracted from that text

Transformer that takes anything of type Text or lower and returns a TextList of tokens extracted from that text

Annotations

@ReaderWriter()
trait TextTokenizerParams extends LanguageDetectionParams with TextMatchingParams
class TextTokenizerReaderWriter[T <: Text] extends OpPipelineStageReaderWriter[TextTokenizer[T]]

Special reader/writer class for TextTokenizer stage
sealed trait TextVectorizationMethod extends EnumEntry with Serializable

Methods of vectorizing text (eg.

Methods of vectorizing text (eg. to be chosen by statistics computed in SmartTextVectorizer)
sealed abstract class TimePeriod extends EnumEntry with Serializable
class TimePeriodListTransformer[I <: DateList] extends UnaryTransformer[I, OPVector]

TimePeriodMapTransformer extracts one of a set of time periods from a date/datetime list

TimePeriodMapTransformer extracts one of a set of time periods from a date/datetime list

I

input feature type
class TimePeriodMapTransformer[I <: DateMap] extends UnaryTransformer[I, IntegralMap]

TimePeriodMapTransformer extracts one of a set of time periods from a date/datetime map

TimePeriodMapTransformer extracts one of a set of time periods from a date/datetime map

I

input feature type
class TimePeriodTransformer[I <: Date] extends UnaryTransformer[I, Integral]

TimePeriodTransformer extracts one of a set of time periods from a date/datetime

TimePeriodTransformer extracts one of a set of time periods from a date/datetime

I

input feature type
case class TimePeriodVal(value: Int, min: Int, max: Int) extends Product with Serializable
class ToOccurTransformer[I <: FeatureType] extends UnaryTransformer[I, RealNN]

Transformer that converts input feature of type I into doolean feature using a user specified function that maps object type I to a Boolean

Transformer that converts input feature of type I into doolean feature using a user specified function that maps object type I to a Boolean

I

Object type to be mapped to a double (doolean).
class TopNLabelJoiner extends MultiLabelJoiner

Joins probability score with label from string indexer stage and Sorts by highest score and returns up topN.

Joins probability score with label from string indexer stage and Sorts by highest score and returns up topN. and Filters out the class - UnseenLabel
class TopNLabelProbMap extends UnaryTransformer[RealMap, RealMap]

Sorts the label probability map and returns the topN.
trait TrackInvalidParam extends Params

Param that decides whether or not the values that are considered invalid are tracked
trait TrackNullsParam extends Params

Param that decides whether or not the values that were missing are tracked
trait TrackTextLenParam extends Params

Param that decides whether or not lengths of text are tracked during vectorization
class UrlMapToPickListMapTransformer extends UnaryTransformer[URLMap, PickListMap]
class ValidEmailTransformer extends UnaryTransformer[Email, Binary]

Checks if an email is valid
trait VectorizerDefaults extends OpPipelineStageBase
class VectorsCombiner extends SequenceEstimator[OPVector, OPVector]

Takes in a sequence of vectors and combines them into a single vector
final class VectorsCombinerModel extends SequenceModel[OPVector, OPVector]

Value Members

object CombinationStrategy extends Enum[CombinationStrategy] with Serializable
object DateListPivot extends Enum[DateListPivot] with Serializable

Enumeration object that contains the option to pivot the DateList feature

Enumeration object that contains the option to pivot the DateList feature

1) SinceFirst - replace the feature by the number of days between the first event and reference date

2) SinceLast - replace the feature by the number of days between the last event and reference date

3) ModeDay - replace the feature by a pivot that indicates the mode of the day of the week Example : If the mode is Monday then it will return (1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)

4) ModeMonth - replace the feature by a pivot that indicates the mode of the month

5) ModeHour - replace the feature by a pivot that indicates the mode of the hour of the day.
object DecisionTreeNumericBucketizer extends Product with Serializable
object EmailToPickListMapTransformer extends Serializable
object GenderDetectStrategy extends Enum[GenderDetectStrategy] with Product with Serializable
object HashAlgorithm extends Enum[HashAlgorithm] with Serializable
object HashSpaceStrategy extends Enum[HashSpaceStrategy] with Serializable
object HumanNameDetectorMetadata extends Product with Serializable
object Inclusion extends Enum[Inclusion] with Serializable
object IndexToStringHandleInvalid extends Enum[IndexToStringHandleInvalid] with Serializable
object LangDetector extends Serializable
object NGramSimilarity extends Serializable
object NameEntityRecognizer extends Serializable
object NumericBucketizer extends Serializable
object OpIndexToStringNoFilter extends Serializable
object OpOneHotVectorizer extends Serializable
object OpStringIndexerNoFilter extends Serializable
object PercentileCalibrator extends Product with Serializable
object PhoneNumberParser extends Product with Serializable
object Scaler extends Serializable

Scaler instance factory
object ScalerMetadata extends Serializable
object ScalingType extends Enum[ScalingType] with Serializable
object SmartTextVectorizer extends Serializable
object StringIndexerHandleInvalid extends Enum[StringIndexerHandleInvalid] with Serializable
object TextLengthType extends Enum[TextLengthType] with Serializable
object TextMapHashingVectorizerNames
object TextTokenizer extends Serializable
object TextVectorizationMethod extends Enum[TextVectorizationMethod] with Serializable
object TikaHelper

Tika helper
object TimePeriod extends Enum[TimePeriod] with Serializable
object ToOccurTransformer extends Serializable
object TopNLabelJoiner extends Serializable
object VectorizerUtils extends Product with Serializable

package feature

Type Members

class AbsoluteValueTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

class AddTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

class AliasTransformer[I <: FeatureType] extends UnaryTransformer[I, I]

class BinaryMapVectorizer[T <: OPMap[Boolean]] extends OPMapVectorizer[Boolean, T]

final class BinaryMapVectorizerModel[T <: OPMap[Boolean]] extends OPMapVectorizerModel[Boolean, T]

class BinaryVectorizer extends SequenceTransformer[Binary, OPVector] with VectorizerDefaults with TrackNullsParam

class CeilTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Integral]

trait CleanTextFun extends AnyRef

trait CleanTextMapFun extends CleanTextFun

sealed trait CombinationStrategy extends EnumEntry with Serializable

sealed trait DateListPivot extends EnumEntry with Serializable

class DateListVectorizer[T <: OPList[Long]] extends SequenceTransformer[T, OPVector] with VectorizerDefaults with TrackNullsParam

class DateMapToUnitCircleVectorizer[T <: DateMap] extends SequenceEstimator[T, OPVector] with DateToUnitCircleParams with MapVectorizerFuns[Long, T]

final class DateMapToUnitCircleVectorizerModel[T <: DateMap] extends SequenceModel[T, OPVector] with CleanTextMapFun

class DateMapVectorizer[T <: OPMap[Long]] extends OPMapVectorizer[Long, T]

final class DateMapVectorizerModel[T <: OPMap[Long]] extends OPMapVectorizerModel[Long, T]

trait DateToUnitCircleParams extends Params

class DateToUnitCircleTransformer[T <: Date] extends SequenceTransformer[T, OPVector] with DateToUnitCircleParams

class DecisionTreeNumericBucketizer[N, I2 <: OPNumeric[N]] extends BinaryEstimator[RealNN, I2, OPVector] with DecisionTreeNumericBucketizerParams with VectorizerDefaults with TrackInvalidParam with TrackNullsParam with NumericBucketizerMetadata with AllowLabelAsInput[OPVector]

final class DecisionTreeNumericBucketizerModel[I2 <: OPNumeric[_]] extends BinaryModel[RealNN, I2, OPVector] with AllowLabelAsInput[OPVector]

trait DecisionTreeNumericBucketizerParams extends AnyRef

final class DecisionTreeNumericMapBucketizerModel[I2 <: OPMap[_]] extends BinaryModel[RealNN, I2, OPVector] with CleanTextMapFun with AllowLabelAsInput[OPVector]

final class DescalerTransformer[I1 <: Real, I2 <: Real, O <: Real] extends BinaryTransformer[I1, I2, O]

class DivideTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

class DropIndicesByTransformer extends UnaryTransformer[OPVector, OPVector]

class EmailToPickListMapTransformer extends OPMapTransformer[Email, PickList, EmailMap, PickListMap]

case class EmptyScalerArgs() extends ScalingArgs with Product with Serializable

class ExistsTransformer[I <: FeatureType] extends UnaryTransformer[I, Binary]

class ExpTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

class FillMissingWithMean[N, I <: OPNumeric[N]] extends UnaryEstimator[I, RealNN]

final class FillMissingWithMeanModel[I <: OPNumeric[_]] extends UnaryModel[I, RealNN]

class FilterMap[I <: OPMap[_]] extends UnaryTransformer[I, I] with MapPivotParams with TextParams with CleanTextMapFun

class FilterTransformer[I <: FeatureType] extends UnaryTransformer[I, I]

class FloorTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Integral]

sealed class GenderDetectStrategy extends EnumEntry

class GeolocationMapVectorizer extends SequenceEstimator[GeolocationMap, OPVector] with MapVectorizerFuns[Seq[Double], GeolocationMap] with TrackNullsParam

final class GeolocationMapVectorizerModel extends SequenceModel[GeolocationMap, OPVector] with CleanTextMapFun

class GeolocationVectorizer extends SequenceEstimator[Geolocation, OPVector] with VectorizerDefaults with TrackNullsParam with GeolocationFunctions

final class GeolocationVectorizerModel extends SequenceModel[Geolocation, OPVector] with VectorizerDefaults

sealed trait HashAlgorithm extends EnumEntry with Serializable

sealed trait HashSpaceStrategy extends EnumEntry with Serializable

case class HashingFunctionParams(hashWithIndex: Boolean, prependFeatureName: Boolean, numFeatures: Int, numInputs: Int, maxNumOfFeatures: Int, binaryFreq: Boolean, hashAlgorithm: HashAlgorithm, hashSpaceStrategy: HashSpaceStrategy) extends Product with Serializable

class HumanNameDetector[T <: Text] extends UnaryEstimator[T, NameStats] with NameDetectFun[T]

case class HumanNameDetectorMetadata(treatAsName: Boolean, predictedNameProb: Double, genderResultsByStrategy: Map[String, GenderStats]) extends MetadataLike with Product with Serializable

class HumanNameDetectorModel[T <: Text] extends UnaryModel[T, NameStats] with NameDetectFun[T]

sealed abstract class Inclusion extends EnumEntry with Serializable

sealed trait IndexToStringHandleInvalid extends EnumEntry with Serializable

class IntegralMapVectorizer[T <: OPMap[Long]] extends OPMapVectorizer[Long, T]

final class IntegralMapVectorizerModel[T <: OPMap[Long]] extends OPMapVectorizerModel[Long, T]

class IntegralVectorizer[T <: Integral] extends SequenceEstimator[T, OPVector] with VectorizerDefaults with TrackNullsParam

final class IntegralVectorizerModel[T <: Integral] extends SequenceModel[T, OPVector] with VectorizerDefaults

class IsValidPhoneDefaultCountry extends UnaryTransformer[Phone, Binary] with PhoneParams

class IsValidPhoneMapDefaultCountry extends UnaryTransformer[PhoneMap, BinaryMap] with PhoneParams

class IsValidPhoneNumber extends BinaryTransformer[Phone, Text, Binary] with PhoneCountryParams

class JaccardSimilarity extends BinaryTransformer[MultiPickList, MultiPickList, RealNN]

class LangDetector[T <: Text] extends UnaryTransformer[T, RealMap]

trait LanguageDetectionParams extends Params

case class LinearScaler(args: LinearScalerArgs) extends Scaler with Product with Serializable

case class LinearScalerArgs(slope: Double, intercept: Double) extends ScalingArgs with Product with Serializable

case class LogScaler() extends Scaler with Product with Serializable

class LogTransformer[I <: OPNumeric[_]] extends UnaryTransformer[I, Real]

trait MapPivotParams extends Params

trait MapStringPivotHelper extends SaveOthersParams

trait MapVectorizerFuns[A, T <: OPMap[A]] extends VectorizerDefaults with MapPivotParams with CleanTextMapFun

trait MaxCardinalityParams extends Params

trait MaxPctCardinalityParams extends Params

class MimeTypeDetector extends UnaryTransformer[Base64, Text] with MimeTypeDetectorParams

class MimeTypeMapDetector extends UnaryTransformer[Base64Map, PickListMap] with MimeTypeDetectorParams

trait MinLengthStdDevParams extends Params

trait MinSupportParam extends Params

class MultiLabelJoiner extends BinaryTransformer[RealNN, OPVector, RealMap]

class MultiPickListMapVectorizer[T <: OPMap[Set[String]]] extends SequenceEstimator[T, OPVector] with PivotParams with MapPivotParams with TextParams with MapStringPivotHelper with CleanTextMapFun with MinSupportParam with TrackNullsParam with MaxPctCardinalityParams with MaxPctCardinalityFun

final class MultiPickListMapVectorizerModel[T <: OPMap[Set[String]]] extends SequenceModel[T, OPVector] with CleanTextMapFun

class MultiplyTransformer[I1 <: OPNumeric[_], I2 <: OPNumeric[_]] extends BinaryTransformer[I1, I2, Real]

class NameEntityRecognizer[T <: Text] extends UnaryTransformer[T, MultiPickListMap] with LanguageDetectionParams

class NumericBucketizer[I1 <: OPNumeric[_]] extends UnaryTransformer[I1, OPVector] with VectorizerDefaults with NumericBucketizerParams with NumericBucketizerMetadata

trait NumericBucketizerParams extends TrackInvalidParam with TrackNullsParam