FeatureLike
FeatureLike
Vectorize text map features by treating low cardinality text features as categoricals and applying hashing trick to high caridinality ones.
Vectorize text map features by treating low cardinality text features as categoricals and applying hashing trick to high caridinality ones.
max cardinality for a text feature to be treated as categorical
number of features (hashes) to generate
indicates whether to attempt language detection
minimum token length, >= 1.
indicates whether to convert all characters to lowercase before analyzing
clean map keys before pivoting
indicates whether to ignore capitalization and punctuation
indicates whether or not to track null values in a separate column.
indicates whether or not to track the length of the text in a separate column
number of most common elements to be used as categorical pivots
minimum number of occurrences an element must have to appear in pivot
name to give indexes which do not have a label name associated with them
include indices when hashing a feature that has them (OPLists or OPVectors)
if true, term frequency vector will be binary such that non-zero term counts will be set to 1.0
if true, prepends a input feature name to each token of that feature
Language detection threshold. If none of the detected languages have confidence greater than the threshold then defaultLanguage is used.
strategy to determine whether to use shared hash space for all included features
default language to assume in case autoDetectLanguage is disabled or failed to make a good enough prediction.
hash algorithm to use
additional text features
result feature of type Vector
Apply TextMapVectorizer on any OPMap that has string values
Apply TextMapVectorizer on any OPMap that has string values
clean text before pivoting
clean map keys before pivoting
whether or not to prepend feature name hash to the tokens before hashing
keys to whitelist
keys to blacklist
other features of the same type
option to keep track of values that were missing
option to add a column containing the text length to the feature vector
size of hash space
strategy to determine whether to use shared hash space for all included features
an OPVector feature
Enrichment functions for TextMap Features (they are hashed by default instead of being pivoted)