Class

com.salesforce.op.utils.spark.RichDataset

RichDataFrame

Related Doc: package RichDataset

Permalink

implicit class RichDataFrame extends RichDataset

A dataframe with three quantifiers: forall, exists, and forNone (see below) the rest of extended functionality comes from RichDataset

Linear Supertypes
RichDataset, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. RichDataFrame
  2. RichDataset
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new RichDataFrame(ds: DataFrame)

    Permalink

    ds

    data frame

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def collect[F1 <: FeatureType, F2 <: FeatureType, F3 <: FeatureType, F4 <: FeatureType, F5 <: FeatureType](f1: FeatureLike[F1], f2: FeatureLike[F2], f3: FeatureLike[F3], f4: FeatureLike[F4], f5: FeatureLike[F5])(implicit arg0: FeatureTypeSparkConverter[F1], arg1: ClassTag[F1], arg2: FeatureTypeSparkConverter[F2], arg3: ClassTag[F2], arg4: FeatureTypeSparkConverter[F3], arg5: ClassTag[F3], arg6: FeatureTypeSparkConverter[F4], arg7: ClassTag[F4], arg8: FeatureTypeSparkConverter[F5], arg9: ClassTag[F5]): Array[(F1, F2, F3, F4, F5)]

    Permalink

    Collects features from the dataset.

    Collects features from the dataset.

    Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

    returns

    array of feature values

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  7. def collect[F1 <: FeatureType, F2 <: FeatureType, F3 <: FeatureType, F4 <: FeatureType](f1: FeatureLike[F1], f2: FeatureLike[F2], f3: FeatureLike[F3], f4: FeatureLike[F4])(implicit arg0: FeatureTypeSparkConverter[F1], arg1: ClassTag[F1], arg2: FeatureTypeSparkConverter[F2], arg3: ClassTag[F2], arg4: FeatureTypeSparkConverter[F3], arg5: ClassTag[F3], arg6: FeatureTypeSparkConverter[F4], arg7: ClassTag[F4]): Array[(F1, F2, F3, F4)]

    Permalink

    Collects features from the dataset.

    Collects features from the dataset.

    Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

    returns

    array of feature values

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  8. def collect[F1 <: FeatureType, F2 <: FeatureType, F3 <: FeatureType](f1: FeatureLike[F1], f2: FeatureLike[F2], f3: FeatureLike[F3])(implicit arg0: FeatureTypeSparkConverter[F1], arg1: ClassTag[F1], arg2: FeatureTypeSparkConverter[F2], arg3: ClassTag[F2], arg4: FeatureTypeSparkConverter[F3], arg5: ClassTag[F3]): Array[(F1, F2, F3)]

    Permalink

    Collects features from the dataset.

    Collects features from the dataset.

    Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

    returns

    array of feature values

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  9. def collect[F1 <: FeatureType, F2 <: FeatureType](f1: FeatureLike[F1], f2: FeatureLike[F2])(implicit arg0: FeatureTypeSparkConverter[F1], arg1: ClassTag[F1], arg2: FeatureTypeSparkConverter[F2], arg3: ClassTag[F2]): Array[(F1, F2)]

    Permalink

    Collects features from the dataset.

    Collects features from the dataset.

    Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

    returns

    array of feature values

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  10. def collect[F1 <: FeatureType](f: FeatureLike[F1])(implicit arg0: FeatureTypeSparkConverter[F1], arg1: ClassTag[F1]): Array[F1]

    Permalink

    Collects features from the dataset.

    Collects features from the dataset.

    Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

    returns

    array of feature values

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  11. val ds: Dataset[_]

    Permalink
    Definition Classes
    RichDataset
  12. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  14. def exists[T <: FeatureType](feature: FeatureLike[T])(predicate: (exists.T.Value) ⇒ Boolean)(implicit arg0: FeatureTypeSparkConverter[T]): Boolean

    Permalink

    Given a feature and a predicate, checks that some values satisfy the predicate

    Given a feature and a predicate, checks that some values satisfy the predicate

    T

    column value type

    feature

    feature that describes column

    returns

    a quantifier that acts on predicate, T => Boolean, producing true iff at least one value satisfies the predicate

  15. def exists[T](columnName: String)(predicate: (T) ⇒ Boolean): Boolean

    Permalink

    Given a column name and a predicate, checks that some values satisfy the predicate

    Given a column name and a predicate, checks that some values satisfy the predicate

    T

    column value type

    columnName

    column name

    predicate

    predicate, T => Boolean

    returns

    true iff at least one value satisfies the predicate

  16. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  17. def forNone[T <: FeatureType](feature: FeatureLike[T])(predicate: (forNone.T.Value) ⇒ Boolean)(implicit arg0: FeatureTypeSparkConverter[T]): Boolean

    Permalink

    Given a feature and a predicate, checks that none of the values satisfy the predicate

    Given a feature and a predicate, checks that none of the values satisfy the predicate

    T

    column value type

    feature

    feature that describes column

    returns

    a quantifier that acts on predicate, T => Boolean, producing true iff none of the values satisfy the predicate

  18. def forNone[T](columnName: String)(predicate: (T) ⇒ Boolean): Boolean

    Permalink

    Given a column name and a predicate, checks that none of the values satisfy the predicate

    Given a column name and a predicate, checks that none of the values satisfy the predicate

    T

    column value type

    columnName

    column name

    predicate

    predicate, T => Boolean

    returns

    true iff none of the values satisfy the predicate

  19. def forall[T <: FeatureType](feature: FeatureLike[T])(predicate: (forall.T.Value) ⇒ Boolean)(implicit arg0: FeatureTypeSparkConverter[T]): Boolean

    Permalink

    Given a feature and a predicate, checks that all values satisfy the predicate

    Given a feature and a predicate, checks that all values satisfy the predicate

    T

    column value type

    feature

    feature that describes column

    returns

    a quantifier that acts on predicate, T => Boolean, producing true iff all values satisfy the predicate

  20. def forall[T](columnName: String)(predicate: (T) ⇒ Boolean): Boolean

    Permalink

    Given a column name and a predicate, checks that all values satisfy the predicate

    Given a column name and a predicate, checks that all values satisfy the predicate

    T

    column value type

    columnName

    column name

    predicate

    predicate, T => Boolean

    returns

    true iff all values satisfy the predicate Examples of usage:

     myDF allOf "MyNumericColumn" should beBetween(-1, 1)
     myDF someOf "MyStringColumn" should (
       (x: String) => (x contains "Country") || (x contains "State")
    )
  21. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  22. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  23. def isEmpty: Boolean

    Permalink

    Check if dataset is empty

    Check if dataset is empty

    returns

    true if dataset is empty, false otherwise

    Definition Classes
    RichDataset
  24. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  25. def metadata(features: OPFeature*): Map[OPFeature, Metadata]

    Permalink

    Returns metadata map for features

    Returns metadata map for features

    features

    features to get metadata for

    returns

    metadata map for features

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  26. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  27. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  28. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  29. def saveAvro(path: String, cleanNames: Boolean = true, options: Map[String, String] = Map.empty, saveMode: SaveMode = SaveMode.ErrorIfExists)(implicit spark: SparkSession): Unit

    Permalink

    Will convert data frame with complex feature names and vector types into avro compatible format and save it to the specified location

    Will convert data frame with complex feature names and vector types into avro compatible format and save it to the specified location

    path

    location to save data

    cleanNames

    should clean column names from non alphanumeric characters before saving

    options

    output options for the underlying data source

    saveMode

    Specifies the behavior when data or table already exists. Options include:

    • SaveMode.Overwrite: overwrite the existing data.
    • SaveMode.Append: append the data.
    • SaveMode.Ignore: ignore the operation (i.e. no-op).
    • SaveMode.ErrorIfExists: default option, throw an exception at runtime.
    spark

    spark session used to save the original schema information with metadata

    Definition Classes
    RichDataset
  30. def select(features: OPFeature*): DataFrame

    Permalink

    Selects features from the dataset.

    Selects features from the dataset.

    features

    features to select

    returns

    a dataset containing the selected features

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  31. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  32. def take[F1 <: FeatureType, F2 <: FeatureType, F3 <: FeatureType, F4 <: FeatureType, F5 <: FeatureType](n: Int, f1: FeatureLike[F1], f2: FeatureLike[F2], f3: FeatureLike[F3], f4: FeatureLike[F4], f5: FeatureLike[F5])(implicit arg0: FeatureTypeSparkConverter[F1], arg1: ClassTag[F1], arg2: FeatureTypeSparkConverter[F2], arg3: ClassTag[F2], arg4: FeatureTypeSparkConverter[F3], arg5: ClassTag[F3], arg6: FeatureTypeSparkConverter[F4], arg7: ClassTag[F4], arg8: FeatureTypeSparkConverter[F5], arg9: ClassTag[F5]): Array[(F1, F2, F3, F4, F5)]

    Permalink

    Collects features from the dataset and returns the first n values.

    Collects features from the dataset and returns the first n values.

    Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

    n

    number of values to return

    returns

    array of the first n feature values

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  33. def take[F1 <: FeatureType, F2 <: FeatureType, F3 <: FeatureType, F4 <: FeatureType](n: Int, f1: FeatureLike[F1], f2: FeatureLike[F2], f3: FeatureLike[F3], f4: FeatureLike[F4])(implicit arg0: FeatureTypeSparkConverter[F1], arg1: ClassTag[F1], arg2: FeatureTypeSparkConverter[F2], arg3: ClassTag[F2], arg4: FeatureTypeSparkConverter[F3], arg5: ClassTag[F3], arg6: FeatureTypeSparkConverter[F4], arg7: ClassTag[F4]): Array[(F1, F2, F3, F4)]

    Permalink

    Collects features from the dataset and returns the first n values.

    Collects features from the dataset and returns the first n values.

    Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

    n

    number of values to return

    returns

    array of the first n feature values

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  34. def take[F1 <: FeatureType, F2 <: FeatureType, F3 <: FeatureType](n: Int, f1: FeatureLike[F1], f2: FeatureLike[F2], f3: FeatureLike[F3])(implicit arg0: FeatureTypeSparkConverter[F1], arg1: ClassTag[F1], arg2: FeatureTypeSparkConverter[F2], arg3: ClassTag[F2], arg4: FeatureTypeSparkConverter[F3], arg5: ClassTag[F3]): Array[(F1, F2, F3)]

    Permalink

    Collects features from the dataset and returns the first n values.

    Collects features from the dataset and returns the first n values.

    Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

    n

    number of values to return

    returns

    array of the first n feature values

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  35. def take[F1 <: FeatureType, F2 <: FeatureType](n: Int, f1: FeatureLike[F1], f2: FeatureLike[F2])(implicit arg0: FeatureTypeSparkConverter[F1], arg1: ClassTag[F1], arg2: FeatureTypeSparkConverter[F2], arg3: ClassTag[F2]): Array[(F1, F2)]

    Permalink

    Collects features from the dataset and returns the first n values.

    Collects features from the dataset and returns the first n values.

    Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

    n

    number of values to return

    returns

    array of the first n feature values

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  36. def take[F1 <: FeatureType](n: Int, f: FeatureLike[F1])(implicit arg0: FeatureTypeSparkConverter[F1], arg1: ClassTag[F1]): Array[F1]

    Permalink

    Collects features from the dataset and returns the first n values.

    Collects features from the dataset and returns the first n values.

    Running collect requires moving all the data into the application's driver process, and doing so on a very large dataset can crash the driver process with OutOfMemoryError.

    n

    number of values to return

    returns

    array of the first n feature values

    Definition Classes
    RichDataset
    Exceptions thrown

    IllegalArgumentException if dataset schema does not match the features

  37. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  38. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  39. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  40. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from RichDataset

Inherited from AnyRef

Inherited from Any

Ungrouped