Class/Object

com.salesforce.op.stages.impl.tuning

DataCutter

Related Docs: object DataCutter | package tuning

Permalink

class DataCutter extends Splitter with DataCutterParams

Instance that will make a holdout set and prepare the data for multiclass modeling Creates instance that will split data into training and test set filtering out any labels that don't meet the minimum fraction cutoff or fall in the top N labels specified.

Linear Supertypes
DataCutterParams, Splitter, SplitterParams, Params, Serializable, Serializable, Identifiable, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. DataCutter
  2. DataCutterParams
  3. Splitter
  4. SplitterParams
  5. Params
  6. Serializable
  7. Serializable
  8. Identifiable
  9. AnyRef
  10. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new DataCutter(uid: String = UID[DataCutter])

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def $[T](param: Param[T]): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  4. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  5. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  6. def checkPreconditions(): Unit

    Permalink
    Attributes
    protected
    Definition Classes
    Splitter
  7. final def clear(param: Param[_]): DataCutter.this.type

    Permalink
    Definition Classes
    Params
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def copy(extra: ParamMap): DataCutter

    Permalink
    Definition Classes
    DataCutter → Params
  10. def copyValues[T <: Params](to: T, extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  11. final def defaultCopy[T <: Params](extra: ParamMap): T

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  12. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  13. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  14. def explainParam(param: Param[_]): String

    Permalink
    Definition Classes
    Params
  15. def explainParams(): String

    Permalink
    Definition Classes
    Params
  16. final def extractParamMap(): ParamMap

    Permalink
    Definition Classes
    Params
  17. final def extractParamMap(extra: ParamMap): ParamMap

    Permalink
    Definition Classes
    Params
  18. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  19. final def get[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  20. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  21. final def getDefault[T](param: Param[T]): Option[T]

    Permalink
    Definition Classes
    Params
  22. def getLabelsFromMetadata(data: DataFrame): Array[String]

    Permalink
  23. def getMaxLabelCategories: Int

    Permalink
    Definition Classes
    DataCutterParams
  24. def getMaxTrainingSample: Int

    Permalink
    Definition Classes
    SplitterParams
  25. def getMinLabelFraction: Double

    Permalink
    Definition Classes
    DataCutterParams
  26. final def getOrDefault[T](param: Param[T]): T

    Permalink
    Definition Classes
    Params
  27. def getParam(paramName: String): Param[Any]

    Permalink
    Definition Classes
    Params
  28. def getReserveTestFraction: Double

    Permalink
    Definition Classes
    SplitterParams
  29. def getSeed: Long

    Permalink
    Definition Classes
    SplitterParams
  30. final def hasDefault[T](param: Param[T]): Boolean

    Permalink
    Definition Classes
    Params
  31. def hasParam(paramName: String): Boolean

    Permalink
    Definition Classes
    Params
  32. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  33. final def isDefined(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  34. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  35. final def isSet(param: Param[_]): Boolean

    Permalink
    Definition Classes
    Params
  36. final val labelColumnName: Param[String]

    Permalink
    Definition Classes
    SplitterParams
  37. final val maxLabelCategories: IntParam

    Permalink
    Definition Classes
    DataCutterParams
  38. final val maxNamesForDroppedLabels: IntParam

    Permalink
    Definition Classes
    DataCutterParams
  39. final val maxTrainingSample: IntParam

    Permalink

    Maximum size of dataset want to train on.

    Maximum size of dataset want to train on. Value should be > 0. Default is 1000000.

    Definition Classes
    SplitterParams
  40. final val minLabelFraction: DoubleParam

    Permalink
    Definition Classes
    DataCutterParams
  41. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  42. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  43. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  44. lazy val params: Array[Param[_]]

    Permalink
    Definition Classes
    Params
  45. def preValidationPrepare(data: DataFrame): PrevalidationVal

    Permalink

    Function to set parameters before passing into the validation step eg - do data balancing or dropping based on the labels

    Function to set parameters before passing into the validation step eg - do data balancing or dropping based on the labels

    returns

    Parameters set in examining data

    Definition Classes
    DataCutterSplitter
  46. final val reserveTestFraction: DoubleParam

    Permalink

    Fraction of data to reserve for test Default is 0.1

    Fraction of data to reserve for test Default is 0.1

    Definition Classes
    SplitterParams
  47. final val seed: LongParam

    Permalink

    Seed for data splitting

    Seed for data splitting

    Definition Classes
    SplitterParams
  48. final def set(paramPair: ParamPair[_]): DataCutter.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  49. final def set(param: String, value: Any): DataCutter.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  50. final def set[T](param: Param[T], value: T): DataCutter.this.type

    Permalink
    Definition Classes
    Params
  51. final def setDefault(paramPairs: ParamPair[_]*): DataCutter.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  52. final def setDefault[T](param: Param[T], value: T): DataCutter.this.type

    Permalink
    Attributes
    protected
    Definition Classes
    Params
  53. def setMaxLabelCategories(value: Int): DataCutter.this.type

    Permalink
    Definition Classes
    DataCutterParams
  54. def setMaxTrainingSample(value: Int): DataCutter.this.type

    Permalink
    Definition Classes
    SplitterParams
  55. def setMinLabelFraction(value: Double): DataCutter.this.type

    Permalink
    Definition Classes
    DataCutterParams
  56. def setReserveTestFraction(value: Double): DataCutter.this.type

    Permalink
    Definition Classes
    SplitterParams
  57. def setSeed(value: Long): DataCutter.this.type

    Permalink
    Definition Classes
    SplitterParams
  58. def split[T](data: Dataset[T]): (Dataset[T], Dataset[T])

    Permalink

    Function to use to create the training set and test set.

    Function to use to create the training set and test set.

    returns

    (dataTrain, dataTest)

    Definition Classes
    Splitter
  59. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  60. def toString(): String

    Permalink
    Definition Classes
    Identifiable → AnyRef → Any
  61. val uid: String

    Permalink
    Definition Classes
    Splitter → Identifiable
  62. def validationPrepare(data: Dataset[Row]): Dataset[Row]

    Permalink

    Rebalance the training data within the validation step

    Rebalance the training data within the validation step

    data

    to prepare for model training. first column must be the label as a double

    returns

    balanced training set and a test set

    Definition Classes
    DataCutterSplitter
  63. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  64. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  65. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  66. def withLabelColumnName(label: String): Splitter

    Permalink

    Add a splitter parameter to name the label column

    Add a splitter parameter to name the label column

    Definition Classes
    Splitter

Inherited from DataCutterParams

Inherited from Splitter

Inherited from SplitterParams

Inherited from Params

Inherited from Serializable

Inherited from Serializable

Inherited from Identifiable

Inherited from AnyRef

Inherited from Any

param

Ungrouped