All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.intel.analytics.bigdl.ppml.examples.README.md Maven / Gradle / Ivy

The newest version!
## End to end Spark GBT Example On CriteoClickLogsDataset

### Usage

Run this example in spark local mode:

1.select a KeyManagementService(`SimpleKeyManagementService` or `EHSMKeyManagementService`)

2.prepare 1g [dataset](https://ailab.criteo.com/download-criteo-1tb-click-logs-dataset/) and a primaryKey and a dataKey(refer to [this](https://github.com/intel-analytics/BigDL/blob/main/ppml/services/kms-utils/docker/README.md))

3.prepare a BigDL PPML Client Container(refer to PPML tutorial)

4.run the following command in the container

> your input file can be CSV, JSON, PARQUET or other textfile with or without encryption. if input file is not encrypted, specify the `inputEncryptMode==plain_text`. else, for encrypted CSV, JSON and other textfile, specify the `inputEncryptMode==AES/CBC/PKCS5Padding`. for encrypted parquet file, specify the `inputEncryptMode==AES_GCM_CTR_V1 or AES_GCM_V1`.
>
> in this example, the input file is a plain CSV file

- for `SimpleKeyManagementService` 

  ```bash
  /opt/jdk8/bin/java \
      -cp '/ppml/trusted-big-data-ml/spark-encrypt-io-0.3.0-SNAPSHOT.jar:/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/*' -Xmx16g \
      org.apache.spark.deploy.SparkSubmit \
      --master local[4] \
      --executor-memory 8g \
      --driver-memory 8g \
      --class com.intel.analytics.bigdl.ppml.examples.xgbClassifierTrainingExampleOnCriteoClickLogsDataset \
      --conf spark.network.timeout=10000000 \
      --conf spark.executor.heartbeatInterval=10000000 \
      --verbose \
      --jars local:///ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar \
      local:///ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar \
      --trainingDataPath /your/training/data/path \
      --modelSavePath /your/model/save/path \
      --inputEncryptMode plain_text \
      --primaryKeyPath /your/primary/key/path/primaryKey \
      --dataKeyPath /your/data/key/path/dataKey \
      --kmsType SimpleKeyManagementService \
      --simpleAPPID your_app_id \
      --simpleAPIKEY your_api_key \
      --maxDepth 5 \
      --maxIter 100
  ```

- for `EHSMKeyManagementService`

  ```bash
  /opt/jdk8/bin/java \
      -cp '/ppml/trusted-big-data-ml/spark-encrypt-io-0.3.0-SNAPSHOT.jar:/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/*' -Xmx16g \
      org.apache.spark.deploy.SparkSubmit \
      --master local[4] \
      --executor-memory 8g \
      --driver-memory 8g \
      --class com.intel.analytics.bigdl.ppml.examples.xgbClassifierTrainingExampleOnCriteoClickLogsDataset \
      --conf spark.network.timeout=10000000 \
      --conf spark.executor.heartbeatInterval=10000000 \
      --verbose \
      --jars local:///ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar \
      local:///ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar \
      --trainingDataPath /your/training/data/path \
      --modelSavePath /your/model/save/path \
      --inputEncryptMode plain_text \
      --primaryKeyPath /your/primary/key/path/primaryKey \
      --dataKeyPath /your/data/key/path/dataKey \
      --kmsType EHSMKeyManagementService \
      --kmsServerIP you_kms_server_ip \
      --kmsServerPort you_kms_server_port \
      --ehsmAPPID your_app_id \
      --ehsmAPIKEY your_api_key \
      --maxDepth 5 \
      --maxIter 100
  ```

  




© 2015 - 2025 Weber Informatics LLC | Privacy Policy