com.intel.analytics.bigdl.example.MLPipeline.README.md Maven / Gradle / Ivy

Show more of this group Show more artifacts with this name
Show all versions of bigdl Show documentation
There is a newer version: 0.13.0
## DLClassifierLogisticRegression

DLClassifierLogisticRegression example demonstrates how to use BigDL DLClassifier to train a
Logistic Regression Model. DLClassifier extends Spark Estimator and can act as a stage in a
ML Pipeline. The feature column can be Array or Spark Vectors, while the label column data should
be Double.

## DLEstimatorMultiLabelLR

DLEstimatorMultiLabelLR example demonstrates how to use BigDL DLEstimator to train a
multi-label Logistic Regression Model. DLEstimator extends Spark Estimator and can act as a
stage in a ML Pipeline. Both the feature and label column can be Array or Spark Vectors. The
feature column may also be Double.

## DLClassifierLeNet
DLClassifierLeNet example demonstrates how to use BigDL with Spark ML pipeline to train and predict LeNet5 model on MNIST dataset.

Learn more about Spark ML please refer to 
### Preparation

To start with this example, you need prepare your dataset.


1. Prepare  dataset

You can download the MNIST Data from [here](http://yann.lecun.com/exdb/mnist/). Unzip all the
files and put them in one folder(e.g. mnist).

There're four files. 

**train-images-idx3-ubyte** contains train images.
**train-labels-idx1-ubyte** is train label file.
**t10k-images-idx3-ubyte** has validation images.
**t10k-labels-idx1-ubyte** contains validation labels.
 
For more detail, please refer to the download page.

### Run this example

Command to run the example in Spark local mode:
```
spark-submit \
--master local[physcial_core_number] \
--class com.intel.analytics.bigdl.example.MLPipeline.DLClassifierLeNet \
./dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
-f path_to_mnist_folder \
-b batch_size
```
Command to run the example in Spark standalone mode:
```
spark-submit \
--master spark://... \
--executor-cores cores_per_executor \
--total-executor-cores total_cores_for_the_job \
--driver-class-path dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
--class com.intel.analytics.bigdl.example.MLPipeline.DLClassifierLeNet  \
dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
-f path_to_mnist_folder \
-b batch_size
```
Command to run the example in Spark yarn mode:
```
--master yarn \
--deploy-mode client \
--executor-cores cores_per_executor \
--num-executors executors_number \
--driver-class-path dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
--class com.intel.analytics.bigdl.example.MLPipeline.DLClassifierLeNet  \
dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
-f path_to_mnist_folder \
-b batch_size
```
where

* -f: where you put your MNIST data
* -b: The mini-batch size. It is expected that the mini-batch size is a multiple of node_number * core_number