All Downloads are FREE. Search and download functionalities are using the official Maven repository.

com.intel.analytics.bigdl.example.imageclassification.README.md Maven / Gradle / Ivy

There is a newer version: 0.11.1
Show newest version
## Summary
This example demonstrates how to use BigDL to load a BigDL or [Torch](http://torch.ch/) model trained on [ImageNet](http://image-net.org/) data, and then apply the loaded model to classify the contents of a set of images in Spark ML pipeline.

## Preparation

To start with this example, you need prepare your model and dataset.

1. Prepare model.

    The Torch ResNet model used in this example can be found in [Resnet Torch Model](https://github.com/facebook/fb.resnet.torch/tree/master/pretrained).
    The BigDL Inception model used in this example can be trained with [BigDL Inception](https://github.com/intel-analytics/BigDL/tree/master/spark/dl/src/main/scala/com/intel/analytics/bigdl/models/inception).
    You can choose one of them, and then put the trained model in $modelPath, and set corresponding $modelType(torch or bigdl).
   
2. Prepare predict dataset

    Put your image data for prediction in the ./predict folder. Alternatively, you may also use imagenet-2012 validation dataset to run the example, which can be found from . After you download the file (ILSVRC2012_img_val.tar), run the follow commands to prepare the data.
    
    ```bash
    mkdir predict
    tar -xvf ILSVRC2012_img_val.tar -C ./predict/
    ```
  
  
     Note: For large dataset, you may want to read image data from HDFS.This command will transform the images into hadoop sequence files:

```bash
mkdir -p val/images
mv predict/* val/images/
java -cp bigdl_folder/lib/bigdl-VERSION-jar-with-dependencies-and-spark.jar com.intel.analytics.bigdl.models.utils.ImageNetSeqFileGenerator -f ./ --validationOnly --hasName
mv val/*.seq predict/
```

  
## Run this example

Command to run the example in Spark local mode:
```
spark-submit \
--master local[physcial_core_number] \
--driver-memory 10g --executor-memory 20g \
--class com.intel.analytics.bigdl.example.imageclassification.ImagePredictor \
./dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
--modelPath ./resnet-18.t7 \
--folder ./predict \
--modelType torch \
--batchSizePerCore 16 \
--isHdfs false
```
Command to run the example in Spark standalone mode:
```
spark-submit \
--master spark://... \
--executor-cores 8 \
--total-executor-cores 32 \
--class com.intel.analytics.bigdl.example.imageclassification.ImagePredictor \
./dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
--modelPath ./resnet-18.t7 \
--folder ./predict \
--modelType torch \
--batchSizePerCore 16 \
--isHdfs false
```
Command to run the example in Spark yarn mode:
```
spark-submit \
--master yarn \
--deploy-mode client \
--executor-cores 8 \
--num-executors 4 \
--class com.intel.analytics.bigdl.example.imageclassification.ImagePredictor \
./dist/lib/bigdl-VERSION-jar-with-dependencies.jar \
--modelPath ./resnet-18.t7 \
--folder ./predict \
--modelType torch \
--batchSizePerCore 16 \
--isHdfs false
```
where 

* ```--modelPath``` is the path to the model file.
* ```--folder``` is the folder of predict images.
* ```--modelType``` is the type of model to load, it can be ```bigdl``` or ```torch```.
* ```--showNum``` is the result number to show, default 100.
* ```--batchSize``` is the batch size to use when do the prediction, default 32.
* ```--isHdfs``` is the type of predict data. "true" means reading sequence file from hdfs, "false" means reading local images, default "false". 




© 2015 - 2024 Weber Informatics LLC | Privacy Policy