com.intel.analytics.zoo.examples.resnet.README.md Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of analytics-zoo-bigdl_0.12.2-spark_2.2.1 Show documentation
Show all versions of analytics-zoo-bigdl_0.12.2-spark_2.2.1 Show documentation
Big Data AI platform for distributed TensorFlow and PyTorch on Apache Spark.
The newest version!
# ResNet
This example demonstrates how to use Analytics Zoo to train and evaluate the [ResNet](https://arxiv.org/abs/1512.03385) architecture on ImageNet data
## Data Processing
We use pipeline to process the input data.
Input data are transformed by several pipeline classes, such as HFlip, BGRImgNormalizer, RandomCropper, etc.
## Model
ShortcutType is a unique feature defined in ResNet. ShortcutType-B is used for ImageNet.
Model is implemented in ResNet
## Get the JAR
You can build one by refer to the
[Build Page](https://analytics-zoo.github.io/master/#ScalaUserGuide/install/#build-with-script-recommended) from the source code.
## Train ResNet on ImageNet
This example shows the best practise we've experimented in multi-node training
### Prepare ImageNet DataSet
The imagenet dataset preparation can be found from
[BigDL inception Prepare the data](https://github.com/intel-analytics/analytics-zoo/tree/master/zoo/src/main/scala/com/intel/analytics/zoo/examples/inception#prepare-the-data).
### Training
* Spark standalone example command
```bash
export SPARK_HOME=the root directory of Spark
export ANALYTICS_ZOO_HOME=the folder where you extract the downloaded Analytics Zoo zip package
${ANALYTICS_ZOO_HOME}/bin/spark-shell-with-zoo.sh \
--master spark://xxx.xxx.xxx.xxx:xxxx \
--executor-cores cores_per_executor \
--total-executor-cores total_cores_for_the_job \
--class com.intel.analytics.zoo.examples.resnet.TrainImageNet \
-f hdfs://xxx.xxx.xxx.xxx:xxxx/imagenet \
--batchSize 8192 --nEpochs 90 --learningRate 0.1 --warmupEpoch 5 \
--maxLr 3.2 --cache /cache --depth 50 --classes 1000
```
### Parameters
--folder | -f [the directory to reach the data]
--batchSize [default 8192, should be n*nodeNumber*coreNumber]
--nEpochs [number of epochs to train]
--learningRate [default 0.1]
--warmupEpoch [warm up epochs]
--maxLr [max learning rate, default to 3.2]
--cache [directory to store snapshot]
--depth [number of layers for resnet, default to 50]
--classes [number of classes, default to 1000]
### Training reference
#### Hyper Parameters
**Global batch** : 8192
**Single batch per core** : 4
**Epochs** : 90
**Initial learning rate**: 0.1
**Warmup epochs**: 5
**Max learning rate**: 3.2
#### Training result (90 epochs)
**Top1 accuracy**: 0.76114
**Top5 accuracy**: 0.92724
© 2015 - 2025 Weber Informatics LLC | Privacy Policy