All Downloads are FREE. Search and download functionalities are using the official Maven repository.

org.apache.mahout.classifier.df.mapreduce.inmem.package-info Maven / Gradle / Ivy

/**
 * 

In-memory mapreduce implementation of Random Decision Forests

* *

Each mapper is responsible for growing a number of trees with a whole copy of the dataset loaded in memory, * it uses the reference implementation's code to build each tree and estimate the oob error.

* *

The dataset is distributed to the slave nodes using the {@link org.apache.hadoop.filecache.DistributedCache}. * A custom {@link org.apache.hadoop.mapreduce.InputFormat} * ({@link org.apache.mahout.classifier.df.mapreduce.inmem.InMemInputFormat}) is configured with the * desired number of trees and generates a number of {@link org.apache.hadoop.mapreduce.InputSplit}s * equal to the configured number of maps.

* *

There is no need for reducers, each map outputs (the trees it built and, for each tree, the labels the * tree predicted for each out-of-bag instance. This step has to be done in the mapper because only there we * know which instances are o-o-b.

* *

The Forest builder ({@link org.apache.mahout.classifier.df.mapreduce.inmem.InMemBuilder}) is responsible * for configuring and launching the job. * At the end of the job it parses the output files and builds the corresponding * {@link org.apache.mahout.classifier.df.DecisionForest}.

*/ package org.apache.mahout.classifier.df.mapreduce.inmem;




© 2015 - 2024 Weber Informatics LLC | Privacy Policy