weka.gui.beans.README_KnowledgeFlow Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of weka-stable Show documentation
The Waikato Environment for Knowledge Analysis (WEKA), a machine learning workbench. This is the stable version. Apart from bugfixes, this version does not receive any other updates.
There is a newer version: 3.8.6
Show newest version
===============================================================
KnowledgeFlow GUI Quick Primer
===============================================================

What's new in the KnowledgeFlow:

Components can now be grouped together under a "meta" component. Start
by placing some components on the layout and connecting them
together. Then select a subset of components with the mouse by holding
down the left button and dragging the resulting rectangle. You will
then be asked whether you wish to group the selected components and
for a name to give the group. The selected components will then be
replaced by a single icon on the layout. All grouped beans can still
be configured and connections made by right-clicking on the icon to
display a pop-up menu. At the moment meta components can't form part
of another group (this functionality will be added in a later
release). Eventually, functionality will be added to allow the user
to store custom groups in a user toolbar for reuse.

Introduction:

The KnowledgeFlow provides an alternative to the Explorer as a
graphical front end to Weka's core algorithms. The KnowledgeFlow is a
work in progress so some of the functionality from the Explorer is not
yet available. On the other hand, there are things that can be done in
the KnowledgeFlow but not in the Explorer.

The KnowledgeFlow presents a "data-flow" inspired interface to
Weka. The user can select Weka components from a tool bar, place them
on a layout canvas and connect them together in order to form a
"knowledge flow" for processing and analyzing data. At present, all of
Weka's classifiers, filters, clusterers, loaders and savers are
available in the KnowledgeFlow along with some extra tools.

The KnowledgeFlow can handle data either incrementally or in batches
(the Explorer handles batch data only). Of course learning from data
incrementally requires a classifier that can be updated on an instance
by instance basis. Currently in Weka there are five classifiers that
can handle data incrementally: NaiveBayesUpdateable, IB1, IBk, LWR
(locally weighted regression). There is also one meta classifier -
RacedIncrementalLogitBoost - that can use of any regression base
learner to learn from discrete class data incrementally.

Features of the KnowledgeFlow:

* intuitive data flow style layout
* process data in batches or incrementally 
* process multiple batches or streams in parallel! (each separate flow 
  executes in its own thread)
* chain filters together
* view models produced by classifiers for each fold in a cross validation
* visualize performance of incremental classifiers during 
  processing (scrolling plots of classification accuracy, RMS error, 
  predictions etc)

Components available in the KnowledgeFlow:

DataSources:
  All of Weka's loaders are available

DataSinks:
  All of Weka's savers are available

Filters:
  All of Weka's filters are available

Classifiers:
  All of Weka's classifiers are available

Clusterers:
  All of Weka's clusterers are available

Evaluation:
  TrainingSetMaker - make a data set into a training set
  TestSetMaker - make a data set into a test set
  CrossValidationFoldMaker - split any data set, training set or test set
    into folds
  TrainTestSplitMaker - split any data set, training set or test set into
    a training set and a test set
  ClassAssigner - assign a column to be the class for any data set, training
    set or test set
  ClassValuePicker - choose a class value to be considered as the "positive"
    class. This is useful when generating data for ROC style curves (see
    below)
  ClassifierPerformanceEvaluator - evaluate the performance of batch
    trained/tested classifiers
  IncrementalClassifierEvaluator - evaluate the performance of incrementally
    trained classifiers
  ClustererPerformanceEvaluator - evaluate the performance of batch
    trained/tested clusterers
  PredictionAppender - append classifier predictions to a test set. For
    discrete class problems, can either append predicted class labels or
    probability distributions

Visualization:
  DataVisualizer - component that can pop up a panel for visualizing data in
    a single large 2D scatter plot
  ScatterPlotMatrix - component that can pop up a panel containing a matrix of
    small scatter plots (clicking on a small plot pops up a large scatter 
    plot)
  AttributeSummarizer - component that can pop up a panel containing a matrix
    of histogram plots - one for each of the attributes in the input data
  ModelPerformanceChart - component that can pop up a panel for visualizing
    threshold (i.e. ROC style) curves.
  TextViewer - component for showing textual data. Can show data sets, 
    classification performance statistics etc.
  GraphViewer - component that can pop up a panel for visualizing tree based
    models
  StripChart - component that can pop up a panel that displays a scrolling
    plot of data (used for viewing the online performance of incremental
    classifiers)


---------------

Launching the KnowledgeFlow:

The Weka GUI Chooser window is used to launch Weka's graphical
environments. Select the button labeled "KnowledgeFlow" to start the
KnowledgeFlow. Alternatively, you can launch the KnowledgeFlow from a
terminal window by typing "java weka.gui.beans.KnowledgeFlow".

At the top of the KnowledgeFlow window is are seven tabs: DataSources,
DataSinks, Filters, Classifiers, Clusterers, Evaluation and
Visualization. The names are pretty much self explanatory.

EXAMPLE:
-----------------
Setting up a flow to load an arff file (batch mode) and
perform a cross validation using J48 (Weka's C4.5 implementation).

First start the KnowlegeFlow.

Next click on the DataSources tab and choose "ArffLoader" from the
toolbar (the mouse pointer will change to a "cross hairs").

Next place the ArffLoader component on the layout area by clicking
somewhere on the layout (A copy of the ArffLoader icon will appear on
the layout area).

Next specify an arff file to load by first right clicking the mouse
over the ArffLoader icon on the layout. A pop-up menu will
appear. Select "Configure" under "Edit" in the list from this menu and
browse to the location of your arff file. Alternatively, you can
double-click on the icon to bring up the configuration dialog (if
the component in question has one).

Next click the "Evaluation" tab at the top of the window and choose the
"ClassAssigner" (allows you to choose which column to be the class)
component from the toolbar. Place this on the layout.

Now connect the ArffLoader to the ClassAssigner: first right click
over the ArffLoader and select the "dataSet" under "Connections" in
the menu. A "rubber band" line will appear. Move the mouse over the
ClassAssigner component and left click - a red line labeled "dataSet"
will connect the two components.

Next right click over the ClassAssigner and choose "Configure" from
the menu. This will pop up a window from which you can specify which
column is the class in your data (last is the default).

Next grab a "CrossValidationFoldMaker" component from the Evaluation
toolbar and place it on the layout. Connect the ClassAssigner to the
CrossValidationFoldMaker by right clicking over "ClassAssigner" and
selecting "dataSet" from under "Connections" in the menu.

Next click on the "Classifiers" tab at the top of the window and
scroll along the toolbar until you reach the "J48" component in the
"trees" section. Place a J48 component on the layout.

Connect the CrossValidationFoldMaker to J48 TWICE by first choosing
"trainingSet" and then "testSet" from the pop-up menu for the
CrossValidationFoldMaker.

Next go back to the "Evaluation" tab and place a
"ClassifierPerformanceEvaluator" component on the layout. Connect J48
to this component by selecting the "batchClassifier" entry from the
pop-up menu for J48.

Next go to the "Visualization" toolbar and place a "TextViewer"
component on the layout. Connect the ClassifierPerformanceEvaluator to
the TextViewer by selecting the "text" entry from the pop-up menu for
ClassifierPerformanceEvaluator.

Now start the flow executing by selecting "Start loading" from the
pop-up menu for ArffLoader. Depending on how big the data set is and
how long cross validation takes you will see some animation from some
of the icons in the layout (J48's tree will "grow" in the icon and the
ticks will animate on the ClassifierPerformanceEvaluator). You will
also see some progress information in the "Status" bar and "Log" at
the bottom of the window.

When finished you can view the results by choosing show results from
the pop-up menu for the TextViewer component.

Other cool things to add to this flow: connect a TextViewer and/or a
GraphViewer to J48 in order to view the textual or graphical
representations of the trees produced for each fold of the cross
validation (this is something that is not possible in the Explorer).
-----------------------------