weka.gui.beans.README_KnowledgeFlow Maven / Gradle / Ivy
===============================================================
KnowledgeFlow GUI Quick Primer
===============================================================
What's new in the KnowledgeFlow:
The KnowledgeFlow has undergone a major UI overhaul in Weka 3.7.4. New
features include:
* Move from tabbed toolbars of components to a tree
* Multiple flow layouts (each in its own tab)
* New icons for general controls
* Support for cut, copy, paste and delete of multiple components
* "Save as" as well as "Save"
* Informational notes can now be added to the layout
* Undo buffer
* "Select all" button
* Drag multiple components
* Snap-to-grid
* New buttons to run flow in parallel mode (all data sources launched simultaneously)
and sequential mode (data sources launched one after the other in an order
specified by the user)
* Template flows demonstrating example learning processes
* Plugin "perspectives" that allow additional functionality to take over
the main UI
Introduction:
The KnowledgeFlow provides an alternative to the Explorer as a
graphical front end to Weka's core algorithms. It presents a
"data-flow" inspired interface to Weka. The user can select Weka
components from a pallete, place them on a layout canvas and connect
them together in order to form a "knowledge flow" for processing and
analyzing data. At present, all of Weka's classifiers, filters,
clusterers, loaders and savers are available in the KnowledgeFlow
along with some extra tools.
The KnowledgeFlow can handle data either incrementally or in batches
(the Explorer handles batch data only). Of course learning from data
incrementally requires a classifier that can be updated on an instance
by instance basis. There are a number of schemes that can handle data
incrementally: NaiveBayesUpdateable, IB1, IBk, LWR (locally weighted
regression), SGD, SPegasos, Cobweb and RacedIncrementalLogitBoost.
Features of the KnowledgeFlow:
* intuitive data flow style layout
* process data in batches or incrementally
* process multiple batches or streams in parallel! (each separate flow
executes in its own thread). Alternatively, multiple streams can be
executed sequentially, in a user-specified order
* chain filters together
* view models produced by classifiers for each fold in a cross validation
* visualize performance of incremental classifiers during
processing (scrolling plots of classification accuracy, RMS error,
predictions etc)
* access additional non flow-based functionality through plugin
"perspectives"
Components available in the KnowledgeFlow:
DataSources:
All of Weka's loaders are available
DataSinks:
All of Weka's savers are available
Filters:
All of Weka's filters are available
Classifiers:
All of Weka's classifiers are available
Clusterers:
All of Weka's clusterers are available
Evaluation:
TrainingSetMaker - make a data set into a training set
TestSetMaker - make a data set into a test set
CrossValidationFoldMaker - split any data set, training set or test set
into folds
TrainTestSplitMaker - split any data set, training set or test set into
a training set and a test set
ClassAssigner - assign a column to be the class for any data set, training
set or test set
ClassValuePicker - choose a class value to be considered as the "positive"
class. This is useful when generating data for ROC style curves (see
below)
ClassifierPerformanceEvaluator - evaluate the performance of batch
trained/tested classifiers
IncrementalClassifierEvaluator - evaluate the performance of incrementally
trained classifiers
ClustererPerformanceEvaluator - evaluate the performance of batch
trained/tested clusterers
PredictionAppender - append classifier predictions to a test set. For
discrete class problems, can either append predicted class labels or
probability distributions
SerializedModelSaver - save a classifier out to a file for later use.
Visualization:
DataVisualizer - component that can pop up a panel for visualizing data in
a single large 2D scatter plot
ScatterPlotMatrix - component that can pop up a panel containing a matrix of
small scatter plots (clicking on a small plot pops up a large scatter
plot)
AttributeSummarizer - component that can pop up a panel containing a matrix
of histogram plots - one for each of the attributes in the input data
ModelPerformanceChart - component that can pop up a panel for visualizing
threshold (i.e. ROC style) curves.
TextViewer - component for showing textual data. Can show data sets,
classification performance statistics etc.
GraphViewer - component that can pop up a panel for visualizing tree based
models
StripChart - component that can pop up a panel that displays a scrolling
plot of data (used for viewing the online performance of incremental
classifiers)
CostBenefitAnalysis - interactively and graphically explore the effects
of changing costs/benefits and adjusting prediction thresholds.
---------------
Launching the KnowledgeFlow:
The Weka GUI Chooser window is used to launch Weka's graphical
environments. Select the button labeled "KnowledgeFlow" to start the
KnowledgeFlow. Alternatively, you can launch the KnowledgeFlow from a
terminal window by typing "java weka.gui.beans.KnowledgeFlow".
EXAMPLE:
-----------------
Setting up a flow to load an arff file (batch mode) and
perform a cross validation using J48 (Weka's C4.5 implementation). NOTE,
this example ("Cross validation") can be accessed from the Templates
button (third in from the right in the toolbar) in the KnowledgeFlow
UI.
First start the KnowlegeFlow.
Next expand the DataSources entry in the tree and choose "ArffLoader"
from the toolbar (the mouse pointer will change to a "cross hairs").
Next place the ArffLoader component on the layout area by clicking
somewhere on the layout (A copy of the ArffLoader icon will appear on
the layout area).
Next specify an arff file to load by first right clicking the mouse
over the ArffLoader icon on the layout. A pop-up menu will
appear. Select "Configure" under "Edit" in the list from this menu and
browse to the location of your arff file. Alternatively, you can
double-click on the icon to bring up the configuration dialog (if
the component in question has one).
Next expand the "Evaluation" entry in the tree and choose the
"ClassAssigner" (allows you to choose which column to be the class)
component from the toolbar. Place this on the layout.
Now connect the ArffLoader to the ClassAssigner: first right click
over the ArffLoader and select the "dataSet" under "Connections" in
the menu. A "rubber band" line will appear. Move the mouse over the
ClassAssigner component and left click - a red line labeled "dataSet"
will connect the two components.
Next right click over the ClassAssigner and choose "Configure" from
the menu. This will pop up a window from which you can specify which
column is the class in your data (last is the default).
Next grab a "CrossValidationFoldMaker" component from Evaluation
and place it on the layout. Connect the ClassAssigner to the
CrossValidationFoldMaker by right clicking over "ClassAssigner" and
selecting "dataSet" from under "Connections" in the menu.
Next expand the "Classifiers" entry in the tree, then the "trees"
sub-entry and select the "J48" component. Place it on the layout.
Connect the CrossValidationFoldMaker to J48 TWICE by first choosing
"trainingSet" and then "testSet" from the pop-up menu for the
CrossValidationFoldMaker.
Next go back to the "Evaluation" entry and place a
"ClassifierPerformanceEvaluator" component on the layout. Connect J48
to this component by selecting the "batchClassifier" entry from the
pop-up menu for J48.
Next expand the "Visualization" entry and place a "TextViewer"
component on the layout. Connect the ClassifierPerformanceEvaluator to
the TextViewer by selecting the "text" entry from the pop-up menu for
ClassifierPerformanceEvaluator.
Now start the flow executing by pressing the blue "play" icon at the
top-left of the display. Progress information for the executing
components willa appear in the "Status" area and "Log" at the bottom
of the window.
When finished you can view the results by choosing show results from
the pop-up menu for the TextViewer component.
Other cool things to add to this flow: connect a TextViewer and/or a
GraphViewer to J48 in order to view the textual or graphical
representations of the trees produced for each fold of the cross
validation (this is something that is not possible in the Explorer).
-----------------------------
© 2015 - 2025 Weber Informatics LLC | Privacy Policy