resources.javahelp.manual.boxes.search.bpc.html Maven / Gradle / Ivy
Search Algorithms: Find Pure Clusters
Search Algorithms: BPC
Introduction
Build Pure Clusters (BPC) is one of the three algorithms in Tetrad
designed to build pure
measurement/structural models (the others are the MIM Build algorithm
and the Purify algorithm).
The goal of Build Pure Clusters is to build a pure measurement model
using observed variables from a data set. Observed variables are
clustered into disjoint groups, each group representing
indicators of a single hidden variable. Variables in one group are not
indicators of the hidden variables associated with the other groupsl. Also, some variables given as input will not
be
used because they do not fit into a pure measurement model along with
the chosen ones.
The Build Pure Clusters algorithm assumes that the population can be
described as a measurement/structural model where observed variables are linear indicators of the unknown latents.
Notice that linearity among latents is not necessary (although it will
be necessary for the MIM Build algorithm)
and latents do not need to be continuous. It is also assumed that the
unknown population graph contains a pure subgraph where each latent has
at least three indicators. This assumption is not testable is should be
evaluated by the plausibility of the final model.
The current implementation of the algorithm
accepts only continuous data sets as input. For general information
about model building algorithms, consult the Search Algorithms page.
Entering Build Pure Clusters parameters
For example, consider a model with this true graph:
If data is generated using this model and a search is constructed from the data, selecting BPC, the following
parameters will be requested:
- depErrorsAlpha value: Build Pure Clusters uses
statistical hypothesis tests in order to generate models automatically.
The depErrorsAlpha value parameter represents the level by which such tests are
used to accept or reject constraints that compose the final output. The
default value is 0.05, but the user may want to experiment with
different depErrorsAlpha values in order to test the sensitivity of her data
within this algorithm.
- number of iterations: Build Pure Clusters uses a
randomized procedure in order to generate a model, since in general
there are different pure measurement submodels of a given general
measurement/structural model. This option allows the use to specify a
given number of runs of the algorithm, where the outputs given for each
run are combined together into s single model. This usually provides
models that are more robust against statistical flunctuations and
slight deviances from the assumptions.
- statistical test: as stated before, automated
model building is done by testing statistical hypothesis. Build Pure
Clusters provides two basic statistical tests that can be used.
Wishart's Tetrad ssumes that the given variables follow a multivariate
normal distribution. Bollen's Tetrad test not make this assumption.
However, it needs to compute a matrix of fourth moments, which can be
time consuming. It is also less robust against sampling variability
when compared to Wishart's test if the data actually follows a
multivariate normal distribution..
Upon executin the search, BPC returns a pure measurement
model. Because of the internal randomization, outputs may vary from run
to run, but one should not expect large differences (and this can be
actually used to evaluate if the assumptions are reasonable for a given
set of input variables). In our example, the outcome should be as
follows if the sample is representative of the population:
Edges with circles at the endpoints are added only to distinguish
latent variables from the indicators. BPC does not make
any claims about the causal relationships among latent variables (this
is the role of the MIM
Build algorithm). The labels given to the latent
variables are arbitrary. As part of the analysis, a domain expert
should evaluate if such latents have indeed a physical or abstract
meaning, or if they should be discarded as meaningless. Such
reification is domain dependent.
Note: If the output is not arranged helpfully, use the Fruchterman-Reingold layout in the Layout menu to arrange more
readably.