All Downloads are FREE. Search and download functionalities are using the official Maven repository.

resources.javahelp.manual.boxes.search.bpc.html Maven / Gradle / Ivy

There is a newer version: 7.6.6
Show newest version



    Search Algorithms: Find Pure Clusters
    


Search Algorithms: BPC



Introduction

Build Pure Clusters (BPC) is one of the three algorithms in Tetrad designed to build pure measurement/structural models (the others are the MIM Build algorithm and the Purify algorithm).

The goal of Build Pure Clusters is to build a pure measurement model using observed variables from a data set. Observed variables are clustered into disjoint groups, each group representing indicators of a single hidden variable. Variables in one group are not indicators of the hidden variables associated with the other groupsl. Also, some variables given as input will not be used because they do not fit into a pure measurement model along with the chosen ones.

The Build Pure Clusters algorithm assumes that the population can be described as a measurement/structural model where observed variables are linear indicators of the unknown latents. Notice that linearity among latents is not necessary (although it will be necessary for the MIM Build algorithm) and latents do not need to be continuous. It is also assumed that the unknown population graph contains a pure subgraph where each latent has at least three indicators. This assumption is not testable is should be evaluated by the plausibility of the final model.

The current implementation of the algorithm accepts only continuous data sets as input. For general information about model building algorithms, consult the Search Algorithms page.


Entering Build Pure Clusters parameters

For example, consider a model with this true graph:

If data is generated using this model and a search is constructed from the data, selecting BPC, the following parameters will be requested:

  • depErrorsAlpha value: Build Pure Clusters uses statistical hypothesis tests in order to generate models automatically. The depErrorsAlpha value parameter represents the level by which such tests are used to accept or reject constraints that compose the final output. The default value is 0.05, but the user may want to experiment with different depErrorsAlpha values in order to test the sensitivity of her data within this algorithm.
  • number of iterations: Build Pure Clusters uses a randomized procedure in order to generate a model, since in general there are different pure measurement submodels of a given general measurement/structural model. This option allows the use to specify a given number of runs of the algorithm, where the outputs given for each run are combined together into s single model. This usually provides models that are more robust against statistical flunctuations and slight deviances from the assumptions.
  • statistical test: as stated before, automated model building is done by testing statistical hypothesis. Build Pure Clusters provides two basic statistical tests that can be used. Wishart's Tetrad ssumes that the given variables follow a multivariate normal distribution. Bollen's Tetrad test not make this assumption. However, it needs to compute a matrix of fourth moments, which can be time consuming. It is also less robust against sampling variability when compared to Wishart's test if the data actually follows a multivariate normal distribution..


Interpreting the Output.

Upon executin the search, BPC returns a pure measurement model. Because of the internal randomization, outputs may vary from run to run, but one should not expect large differences (and this can be actually used to evaluate if the assumptions are reasonable for a given set of input variables). In our example, the outcome should be as follows if the sample is representative of the population:

Edges with circles at the endpoints are added only to distinguish latent variables from the indicators. BPC does not make any claims about the causal relationships among latent variables (this is the role of the MIM Build algorithm). The labels given to the latent variables are arbitrary. As part of the analysis, a domain expert should evaluate if such latents have indeed a physical or abstract meaning, or if they should be discarded as meaningless. Such reification is domain dependent.

Note: If the output is not arranged helpfully, use the Fruchterman-Reingold layout in the Layout menu to arrange more readably.





© 2015 - 2025 Weber Informatics LLC | Privacy Policy