docs.javahelp.manual.common_tasks.loading_tabular_data.html Maven / Gradle / Ivy

Go to download
Show more of this group Show more artifacts with this name
Show all versions of tetrad-lib Show documentation
There is a newer version: 7.6.6



    
    Graph




    
        
            Loading Data
        
    




    Tabular Data
The basic file format for tabular data in Tetrad is a standard whitespace, tab-delimited, or comma-delimited listing
    of variables and data, where padding spaces are ignored. So, for instance (showing tabs as "\t"),

    X1 \t X2 \t X3 \t X4 \t X5

is interpreted as a list of five variables names, "X1", "X2", "X3", "X4", and
    "X4"; the spaces are ignored. If tabs or commas are used to delimit values, then missing values may be
    represented by consecutive tabs (or commas) with nothing but spaces in between. For example, 

    1.2\t\t1.4\t\t\t1.5

or

    1.2,,1.4,,, 1.5

would be read as 1.2, followed by a missing value, followed by 1.4, followed by two missing values, followed by 1.5.
    If only spaces are used to delimit values, then missing values must be denoted using asterisks ("*"), as
    follows:

    1.2 * 1.4 * * 1.5

Asterisks may always be used to denote missing values--e.g.,

    1.2\t * \t 1.4 \t * \t * \t 1.5

but allowing them to be specified by consecutive tabs or commas makes it easier to read some data sets.
When reading tabular data, Tetrad expects either rectangular continuous data sets or rectangular discrete data sets.
    Since data sets may in some cases be quite large, Tetrad does not try to guess as to which type of data set you are
    trying to load, unless you explicitly tell it which type it is by including "/continuousdata" or "/discretedata"
    at the top of the file, before the list of variable names.
Reading continuous data from a file is fairly straightforward. Reading discrete data, by contrast, can be a bit
    tricky. The difficulty is getting the details of the discrete variables right. Tetrad currently assumes that all
    discrete variables are nominal, sidestepping the problem of distinguishing between nominal and ordinal variables.
    Also, most of the time the categories for a variable can simply be read off of the column of values itself. Tetrad
    assumes that if the values in a column are non-negative integers, the categories for the variable for that column
    should be "0", "1", ..., "m", where "m" is the literal for the maximum
    integer in the column. So most of the time if you reading in integral data, Tetrad will get the categories for your
    variables correct for those variables as well. However, once in a while data needs to be read in whose variables
    don't satisfy these constraints. There are two basic problems:

    The variable for a column might be integral even if it shouldn't be interpreted as having categories 0, 1, ...,
        m for maximal m in the column. Or,
    
     The variable for a column might have categories that aren't attested anywhere in the column.

To solve these problems, for discrete data, a header section is allows in which variables are defined.


    

    


 
You can import any tab delimited data . The data file needs a one line header:

    

    For data for continuous variables, which must be numeric the header line is:

    /continuous 

    

    For data for discrete variables, which can be alphanumeric, the header line is:

    /discrete

    

    For a lower triangular covariance or correlation matrix, the header is:

    /covariance

    

    You must also include a row with the names of the variables in the appropriate order for the data file or covariance
    matrix.

    

    For example, if you write a EXCEL file:with the names of variables in the first row, and save it as a text file, for
    example:

    

    /continuous

    X1    X2    X3

    4    2    1

    2    5    0.09

    

    and in the File menu above the Tetrad data sheet click ":Load." Any previous data is erased and you see:

    

    

    

    



    
        Note that the variable names occur in the correct place at the top of each column. It is essential that the first row of the data file you wish to import
            contain the variable names, tab delimited. Also, do not include empty spaces between rows. 

            

            A data file does not require that any other boxes have flowgraph edges directed into it. A standalone
            Data box can be used to import an external data file. 

            

            A data file can be saved by opening the "File" tab and clicking "Save." It can be reloaded just as can any
            imported file.

            

            Inside a Tetrad data sheet you can use the mouse to select individual cells, rows or columns. (To select
            more than one column or row, hold down the shift key). Then, by opening the Edit tab, you can copy, delete
            or insert cells, rows and columns. For example, you can select the X2 column in the picture above, copy it,
            select a new empty column, and paste a copy of the selected column in the new column.

            

            The Manip tab in the Tetrad data sheet permits simple manipulations of the data. Other data manipulations
            use the Manipulate Data box.

            

            1. Continuous data can be projected to a set of discrete values. For example if you select all of the
            colunms in a data sheet