docs.javahelp.manual.common_tasks.loading_tabular_data.html Maven / Gradle / Ivy
Graph
Loading Data
Tabular Data
The basic file format for tabular data in Tetrad is a standard whitespace, tab-delimited, or comma-delimited listing
of variables and data, where padding spaces are ignored. So, for instance (showing tabs as "\t"),
X1 \t X2 \t X3 \t X4 \t X5
is interpreted as a list of five variables names, "X1", "X2", "X3", "X4", and
"X4"; the spaces are ignored. If tabs or commas are used to delimit values, then missing values may be
represented by consecutive tabs (or commas) with nothing but spaces in between. For example,
1.2\t\t1.4\t\t\t1.5
or
1.2,,1.4,,, 1.5
would be read as 1.2, followed by a missing value, followed by 1.4, followed by two missing values, followed by 1.5.
If only spaces are used to delimit values, then missing values must be denoted using asterisks ("*"), as
follows:
1.2 * 1.4 * * 1.5
Asterisks may always be used to denote missing values--e.g.,
1.2\t * \t 1.4 \t * \t * \t 1.5
but allowing them to be specified by consecutive tabs or commas makes it easier to read some data sets.
When reading tabular data, Tetrad expects either rectangular continuous data sets or rectangular discrete data sets.
Since data sets may in some cases be quite large, Tetrad does not try to guess as to which type of data set you are
trying to load, unless you explicitly tell it which type it is by including "/continuousdata" or "/discretedata"
at the top of the file, before the list of variable names.
Reading continuous data from a file is fairly straightforward. Reading discrete data, by contrast, can be a bit
tricky. The difficulty is getting the details of the discrete variables right. Tetrad currently assumes that all
discrete variables are nominal, sidestepping the problem of distinguishing between nominal and ordinal variables.
Also, most of the time the categories for a variable can simply be read off of the column of values itself. Tetrad
assumes that if the values in a column are non-negative integers, the categories for the variable for that column
should be "0", "1", ..., "m", where "m" is the literal for the maximum
integer in the column. So most of the time if you reading in integral data, Tetrad will get the categories for your
variables correct for those variables as well. However, once in a while data needs to be read in whose variables
don't satisfy these constraints. There are two basic problems:
- The variable for a column might be integral even if it shouldn't be interpreted as having categories 0, 1, ...,
m for maximal m in the column. Or,
- The variable for a column might have categories that aren't attested anywhere in the column.
To solve these problems, for discrete data, a header section is allows in which variables are defined.
You can import any tab delimited data . The data file needs a one line header:
For data for continuous variables, which must be numeric the header line is:
/continuous
For data for discrete variables, which can be alphanumeric, the header line is:
/discrete
For a lower triangular covariance or correlation matrix, the header is:
/covariance
You must also include a row with the names of the variables in the appropriate order for the data file or covariance
matrix.
For example, if you write a EXCEL file:with the names of variables in the first row, and save it as a text file, for
example:
/continuous
X1 X2 X3
4 2 1
2 5 0.09
and in the File menu above the Tetrad data sheet click ":Load." Any previous data is erased and you see:
Note that the variable names occur in the correct place at the top of each column. It is essential that the first row of the data file you wish to import
contain the variable names, tab delimited. Also, do not include empty spaces between rows.
A data file does not require that any other boxes have flowgraph edges directed into it. A standalone
Data box can be used to import an external data file.
A data file can be saved by opening the "File" tab and clicking "Save." It can be reloaded just as can any
imported file.
Inside a Tetrad data sheet you can use the mouse to select individual cells, rows or columns. (To select
more than one column or row, hold down the shift key). Then, by opening the Edit tab, you can copy, delete
or insert cells, rows and columns. For example, you can select the X2 column in the picture above, copy it,
select a new empty column, and paste a copy of the selected column in the new column.
The Manip tab in the Tetrad data sheet permits simple manipulations of the data. Other data manipulations
use the Manipulate Data box.
1. Continuous data can be projected to a set of discrete values. For example if you select all of the
colunms in a data sheet