smile.data.package-info Maven / Gradle / Ivy
The newest version!
/*******************************************************************************
* Copyright (c) 2010 Haifeng Li
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*******************************************************************************/
/**
* Data and attribute encapsulation classes. A data is a set of datum objects,
* which are usually defined by attribute-value pairs. The datum object could
* be very sparse and thus is stored in a list to save space. A datum object
* may have an associated class label (for classification) or real-valued
* response value (for regression). Optionally, a datum object or attribute
* may have a (positive) weight value, whose meaning depends on applications.
* However, most machine learning methods are not able to utilize this extra
* weight information. There are, generally speaking, two major types of attributes:
*
* - Qualitative variables:
* - The data values are non-numeric categories. Examples: Blood type, Gender.
* - Quantitative variables:
* - The data values are counts or numerical measurements. A quantitative
* variable can be either discrete such as the number of students receiving
* an 'A' in a class, or continuous such as GPA, salary and so on.
*
* Another way of classifying data is by the measurement scales. In statistics,
* there are four generally used measurement scales:
*
* - Nominal data:
* - data values are non-numeric group labels. For example, Gender variable
* can be defined as male = 0 and female =1.
* - Ordinal data:
* - data values are categorical and may be ranked in some numerically
* meaningful way. For example, strongly disagree to strong agree may be
* defined as 1 to 5.
* - Continuous data:
* -
* Interval data:
* data values are ranged in a real interval, which can be as large as
* from negative infinity to positive infinity. The difference between two
* values are meaningful, however, the ratio of two interval data is not
* meaningful. For example temperature, IQ.
*
* Ratio data:
* both difference and ratio of two values are meaningful. For example,
* salary, weight.
*
*
*
* @author Haifeng Li
*/
package smile.data;
© 2015 - 2025 Weber Informatics LLC | Privacy Policy