All Downloads are FREE. Search and download functionalities are using the official Maven repository.

de.citec.tcs.alignment.DissimilarityWeighting Maven / Gradle / Ivy

Go to download

This module defines the interface for AlignmentAlgorithms as well as some helper classes. An AlignmentAlgorithm computes an Alignment of two given input sequences, given a Comparator that works in these sequences. More details on the AlignmentAlgorithm can be found in the respective interface. More information on Comparators can be found in the comparators module. The resulting 'Alignment' may be just a real-valued dissimilarity between the input sequence or may incorporate additional information, such as a full Alignment, a PathList, a PathMap or a CooptimalModel. If those results support the calculation of a Gradient, they implement the DerivableAlignmentDistance interface. In more detail, the Alignment class represents the result of a backtracing scheme, listing all Operations that have been applied in one co-optimal Alignment. A classic AlignmentAlgorithm does not result in a differentiable dissimilarity, because the minimum function is not differentiable. Therefore, this package also contains utility functions for a soft approximation of the minimum function, namely Softmin. For faster (parallel) computation of many different alignments or gradients we also provide the ParallelProcessingEngine, the SquareParallelProcessingEngine and the ParallelGradientEngine.

The newest version!
/* 
 * TCS Alignment Toolbox Version 3
 * 
 * Copyright (C) 2016
 * Benjamin Paaßen
 * AG Theoretical Computer Science
 * Centre of Excellence Cognitive Interaction Technology (CITEC)
 * University of Bielefeld
 * 
 * This program is free software: you can redistribute it and/or modify
 * it under the terms of the GNU Affero General Public License as
 * published by the Free Software Foundation, either version 3 of the
 * License, or (at your option) any later version.
 * 
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU Affero General Public License for more details.
 * 
 * You should have received a copy of the GNU Affero General Public License
 * along with this program.  If not, see .
 */

package de.citec.tcs.alignment;

import java.util.List;
import lombok.NonNull;

/**
 * This is a helper class to enable users to weight a collection of dissimilarities based on
 * different schemes specified below.
 *
 * @author Benjamin Paassen - [email protected]
 */
public enum DissimilarityWeighting {

	/**
	 * This is a linear weighting of the input dissimilarities. In the TCS Alignment Toolbox,
	 * dissimilarities lie between 0 and 1. Therefore we can weight a dissimilaritiy d by 1-d.
	 *
	 * To be more precise, assume dissimilarities d_1, ... , d_M
	 *
	 * Then the weight of d_i is defined as:
	 *
	 * w_i := (1 - d_i) / (M - sum_j d_j)
	 */
	LINEAR,
	/**
	 * This is a softmin weighting of the dissimilarities. Assume dissimilarities d_1, ... , d_M.
	 * Then the weighting is defined as
	 *
	 * w_i := exp(-score(d_i) * beta) / (sum_j exp(-score(d_j) * beta))
	 *
	 * This is derived from the Boltzman Free Energy Minimization. Beta is set to 4.6 to ensure that
	 * a worst possible path with score 1 contributes at best 1% to the overall derivative if
	 * another path with an optimal score of 0 is present.
	 */
	SOFTMIN,
	/**
	 * In this case the weight of each dissimilarity is determined to be the value of the Gaussian
	 * probability density function with zero mean and standard deviation sigma at its score.
	 * Assume dissimilarities d_1, ... , d_M. Then the weighting is defined as
	 *
	 * w_i := exp(-0.5 * score(d_i)^2 / sigma^2) / (sum_j exp(-0.5 * score(d_j)^2 / sigma^2))
	 *
	 * sigma is set to 0.33 to ensure that a worst possible path with score 1 contributes at best 1%
	 * to the overall derivative if another path with an optimal score of 0 is present.
	 */
	GAUSSIAN;

	private static final double SOFTMIN_BETA = 4.6;
	private static final double GAUSSIAN_NORMALIZATION = 1 / (2 * 0.33 * 0.33);

	public double[] calculateWeighting(@NonNull final List scores) {
		final double[] scoreArr = new double[scores.size()];
		int i = 0;
		for (Double score : scores) {
			scoreArr[i] = score;
		}
		return calculateWeighting(scoreArr);
	}

	/**
	 * This calculates the normalized weights (between 0 and 1 and adding up to 1) for the given
	 * dissimilarities.
	 *
	 * @param d an array of dissimilarities.
	 *
	 * @return the normalized weights (between 0 and 1 and adding up to 1) for the given
	 * dissimilarities.
	 */
	public double[] calculateWeighting(@NonNull double[] d) {
		if (this == SOFTMIN) {
			return Softmin.calculateSoftminProbabilities(SOFTMIN_BETA, d);
		}

		final double[] weights = new double[d.length];
		double normalization = 0;
		for (int i = 0; i < d.length; i++) {
			switch (this) {
				case LINEAR:
					weights[i] = 1 - d[i];
					break;
				case GAUSSIAN:
					weights[i] = Math.exp(-(d[i] * d[i]) * GAUSSIAN_NORMALIZATION);
					break;
				default:
					throw new UnsupportedOperationException(
							"The Weighting " + this + " is not supported!");
			}
			if (weights[i] < 0 || weights[i] > 1) {
				throw new IllegalArgumentException("A negative or too high dissimilarity was given, resulting in invalid weights.");
			}
			normalization += weights[i];
		}
		//normalize the weights
		if (normalization == 0) {
			// if there is no normalization possible (dividing by zero) then return a uniform
			// weighting
			for (int i = 0; i < d.length; i++) {
				weights[i] = 1. / d.length;
			}
		} else {
			for (int i = 0; i < weights.length; i++) {
				weights[i] /= normalization;
			}
		}

		return weights;
	}
}




© 2015 - 2025 Weber Informatics LLC | Privacy Policy