com.hfg.bio.seq.pattern.ProteinPattern Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of com_hfg Show documentation
Show all versions of com_hfg Show documentation
com.hfg xml, html, svg, and bioinformatics utility library
package com.hfg.bio.seq.pattern;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import com.hfg.bio.seq.BioSequenceType;
import com.hfg.bio.seq.Protein;
import com.hfg.bio.seq.SeqLocation;
//------------------------------------------------------------------------------
/**
Container for a protein pattern (motif).
From the PROSITE user manual:
The patterns are described using the following conventions:
- The standard IUPAC one-letter codes for the amino acids are used.
- The symbol 'x' is used for a position where any amino acid is accepted.
- Ambiguities are indicated by listing the acceptable amino acids for a given position, between square parentheses '[ ]'.
For example: [ALT] stands for Ala or Leu or Thr.
- Ambiguities are also indicated by listing between a pair of curly brackets '{ }' the amino acids that are not accepted
at a given position. For example: {AM} stands for any amino acid except Ala and Met.
- Each element in a pattern is separated from its neighbor by a '-'.
- Repetition of an element of the pattern can be indicated by following that element with a numerical value or a numerical
range between parenthesis. Examples: x(3) corresponds to x-x-x, x(2,4) corresponds to x-x or x-x-x or x-x-x-x.
- When a pattern is restricted to either the N- or C-terminal of a sequence, that pattern either starts with a '<' symbol
or respectively ends with a '>' symbol. In some rare cases (e.g. PS00267 or PS00539), '>' can also occur inside square
brackets for the C-terminal element. 'F-[GSTV]-P-R-L-[G>]' means that either 'F-[GSTV]-P-R-L-G' or 'F-[GSTV]-P-R-L>' are considered.
- A period ends the pattern.
Examples:
PA [AC]-x-V-x(4)-{ED}.
This pattern is translated as: [Ala or Cys]-any-Val-any-any-any-any-{any but Glu or Asp}
PA <A-x-[ST](2)-x(0,1)-V.
This pattern, which must be in the N-terminal of the sequence ('<'), is translated as: Ala-any-[Ser or Thr]-[Ser or Thr]-(any or none)-Val
@author J. Alex Taylor, hairyfatguy.com
*/
//------------------------------------------------------------------------------
// com.hfg Library
//
// This library is free software; you can redistribute it and/or
// modify it under the terms of the GNU Lesser General Public
// License as published by the Free Software Foundation; either
// version 2.1 of the License, or (at your option) any later version.
//
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
// Lesser General Public License for more details.
//
// You should have received a copy of the GNU Lesser General Public
// License along with this library; if not, write to the Free Software
// Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
//
// J. Alex Taylor, President, Founder, CEO, COO, CFO, OOPS hairyfatguy.com
// [email protected]
//------------------------------------------------------------------------------
public class ProteinPattern extends SeqPattern
{
//###########################################################################
// CONSTRUCTORS
//###########################################################################
//--------------------------------------------------------------------------
public ProteinPattern(String inPrositePattern)
{
super(inPrositePattern);
}
//###########################################################################
// PUBLIC METHODS
//###########################################################################
//--------------------------------------------------------------------------
public BioSequenceType getBioSequenceType()
{
return BioSequenceType.PROTEIN;
}
//--------------------------------------------------------------------------
public ProteinPattern setIgnoreGaps(boolean inValue)
{
return (ProteinPattern) super.setIgnoreGaps(inValue);
}
//--------------------------------------------------------------------------
@Override
public ProteinPattern setMaxMismatches(int inValue)
{
return (ProteinPattern) super.setMaxMismatches(inValue);
}
//--------------------------------------------------------------------------
protected ProteinPatternMatch createMatch(String inSeq, SeqLocation inLocation)
{
return new ProteinPatternMatch(this, inSeq, inLocation);
}
}