Many resources are needed to download a project. Please understand that we have to compensate our server costs. Thank you in advance. Project price only 1 $
You can buy this project and download/modify it how often you want.
/*
* Copyright 2016 Miroslav Janíček
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*
* --
* Portions of this file are licensed under the Lua license. For Lua
* licensing details, please visit
*
* http://www.lua.org/license.html
*
* Copyright (C) 1994-2016 Lua.org, PUC-Rio.
*
* Permission is hereby granted, free of charge, to any person obtaining
* a copy of this software and associated documentation files (the
* "Software"), to deal in the Software without restriction, including
* without limitation the rights to use, copy, modify, merge, publish,
* distribute, sublicense, and/or sell copies of the Software, and to
* permit persons to whom the Software is furnished to do so, subject to
* the following conditions:
*
* The above copyright notice and this permission notice shall be
* included in all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
* EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
* MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.
* IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY
* CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
* TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
* SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
*/
package net.sandius.rembulan.lib.impl;
import net.sandius.rembulan.lib.StringLib;
import net.sandius.rembulan.util.Check;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
/**
* Patterns in Lua are described by regular strings, which are interpreted as patterns
* by the pattern-matching functions {@link StringLib#_find() {@code string.find}},
* {@link StringLib#_gmatch() {@code string.gmatch}},
* {@link StringLib#_gsub() {@code string.gsub}},
* and {@link StringLib#_match() {@code string.match}}.
* This section describes the syntax and the meaning (that is, what they match) of these
* strings.
*
*
Character Class:
*
*
A character class is used to represent a set of characters. The following
* combinations are allowed in describing a character class:
*
*
*
x: (where x is not one of the magic characters
* {@code ^$()%.[]*+-?}) represents the character x itself.
*
{@code .}: (a dot) represents all characters.
*
{@code %a}: represents all letters.
*
{@code %c}: represents all control characters.
*
{@code %d}: represents all digits.
*
{@code %g}: represents all printable characters except space.
*
{@code %l}: represents all lowercase letters.
*
{@code %p}: represents all punctuation characters.
*
{@code %s}: represents all space characters.
*
{@code %u}: represents all uppercase letters.
*
{@code %w}: represents all alphanumeric characters.
*
{@code %x}: represents all hexadecimal digits.
*
%x: (where x is any non-alphanumeric character)
* represents the character x. This is the standard way to escape the magic characters.
* Any non-alphanumeric character (including all punctuation characters, even the non-magical)
* can be preceded by a {@code '%'} when used to represent itself in a pattern.
*
[set]: represents the class which is the union
* of all characters in set. A range of characters can be specified by separating the end
* characters of the range, in ascending order, with a {@code '-'}. All classes
* %x described above can also be used as components in set.
* All other characters in set represent themselves. For example, {@code [%w_]}
* (or {@code [_%w]}) represents all alphanumeric characters plus the underscore,
* {@code [0-7]} represents the octal digits, and {@code [0-7%l%-]} represents the octal
* digits plus the lowercase letters plus the {@code '-'} character.
*
*
You can put a closing square bracket in a set by positioning it as the first character
* in the set. You can put an hyphen in a set by positioning it as the first or the last
* character in the set. (You can also use an escape for both cases.)
*
*
The interaction between ranges and classes is not defined. Therefore, patterns like
* {@code [%a-z]} or {@code [a-%%]} have no meaning.
*
*
[^set]: represents the complement of set,
* where set is interpreted as above.
*
*
*
For all classes represented by single letters ({@code %a}, {@code %c}, etc.),
* the corresponding uppercase letter represents the complement of the class. For instance,
* {@code %S} represents all non-space characters.
*
* The definitions of letter, space, and other character groups depend on the current locale.
* In particular, the class {@code [a-z]} may not be equivalent to {@code %l}.
*
*
Pattern Item:
*
*
A pattern item can be
*
*
*
a single character class, which matches any single character in the class;
*
a single character class followed by {@code '*'}, which matches zero or more repetitions
* of characters in the class. These repetition items will always match the longest possible
* sequence;
*
a single character class followed by {@code '+'}, which matches one or more repetitions
* of characters in the class. These repetition items will always match the longest possible
* sequence;
*
a single character class followed by {@code '-'}, which also matches zero or more
* repetitions of characters in the class. Unlike {@code '*'}, these repetition items will
* always match the shortest possible sequence;
*
a single character class followed by {@code '?'}, which matches zero or one occurrence
* of a character in the class. It always matches one occurrence if possible;
*
%n, for n between 1 and 9; such item matches a substring
* equal to the n-th captured string (see below);
*
%bxy, where x and y are two distinct characters;
* such item matches strings that start with x, end with y, and where the
* x and y are balanced. This means that, if one reads the string from
* left to right, counting +1 for an x and -1 for a y, the ending y
* is the first y where the count reaches 0. For instance, the item {@code %b()}
* matches expressions with balanced parentheses.
*
%f[set], a frontier pattern; such item matches an empty string
* at any position such that the next character belongs to set and the previous
* character does not belong to set. The set set is interpreted as previously
* described. The beginning and the end of the subject are handled as if they were
* the character {@code '\0'}.
*
*
*
Pattern:
*
*
A pattern is a sequence of pattern items. A caret {@code '^'} at the beginning
* of a pattern anchors the match at the beginning of the subject string. A {@code '$'}
* at the end of a pattern anchors the match at the end of the subject string. At other positions,
* {@code '^'} and {@code '$'} have no special meaning and represent themselves.
*
*
Captures:
*
*
A pattern can contain sub-patterns enclosed in parentheses; they describe captures.
* When a match succeeds, the substrings of the subject string that match captures are stored
* (captured) for future use. Captures are numbered according to their left parentheses.
* For instance, in the pattern {@code "(a*(.)%w(%s*))"}, the part of the string matching
* {@code "a*(.)%w(%s*)"} is stored as the first capture (and therefore has number 1);
* the character matching {@code "."} is captured with number 2, and the part matching
* {@code "%s*"} has number 3.
*
*
As a special case, the empty capture {@code ()} captures the current string position
* (a number). For instance, if we apply the pattern {@code "()aa()"} on the string
* {@code "flaaap"}, there will be two captures: 3 and 5.
*/
public class StringPattern {
private final List items;
private final int numCaptures;
private StringPattern(
List items,
int numCaptures) {
this.items = Check.notNull(items);
this.numCaptures = Check.nonNegative(numCaptures);
}
private static final String MAGIC_CHARS = "^$()%.[]*+-?";
private static final String PUNCTUATION_CHARS = ".,;:?!";
private static boolean isMagic(char c) {
return MAGIC_CHARS.indexOf(c) != -1;
}
public static class Match {
private final String originalString;
private final int beginIndex;
private final int endIndex;
private final List