![JAR search and dependency download from the Maven repository](/logo.png)
org.hazlewood.connor.bottema.emailaddress.EmailAddressCriteria Maven / Gradle / Ivy
Show all versions of emailaddress-rfc2822 Show documentation
package org.hazlewood.connor.bottema.emailaddress;
import static java.util.EnumSet.of;
import java.util.EnumSet;
/**
* Defines a set of restriction flags for email address validation. To remain completely true to RFC 2822, all flags should be set to true
.
*
* There are a few basic use cases:
*
* -
* User wants to scrape as much data from a possibly-ugly address as they can and make a sensible address from it; these users typically allow all
* kinds of addresses (except perhaps for single-domain addresses) because in the wild, legitimate senders often violate 2822. E.g. If your goal is to
* parse spammy emails for analysis, you may want to allow every variation out there just so you can parse something useful.
*
* -
* User wants to check to see if an email address is of proper, normal syntax; e.g. checking the value entered in a form. These users typically make
* everything strict, since what most people consider a "valid" email address is a drastic subset of 2822. For users with the strictest requirements,
* this library may not be enough, since although it checks most of RFC 2822, it might still be too 'tolerant' for their needs (on the other side of
* the spectrum, most libraries use a simple [email protected] type regex, which as we of course know is
* rarely a good idea)
*
* -
* User wants to intelligently parse a possibly-ugly address with the goal being a cleaned up usable address that other software
* (MTAs, databases, whatever) can use / parse without breaking; {@link #DEFAULT} tailors to this use case (with the possible exception of
* {@link #ALLOW_DOT_IN_A_TEXT}, to taste). In our experience they allowed "real" addresses the highest percentage of the time, and the addresses they
* failed on were almost all ridiculous.
*
*
*
* @author Benny Bottema
*/
public enum EmailAddressCriteria {
/**
* This criteria changes the behavior of the domain parsing. If included, the parser will allow 2822 domains, which include single-level domains (e.g.
* bob@localhost) as well as domain literals, e.g.:
someone@[192.168.1.100] or
john.doe@[23:33:A2:22:16:1F] or
me@[my
* computer]
*
*
The RFC says these are valid email addresses, but most people don't like allowing them. If you don't want to allow them, and only want to allow valid
* domain names (RFC 1035, x.y.z.com, etc), and specifically only those with at least two levels
* ("example.com"), then don't include this critera.
*/
ALLOW_DOMAIN_LITERALS,
/**
* This criteria states that as per RFC 2822, quoted identifiers are allowed (using quotes and angle brackets around the raw address), e.g.:
*
* "John Smith" <[email protected]>
*
*
The RFC says this is a valid mailbox. If you don't want to allow this, because for example, you only want users to enter in a raw address
* ([email protected] - no quotes or angle brackets), then don't include this criteria.
*/
ALLOW_QUOTED_IDENTIFIERS,
/**
* This criteria allows "." to appear in atext (note: only atext which appears in the 2822 "name-addr" part of the address, not the
* other instances)
*
* The addresses:
Kayaks.org <[email protected]>
Bob K. Smith<[email protected]>
*
* ...are not valid. They should be:
"Kayaks.org" <[email protected]>
"Bob K. Smith"
* <[email protected]>
*
* If this criteria is not included, the parser will act per 2822 and will require the quotes; if included, it will allow the use of "." without
* quotes.
*/
ALLOW_DOT_IN_A_TEXT,
/**
* This criteria allows "[" or "]" to appear in atext. Not very useful, maybe, but there it is.
*
* The address:
[Kayaks] <[email protected]> ...is not valid. It should be:
"[Kayaks]" <[email protected]>
*
* If this criteria is not included, the parser will act per 2822 and will require the quotes; if included, it will allow them to be missing.
*
* One real-world example seen:
*
* Bob Smith [mailto:[email protected]]=20
*
* Use at your own risk. There may be some issue with enabling this feature in conjunction with {@link #ALLOW_DOMAIN_LITERALS}, but i haven't looked into
* that. If the ALLOW_DOMAIN_LITERALS
criteria is not included, I think this should be pretty safe. Whether or not it's useful, that's up to
* you.
*/
ALLOW_SQUARE_BRACKETS_IN_A_TEXT,
/**
* This criteria allows as per RFC 2822 ")" or "(" to appear in quoted versions of the localpart (they are never allowed in unquoted
* versions)
*
* You can disallow it, but better to include this criteria. I left this hanging around (from an earlier incarnation of the code) as a random option you can
* switch off. No, it's not necssarily useful. Long story.
*
* If this criteria is not included, it will prevent such addresses from being valid, even though they are: "bob(hi)smith"@test.com
*/
ALLOW_PARENS_IN_LOCALPART;
/**
* The default setting is not strictly 2822 compliant. For example, it does not include the {@link #ALLOW_DOMAIN_LITERALS} criteria, which results in
* exclusions on single domains. Useful for cleaning up email strings that other middleware (ie. the next server) will be able to understand.
*
* Included in the defaults are:
- {@link #ALLOW_QUOTED_IDENTIFIERS}
- {@link #ALLOW_PARENS_IN_LOCALPART}
.
*/
public static final EnumSet DEFAULT = of(ALLOW_QUOTED_IDENTIFIERS, ALLOW_PARENS_IN_LOCALPART);
/**
* Criteria which is most RFC 2822 compliant and allows all compliant address forms, including the more exotic ones. Most useful for validating the broadest
* range of email address that should be allowed within the boundaries of RFC compliancy.
*/
public static final EnumSet RFC_COMPLIANT = EnumSet.allOf(EmailAddressCriteria.class);
}