org.hazlewood.connor.bottema.emailaddress.EmailAddressValidator Maven / Gradle / Ivy
Go to download
Show more of this group Show more artifacts with this name
Show all versions of emailaddress-rfc2822 Show documentation
Show all versions of emailaddress-rfc2822 Show documentation
The world's only more-or-less-2822-compliant Java-based email address extractor / verifier
/*
* RFC2822 email address parsing and extraction, some header verification.
*
* Validates an email address according to RFC 2822, using regular expressions.
*
* @author Les Hazlewood, Casey Connor, Benny Bottema
*/
package org.hazlewood.connor.bottema.emailaddress;
/*
* Original code Copyright 2008 Les Hazlewood
* Original code Copyright 2013-2016 Les Hazlewood, Boxbe, Inc., Casey Connor
* Original code Copyright 2016 Benny Bottema
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
import java.util.EnumSet;
/**
* A utility class to parse, clean up, and extract email addresses from messages per RFC2822 syntax. Designed to integrate with Javamail (this class will
* require that you have a javamail mail.jar in your classpath), but you could easily change the existing methods around to not use Javamail at all. For
* example, if you're changing the code, see the difference between getInternetAddress and getDomain: the latter doesn't depend on any javamail code. This is
* all a by-product of what this class was written for, so feel free to modify it to suit your needs.
*
* For real-world addresses, this class is roughly 3-4 times slower than parsing with InternetAddress (although recent versions of this class might be faster),
* but it can handle a whole lot more. Because of sensible design tradeoffs made in javamail, if InternetAddress has trouble parsing, it might throw an
* exception, but often it will silently leave the entire original string in the result of ia.getAddress(). This class can be trusted to only provide
* authenticated results.
*
* This class has been successfully used on many billion real-world addresses, live in production environments, but it's not perfect yet.
*
* Comments/Questions/Corrections welcome: https://github.com/bbottema/email-rfc2822-validator/issues
*
*
Historie:
*
* Started with code by Les Hazlewood: leshazlewood.com.
*
* Modified/added (Casey Connor): removed some functions, added support for CFWS token, corrected FWSP token, added some boolean flags, added getInternetAddress
* and extractHeaderAddresses and other methods, some optimization.
*
* Modified/added (Benny Bottema): modularized the code and separated configuration, validation and extraction functions.
*
* Where Mr. Hazlewood's version was more for ensuring certain forms that were passed in during registrations, etc, this handles more types of verifying as well
* a few forms of extracting the data in predictable, cleaned-up chunks.
*
* Note: CFWS means the "comment folded whitespace" token from 2822, in other words, whitespace and comment text that is enclosed in ()'s.
*
* Limitations: doesn't support nested CFWS (comments within (other) comments), doesn't support mailbox groups except when flat-extracting addresses from
* headers or when doing verification, doesn't support any of the obs-* tokens. Also: the getInternetAddress and extractHeaderAddresses methods return
* InternetAddress objects; if the personal name has any quotes or \'s in it at all, the InternetAddress object will always escape the name entirely and put it
* in quotes, so multiple-token personal names with those characters somewhere in them will always be munged into one big escaped string. This is not really a
* big deal at all, but I mention it anyway. (And you could get around it by a simple modification to those methods to not use InternetAddress objects.) See the
* docs of those methods for more info.
*
* Note: Unlike InternetAddress, this class will preserve any RFC-2047-encoding of international characters. Thus doing my_internetaddress.getPersonal() will
* return the 2047-encoded string, ready for use in an RFC-822-compliant message, whereas the common InternetAddress constructor (when used outside the context
* of EmailAddressValidator) would return the decoded version of the text, if any was needed. If you need the decoded form, you can do something like this
* (where ia is the InternetAddress object returned from an EmailAddressValidator method):
*
* ia.setPersonal(javax.mail.internet.MimeUtility.decodeText(ia.getPersonal()));
*
* ...subsequent calls to ia.getPersonal() will then return the decoded text.
*
* Note: This class does not do any header-length-checking. There are no such limitations on the email address grammar in 2822, though email headers in general
* do have length restrictions. So if the return path is 40000 unfolded characters long, but otherwise valid under 2822, this class will pass it.
*
* Examples of passing (2822-valid) addresses, believe it or not:
*
* bob @example.com
"bob" @ example.com
bob (comment) (other comment) @example.com (personal name)
*
"<bob \" (here) " < (hi there) "bob(the man)smith" (hi) @ (there) example.com (hello) > (again)
*
* (none of which are permitted by javamail's InternetAddress parsing, incidentally)
*
* By using getInternetAddress(), you can retrieve an InternetAddress object that, when toString()'ed, would reveal that the parser had converted the above
* into:
*
* <[email protected]>
<[email protected]>
"personal name" <[email protected]>
"<bob
* \" (here)" <"bob(the man)smith"@example.com> (respectively)
If parsing headers, however, you'll probably be calling
* extractHeaderAddresses().
*
* A future improvement may be to use this class to extract info from corrupted addresses, but for now, it does not permit them.
*
* Some of the configuration booleans allow a bit of tweaking already. The source code can be compiled with these booleans in various states. They are
* configured to what is probably the most commonly-useful state.
*
* @author Les Hazlewood, Casey Connor, Benny Bottema
* @version 1.13 (just regex validation engine)
*/
public final class EmailAddressValidator {
/**
* Private constructor; this is a utility class with static methods only, not designed for extension.
*/
private EmailAddressValidator() {
//
}
/**
* Validates an e-mail with default validation flags. The default setting is not strictly 2822 compliant. For example, it does not include the {@link
* EmailAddressCriteria#ALLOW_DOMAIN_LITERALS} criteria, which results in exclusions on single domains. Useful for cleaning up email strings that other
* middleware (ie. the next server) will be able to understand.
*
* @param email A string representing an email address.
* @return Whether the e-mail address is a valid address excluding the more exotic formats.
* @see EmailAddressCriteria#DEFAULT
*/
public static boolean isValid(final String email) {
return isValid(email, EmailAddressCriteria.DEFAULT);
}
/**
* Validates an e-mail with default validation flags that remains true
to RFC 2822.
*
* @param email A string representing an email address.
* @return Whether the e-mail address is compliant with RFC 2822.
* @see EmailAddressCriteria#RFC_COMPLIANT
*/
public static boolean isValidStrict(final String email) {
return isValid(email, EmailAddressCriteria.RFC_COMPLIANT);
}
/**
* Using the given validation criteria, checks to see if the specified string is a valid email address according to the RFC 2822 specification, which is
* remarkably squirrely. See doc for this class: 2822 not fully implemented, but probably close enough for almost any needs. Note that things like
* spaces
* in addresses ("bob @hi.com") are valid according to 2822! Read the docs for this class before using this method!
*
* If being used on a 2822 header, this method applies to Sender, Resent-Sender, only, although you can also use it on the Return-Path if you
* know it
* to be non-empty (see doc for isValidReturnPath()!). Folded header lines should work OK, but I haven't tested that.
*
* @param email A complete email address.
* @param criteria A set of criteria flags that restrict or relax RFC 2822 compliance.
* @return Whether the e-mail address is compliant with RFC 2822, configured using the passed in {@link EmailAddressCriteria}.
* @see EmailAddressCriteria
*/
public static boolean isValid(final String email, final EnumSet criteria) {
return isValidMailbox(email, Dragons.fromCriteria(criteria));
}
/**
* Checks to see if the specified string is a valid email address according to the RFC 2822 specification, which is remarkably squirrely. See doc for this
* class: 2822 not fully implemented, but probably close enough for almost any needs. Note that things like spaces in addresses ("bob @hi.com") are
* valid
* according to 2822! Read the docs for this class before using this method!
*
* If being used on a 2822 header, this method applies to Sender, Resent-Sender, only, although you can also use it on the Return-Path if you
* know it
* to be non-empty (see doc for isValidReturnPath()!). Folded header lines should work OK, but I haven't tested that.
*
* @param email the email address string to test for validity (null and "" OK, will return false for those)
* @param dragons the regular expressions compiled using given criteria, used to validate email strings with
* @return true if the given email text is valid according to RFC 2822, false otherwise.
*/
private static boolean isValidMailbox(String email, Dragons dragons) {
return (email != null) && dragons.MAILBOX_PATTERN.matcher(email).matches();
}
}