The Complete Email Regex Explained

What Is a Regular Expression (Regex)?

A regular expression (often shortened to regex or regexp) is a sequence of characters that defines a search pattern. Regular expressions are used across virtually every programming language and text editor to find, match, and manipulate strings of text. They are incredibly powerful for pattern matching tasks such as validating user input, searching through large bodies of text, or extracting specific data like email addresses, phone numbers, or URLs.

At its core, a regex works by describing the structure of the text you want to match. Instead of searching for a specific word, you describe the pattern of characters. For example, the regex \d{3} matches any three consecutive digits, whether that is "123", "456", or "789".

When it comes to email extraction, regex is the go-to method. Every email address follows a predictable structure: a local part, the @ symbol, and a domain. This predictable format makes email addresses ideal candidates for regex matching.

The Standard Email Regex Pattern

The most commonly used email regex pattern is:

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}

This pattern strikes a balance between accuracy and simplicity. It will correctly match the vast majority of real-world email addresses without being overly complex. Let us break down exactly what each part does.

Breaking Down Each Part

1. The Local Part: `[a-zA-Z0-9._%+-]+`

This is the portion before the @ symbol (e.g., "john.doe" in john.doe@example.com). The character class [a-zA-Z0-9._%+-] matches:

a-z – lowercase letters
A-Z – uppercase letters
0-9 – digits
. – periods (dots)
_ – underscores
% – percent signs
+ – plus signs
- – hyphens

The + quantifier after the bracket means "one or more of these characters." So the local part must contain at least one character from the allowed set.

2. The At Symbol: `@`

This is a literal match for the @ character. Every valid email address contains exactly one @ symbol separating the local part from the domain. In regex, the @ character has no special meaning, so it matches itself directly.

3. The Domain Name: `[a-zA-Z0-9.-]+`

This matches the domain portion (e.g., "example" or "mail.example" in user@mail.example.com). The allowed characters are:

a-z, A-Z – letters
0-9 – digits
. – periods (for subdomains like mail.example)
- – hyphens (common in domain names like my-company.com)

Again, the + quantifier requires at least one character.

4. The Dot Before the TLD: `\.`

This matches a literal period (dot) character. The backslash is necessary because in regex, an unescaped . matches any character. The \. ensures we match only an actual dot, which separates the domain name from the top-level domain.

5. The Top-Level Domain (TLD): `[a-zA-Z]{2,}`

This matches the TLD such as "com", "org", "net", "de", or newer TLDs like "technology" or "email". The {2,} quantifier means "two or more characters," which ensures we match valid TLDs (the shortest ones like ".ai" or ".uk" have two characters) while also accommodating longer TLDs like ".museum" or ".photography".

Common Regex Variations

Simple and Permissive

If you just need a quick check and do not care about edge cases:

.+@.+\..+

This matches "anything, then @, then anything, then a dot, then anything." It is fast but will match many invalid strings.

Strict Practical Pattern

A more strict pattern that enforces reasonable constraints:

^[a-zA-Z0-9](?:[a-zA-Z0-9._%+-]{0,63})@(?:[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?\.)+[a-zA-Z]{2,63}$

This version adds several improvements:

The local part must start with an alphanumeric character
Domain labels must start and end with alphanumeric characters
Maximum label lengths are enforced (63 characters per RFC 1035)
The TLD is limited to 63 characters

RFC 5322 Compliant Pattern

The official RFC 5322 standard defines the full email syntax. A truly compliant regex is extremely long and complex:

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

In practice, almost nobody uses the full RFC 5322 regex. The standard pattern [a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,} covers 99.9% of real-world email addresses and is far easier to read, maintain, and debug.

Code Examples

JavaScript

Extract all emails from a string

function extractEmails(text) {
  const regex = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g;
  const matches = text.match(regex);
  return matches ? [...new Set(matches)] : [];
}

// Example usage
const text = "Contact us at info@example.com or support@my-company.org";
const emails = extractEmails(text);
console.log(emails);
// Output: ["info@example.com", "support@my-company.org"]

Validate a single email address

function isValidEmail(email) {
  const regex = /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
  return regex.test(email);
}

console.log(isValidEmail("user@example.com"));   // true
console.log(isValidEmail("invalid@.com"));        // false
console.log(isValidEmail("no-at-sign.com"));      // false

Python

Extract and validate emails

import re

def extract_emails(text):
    pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
    return list(set(re.findall(pattern, text)))

def is_valid_email(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

# Extract emails from text
text = """
Please reach out to sales@example.com for pricing.
Technical support is available at help@support.example.org.
You can also email the CEO directly: ceo@big-company.co.uk
"""
emails = extract_emails(text)
print(emails)
# Output: ['sales@example.com', 'help@support.example.org', 'ceo@big-company.co.uk']

# Validate single email
print(is_valid_email("test@example.com"))  # True
print(is_valid_email("not-an-email"))      # False

PHP

Extract emails from a string

<?php
function extractEmails(string $text): array {
    $pattern = '/[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/';
    preg_match_all($pattern, $text, $matches);
    return array_unique($matches[0]);
}

// Example usage
$text = "Email us at contact@example.com or admin@website.org for help.";
$emails = extractEmails($text);
print_r($emails);
// Output: Array ( [0] => contact@example.com [1] => admin@website.org )

// Validate a single email (PHP also has a built-in function)
$email = "user@example.com";
if (filter_var($email, FILTER_VALIDATE_EMAIL)) {
    echo "$email is valid\n";
}
?>

Common Pitfalls and Edge Cases

1. Internationalized Email Addresses (IDN)

The standard regex does not handle internationalized domain names (IDN) or local parts with Unicode characters. Email addresses like user@beispiel.de work fine, but user@bsp.xn--e1afmapc (Punycode) or addresses with characters like umlauts in the local part (müller@example.com) will not be matched. If you need to support these, you must expand the character classes to include Unicode ranges.

2. New and Long TLDs

Since the introduction of new generic TLDs (gTLDs), email addresses can end with TLDs like .photography, .technology, or .international. The {2,} quantifier in our pattern handles these correctly. However, older patterns that used {2,4} would fail on these longer TLDs – make sure your regex uses {2,} or at least {2,63}.

3. Quoted Local Parts

The email specification technically allows quoted strings in the local part, such as "john doe"@example.com or "very.(),:;<>[]".VERY.unusual@example.com. These are valid according to RFC 5322 but are extremely rare in practice. The standard regex does not match them, which is acceptable for most real-world use cases.

4. IP Address Domains

Emails can technically use IP addresses instead of domain names: user@[192.168.1.1] or user@[IPv6:2001:db8::1]. These are valid but almost never seen in practice. The standard regex will not match them.

5. Consecutive Dots

The standard regex allows consecutive dots in the local part (e.g., user..name@example.com), which is technically invalid per RFC 5321. If you need to reject these, add a negative lookahead: (?!.*\.\.)[a-zA-Z0-9._%+-]+.

6. Trailing Periods in Extracted Text

When extracting emails from natural text, sentences like "Contact us at info@example.com." can result in the trailing period being captured as part of the email. Our regex handles this correctly because \.[a-zA-Z]{2,} requires at least two letters after the final dot, so a trailing period followed by a space or end of sentence will not be included.

When to Use Regex vs. Built-in Validation

Many programming languages offer built-in email validation that is more robust than a custom regex:

PHP: filter_var($email, FILTER_VALIDATE_EMAIL)
Python: The email-validator library provides thorough validation including DNS checks
JavaScript: HTML5 <input type="email"> provides browser-native validation
.NET: System.Net.Mail.MailAddress parses and validates emails

Use regex when you need to extract emails from unstructured text. Use built-in validators when you need to validate a single email address from a form field. For extraction tasks, regex is the clear winner because built-in validators only check one address at a time and cannot scan through text.

Performance Tips

When processing very large amounts of text (megabytes of data), keep these performance tips in mind:

Compile the regex: In languages like Python, use re.compile() to pre-compile the pattern if you are using it repeatedly
Use the global flag: In JavaScript, always include the g flag to find all matches, not just the first one
Avoid backtracking: The standard email regex is efficient and does not cause catastrophic backtracking, but overly complex patterns can
Deduplicate results: Use a Set (JavaScript) or set (Python) to efficiently remove duplicate email addresses from your results

Try the Free Email Extractor

Extract email addresses from any text, file, or URL – instantly and securely in your browser.

Open Email Extractor

About the Author

Daniel Dorfer worked for nearly four years in technical support at GMX, one of Germany’s largest email providers, and for almost two years at united domains, a leading domain hoster and registrar. He is a founding member of the KIBC (KI Business Club). This website was built entirely with the help of Claude Code (Opus 4.6) by Anthropic.

FAQ

Is there a perfect email regex?

No. The RFC 5322 grammar is too permissive to capture fully in one practical pattern — real-world extractors use a pragmatic regex that matches the address formats that actually occur, and accept that exotic edge cases (quoted local parts, comments) are traded for speed and readability.

Why does my regex miss or mangle some addresses?

Typical causes: greedy dot matching swallowing punctuation at the end of sentences, missing support for subdomains or long TLDs, and case-sensitivity. Test against addresses embedded in prose, not just clean lists.

Do I need to write my own regex to extract emails?

Not for everyday use — our free browser-based extractor applies a robust RFC-oriented pattern, deduplicates and validates automatically. Writing your own pattern is mainly worthwhile inside scripts and pipelines.