How to Extract Emails from HTML Source Code

Where Email Addresses Hide in HTML

HTML source code contains far more email addresses than are visible on the rendered page. Email addresses can appear in many places within the markup:

Visible text content – email addresses displayed as plain text inside paragraphs, list items, or table cells.
Mailto links – clickable email links like <a href="mailto:info@example.com">Contact us</a>. The display text may say “Contact us” while the actual address is only in the href attribute.
Data attributes – some websites store email addresses in custom data-* attributes for use by JavaScript.
JavaScript variables – email addresses embedded in inline <script> blocks as string variables or JSON data.
HTML comments – developers sometimes leave commented-out code containing email addresses: .
Meta tags – the <meta name="author"> tag or Open Graph tags may include email addresses.
Form fields – default or placeholder values in input fields sometimes contain example or real email addresses.

Method 1: Paste HTML into Our Tool

The fastest way to extract emails from HTML is to paste the raw source code directly into our tool:

View the page source with Ctrl+U (Windows/Linux) or Cmd+Option+U (macOS).
Select all with Ctrl+A and copy with Ctrl+C.
Go to extract-emails.com and paste the HTML source.
The tool automatically scans through all HTML content – including tags, attributes, comments, and scripts – to find every email address.
Results are deduplicated and ready to copy or download.

Our tool processes the raw text without rendering the HTML, so it catches email addresses in comments, scripts, and attributes that a browser would not display.

Method 2: View Source and Search Manually

For a quick check on a single page:

Open the page source with Ctrl+U.
Use Ctrl+F to search for @ – this highlights every occurrence of the @ symbol.
Also search for mailto: to find email links specifically.
Copy each email address you find.

This method is quick but error-prone for large HTML files with many occurrences.

Method 3: Python with BeautifulSoup

For automated extraction from HTML files or strings, Python provides the most flexible approach:

Extract emails from both text content and mailto links

import re
from bs4 import BeautifulSoup

def extract_emails_from_html(html_content):
    soup = BeautifulSoup(html_content, "html.parser")
    pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
    emails = set()

    # 1. Extract from visible text
    text = soup.get_text()
    emails.update(re.findall(pattern, text))

    # 2. Extract from mailto: links
    for link in soup.find_all("a", href=True):
        if link["href"].startswith("mailto:"):
            email = link["href"].replace("mailto:", "").split("?")[0]
            emails.add(email)

    # 3. Extract from the raw HTML (catches scripts, comments, attributes)
    emails.update(re.findall(pattern, html_content))

    return sorted(emails)

# Example: from a file
with open("page.html", "r", encoding="utf-8") as f:
    html = f.read()

emails = extract_emails_from_html(html)
for email in emails:
    print(email)

Method 4: Command Line with curl and grep

A single command to fetch a page and extract all email addresses:

One-liner for the terminal

curl -s "https://example.com/contact" | grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' | sort -u

For local HTML files:

grep -oE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' page.html | sort -u

Process all HTML files in a directory:

grep -roE '[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}' *.html | sort -u

Handling Obfuscated Emails

Many websites deliberately hide email addresses from automated extraction. Common obfuscation techniques and how to handle them:

JavaScript Encoding

Some sites construct email addresses dynamically in JavaScript:

// Obfuscated version on the website
var user = "info";
var domain = "example.com";
document.write(user + "@" + domain);

Standard regex will not catch these. You would need to either execute the JavaScript or manually inspect the script blocks.

HTML Entity Encoding

Email addresses encoded as HTML entities: info@example.com. Our tool handles this automatically since it processes the decoded text.

Text Replacement Patterns

Addresses written as “name [at] domain [dot] com” or “name(at)domain(dot)com”. These require additional regex patterns:

import re

def deobfuscate_emails(text):
    # Handle [at] and [dot] variations
    pattern = r'[a-zA-Z0-9._%+-]+\s*[\[\(]at[\]\)]\s*[a-zA-Z0-9.-]+\s*[\[\(]dot[\]\)]\s*[a-zA-Z]{2,}'
    matches = re.findall(pattern, text, re.IGNORECASE)
    emails = []
    for match in matches:
        email = re.sub(r'\s*[\[\(]at[\]\)]\s*', '@', match, flags=re.IGNORECASE)
        email = re.sub(r'\s*[\[\(]dot[\]\)]\s*', '.', email, flags=re.IGNORECASE)
        emails.append(email)
    return emails

Tips for Best Results

Always use the raw HTML source. The rendered page shows only a fraction of the email addresses present in the markup.
Check inline scripts. JavaScript blocks often contain email addresses in configuration objects, contact arrays, or analytics tags.
Inspect HTML comments. Developers leave comments with contact information, debug data, or old code containing email addresses.
Look at all attributes. Beyond href, check value, placeholder, data-email, and content attributes.
Handle encoding. HTML entities, URL encoding (%40 for @), and JavaScript escape sequences can all hide email addresses.

Finding Emails in Data Attributes and Meta Tags

Modern HTML markup stores data in many more places than just text content and href attributes. Email addresses frequently appear in:

Custom data attributes: <span data-email="jane@example.com"> — These are common when developers want to pass data to JavaScript without exposing it in visible text.
Meta tags: <meta name="author" content="jane@example.com"> or Open Graph tags like <meta property="og:email" content="...">.
Form input values: <input type="hidden" name="reply_to" value="jane@example.com"> — Hidden fields often carry pre-filled reply addresses.
HTML comments:  — Developers sometimes leave debug information or TODO notes that include real contact addresses.
JSON-LD structured data: Schema.org blocks embedded in <script type="application/ld+json"> tags regularly include "email" fields for organizations and persons.

A complete regex pass over the raw HTML catches all of these automatically, which is why pasting the full source into an extractor is more thorough than reading the rendered page text.

Base64 and URL-Encoded Email Addresses

Some developers intentionally obscure email addresses to defeat naive scrapers. Two common encoding techniques:

Base64 Encoding

The address is encoded as a Base64 string and decoded at runtime by JavaScript. In the HTML you might see:

<script>
  var e = atob('amFuZUBleGFtcGxlLmNvbQ==');
  document.getElementById('contact').textContent = e;
</script>

To recover the address, extract all atob(...) arguments and decode them. In Node.js: Buffer.from('amFuZUBleGFtcGxlLmNvbQ==', 'base64').toString() returns jane@example.com.

URL Encoding

The @ sign becomes %40 and dots become %2E in URL-encoded strings. This is standard in mailto: links inside query parameters, e.g.:

<a href="?reply=jane%40example%2Ecom">Reply</a>

Decode with decodeURIComponent() in JavaScript, or urllib.parse.unquote() in Python.

CSS-Based Obfuscation

A subtler trick uses CSS content properties or reversed Unicode strings to display an address visually while keeping it out of the DOM text. These can be identified by inspecting the computed styles of contact elements, but they resist automated extraction entirely — which is often the intent.

Frequently Asked Questions

Do I need to decode HTML entities before running my regex?: Yes. @ and &commat; both represent @. Run a full entity-decode pass first, or your regex will miss addresses that use entity encoding.
My regex finds thousands of matches — most are fake.: Apply a post-filter: remove addresses where the domain has no dot (e.g. user@localhost), addresses from common placeholder domains (example.com, test.com, sentry.io), and addresses longer than 254 characters. Also deduplicate — the same address often appears dozens of times in a single page.
Can I extract emails from minified JavaScript?: Yes — email addresses survive minification intact because they are string literals, not identifiers. Your regex will find them regardless of whitespace or variable name compression.
What about emails inside iframes?: An iframe loads a separate document. If the iframe is same-origin, you can access its contentDocument with JavaScript. Cross-origin iframes are blocked by the browser’s same-origin policy and require fetching the iframe URL separately.

Extract Emails from HTML Instantly

Paste any HTML source code – our free tool finds every email address, including those in links, scripts, and comments.

Open Email Extractor

About the Author

Daniel Dorfer worked for nearly four years in technical support at GMX, one of Germany’s largest email providers, and for almost two years at united domains, a leading domain hoster and registrar. He is a founding member of the KIBC (KI Business Club). This website was built entirely with the help of Claude Code (Opus 4.6) by Anthropic.