How to Extract Emails from Word Documents (DOCX)

Why Word Documents Contain Email Addresses

Microsoft Word documents are one of the most commonly exchanged file formats in business communication. Because of this, they frequently contain email addresses in places you might not immediately think to look:

Contracts and proposals – legal agreements typically include contact details for all parties involved, including email addresses for correspondence and notifications.
Business reports – monthly or quarterly reports often list team members, stakeholders, and their email addresses in headers, footers, or appendices.
Mailing lists and directories – organizations frequently compile contact directories as Word documents, with hundreds or even thousands of email addresses.
Meeting minutes and agendas – participant lists at the top of meeting documents usually include email addresses for follow-up communication.
Email signatures in pasted content – when email threads are pasted into Word documents for archiving, every signature block contains one or more email addresses.
Cover letters and resumes – job applications bundled into a single document contain applicant contact information including email.

Manually scanning through long Word documents to find and copy every email address is slow and error-prone. The methods below automate this process and ensure you capture every address, including those hidden in tables, headers, and footers.

Method 1: Copy-Paste from Word

The simplest approach works for any Word document that you can open and select text from. No special tools or programming knowledge are required.

Open the Word document in Microsoft Word, Google Docs, LibreOffice Writer, or any other word processor.
Select all content with Ctrl+A (Windows/Linux) or Cmd+A (macOS).
Copy the selected text with Ctrl+C / Cmd+C.
Go to extract-emails.com and paste the text into the input field with Ctrl+V / Cmd+V.
The tool instantly identifies and lists every email address found in the pasted text. You can copy the results or download them as a file.

Limitation: This method only captures text from the main body of the document. Content inside tables may paste correctly, but text in headers, footers, text boxes, and comments may not be included when you use Ctrl+A. For complete extraction, use Method 2 or Method 3.

Method 2: Upload DOCX to Our Tool (Recommended)

Our online tool at extract-emails.com can read .docx files directly in your browser. This is the fastest and most reliable method for most users.

Visit extract-emails.com.
Drag and drop your .docx file onto the upload area, or click to select it from your file system.
The tool reads the document locally using JavaScript – no data is uploaded to any server.
Text is extracted from all paragraphs and tables within the document.
A regex pattern scans the extracted text for email addresses, and duplicates are removed automatically.
Results are displayed immediately. You can copy them to your clipboard or download them.

Privacy Benefit: The entire process runs in your browser. Your Word document never leaves your device, making this method safe for confidential documents such as contracts, HR files, or financial reports.

Supported formats: The tool supports .docx files (the modern XML-based format used by Word 2007 and later). Older .doc files (binary format) should first be saved as .docx in any modern word processor.

Method 3: Python Script with python-docx

For developers, automation pipelines, or batch processing of many documents, a Python script using the python-docx library provides full control over the extraction process.

Basic Extraction from Paragraphs

Install python-docx and extract emails

pip install python-docx

import re
from docx import Document

def extract_emails_from_docx(docx_path):
    doc = Document(docx_path)
    text = ""

    # Extract text from all paragraphs
    for paragraph in doc.paragraphs:
        text += paragraph.text + "\n"

    pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
    emails = list(set(re.findall(pattern, text)))
    return sorted(emails)

# Example usage
emails = extract_emails_from_docx("contract.docx")
for email in emails:
    print(email)

Complete Extraction Including Tables

Word documents frequently store contact information in tables. The basic script above only reads paragraphs. The following version also scans every cell in every table:

import re
from docx import Document

def extract_emails_complete(docx_path):
    doc = Document(docx_path)
    text_parts = []
    pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'

    # Paragraphs
    for paragraph in doc.paragraphs:
        text_parts.append(paragraph.text)

    # Tables
    for table in doc.tables:
        for row in table.rows:
            for cell in row.cells:
                text_parts.append(cell.text)

    # Headers and footers
    for section in doc.sections:
        header = section.header
        footer = section.footer
        for paragraph in header.paragraphs:
            text_parts.append(paragraph.text)
        for paragraph in footer.paragraphs:
            text_parts.append(paragraph.text)

    full_text = "\n".join(text_parts)
    emails = list(set(re.findall(pattern, full_text)))
    return sorted(emails)

emails = extract_emails_complete("report.docx")
for email in emails:
    print(email)

Batch Processing Multiple Documents

If you have a folder full of Word documents, you can process them all at once:

import glob

all_emails = set()

for docx_file in glob.glob("documents/*.docx"):
    print(f"\n--- {docx_file} ---")
    emails = extract_emails_complete(docx_file)
    for email in emails:
        print(email)
    all_emails.update(emails)

print(f"\n=== Total unique emails: {len(all_emails)} ===")
for email in sorted(all_emails):
    print(email)

This approach integrates easily into larger data pipelines and can be combined with CSV or database output for further processing.

Handling Special Cases

Word documents have a rich internal structure that can hide email addresses in unexpected places. Here are the most common special cases and how to handle them:

Tables

Contact lists and directories are often formatted as tables in Word. The python-docx library provides access to table content through doc.tables. Each table contains rows, and each row contains cells. The complete extraction script above already handles this case.

Headers and Footers

Company documents frequently include email addresses in the header or footer (e.g., a support email in the letterhead). These are stored in separate XML elements within the DOCX file and are not included in the main doc.paragraphs collection. Access them via doc.sections[0].header.paragraphs and doc.sections[0].footer.paragraphs.

Hyperlinks

Some email addresses are stored as mailto: hyperlinks rather than visible text. While the display text might show "Contact us", the underlying link contains the actual email address. To extract these, you need to inspect the document’s XML directly:

from docx import Document
from docx.opc.constants import RELATIONSHIP_TYPE as RT
import re

def extract_mailto_links(docx_path):
    doc = Document(docx_path)
    emails = set()

    for rel in doc.part.rels.values():
        if "mailto:" in str(rel._target):
            email = str(rel._target).replace("mailto:", "")
            emails.add(email)

    return sorted(emails)

Tracked Changes and Comments

If the document has tracked changes (revisions) or comments, email addresses may appear in the revision metadata or comment text. The standard python-docx API does not expose tracked changes directly, but you can parse the underlying XML if needed. For most use cases, accepting all changes before extraction is the simplest approach.

Tips for Best Results

Convert .doc to .docx first. The older binary .doc format is not supported by python-docx or our online tool. Open the file in Word or LibreOffice and save it as .docx before extraction.
Check headers, footers, and text boxes. These areas are easy to overlook but often contain important email addresses such as company support contacts or legal contacts.
Handle obfuscated addresses. Some documents deliberately disguise email addresses to prevent automated extraction, using formats like "name [at] domain [dot] com". These require additional regex patterns beyond the standard email matcher.
Validate your results. After extraction, scan the list for false positives – strings that match the email pattern but are not real addresses (e.g., placeholder text like your.name@example.com). Our email regex guide explains the pattern in detail.
Respect privacy. Email addresses extracted from business documents may be subject to data protection regulations such as GDPR. Ensure you have a legitimate purpose for collecting and using these addresses.
Remove duplicates. Documents with repeated headers or copied-in email threads often contain the same address multiple times. All methods above include deduplication.

Extract Emails from Your Word Document Now

Upload your DOCX file or paste its text – our free tool finds every email address instantly, right in your browser.

Open Email Extractor

About the Author

Daniel Dorfer worked for nearly four years in technical support at GMX, one of Germany’s largest email providers, and for almost two years at united domains, a leading domain hoster and registrar. He is a founding member of the KIBC (KI Business Club). This website was built entirely with the help of Claude Code (Opus 4.6) by Anthropic.