Initial commit: IMAP email downloader
Single-file Python script to download emails from IMAP servers: - Downloads emails as .eml files preserving folder structure - Extracts attachments to zip files - Supports SSL and STARTTLS connections - Incremental updates using UID tracking (default behavior) - Multi-account support with separate folders per email - Safety checks to prevent duplicate downloads Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,8 @@
|
|||||||
|
# Python
|
||||||
|
__pycache__/
|
||||||
|
*.py[cod]
|
||||||
|
.venv/
|
||||||
|
venv/
|
||||||
|
|
||||||
|
# Downloads
|
||||||
|
download/
|
||||||
@@ -0,0 +1,133 @@
|
|||||||
|
# IMAP Downloader
|
||||||
|
|
||||||
|
A simple Python script to download all emails from an IMAP server into individual EML files, preserving the folder structure.
|
||||||
|
|
||||||
|
## Features
|
||||||
|
|
||||||
|
- Downloads emails as standard `.eml` files
|
||||||
|
- Preserves IMAP folder hierarchy locally
|
||||||
|
- Extracts attachments into zip files alongside each email
|
||||||
|
- Supports SSL and STARTTLS connections
|
||||||
|
- Incremental updates using UID tracking (only download new emails)
|
||||||
|
- Multi-account support (separate folders per email address)
|
||||||
|
- Configurable download limit for testing/debugging
|
||||||
|
|
||||||
|
## Requirements
|
||||||
|
|
||||||
|
- Python 3.6+
|
||||||
|
- No external dependencies (uses only standard library)
|
||||||
|
|
||||||
|
## Installation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Clone or download the script
|
||||||
|
git clone <repo-url>
|
||||||
|
cd imapdown
|
||||||
|
|
||||||
|
# Create virtual environment (optional but recommended)
|
||||||
|
python3 -m venv .venv
|
||||||
|
source .venv/bin/activate
|
||||||
|
```
|
||||||
|
|
||||||
|
## Usage
|
||||||
|
|
||||||
|
### Basic Usage
|
||||||
|
|
||||||
|
By default, the script only downloads new emails since the last run (incremental mode). On first run, it downloads everything.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Download emails using SSL (most common)
|
||||||
|
./imapdown.py --server imap.example.com --email me@example.com --user me@example.com --password "secret" --ssl
|
||||||
|
|
||||||
|
# Using STARTTLS
|
||||||
|
./imapdown.py --server imap.example.com --email me@example.com --user me@example.com --password "secret" --starttls
|
||||||
|
|
||||||
|
# Custom port
|
||||||
|
./imapdown.py --server imap.example.com --email me@example.com --user me@example.com --password "secret" --ssl --port 12993
|
||||||
|
```
|
||||||
|
|
||||||
|
### Full Download
|
||||||
|
|
||||||
|
To force a complete download of all emails (ignoring previous state):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./imapdown.py --server imap.example.com --email me@example.com --user me@example.com --password "secret" --ssl --full
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note:** As a safety measure, `--full` will refuse to run if the download folder already contains emails. This prevents accidental duplicates. To re-download everything, first delete the folder:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
rm -rf download/me@example.com/
|
||||||
|
./imapdown.py --server imap.example.com --email me@example.com --user me@example.com --password "secret" --ssl --full
|
||||||
|
```
|
||||||
|
|
||||||
|
### Debugging/Testing
|
||||||
|
|
||||||
|
Limit the number of emails downloaded:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
./imapdown.py --server imap.example.com --email me@example.com --user me@example.com --password "secret" --ssl --limit 10
|
||||||
|
```
|
||||||
|
|
||||||
|
## Command Line Arguments
|
||||||
|
|
||||||
|
| Argument | Required | Description |
|
||||||
|
|----------|----------|-------------|
|
||||||
|
| `--server` | Yes | IMAP server hostname |
|
||||||
|
| `--email` | Yes | Email address (used for folder organization) |
|
||||||
|
| `--user` | Yes | Username for authentication |
|
||||||
|
| `--password` | Yes | Password for authentication |
|
||||||
|
| `--ssl` | No | Use implicit SSL/TLS (default port 993) |
|
||||||
|
| `--starttls` | No | Use STARTTLS (default port 143) |
|
||||||
|
| `--port` | No | Custom port (overrides defaults) |
|
||||||
|
| `--limit` | No | Maximum number of emails to download |
|
||||||
|
| `--full` | No | Download all emails (default: only new since last run) |
|
||||||
|
|
||||||
|
Note: `--ssl` and `--starttls` are mutually exclusive.
|
||||||
|
|
||||||
|
## Output Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
./download/
|
||||||
|
├── user@example.com/
|
||||||
|
│ ├── .imapdown_state.json # Tracks last downloaded UID per folder
|
||||||
|
│ ├── INBOX/
|
||||||
|
│ │ ├── 123_20240115_Meeting_notes.eml
|
||||||
|
│ │ ├── 124_20240116_Report.eml
|
||||||
|
│ │ └── 124_20240116_Report.zip # Attachments (if any)
|
||||||
|
│ ├── Sent/
|
||||||
|
│ │ └── 456_20240114_RE_Question.eml
|
||||||
|
│ └── Archive/
|
||||||
|
│ └── 789_20240101_Old_email.eml
|
||||||
|
└── another@example.com/
|
||||||
|
└── ...
|
||||||
|
```
|
||||||
|
|
||||||
|
### File Naming
|
||||||
|
|
||||||
|
Email files are named: `{UID}_{date}_{subject}.eml`
|
||||||
|
|
||||||
|
- **UID**: Unique identifier from the IMAP server
|
||||||
|
- **date**: Message date in `YYYYMMDD_HHMMSS` format
|
||||||
|
- **subject**: Sanitized email subject (truncated to 50 characters)
|
||||||
|
|
||||||
|
### Attachments
|
||||||
|
|
||||||
|
When an email contains attachments, they are extracted and saved in a zip file with the same base name as the `.eml` file but with a `.zip` extension.
|
||||||
|
|
||||||
|
## State Tracking
|
||||||
|
|
||||||
|
The script maintains a `.imapdown_state.json` file in each email account's folder. This file tracks the highest downloaded UID for each IMAP folder, enabling efficient incremental updates with `--update`.
|
||||||
|
|
||||||
|
Example state file:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"INBOX": 19334,
|
||||||
|
"INBOX.Archive": 1770,
|
||||||
|
"Sent": 892
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## License
|
||||||
|
|
||||||
|
MIT
|
||||||
Executable
+395
@@ -0,0 +1,395 @@
|
|||||||
|
#!/usr/bin/env python3
|
||||||
|
"""Simple IMAP email downloader - downloads all emails to EML files."""
|
||||||
|
|
||||||
|
import argparse
|
||||||
|
import email
|
||||||
|
import email.utils
|
||||||
|
import imaplib
|
||||||
|
import io
|
||||||
|
import json
|
||||||
|
import os
|
||||||
|
import re
|
||||||
|
import sys
|
||||||
|
import zipfile
|
||||||
|
from datetime import datetime
|
||||||
|
|
||||||
|
|
||||||
|
def parse_args():
|
||||||
|
"""Parse command line arguments."""
|
||||||
|
parser = argparse.ArgumentParser(
|
||||||
|
description="Download all emails from an IMAP server to EML files"
|
||||||
|
)
|
||||||
|
|
||||||
|
parser.add_argument("--server", required=True, help="IMAP server hostname")
|
||||||
|
parser.add_argument("--email", required=True, help="Email address")
|
||||||
|
parser.add_argument("--user", required=True, help="Username for authentication")
|
||||||
|
parser.add_argument("--password", required=True, help="Password for authentication")
|
||||||
|
|
||||||
|
security = parser.add_mutually_exclusive_group()
|
||||||
|
security.add_argument("--ssl", action="store_true", help="Use implicit SSL/TLS (default port 993)")
|
||||||
|
security.add_argument("--starttls", action="store_true", help="Use STARTTLS (default port 143)")
|
||||||
|
|
||||||
|
parser.add_argument("--port", type=int, help="Custom port (default: 993 for SSL, 143 otherwise)")
|
||||||
|
parser.add_argument("--limit", type=int, help="Limit number of emails to download (for debugging)")
|
||||||
|
parser.add_argument("--full", action="store_true", help="Download all emails (default: only new emails since last run)")
|
||||||
|
|
||||||
|
return parser.parse_args()
|
||||||
|
|
||||||
|
|
||||||
|
def decode_modified_utf7(s):
|
||||||
|
"""Decode IMAP modified UTF-7 folder names."""
|
||||||
|
result = []
|
||||||
|
i = 0
|
||||||
|
while i < len(s):
|
||||||
|
if s[i] == '&':
|
||||||
|
if i + 1 < len(s) and s[i + 1] == '-':
|
||||||
|
result.append('&')
|
||||||
|
i += 2
|
||||||
|
else:
|
||||||
|
end = s.find('-', i + 1)
|
||||||
|
if end == -1:
|
||||||
|
result.append(s[i:])
|
||||||
|
break
|
||||||
|
encoded = s[i + 1:end]
|
||||||
|
if encoded:
|
||||||
|
encoded = encoded.replace(',', '/')
|
||||||
|
padding = (4 - len(encoded) % 4) % 4
|
||||||
|
encoded += '=' * padding
|
||||||
|
try:
|
||||||
|
import base64
|
||||||
|
decoded = base64.b64decode(encoded).decode('utf-16-be')
|
||||||
|
result.append(decoded)
|
||||||
|
except Exception:
|
||||||
|
result.append(s[i:end + 1])
|
||||||
|
i = end + 1
|
||||||
|
else:
|
||||||
|
result.append(s[i])
|
||||||
|
i += 1
|
||||||
|
return ''.join(result)
|
||||||
|
|
||||||
|
|
||||||
|
def parse_folder_list(response):
|
||||||
|
"""Parse IMAP LIST response to extract folder names."""
|
||||||
|
folders = []
|
||||||
|
pattern = re.compile(r'\((?P<flags>.*?)\) "(?P<delimiter>.*)" (?P<name>.*)')
|
||||||
|
|
||||||
|
for item in response:
|
||||||
|
if isinstance(item, bytes):
|
||||||
|
item = item.decode('utf-8', errors='replace')
|
||||||
|
|
||||||
|
match = pattern.match(item)
|
||||||
|
if match:
|
||||||
|
name = match.group('name')
|
||||||
|
if name.startswith('"') and name.endswith('"'):
|
||||||
|
name = name[1:-1]
|
||||||
|
name = decode_modified_utf7(name)
|
||||||
|
folders.append(name)
|
||||||
|
|
||||||
|
return folders
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_filename(name, max_length=50):
|
||||||
|
"""Sanitize a string for use as a filename."""
|
||||||
|
if not name:
|
||||||
|
return "untitled"
|
||||||
|
name = re.sub(r'[<>:"/\\|?*\x00-\x1f]', '_', name)
|
||||||
|
name = name.strip('. ')
|
||||||
|
name = name[:max_length]
|
||||||
|
name = name.strip('. ')
|
||||||
|
return name or "untitled"
|
||||||
|
|
||||||
|
|
||||||
|
def sanitize_folder_path(folder_name):
|
||||||
|
"""Sanitize folder path for filesystem use."""
|
||||||
|
parts = folder_name.replace('/', os.sep).replace('.', os.sep).split(os.sep)
|
||||||
|
sanitized = [sanitize_filename(p, max_length=100) for p in parts if p]
|
||||||
|
return os.path.join(*sanitized) if sanitized else "INBOX"
|
||||||
|
|
||||||
|
|
||||||
|
def get_message_date(msg):
|
||||||
|
"""Extract date from email message."""
|
||||||
|
date_str = msg.get('Date')
|
||||||
|
if date_str:
|
||||||
|
try:
|
||||||
|
parsed = email.utils.parsedate_to_datetime(date_str)
|
||||||
|
return parsed.strftime('%Y%m%d_%H%M%S')
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return datetime.now().strftime('%Y%m%d_%H%M%S')
|
||||||
|
|
||||||
|
|
||||||
|
def get_message_subject(msg):
|
||||||
|
"""Extract and decode subject from email message."""
|
||||||
|
subject = msg.get('Subject', '')
|
||||||
|
if not subject:
|
||||||
|
return 'no_subject'
|
||||||
|
|
||||||
|
try:
|
||||||
|
decoded_parts = email.header.decode_header(subject)
|
||||||
|
decoded = []
|
||||||
|
for part, charset in decoded_parts:
|
||||||
|
if isinstance(part, bytes):
|
||||||
|
charset = charset or 'utf-8'
|
||||||
|
try:
|
||||||
|
decoded.append(part.decode(charset, errors='replace'))
|
||||||
|
except Exception:
|
||||||
|
decoded.append(part.decode('utf-8', errors='replace'))
|
||||||
|
else:
|
||||||
|
decoded.append(part)
|
||||||
|
return ''.join(decoded)
|
||||||
|
except Exception:
|
||||||
|
return str(subject)
|
||||||
|
|
||||||
|
|
||||||
|
def extract_attachments(msg, eml_filepath):
|
||||||
|
"""Extract attachments from email and save as zip file."""
|
||||||
|
attachments = []
|
||||||
|
|
||||||
|
for part in msg.walk():
|
||||||
|
content_disposition = part.get('Content-Disposition', '')
|
||||||
|
if 'attachment' in content_disposition or 'inline' in content_disposition:
|
||||||
|
filename = part.get_filename()
|
||||||
|
if filename:
|
||||||
|
try:
|
||||||
|
decoded_parts = email.header.decode_header(filename)
|
||||||
|
decoded_filename = []
|
||||||
|
for data, charset in decoded_parts:
|
||||||
|
if isinstance(data, bytes):
|
||||||
|
charset = charset or 'utf-8'
|
||||||
|
decoded_filename.append(data.decode(charset, errors='replace'))
|
||||||
|
else:
|
||||||
|
decoded_filename.append(data)
|
||||||
|
filename = ''.join(decoded_filename)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
payload = part.get_payload(decode=True)
|
||||||
|
if payload:
|
||||||
|
attachments.append((sanitize_filename(filename, max_length=100), payload))
|
||||||
|
|
||||||
|
if attachments:
|
||||||
|
zip_path = os.path.splitext(eml_filepath)[0] + '.zip'
|
||||||
|
with zipfile.ZipFile(zip_path, 'w', zipfile.ZIP_DEFLATED) as zf:
|
||||||
|
seen_names = {}
|
||||||
|
for filename, data in attachments:
|
||||||
|
if filename in seen_names:
|
||||||
|
seen_names[filename] += 1
|
||||||
|
name, ext = os.path.splitext(filename)
|
||||||
|
filename = f"{name}_{seen_names[filename]}{ext}"
|
||||||
|
else:
|
||||||
|
seen_names[filename] = 0
|
||||||
|
zf.writestr(filename, data)
|
||||||
|
return len(attachments)
|
||||||
|
return 0
|
||||||
|
|
||||||
|
|
||||||
|
STATE_FILE = '.imapdown_state.json'
|
||||||
|
|
||||||
|
|
||||||
|
def load_state(base_dir):
|
||||||
|
"""Load the state file tracking last downloaded emails."""
|
||||||
|
state_path = os.path.join(base_dir, STATE_FILE)
|
||||||
|
if os.path.exists(state_path):
|
||||||
|
try:
|
||||||
|
with open(state_path, 'r') as f:
|
||||||
|
return json.load(f)
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
return {}
|
||||||
|
|
||||||
|
|
||||||
|
def save_state(base_dir, state):
|
||||||
|
"""Save the state file."""
|
||||||
|
state_path = os.path.join(base_dir, STATE_FILE)
|
||||||
|
with open(state_path, 'w') as f:
|
||||||
|
json.dump(state, f, indent=2)
|
||||||
|
|
||||||
|
|
||||||
|
def connect_imap(server, port, use_ssl, use_starttls):
|
||||||
|
"""Connect to IMAP server with appropriate security."""
|
||||||
|
if use_ssl:
|
||||||
|
port = port or 993
|
||||||
|
print(f"Connecting to {server}:{port} with SSL...")
|
||||||
|
return imaplib.IMAP4_SSL(server, port)
|
||||||
|
else:
|
||||||
|
port = port or 143
|
||||||
|
print(f"Connecting to {server}:{port}...")
|
||||||
|
conn = imaplib.IMAP4(server, port)
|
||||||
|
if use_starttls:
|
||||||
|
print("Upgrading to TLS with STARTTLS...")
|
||||||
|
conn.starttls()
|
||||||
|
return conn
|
||||||
|
|
||||||
|
|
||||||
|
def download_folder(conn, folder_name, base_dir, limit=None, total_so_far=0, update_mode=False, last_uid=None):
|
||||||
|
"""Download all emails from a folder. Returns (downloaded_count, highest_uid)."""
|
||||||
|
local_path = os.path.join(base_dir, sanitize_folder_path(folder_name))
|
||||||
|
os.makedirs(local_path, exist_ok=True)
|
||||||
|
|
||||||
|
try:
|
||||||
|
status, _ = conn.select(f'"{folder_name}"', readonly=True)
|
||||||
|
if status != 'OK':
|
||||||
|
print(f" Could not select folder: {folder_name}")
|
||||||
|
return 0, last_uid
|
||||||
|
except Exception as e:
|
||||||
|
print(f" Error selecting folder {folder_name}: {e}")
|
||||||
|
return 0, last_uid
|
||||||
|
|
||||||
|
if update_mode and last_uid is not None:
|
||||||
|
status, data = conn.uid('SEARCH', None, f'UID {last_uid + 1}:*')
|
||||||
|
else:
|
||||||
|
status, data = conn.uid('SEARCH', None, 'ALL')
|
||||||
|
|
||||||
|
if status != 'OK':
|
||||||
|
print(f" Could not search folder: {folder_name}")
|
||||||
|
return 0, last_uid
|
||||||
|
|
||||||
|
uid_list = data[0].split()
|
||||||
|
|
||||||
|
# Filter out UIDs <= last_uid (some servers return highest UID even when searching for higher)
|
||||||
|
if update_mode and last_uid is not None:
|
||||||
|
uid_list = [uid for uid in uid_list if int(uid) > last_uid]
|
||||||
|
|
||||||
|
if not uid_list:
|
||||||
|
print(f" {folder_name}: no new messages")
|
||||||
|
return 0, last_uid
|
||||||
|
|
||||||
|
if limit is not None:
|
||||||
|
remaining = limit - total_so_far
|
||||||
|
if remaining <= 0:
|
||||||
|
return 0, last_uid
|
||||||
|
uid_list = uid_list[:remaining]
|
||||||
|
|
||||||
|
print(f" {folder_name}: {len(uid_list)} messages to download")
|
||||||
|
downloaded = 0
|
||||||
|
highest_uid = last_uid
|
||||||
|
|
||||||
|
for uid in uid_list:
|
||||||
|
try:
|
||||||
|
uid_int = int(uid)
|
||||||
|
status, data = conn.uid('FETCH', uid, '(RFC822)')
|
||||||
|
if status != 'OK':
|
||||||
|
continue
|
||||||
|
|
||||||
|
raw_email = None
|
||||||
|
for part in data:
|
||||||
|
if isinstance(part, tuple):
|
||||||
|
raw_email = part[1]
|
||||||
|
break
|
||||||
|
|
||||||
|
if raw_email is None:
|
||||||
|
continue
|
||||||
|
|
||||||
|
msg = email.message_from_bytes(raw_email)
|
||||||
|
date_str = get_message_date(msg)
|
||||||
|
subject = sanitize_filename(get_message_subject(msg))
|
||||||
|
|
||||||
|
filename = f"{uid_int}_{date_str}_{subject}.eml"
|
||||||
|
filepath = os.path.join(local_path, filename)
|
||||||
|
|
||||||
|
counter = 1
|
||||||
|
base_filepath = filepath
|
||||||
|
while os.path.exists(filepath):
|
||||||
|
name, ext = os.path.splitext(base_filepath)
|
||||||
|
filepath = f"{name}_{counter}{ext}"
|
||||||
|
counter += 1
|
||||||
|
|
||||||
|
with open(filepath, 'wb') as f:
|
||||||
|
f.write(raw_email)
|
||||||
|
|
||||||
|
extract_attachments(msg, filepath)
|
||||||
|
downloaded += 1
|
||||||
|
|
||||||
|
if highest_uid is None or uid_int > highest_uid:
|
||||||
|
highest_uid = uid_int
|
||||||
|
|
||||||
|
except Exception as e:
|
||||||
|
print(f" Error downloading UID {uid}: {e}")
|
||||||
|
|
||||||
|
return downloaded, highest_uid
|
||||||
|
|
||||||
|
|
||||||
|
def main():
|
||||||
|
args = parse_args()
|
||||||
|
|
||||||
|
email_folder = sanitize_filename(args.email, max_length=100)
|
||||||
|
base_dir = os.path.join(os.getcwd(), 'download', email_folder)
|
||||||
|
os.makedirs(base_dir, exist_ok=True)
|
||||||
|
|
||||||
|
if args.full:
|
||||||
|
has_emails = False
|
||||||
|
for root, dirs, files in os.walk(base_dir):
|
||||||
|
if any(f.endswith('.eml') for f in files):
|
||||||
|
has_emails = True
|
||||||
|
break
|
||||||
|
if has_emails:
|
||||||
|
print(f"Error: --full specified but {base_dir} already contains emails.", file=sys.stderr)
|
||||||
|
print("Delete the folder first to do a full re-download, or run without --full for incremental update.", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
try:
|
||||||
|
conn = connect_imap(args.server, args.port, args.ssl, args.starttls)
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Connection failed: {e}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
try:
|
||||||
|
status, _ = conn.login(args.user, args.password)
|
||||||
|
if status != 'OK':
|
||||||
|
print("Authentication failed", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
print("Logged in successfully")
|
||||||
|
except Exception as e:
|
||||||
|
print(f"Authentication failed: {e}", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
try:
|
||||||
|
status, folder_data = conn.list()
|
||||||
|
if status != 'OK':
|
||||||
|
print("Could not list folders", file=sys.stderr)
|
||||||
|
sys.exit(1)
|
||||||
|
|
||||||
|
folders = parse_folder_list(folder_data)
|
||||||
|
print(f"Found {len(folders)} folders")
|
||||||
|
|
||||||
|
update_mode = not args.full
|
||||||
|
state = load_state(base_dir) if update_mode else {}
|
||||||
|
if args.full:
|
||||||
|
print("Full download mode: downloading all emails")
|
||||||
|
else:
|
||||||
|
print("Incremental mode: only downloading new emails (use --full to download all)")
|
||||||
|
|
||||||
|
total_downloaded = 0
|
||||||
|
for folder in folders:
|
||||||
|
last_uid = None
|
||||||
|
if update_mode and folder in state:
|
||||||
|
try:
|
||||||
|
last_uid = int(state[folder])
|
||||||
|
except (ValueError, TypeError):
|
||||||
|
pass
|
||||||
|
|
||||||
|
downloaded, highest_uid = download_folder(
|
||||||
|
conn, folder, base_dir, args.limit, total_downloaded,
|
||||||
|
update_mode=update_mode, last_uid=last_uid
|
||||||
|
)
|
||||||
|
total_downloaded += downloaded
|
||||||
|
|
||||||
|
if highest_uid is not None:
|
||||||
|
state[folder] = highest_uid
|
||||||
|
|
||||||
|
if args.limit and total_downloaded >= args.limit:
|
||||||
|
print(f" Reached limit of {args.limit} emails")
|
||||||
|
break
|
||||||
|
|
||||||
|
save_state(base_dir, state)
|
||||||
|
print(f"\nDownloaded {total_downloaded} emails to {base_dir}")
|
||||||
|
|
||||||
|
finally:
|
||||||
|
try:
|
||||||
|
conn.logout()
|
||||||
|
except Exception:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
if __name__ == '__main__':
|
||||||
|
main()
|
||||||
@@ -0,0 +1,97 @@
|
|||||||
|
# Implementation Plan: IMAP Downloader
|
||||||
|
|
||||||
|
## Overview
|
||||||
|
|
||||||
|
Create a single-file Python script (`imapdown.py`) that downloads all emails from an IMAP server and saves them as individual EML files in a local folder structure mirroring the IMAP mailbox hierarchy.
|
||||||
|
|
||||||
|
## Implementation Steps
|
||||||
|
|
||||||
|
### 1. Argument Parsing
|
||||||
|
|
||||||
|
Use `argparse` to handle command line arguments:
|
||||||
|
|
||||||
|
**Mandatory arguments:**
|
||||||
|
- `--server` - IMAP server hostname
|
||||||
|
- `--email` - Email address
|
||||||
|
- `--user` - Username for authentication
|
||||||
|
- `--password` - Password for authentication
|
||||||
|
|
||||||
|
**Optional arguments:**
|
||||||
|
- `--ssl` - Use implicit SSL/TLS (typically port 993)
|
||||||
|
- `--starttls` - Use STARTTLS upgrade (typically port 143)
|
||||||
|
- `--port` - Custom port (defaults: 993 for SSL, 143 for STARTTLS/plain)
|
||||||
|
|
||||||
|
Add mutual exclusion for `--ssl` and `--starttls`.
|
||||||
|
|
||||||
|
### 2. IMAP Connection
|
||||||
|
|
||||||
|
- Use Python's built-in `imaplib` module
|
||||||
|
- Connection logic:
|
||||||
|
- If `--ssl`: Use `IMAP4_SSL` (default port 993)
|
||||||
|
- If `--starttls`: Use `IMAP4`, then call `starttls()` (default port 143)
|
||||||
|
- If neither: Use plain `IMAP4` (default port 143)
|
||||||
|
- Authenticate with provided credentials
|
||||||
|
|
||||||
|
### 3. Folder Discovery
|
||||||
|
|
||||||
|
- Use `list()` method to get all mailbox folders
|
||||||
|
- Parse folder names and hierarchy delimiter
|
||||||
|
- Handle folder name encoding (IMAP uses modified UTF-7)
|
||||||
|
|
||||||
|
### 4. Email Download
|
||||||
|
|
||||||
|
For each folder:
|
||||||
|
1. Create corresponding local directory structure
|
||||||
|
2. Select the folder with `select()`
|
||||||
|
3. Search for all messages with `search(None, 'ALL')`
|
||||||
|
4. For each message:
|
||||||
|
- Fetch the complete RFC822 message
|
||||||
|
- Generate a unique filename (using UID or message ID + date)
|
||||||
|
- Save as `.eml` file
|
||||||
|
|
||||||
|
### 5. File Naming Strategy
|
||||||
|
|
||||||
|
Use a naming scheme that ensures uniqueness and provides useful info:
|
||||||
|
- Format: `{UID}_{date}_{subject_snippet}.eml`
|
||||||
|
- Sanitize subject for filesystem safety
|
||||||
|
- Handle duplicates by appending counter if needed
|
||||||
|
|
||||||
|
### 6. Error Handling
|
||||||
|
|
||||||
|
- Connection failures
|
||||||
|
- Authentication errors
|
||||||
|
- Folder access issues
|
||||||
|
- Invalid/corrupt messages
|
||||||
|
- Filesystem errors (permissions, disk space)
|
||||||
|
|
||||||
|
## Dependencies
|
||||||
|
|
||||||
|
Only Python standard library:
|
||||||
|
- `imaplib` - IMAP protocol
|
||||||
|
- `argparse` - Command line parsing
|
||||||
|
- `email` - Email message parsing
|
||||||
|
- `os` / `pathlib` - Filesystem operations
|
||||||
|
- `re` - Regex for sanitization
|
||||||
|
- `datetime` - Date handling
|
||||||
|
|
||||||
|
## Output Structure
|
||||||
|
|
||||||
|
```
|
||||||
|
./download/
|
||||||
|
├── INBOX/
|
||||||
|
│ ├── 1_20240115_Meeting_notes.eml
|
||||||
|
│ └── 2_20240116_Project_update.eml
|
||||||
|
├── Sent/
|
||||||
|
│ └── 1_20240114_RE_Question.eml
|
||||||
|
└── Archive/
|
||||||
|
└── 2023/
|
||||||
|
└── 1_20230501_Old_email.eml
|
||||||
|
```
|
||||||
|
|
||||||
|
## Testing Approach
|
||||||
|
|
||||||
|
1. Test argument parsing with various combinations
|
||||||
|
2. Test connection with SSL, STARTTLS, and plain
|
||||||
|
3. Test with folders containing special characters
|
||||||
|
4. Test with empty folders
|
||||||
|
5. Verify EML files are valid and openable
|
||||||
+27
@@ -0,0 +1,27 @@
|
|||||||
|
# Simple IMAP downloader
|
||||||
|
|
||||||
|
A single file Python script to download all emails from an IMAP inbox into single EML files, one per email, into a folder structure representing the same folder structure in the IMAP inbox
|
||||||
|
|
||||||
|
## Arguments
|
||||||
|
|
||||||
|
Mandatory:
|
||||||
|
|
||||||
|
--server
|
||||||
|
--email
|
||||||
|
--user
|
||||||
|
--password
|
||||||
|
|
||||||
|
Optional (if not supplied, use sensible defaults)
|
||||||
|
|
||||||
|
--ssl or --starttls (either allowed but not both)
|
||||||
|
--port
|
||||||
|
|
||||||
|
## Environment
|
||||||
|
|
||||||
|
There is a virtual Python environment set up in .venv - use it
|
||||||
|
|
||||||
|
## Additional requirements
|
||||||
|
|
||||||
|
- limit the number of returned emails with '--limit xxx' - this is mainly to be used for debugging purposes
|
||||||
|
- ensure that file attachments (if available) are downloaded as well - zip these up into a single zip file and name it after the downloaded .eml file but with .zip instead
|
||||||
|
- keep track of the latest email downloaded - if `--update` is specified then just pull back emails newer than the last email downloaded
|
||||||
Reference in New Issue
Block a user