Skip to content

Conversation

@seliger
Copy link

@seliger seliger commented Dec 24, 2025

fmbox was treating eml files like mbox files, assuming the first line was a "From" header, but this is not always the case. For example, Mailcow uses doveadm from Dovecot to export Maildir-formatted spools that do not have From: as the first header line, and that header was getting lost.

This change works around the issue by rewinding to the beginning of the file if the first line does not start with "From:".

Below is an example header from one of the eml files processed.

X-Gmail-Labels: Archive/2025
Return-Path: <bounce+2404ec.293f67-sponsors=xxxxxxxxxxx.org@xxxxxxxxxxxxx.com>
Received: from board.xxxxxxxxxxx.org ([0.0.0.0])
	by d959cc03a3fb with LMTP
	id COhRGIOEVWiHPAAAmBUvNQ
	(envelope-from <bounce+2404ec.293f67-sponsors=xxxxxxxxxxx.org@xxxxxxxxxxxxx.com>); Fri, 20 Jun 2025 11:55:47 -0400
Received: from 76.static.xxxxxxxxxxxxx.com (76.static.xxxxxxxxxxxxx.com [0.0.0.0])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by board.xxxxxxxxxxx.org (Postcow) with ESMTPS id 0666DAAAF00
	for <sponsors@xxxxxxxxxxx.org>; Fri, 20 Jun 2025 11:55:42 -0400 (EDT)
Authentication-Results: board.xxxxxxxxxxx.org;
	dkim=pass header.d=xxxxxxxxxxxxx.com header.s=krs header.b=24M9VyUe;
	dmarc=pass (policy=reject) header.from=xxxxxxxxxxxxx.com;
	spf=pass (board.xxxxxxxxxxx.org: domain of "bounce+2404ec.293f67-sponsors=xxxxxxxxxxx.org@xxxxxxxxxxxxx.com" designates 0.0.0.0 as permitted sender) smtp.mailfrom="bounce+2404ec.293f67-sponsors=xxxxxxxxxxx.org@xxxxxxxxxxxxx.com"
DKIM-Signature: a=rsa-sha256; v=1; c=relaxed/relaxed; d=xxxxxxxxxxxxx.com; q=dns/txt; s=krs; t=1750434942; x=1750442142;
 h=Message-Id: List-Unsubscribe-Post: List-Unsubscribe: To: To: From: From: Subject: Subject: Content-Type: Mime-Version: Date: Sender: Sender;
 bh=NWNeMIamMGwN6jKQmnXD+Cb7ysofCxgyInd43UNWHsg=;
 b=24M9VyUeucMx4VRhVt/6WGRIkE3nf1zr5lCn41PtRaAZW8ARM9S8TXWhTsm34ockIE1/fob4z7ysGbI3mX+/V8PHK7fndu6YhlqgyST936Vn3Bri3JpmNRxuvFenpdZ3pVODv4YOiknKDbrNqTiDyjaJXyo9JbMq25zWQsC4O48=
X-Mailgun-Sid: WyJlYjUyOSIsInNwb25zb3JzQHRpcHBlNGhmYWlyLm9yZyIsIjI5M2Y2NyJd
Received: by ffedc5b9c4d3 with HTTP id 6855847e91c12c4617ab78bc; Fri, 20 Jun 2025
 15:55:42 GMT
X-Mailgun-Sending-Ip: 0.0.0.0
Sender: noreply+automations@xxxxxxxxxxxxx.com
Date: Fri, 20 Jun 2025 15:55:42 +0000
Mime-Version: 1.0
Content-Type: multipart/alternative;
 boundary="5c600763ff86c733f479d12499008c37b15ae8ea1873b206ebbdcfb065cb"
Subject: Sponsorship Pledge
From: Xxxxxxxx Automations <noreply+automations@xxxxxxxxxxxxx.com>
To: sponsors@xxxxxxxxxxx.org
Message-Id: <20250620155542.83a8f3e7e36d0c9c@xxxxxxxxxxxxx.com>
X-Last-TLS-Session-Version: TLSv1.2
X-Rspamd-Queue-Id: 0666DAAAF00
X-Spamd-Result: default: False [0.76 / 15.00];
	URI_COUNT_ODD(1.00)[1];
	IP_REPUTATION_HAM(-0.53)[asn: 396479(-0.15), country: US(-0.01), ip: 0.0.0.0(-0.37)];
	MV_CASE(0.50)[];
	DMARC_POLICY_ALLOW(-0.50)[xxxxxxxxxxxxx.com,reject];
	MX_INVALID(0.50)[];
	FORGED_SENDER(0.30)[noreply@xxxxxxxxxxxxx.com,bounce@xxxxxxxxxxxxx.com];
	R_DKIM_ALLOW(-0.20)[xxxxxxxxxxxxx.com:s=krs];
	R_SPF_ALLOW(-0.20)[+ip4:0.0.0.0/21];
	MIME_GOOD(-0.10)[multipart/alternative,text/plain];
	HAS_LIST_UNSUB(-0.01)[];
	RCPT_MAILCOW_DOMAIN(0.00)[xxxxxxxxxxx.org];
	MIME_TRACE(0.00)[0:+,1:+,2:~];
	BCC(0.00)[];
	RCPT_COUNT_ONE(0.00)[1];
	ARC_NA(0.00)[];
	TO_MATCH_ENVRCPT_ALL(0.00)[];
	FROM_HAS_DN(0.00)[];
	TAGGED_FROM(0.00)[2404ec.293f67-sponsors=xxxxxxxxxxx.org,automations];
	MID_RHS_MATCH_FROM(0.00)[];
	TO_DN_NONE(0.00)[];
	FROM_NEQ_ENVFROM(0.00)[noreply@xxxxxxxxxxxxx.com,bounce@xxxxxxxxxxxxx.com];
	RCVD_TLS_LAST(0.00)[];
	ASN(0.00)[asn:396479, ipnet:0.0.0.0/24, country:US];
	MISSING_XM_UA(0.00)[];
	RCVD_COUNT_ONE(0.00)[1];
	DKIM_TRACE(0.00)[xxxxxxxxxxxxx.com:+];
	RBL_NIXSPAM_FAIL(0.00)[0.0.0.0:server fail];
	RCVD_IN_DNSWL_NONE(0.00)[0.0.0.0:from]

To test this, I came up with this snippet.

import fmbox

mbox = fmbox.fmbox("test.eml")

message = mbox.next()

print (f"Gmail Labels: {message.get_header(b"X-Gmail-Labels")}")
print (f"Return Path: {message.get_header(b"Return-Path")}")

If you swap the two headers in the eml file, you can see where the top line is always removed. For me, this was breaking GYB from shunting email into the correct label structures.

I'm not saying this is the best solution, but it works for my use case and respects files that start with an actual "From:" header on the first line, like traditional mbox bundles.

fmbox was treating eml files like mbox files, assuming the first line was a "From" header, but this is not always the case. For example, Mailcow uses dovecotadm to export Maildir formatted spools that do not have From: as the first line of headers and that header was getting lost.

This change works around the issue by rewinding to the beginning of the file if the first line does not start with "From:".
[BUG] fmbox loses first header for EML files
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant