Problems with Exim and sssd
I'm just putting this up here in the hopes that Google will pick it up and potentially help someone else out, in case someone ever runs into this situation.
Earlier today, I ran into some issues on an Exim mailserver where deliveries were mostly not happening. The relevant messages in Exim's logfile were:
2015-05-14 09:57:03 1YsuZL-0003XH-4A == email@example.com R=outbound_spam T=remote_smtp defer (-1): smtp transport process returned non-zero status 0x0100: exit code 1
Attempting to deliver the message manually from the commandline resulted in various messages along the lines of:
1YsuZL-0003XH-4A Cannot open main log file "/var/log/exim/main.log": Permission denied: euid=93 egid=93
And using strace against the mail delivery attempt was showing mystifying things like this:
[pid 18566] open("/var/log/exim/main.log", O_WRONLY|O_APPEND) = -1 EACCES (Permission denied)
I say "mystifying" because the UID/GIDs there matched perfectly fine (the exim user/group are both 93), the setuid() and setgid() calls I saw from the strace seemed fine, and I even went so far as to give the exim user a temporary shell so that I could log in and verify that it had all the access it needed.
Additionally, while attempting to debug the problem, I'd noticed some disturbing messages while trying fresh SSH logins:
Last login: Thu May 14 10:47:58 2015 from foo.domain.com id: cannot find name for user ID 601000001 id: cannot find name for group ID 601000001 id: cannot find name for user ID 601000001
I was still let into the box, but only because sssd had been caching my credentials (the machine talks to FreeIPA for user authentication). When I cleared out sssd's caches, I was no longer able to authenticate as my user.
Now both of those various problems appear on Google in a few places, but without any solutions which happened to be applicable. For the Exim problem, the most concrete solution was somebody fully removing and then reinstalling Exim, and in the sssd case, someone had suggested running an "ipa-client-install --uninstall" and then re-joining, which I'd tried doing without success.
In the end, the problem on my end turned out to be more basic: a backup restore process earlier in the day had, for reasons still unknown, reset the permissions of both /var and /var/spool to be owned by mail, with permissions of 0750. That's why my attempts to log in as the exim user itself worked fine - that user does have permission to go into those dirs. But other unprivileged daemons couldn't, and it was causing all kinds of weird havoc on the system.
So yeah: if you do run into these problems on a system, it's something you might want to check out, at least.