Hmm, the pre-mix doesn’t actually reference the traefik network. Is it necessary?
I removed the reference to the traefik network and it seems to work fine.
I’m trying to sync my email over from my old server using mbsync. It’s super slow. Way slower than when I tried it before with a configured server rather than using docker. Have you seen performance problems using the cephfs? Any suggestions for how to pinpoint it?
CPU usage on all the nodes is at 3% or so. The nodes all have 16GB of RAM. NVME drives for boot and a separate NVME drive for storage. These things should be stupid fast.
top - 19:00:39 up 3 days, 22:02, 1 user, load average: 0.00, 0.02, 0.01
Tasks: 180 total, 1 running, 179 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.2 us, 0.1 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16316016 total, 11687504 free, 1821032 used, 2807480 buff/cache
KiB Swap: 16662524 total, 16662524 free, 0 used. 14079380 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
23619 ceph 20 0 1147072 324496 28944 S 1.3 2.0 11:48.93 ceph-osd
1120 root 20 0 1622524 97136 33344 S 1.0 0.6 37:15.53 dockerd
25133 ceph 20 0 520532 144456 17240 S 0.7 0.9 5:12.71 ceph-mds
22597 ceph 20 0 1523088 1.086g 21560 S 0.3 7.0 8:34.26 ceph-mon
26307 root 20 0 0 0 0 S 0.3 0.0 0:00.11 kworker/1:0
27062 root 20 0 41800 3628 3032 R 0.3 0.0 0:00.06 top
1 root 20 0 39424 7384 3892 S 0.0 0.0 0:06.88 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.03 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:00.14 ksoftirqd/0
7 root 20 0 0 0 0 S 0.0 0.0 0:38.60 rcu_sched
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
9 root rt 0 0 0 0 S 0.0 0.0 0:00.06 migration/0
10 root rt 0 0 0 0 S 0.0 0.0 0:01.44 watchdog/0
I agree it doesn’t seem to be ceph which is struggling. Maybe try benchmarking copying a 10GB (or a size > your available RAM, to avoid caching) file locally one one of the nodes, vs copying the same file into cephfs?
docker-mailserver does include fetchmail support, although I haven’t played with it. Maybe you could use setup.sh to suck the mail off your old host that way?
D
On the local filesystem:
sudo dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 0.833802 s, 1.3 GB/s
On the cephfs filesystem:
sudo dd if=/dev/zero of=here bs=1G count=1 oflag=direct
1+0 records in
1+0 records out
1073741824 bytes (1.1 GB, 1.0 GiB) copied, 30.706 s, 35.0 MB/s
That’s pretty terrible.
Okay, I’m trying to get rainloop up and running. It looks like they support sieve now. This is my first time trying to get a service up in this context, so I’m looking for advice Here’s my docker-compose.yml. I get an error that I can’t get a secure connection to rainloop.gerg.org, so I’m wondering what I’m doing wrong:
version: '3'
services:
mail:
image: tvial/docker-mailserver:latest
ports:
- "25:25"
- "587:587"
- "993:993"
volumes:
- /var/data/mailserver/maildata:/var/mail
- /var/data/mailserver/mailstate:/var/mail-state
- /var/data/mailserver/config:/tmp/docker-mailserver
- /var/data/mailserver/letsencrypt:/etc/letsencrypt
env_file: /var/data/mailserver/.env
networks:
- internal
deploy:
replicas: 1
rainloop:
image: hardware/rainloop
networks:
- internal
- traefik_public
deploy:
labels:
- traefik.frontend.rule=Host:rainloop.gerg.org
- traefik.docker.network=traefik_public
- traefik.port=80
volumes:
- /var/data/mailserver/rainloop:/rainloop/data
networks:
traefik_public:
external: true
internal:
driver: overlay
ipam:
config:
- subnet: 172.16.2.0/24
Looks like traefik.port should be 8888 instead of 80. Seems to work! Though I don’t have sieve working quite yet…
Excellent Yes, traefik_port should be whatever the “app” container listens on. Most containers listen on port 80, but if they don’t (and it’s not mentioned in their docs), you can find out by either:
- Inspecting their Dockerfile
- Inspecting the container using “docker inspect [container-id]”
I’m keen to know how sieve goes - you’re one step ahead of my recipe now
D
I didn’t have starttls turned on for sieve in rainloop. Once I did that, it seems to be working.
I ran into an odd issue with sieve rules. They make a bogus directory show up in the mail apps. The workaround is to put an override into config/dovecot.cf
plugin {
sieve = /var/mail/sieve/%d/%n/.dovecot.sieve
sieve_dir = /var/mail/sieve/%d/%n/sieve
}
I actually like this fix better than the one that’s working its way through mailserver.
Here’s a reference to the issue: https://github.com/tomav/docker-mailserver/issues/508
I have a problem with my mailserver. It’s getting pounded (apparently by one of my devices?) I see this in the logs:
Oct 20 00:53:14 011c9f505ae6 dovecot: imap-login: Maximum number of connections from user+IP exceeded (mail_max_userip_connections=10): user=<[email protected]>, method=PLAIN, rip=10.255.0.2, lip=10.255.0.14, TLS, session=<GgPL5e9bGfQK/wAC>
Top shows:
I can’t start up a shell on the docker container:
oci runtime error: exec failed: container_linux.go:265: starting container process caused "process_linux.go:84: executing setns process caused \"exit status 15\""
And I can’t find a way to restart the service. Am I at the docker stack rm, docker stack deploy stage?
Remember, you’re using swarm ingress, so any inbound traffic will appear to come from that address. Maybe you’re being brute-forced? You should be able to shell into the container though, using something like this:
[root@ds3 ~]# docker exec -it 5947 bash
root@5947b30c889b:/#
Can you get logs using setup.sh?
Instead of deleting and redeploying the whole stack, you could just stop the specific container, and let swarm auto-recover it, but it will most likely auto-start it on a different node, which might break your inbound mail NAT.
I couldn’t stop the docker processes. Couldn’t even reboot. I ended up having to cycle power on the machine to get things back.
Then mail wasn’t working. Turns out keepalived on one of the machines was in a weird state (I must have missed starting ipvs, but it was in /etc/rc.local), so the firewall was pointing at the wrong machine. The downside of having the mail process pinned to one machine. Guess I need to spend some more time investigating mail in a swarm (unless you’ve already figured it out
I wish I’d already figured it out, but sadly not
I ended up in a bad state again with "maximum number of connections exceeded. My “solution” is to add to the dovecot.cf:
protocol imap {
# Space separated list of plugins to load (default is global mail_plugins).
#mail_plugins = $mail_plugins
# Maximum number of IMAP connections allowed for a user from each IP address.
# NOTE: The username is compared case-sensitively.
mail_max_userip_connections = 100
}
Then to restart the mailserver (I couldn’t find a way to restart a service from docker stack):
sudo docker service update --force mailserver_mail
I thought the LetsEncrypt certificates would automatically renew. They didn’t and mail is not happy. Did I miss something?or did I mis-configure.
Greg
Eeeew. I thought so too, but I’m in the same boat. I’ll check it out…
OK, so preliminary research says we have to renew our certs by doing something like this:
cd /var/data/mailserver
docker run -ti --rm -v "$(pwd)"/letsencrypt:/etc/letsencrypt certbot/certbot renew
Sadly, this doesn’t work for my certs, which were registered --dns --manual
- as it turns out, I have to regenerate them every 90 days
Let me know how it goes?
D
No luck here:
Processing /etc/letsencrypt/renewal/mail.gerg.org.conf
-------------------------------------------------------------------------------
Cert is due for renewal, auto-renewing...
Could not choose appropriate plugin: The manual plugin is not working; there may be problems with your existing configuration.
The error was: PluginError('An authentication script must be provided with --manual-auth-hook when using the manual plugin non-interactively.',)
Attempting to renew cert (mail.gerg.org) from /etc/letsencrypt/renewal/mail.gerg.org.conf produced an unexpected error: The manual plugin is not working; there may be problems with your existing configuration.
The error was: PluginError('An authentication script must be provided with --manual-auth-hook when using the manual plugin non-interactively.',). Skipping.
All renewal attempts failed. The following certs could not be renewed:
/etc/letsencrypt/live/mail.gerg.org/fullchain.pem (failure)
-------------------------------------------------------------------------------
All renewal attempts failed. The following certs could not be renewed:
/etc/letsencrypt/live/mail.gerg.org/fullchain.pem (failure)
-------------------------------------------------------------------------------
I followed your recipe using the domain challenge, so I guess I also have to do the manual updates. Since I don’t have to worry about it again for 3 months, I’ll figure something out closer to that time.
Yeah, likewise, I just manually regenerated my certs. Some ideas here - we could add a “cron-type” container ala-NextCloud, which attempts the cert renewal daily (it should do nothing provided the cert is not due for expiry). I noticed that the DNS TXT entry for the verification didn’t change, so it may be possible to fully-automate the “manual” regeneration