Docker NFS volumes troubleshooting

Toasty · June 13, 2019, 9:13pm

I have a weird problem with NFS and volumes right now, maybe somebody here knows what’s up

I’m trying to move my docker swarm to use volume mounts instead of bind mounts to a nfs share that is mounted on boot. As such, I started digging into the native support for this in docker (which isn’t super obvious at first). You can basically do the following in a yaml:

volumes:
    <name of volume>:
        driver: local
        driver_opts:
            type: "nfs"
            device: "<nfs address>:<nfs volume mount and path>"

Or create it thru the command line, same diff. One could also a o in the driver_opts to specify addr= which allows you to add nfs mount options (think nolock, soft or hard). A quick docker-compose up etc and I do see the volume mount, but trying to access it from the sonarr web interface to add a show and it says User bla can't write to this location. So we fiddle with the nfs export for a tick to make sure things ok. My final export looks like this (on a synology).

/volume1/dockervolumes  *(rw,async,no_wdelay,crossmnt,insecure,no_root_squash,insecure_locks,sec=sys,anonuid=1025,anongid=100)

I’m not too sure about the anonuid stuff, but it’s generated from the synology web ui and has worked for a while now with fstab mount and direct binds, so i’m guessing that isn’t the problem. I’d like it to do all_squash but it seems like docker swarm doesn’t like that because of some internal chowning it does, so no joy there. Figured it might be something with sonarr image permissions, but the PUID and GUID are set. To make sure I’m not cray cray I do a quick docker exec -it <complicated swarm name> bash and try to create a file on the share … and low and behold it works. Try sonarr again, still no go.

I’ve also been tinkering with trying some volume plugins. I settled on Rancher Convoy at first, but that had the same issues, then I tried to run Openstack Netshare which does work. also does not work

I guess my question is, how should this work and why is this so oddly hard weird, and why are the built in solutions so obtuse and hard to discover? What am i missing here? Did I grab the wrong user from my hosts and is it not allowed to write to the share?

kruecab · June 13, 2019, 9:52pm

I don’t have an answer except to say that permissions over NFS with docker and Synology is a problem we share! I administered a variety of enterprise NFS servers (and other storage) for quite a while and I suspect part of the problem is with Synology, but another part may be with docker - I just haven’t had enough time to nail it down yet. One problem with Synology, is that it reserves all UID’s below 1024, meaning you can’t create any UID that low on your Synology… and may Linux distros and docker images want to use UID’s lower than that as a default… like UID 100, GID 100 is common. With “no_root_squash”, the NFS server should do absolutely no authentication… if someone says they are UID 0-65535, it should just accept what the client says… however, I still see “Permission Denied” for some docker images. The anonuid and anongid options force the UID to one of those when the NFS server doesn’t recognize a UID/GID, which could work against us in this case - however it can help with remapping <1024 UIDs into something Synology is okay with. It may have to do with NFS v4 which can provide some server-side authentication, which could make this worse because of the aforementioned Synology idiosyncrasy relating to UIDs lower than 1024. NFS v4 usermapping could also be a help for folks in our case, but without doing some packet sniffing, I can’t readily say.

What has worked for me thus far is this:

PUID/PGID of 1024/1024 in environment files. Note this is common to all linuxserver.io images, but not all others.
NFS options like you have above
Mount NFS on the docker swarm nodes in a common place, then bind mount each container into the spot it needs to be.

MariaDB seems to be the most difficult to get and keep going with this, but I’ve had trouble with Nextcloud as well. Generally they have one-time setup scripts that want to run as root and set a bunch of permissions up and those tend to bomb out. If you are having this problem too, something to try is to initially bind mount your container to a local FS on one node and get the container running so it gets through all its init functions, then shut down your stack, move the data with permissions and UID/GID intact to your NFS server, and re-point the stack/container at the NFS location.

I’ve not worked on this in a couple months, but will take up the charge again at some point. Perhaps this thread will motivate me. Please share your results and maybe we can nail this together!

FYI, there are several threads out there on the Internet about problems with NFS and docker so we aren’t totally alone.

Toasty · June 13, 2019, 10:07pm

Well that sucks and seems to be the root cause, but I guess is to

be expected for a consumer product. I wonder if we can tinker with the nfs-server on the synologies without having it being overwritten by an update (these also seems like more enterprise-y solutions then what they are targeting i guess).

I’ll give this a go and see, but you seem to indicate this not having too much traction on your setups?

This is what i have now, and am trying to move away from

That is my next hurdle I guess. Most of these apps that utilize these services weren’t exactly made cloud native I guess (esp the sqlite apps)

kruecab · June 14, 2019, 9:07pm

I’ve got quite a few things from the cookbook running. I’ve mainly had trouble with NextCloud and phpIPAM.

The main reason I chose to NFS mount in the swarm nodes and bind mount is because I want the storage to be easy to swap out. So my stuff is in /mnt/data and that is NFS mounted now, but I could swap it out for Gluster or anything else as long as the data stayed in the same path. My other reason is that the volume mappings are easy and take one line whereas NFS mounting takes a few lines… I have several stacks and like any good sysadmin, I’m lazy. My final reason for doing it this way is because I find myself poking around in the volumes outside docker to see what got dropeed there or to tweak a setting or whatever - this just feels easier to do if the data is always mounted to the swarm nodes.

I think the biggest challenge for us persistent-data-store-on-NFS folks is that the “cloud native” thinkers are usually thinking about object stores or other backends that are themselves “cloud native”… NFS is more of a backstop and more used in traditional enterprise IT shops. But probably our biggest challenge is some of the peculiarities of Synology NFS and UID handling/mapping… I’ve done all kinds of crazy stuff with NFS servers from v2-v4 and it should all work… if we can crack the code on Synology, I think we’ll be most of the way there and should make NFS volumes vs. bind mounting work too.