Shared Storage (Ceph) - Funky Penguin's Geek Cookbook

While Docker Swarm is great for keeping containers running (and restarting those that fail), it does nothing for persistent storage. This means if you actually want your containers to keep any data persistent across restarts (hint: you do!), you need to provide shared storage to every docker node.


This is a companion discussion topic for the original entry at https://geek-cookbook.funkypenguin.co.nz/ha-docker-swarm/shared-storage-ceph/

89 posts were split to a new topic: [Archived] Shared Storage (Ceph) Jewel

Thanks to @TNTechnoHermitā€™s persistence, I spent the morning refreshing this recipe :slight_smile: Hereā€™s the freshly-baked version : Shared Storage (Ceph) |惻āˆ€ćƒ»

D

1 Like

Thank you @funkypenguin and @TNTechnoHermit for all your incredible hard work, looking forward to testing this out later.

Quick question, assuming there is no way to run the ceph dashboard as a docker container so it can run on the swarm and be load balanced/always available regardless of which node may become unavailable?

Hey @waynehaffenden - I assume that itā€™s possible, since:

  1. the MGR is what provides the dashboard,
  2. it doesnā€™t depend on any sort of persistence, and
  3. you can have more than one MGR.

Iā€™m just unsure whether you can have more than one MGR concurrently. Iā€™m happy to accept PRs to the recipe if someone wants to test it out :wink:

D

The ingredients section says:

Each node should have the IP of every other participating node hard-coded in /etc/hosts ( including its own IP )

Maybe thatā€™s too obscure. The intention is that you manually populate /etc/hosts on each node, so that youā€™re resilient to DNS failures, but you still get the improved readability of using actual node names instead of IP addresses.

Does it work if you populate /etc/hosts?

What OS are you using?

Youā€™re missing the port number. Also, as I understand it, you just need to specify one node, not all three - If all 3 nodes are up when the initial mount to say Node 1 is made, the mount will be aware of the other 2 nodes?

Perhaps just test the mount (using mount command) direct to Node 1 (and only node 1) IP first?

FYI, itā€™s working for me without the port number (provided youā€™ve not deviated from the default port). And yes, you only need to specify one node, but it canā€™t be the node youā€™re on, since the mon daemon may not have started yet (@TNTechnoHermit had this problem initially). Since you canā€™t predict which of your nodes will be up/down when this particular node boots, itā€™s more resilient to just add them all :slight_smile:

@funkypenguin Iā€™ve finally got round to setting this up and it all appears to be working great, thank you. My only question is that the output from ceph status shows a total of 30GiB (3 x 10GB disks) with an availability of 27GiB but when doing a df -h the mounted ceph path shows only 8.5G available? Hereā€™s the output from ceph status:

cluster:
id: 630cf0f4-a389-11ea-978c-fa163ec47a7e
health: HEALTH_OK

services:
mon: 3 daemons, quorum dn0,dn1,dn2 (age 11h)
mgr: dn0.qomczu(active, since 11h), standbys: dn1.cpxaem
mds: data:1 {0=data.dn2.moduyx=up:active} 1 up:standby
osd: 3 osds: 3 up (since 11h), 3 in (since 12h)

task status:
scrub status:
mds.data.dn2.moduyx: idle

data:
pools: 3 pools, 65 pgs
objects: 27 objects, 74 KiB
usage: 3.0 GiB used, 27 GiB / 30 GiB avail
pgs: 65 active+clean

Thank you

Edit: Also, notice only one of my nodes is in standby as a manager, how do I get the 3rd node in standby too? If I reboot any of the VPS (for a kernel update for example), rebooting dn1 (in standby) or dn2 (the one not in standby) ceph continues to operate with a warning. However, rebooting dn0 (the one with cephadm on) ceph hangs and doesnā€™t operate (this concerns me to what would happen once I have docker volumes mounted to here with DBs running etc if ceph stops!) until the VPS is back online. Any ideas?

Edit 2: (Sorry) Iā€™m also receiving a health warning with 1 hosts fail cephadm check (How do I go about finding out which host has the issue) and any ideas on what might cause this. This is only a mess around so happy to wipe and try again but thought Iā€™d best ask first :smiley:

P.S Iā€™m a complete noob to all of this stuff so forgive my ignorance :slight_smile:

Thisā€™ll mean that you have 30GiB of ā€œrawā€ storage, but due to the amount of replicas youā€™ve chosen (3, by default) in the pool used to back cephfs, youā€™re only presented with an availability of 8.5G for cephfs.

If (for example), you created another pool (say, for rbd storage) with 2 replicas instead of 3, youā€™d have 17G available in this pool, and as you consumed each pool, the available storage would be reduced accordingly.

1 Like

Iā€™ve not tried this yet, but based on Orchestrator CLI ā€” Ceph Documentation, I think youā€™d do something like ceph orch apply mgr 3 dn2. However, how do you know that ceph hangs when dn0 reboots? What does ceph -s on dn1/2 tell you when this happens?

@funkypenguin Thank you David for explaining that, makes more sense now. My next question of course is how would I go about specifying 2 replicas for the pool I have or for a new one (Happy to start again if easier)? The idea in my mind is that I can then lose one VPS (for a reboot) while having enough space left over out of the 3 nodes I currently have setup.

@funkypenguin Thanks again, Iā€™ll have a play with this and report back. Noticed Iā€™ve only got 2/3 nodes for MDS too. Out of interest, did you find this was the case when you tried this or did you have all hosts showing for all the various services (like the old guide used to)?

When running ceph -s on dn1/2 (while dn0 is offline) it just sits and never returns, also, if I put a test file in the ceph mount (during this time) it is not copied over to the remaining host. If I reboot either the other two but not dn0 ceph -s returns immediately with the warning that a host is down (as expected) and continues to operate normally

In fstab you need to put:

manager1,manager2,manager3:/ /var/data ceph name=admin,secret=[cephkey],noatime,_netdev 0 0

[cephkey] is generated by sudo ceph-authtool -p /etc/ceph/ceph.client.admin.keyring.

In Debian Buster you must install ceph-common.

Sorry for my bad englishā€¦ I hope I have helped.

@funkypenguin, you forgot add secret option at fstab in to the recipe.

Welcome @cristain_dkb! I turns out I didnā€™t need a secret option - provided I installed ceph-common using the Octopus repo installed by cephadm, it just worked :slight_smile:

Maybe, itā€™s a distro thing. With Debian Buster without secret option I had error 22, like @zeiglecm . But with secret option is working perfect.

Another thing, to run ceph orch host add [node-name], I needed execute sudo ./cephadm shell --fsid [fsid] -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring before.

@funkypenguin, this is a great job. Thanks you very much. Iā€™m going to deploy a Swarm in a University in Argentina with servers in premise. And these guides are very useful.

1 Like

@cristain_dkb Iā€™ve added a section on Debian Buster, but I canā€™t test it currently: Shared Storage (Ceph) |惻āˆ€ćƒ»

Would you mind validating that my example works for you?

Thanks!
D