In this article, I'll show you how to deploy a Ceph storage cluster on three nodes by using Ansible. We will go with the latest version of Ceph -- Nautilus.


I'll skip all the preparations and expect that you already run three (either physical or virtual) nodes ready for deployment.

So, the overall requirements are:

  • Three nodes running Ubuntu Bionic with at least one additional disk on each node for storage
  • Configured network (including firewall), so the nodes can communicate with each other (refer to the documentation for ports: Network Configuration Reference — Ceph Documentation)
  • You have the basic knowledge of Ceph and how it's built (MONs, MGRs, OSDs, RGWs, etc.)
  • You already configured passwordless SSH user for Ansible on your nodes (in this tutorial we'll use the user called ansiblemaintenance)

If you're new into Ansible and Ceph itself, please read it carefully, so you can avoid most mistakes I did. I'll give my best to explain the basic things you need to know.

Step 1: Introduction

As I said before, in this tutorial we're going to install the Ceph storage cluster. And as you might already know, Ceph has a distributed nature and consists of different kinds of daemons: mons, osds, mgrs, etc. These daemons are responsible for specific kind of tasks in the cluster, so e.g.:

  • MONs are monitoring the cluster state, its configurations, and data placement.
  • OSDs are storing the data and managing the journaled disk storage (also known as BlueStore).
  • MGRs are responsible for maintenance of tasks, bookkeeping and cluster management.
  • RGWs provide an object storage layer as an interface compatible with Amazon S3 or OpenStack Swift APIs and so on.

So, if I say "we're going to install storage cluster", I mean, we're going to install just the storage cluster with all its functionality, but no other kind of daemons like CephFS or MDS. However, if you also need this functionality -- keep reading this tutorial: it's pretty easy to also implement other daemons you need. If you also plan to use NFS over RGW functionality, read the addition at the end of the article.

The distributed nature also means that we're able to install different daemons on different nodes. That means it's not much more effort to install everything just on three nodes or distribute it over hundreds of different nodes in your network just by editing one file. That's kind of... awesome, isn't?

So, if you're going to deploy Ceph in production, you might want to stop at this point and think about the Ceph services distribution concept for your needs. For Ceph itself it's fine if you deploy everything on three nodes (however, at least three nodes are the recommendation by Ceph team).

To keep the tutorial simple, I'll install all daemons on all three nodes, but also explain the way you can distribute them.

Step 2: Preparations

The Ansible repository for Ceph deployment is maintaining by the Ceph team and also hosted on the Ceph's Github account. Let's clone it and look into its stable branches.

git clone [email protected]:ceph/ceph-ansible.git && cd ceph-ansible/
git fetch origin && git branch -a |grep stable

You should see output like this:

[email protected]~/ceph-ansible$ git branch -a |grep stable
  remotes/origin/2451-bkp-stable-3.0
  remotes/origin/guits-fix_stable-4.0_update
  remotes/origin/guits-iscsi_stable3.0
  remotes/origin/guits-refact_ci_testing_releases_stable30
  remotes/origin/guits-refact_ci_testing_releases_stable31
  remotes/origin/mergify/bp/stable-4.0/pr-4168
  remotes/origin/stable-2.1
  remotes/origin/stable-2.2
  remotes/origin/stable-3.0
  remotes/origin/stable-3.1
  remotes/origin/stable-3.2
  remotes/origin/stable-4.0

Every major Ceph version has to be deployed with the appropriate ceph_ansible version and the version of Ansible binary on your computer. So, the Mimic version has to be deployed with stable-3.x branch, where the Nautilus version requires the stable-4.0 version. You can also check requirements here: ceph-ansible — ceph-ansible documentation. Let's check out the required branch and install requirements defined in the repository:

git checkout stable-4.0
pip3 install -r requirements.txt

The step above will install all necessary tools you need for the deployment and also the version of ansible you need.

Note: I use pip3 for the installation since pip is based on Python 2.7 and it's going EOL on 1st January 2020. Noteworthy is also that installing ansible with pip/pip3 will make your life a bit easier since I got a handful of dependencies related issues by installing ansible with APT on Ubuntu Bionic (however, it's your choice ;)

Step 3: The ceph-ansible repository layout

Don't be afraid, if you're looking into the repository and see hundreds of files in it. We need most of them but have to modify just a bunch of them. So, what do we need? Let's go through the repository and explain what are they for.

First, let's have a look into the root folder of the repository. We already installed all the requirements by running the command above and using the requirements.txt file. There is no explanation needed since it holds all the tools to be installed as a new line in a file. Besides that, we also have to modify the following files:

  • inventory file, that we have to create in the root folder. It holds all the hosts we want Ceph to deploy on. This is the only file where we have to define the distribution of Ceph services in our infrastructure.
  • site.yaml - the actual playbook for our cluster that holds all the tasks that have to be deployed. This file is already in the root folder, but have to be renamed.
  • Besides that, we also have some files inside of the group_vars subfolder. These files are holding variables we have to change to deploy everything in the right way (e.g. the right version of Ceph).

There are different ways to design the content of files inside of the group_vars subfolder. If you look into this folder, you'll see different files with names of Ceph daemons there:

[email protected]~/ceph-ansible/group_vars$ ls -las
total 140
 4 drwxrwxr-x  2 ekoenig ekoenig  4096 Jun 28 22:01 .
 4 drwxrwxr-x 13 ekoenig ekoenig  4096 Jun 28 22:01 ..
28 -rw-rw-r--  1 ekoenig ekoenig 26762 Jun 28 22:01 all.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig   310 Jun 28 22:01 ceph-fetch-keys.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig  2217 Jun 28 22:01 clients.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig   666 Jun 28 22:01 dashboards.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig   740 Jun 28 22:01 docker-commons.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig   269 Jun 28 22:01 factss.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig   774 Jun 28 22:01 grafanas.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig  4089 Jun 28 22:01 iscsigws.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig  1618 Jun 28 22:01 mdss.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig  1846 Jun 28 22:01 mgrs.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig  2864 Jun 28 22:01 mons.yml.sample
 8 -rw-rw-r--  1 ekoenig ekoenig  4458 Jun 28 22:01 nfss.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig   327 Jun 28 22:01 node-exporters.yml.sample
 8 -rw-rw-r--  1 ekoenig ekoenig  4945 Jun 28 22:01 osds.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig   916 Jun 28 22:01 prometheuss.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig  2286 Jun 28 22:01 rbdmirrors.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig   501 Jun 28 22:01 rgwloadbalancers.yml.sample
 4 -rw-rw-r--  1 ekoenig ekoenig  2406 Jun 28 22:01 rgws.yml.sample
28 -rw-rw-r--  1 ekoenig ekoenig 26753 Jun 28 22:01 rhcs.yml.sample

In these files you can configure every variable you need for the service. Like, on which disks OSDs should be deployed on, which object storage you want to use -- FileStore or BlueStore, which OSD scenario you want to use and so on. Additionally, the file all.yaml holds the configuration, that can be assigned to every node in the cluster -- no matter, is it a MON or an OSD node. This can be done by grouping services to nodes in the inventory.

The infrastructure-playbooks subfolder also holds some interesting playbooks you might want to use in the future. So, if you plan to upgrade your cluster to a newer version, you can just copy/move/symlink the rolling-update.yaml playbook into the root folder of ceph-ansible, change the Ceph version inside on group_vars and run it. We tested it by upgrading our cluster from Mimic to Nautilus, it works flawlessly. Also, the Ceph team does a great job by adding new infrastructure playbooks for our needs. Alright, let's configure our inventory file.

Step 4: The ceph-ansible inventory

There are two ways to create the inventory file for Ansible generally: you either can use the yaml format or the ini format. Yaml format might give you a better overview. It looks like the following:

ceph:
  osds:
    ceph-osd1.domain.com:
      ansible_host: 10.1.100.10
    ceph-osd2.domain.com:
      ansible_host: 10.1.100.11
    ceph-osd3.domain.com:
      ansible_host: 10.1.100.12
  mons:
    ceph-mon1.domain.com:
      ansible_host: 10.1.100.20
    ceph-mon2.domain.com:
      ansible_host: 10.1.100.21
    ceph-mon2.domain.com:
      ansible_host: 10.1.100.22
  ...

The ini format (imo) looks a bit simpler. We'll use this one in this guide. So, create a new file inside of the root folder and fill it with your hosts like in the example below. Since we deploy everything just on our three nodes, you can keep it like in this example.

You might ask, what's about the lower section - [all:children]? And that's the most important thing: groups in this inventory are mapped to our files inside of the group_vars folder. So, all variables from the group_vars/mons.yaml will be applied to hosts defined in the [mons] group. Same for all other groups. Additionally, everything defined in the group_vars/all.yaml file will be applied to all groups since the [all:children] group holds all other groups. In other words: you're also allowed to put all the variables for every service just into the group_vars/all.yaml file and it will work.

[mons]
ceph1.yourdomain.com
ceph2.yourdomain.com
ceph3.yourdomain.com

[osds]
ceph1.yourdomain.com
ceph2.yourdomain.com
ceph3.yourdomain.com

[mgrs]
ceph1.yourdomain.com
ceph2.yourdomain.com
ceph3.yourdomain.com

[rgws]
ceph1.yourdomain.com
ceph2.yourdomain.com
ceph3.yourdomain.com

[all:children]
mons
osds
mgrs
rgws

Let's assume, your infrastructure is a bit more complex as just three nodes and you want to deploy your Ceph Storage Cluster like following:

  • Five physical storage nodes with a bunch of disks in it
  • Three VMs for mons
  • Two VMs for RGWs

In this case, your inventory file could look like in the example below.

[mons]
ceph-mon1.yourdomain.com
ceph-mon2.yourdomain.com
ceph-mon3.yourdomain.com

[osds]
ceph-osd1.yourdomain.com
ceph-osd2.yourdomain.com
ceph-osd3.yourdomain.com
ceph-osd4.yourdomain.com
ceph-osd5.yourdomain.com

[mgrs]
ceph-osd1.yourdomain.com
ceph-osd3.yourdomain.com
ceph-osd5.yourdomain.com

[rgws]
ceph-rgw1.yourdomain.com
ceph-rgw2.yourdomain.com

[all:children]
mons
osds
mgrs
rgws

Pretty simple, right? Now plan your inventory and save it in the root folder of ceph-ansible.

Step 5: Configuring the cluster

Now it's time to write some configurations for our cluster in already mentioned group_vars/ folder. These configurations define how everything should be set up in your cluster, e.g.:

Ceph core:

  • Which version of Ceph to deploy
  • Which repository to use
  • Cluster name
  • Services configuration (e.g. firewall & ntp)
  • Inventory groups mapping

Ceph mons:

  • Network interface for monitors
  • Public/cluster network and IP version

Ceph mgrs:

  • Which modules should be enabled

Ceph RGW:

  • Rados Gateway network interface
  • Creation of pools on deployment

Ceph OSDs:

  • OSD scenario
  • Physical disks for storage
  • Filesystem type
  • Object store type
  • Mount options for disks and so on

I'll split configurations for specific services in its files, but as I said before -- feel free to put everything in one all.yaml file.

Notice: you have to rename these files from .yml.sample to .yaml (.yml will probably also work).

I'll give some important hints regarding some variables. Please refer to comments inside of files for full description, they are pretty good documented.

The all.yaml file:

Define repositories and packages of Ceph you want to install. There's no need to define you platform, since Ansible get it by gathering facts.

---
ceph_mirror: http://download.ceph.com
ceph_origin: repository
ceph_repository: community
ceph_stable: true
ceph_stable_key: https://download.ceph.com/keys.release.asc
ceph_stable_release: nautilus
ceph_stable_repo: "{{ ceph_mirror }}/debian-{{ ceph_stable_release }}"
upgrade_ceph_packages: True

Set some configurations for the cluster. Be careful with the cluster name, since your ceph commands on all nodes will be replaced with the defined name. There are also currently some issues in the ceph tracker regarding manually defined cluster names.

Also, you can let Ceph configure the firewall, so all nodes can communicate with each other on the necessary ports.

Regarding the monitor interface -- you either have to define the interface or the monitor address. The easiest one is to use the name of your primary interfaces on Ceph nodes. Check it out on your hosts with the ip a or ifconfig -a command.

cluster: ceph
ceph_conf_key_directory: /etc/ceph
fetch_directory: fetch/
ntp_service_enabled: true
configure_firewall: false
monitor_interface: ens18

Finally, define our inventory groups:

mon_group_name: mons
osd_group_name: osds
mgr_group_name: mgrs
rgw_group_name: rgws
nfs_group_name: nfss

At the end, your group_vars/all.yaml file should look like this:

---
##################
## INSTALLATION ##
##################
ceph_mirror: http://download.ceph.com
ceph_origin: repository
ceph_repository: community
ceph_stable: true
ceph_stable_key: https://download.ceph.com/keys.release.asc
ceph_stable_release: nautilus
ceph_stable_repo: "{{ ceph_mirror }}/debian-{{ ceph_stable_release }}"
upgrade_ceph_packages: True

##################
## CONFIGURATION #
##################
cluster: ceph
ceph_conf_key_directory: /etc/ceph
fetch_directory: fetch/
ntp_service_enabled: true
configure_firewall: true
monitor_interface: ens18

##################
##   INVENTORY   #
##################
mon_group_name: mons
osd_group_name: osds
mgr_group_name: mgrs
rgw_group_name: rgws
nfs_group_name: nfss

The osds.yaml file:

In this file, we'll define settings for our storage disks.

First of all - disable the OSD auto-discovery, since it automatically takes all available disks for OSDs. In large clusters, it might be a useful option, but in our small cluster, we prefer to define it manually, like in the example below. To verify, just login per SSH into your storage nodes and check available disks with fdisk -l command.

There are different OSD scenarios available, but its worth to mention, that Ceph is going away from collocated and non-collocated modes. So, if you use one of these modes, you won't be able to add new disks in the cluster with add-osd.yaml playbook later, since it will throw an error because of GTP table. In this case, you have to convert your non-lvm disks by following the guide manually: Adding/Removing OSDs — Ceph Documentation.

Also, Ceph recommends to use XFS file system type in production, but you're free to use the one you want. Refer to official docs if in doubt: Hard Disk and File System Recommendations — Ceph Documentation. That's all for the minimal (but good) configuration of OSDs.

---
osd_auto_discovery: false
devices:
  - '/dev/sdc'
  - '/dev/sdd'
  - '/dev/sde'
osd_scenario: lvm
osd_mkfs_type: xfs
osd_objectstore: bluestore

The mgrs.yaml file:

There's not much about the MGRs configuration, basically, we just have to define, what modules should be enabled after the installation:

---
ceph_mgr_modules:
  - status
  - dashboard

The rgws.yaml file:

Same for RGWs, there is not much to configure. You have to set the radosgw_interface to the name of your interface on the system. Also, you can set the email address and the DNS name for your gateway.

For the frontend server, you can use either civetweb or beast. There's no need to configure something, Ceph will automatically setup beast for you (it's also currently recommended by the Ceph team).

Just for the case: you can also configure everything about your RGW later.

In the end, there's just one line required in the rgws.yaml file:

---
radosgw_interface: bond0

Well, that's all for the configuration. Now, let's take a look into the last part: the playbook.

Step 6: The playbook

As I wrote before, this file is already at the right place, but have to be renamed:

[email protected]~/ceph-ansible$ mv site.yml.sample site.yaml

Now, since we're not going to use every Ceph service, it's important to clean up some stuff inside the playbook. First, remove all the roles we don't use in the hosts section. It should only contain the same groups as defined in the all.yaml file:

- hosts:
  - mons
  - osds
  - rgws
  - mgrs

Also, go through the file and remove all the tasks from the  - hosts sections. In the end, your site.yaml file should look like this:

---
- hosts:
  - mons
  - osds
  - mgrs
  - rgws
  - nfss

  gather_facts: false
  any_errors_fatal: true
  become: true

  tags:
    - always

  vars:
    delegate_facts_host: True

  pre_tasks:
    - name: check for python2
      stat:
        path: /usr/bin/python
      ignore_errors: yes
      register: systempython2

    - name: install python2 for debian based systems
      raw: sudo apt-get -y install python-simplejson
      ignore_errors: yes
      when:
        - systempython2.stat is undefined or systempython2.stat.exists == false

    - name: install python2 for fedora
      raw: sudo dnf -y install python creates=/usr/bin/python
      ignore_errors: yes
      when:
        - systempython2.stat is undefined or systempython2.stat.exists == false

    - name: install python2 for opensuse
      raw: sudo zypper -n install python-base creates=/usr/bin/python2.7
      ignore_errors: yes
      when:
        - systempython2.stat is undefined or systempython2.stat.exists == false

    - name: gather facts
      setup:
      when:
        - not delegate_facts_host | bool

    - name: gather and delegate facts
      setup:
      delegate_to: "{{ item }}"
      delegate_facts: True
      with_items: "{{ groups['all'] }}"
      run_once: true
      when:
        - delegate_facts_host | bool

    - name: install required packages for fedora > 23
      raw: sudo dnf -y install python2-dnf libselinux-python ntp
      when:
        - ansible_distribution == 'Fedora'
        - ansible_distribution_major_version|int >= 23

  roles:
    - ceph-defaults
    - ceph-validate
    - ceph-infra

- hosts: mons
  gather_facts: false
  become: True
  pre_tasks:
    - name: set ceph monitor install 'In Progress'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_mon:
            status: "In Progress"
            start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"
  roles:
    - role: ceph-defaults
      tags: ['ceph_update_config']
    - role: ceph-handler
    - role: ceph-common
    - role: ceph-config
      tags: ['ceph_update_config']
    - role: ceph-mon
  post_tasks:
    - name: set ceph monitor install 'Complete'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_mon:
            status: "Complete"
            end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"

- hosts: mgrs
  gather_facts: false
  become: True
  pre_tasks:
    - name: set ceph manager install 'In Progress'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_mgr:
            status: "In Progress"
            start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"
  roles:
    - role: ceph-defaults
      tags: ['ceph_update_config']
    - role: ceph-handler
    - role: ceph-common
    - role: ceph-config
      tags: ['ceph_update_config']
    - role: ceph-mgr
  post_tasks:
    - name: set ceph manager install 'Complete'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_mgr:
            status: "Complete"
            end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"

- hosts: osds
  gather_facts: false
  become: True
  pre_tasks:
    - name: set ceph osd install 'In Progress'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_osd:
            status: "In Progress"
            start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"
  roles:
    - role: ceph-defaults
      tags: ['ceph_update_config']
    - role: ceph-handler
    - role: ceph-common
    - role: ceph-config
      tags: ['ceph_update_config']
    - role: ceph-osd
  post_tasks:
    - name: set ceph osd install 'Complete'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_osd:
            status: "Complete"
            end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"

- hosts: mgrs
  gather_facts: false
  become: True
  pre_tasks:
    - name: set ceph manager install 'In Progress'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_mgr:
            status: "In Progress"
            start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"
  roles:
    - role: ceph-defaults
      tags: ['ceph_update_config']
    - role: ceph-handler
    - role: ceph-common
    - role: ceph-config
      tags: ['ceph_update_config']
    - role: ceph-mgr
  post_tasks:
    - name: set ceph manager install 'Complete'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_mgr:
            status: "Complete"
            end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"

- hosts: rgws
  gather_facts: false
  become: True
  pre_tasks:
    - name: set ceph rgw install 'In Progress'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_rgw:
            status: "In Progress"
            start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"
  roles:
    - role: ceph-defaults
      tags: ['ceph_update_config']
    - role: ceph-handler
    - role: ceph-common
    - role: ceph-config
      tags: ['ceph_update_config']
    - role: ceph-rgw
  post_tasks:
    - name: set ceph rgw install 'Complete'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_rgw:
            status: "Complete"
            end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"

- hosts: nfss
  gather_facts: false
  become: True
  pre_tasks:
    - name: set ceph nfs install 'In Progress'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_nfs:
            status: "In Progress"
            start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"
  roles:
    - role: ceph-defaults
      tags: ['ceph_update_config']
    - role: ceph-handler
    - role: ceph-common
    - role: ceph-config
      tags: ['ceph_update_config']
    - role: ceph-nfs
  post_tasks:
    - name: set ceph nfs install 'Complete'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_nfs:
            status: "Complete"
            end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"

- hosts: mons
  gather_facts: false
  become: True
  tasks:
    - name: get ceph status from the first monitor
      command: ceph --cluster {{ cluster | default ('ceph') }} -s
      register: ceph_status
      changed_when: false
      delegate_to: "{{ groups['mons'][0] }}"
      run_once: true
      ignore_errors: true # we skip the error if mon_group_name is different than 'mons'

    - name: "show ceph status for cluster {{ cluster | default ('ceph') }}"
      debug:
        msg: "{{ ceph_status.stdout_lines }}"
      delegate_to: "{{ groups['mons'][0] }}"
      run_once: true
      when: not ceph_status.failed

And that's it, all configuration is done.

Step 7: Deployment

Now run the command below with the appropriate user and get you a coffee. Depending on your network speed and available resources on Ceph nodes, it might take between 15 and 40 minutes to get everything deployed.

ansible-playbook -i inventory site.yaml -e "ansible_user=ansiblemaintenance"

In some cases, e.g. because of network failure or some other reasons, deployment might fail. Obviously, if there are errors like "[...] unreachable", you might want to take a closer look at why it happens. One possible reason could be that your deployment user isn't able to login into hosts or run commands as sudo. Typing mistakes in configuration files or playbooks will also cause the failure of the deployment. After fixing the problem you can just re-run the command above to repeat the deployment.


After the deployment, you'll get a health report and a play recap. If everything looks green - you did great and your cluster is running.

TASK [show ceph status for cluster ceph] *****************************************************************************************
Thursday 15 August 2019  20:47:10 +0100 (0:00:01.304)       0:09:54.657 *******
ok: [ceph1.yourdomain.com -> ceph1.yourdomain.com] => {
    "msg": [
        "  cluster:",
        "    id:     7d903b20-0d8d-4dc8-saf7-fxxxxxxg0f18",
        "    health: HEALTH_OK",
        " ",
        "  services:",
        "    mon:     3 daemons, quorum ceph1,ceph2,ceph3",
        "    mgr:     ceph1(active), standbys: ceph2, ceph3",
        "    osd:     12 osds: 12 up, 12 in",
        "    rgw:     3 daemons active",
        " ",
        "  data:",
        "    pools:   5 pools, 416 pgs",
        "    objects: 196  objects, 3.6 KiB",
        "    usage:   12 GiB used, 10 TiB / 10 TiB avail",
        "    pgs:     416 active+clean",
        " ",
        "  io:",
        "    client:   138 KiB/s rd, 84 B/s wr, 141 op/s rd, 86 op/s wr",
        " "
    ]
}

To verify that, log in with SSH into one node and run ceph -s as root -- you'll get the same output as above.

Step 8: Enabling dashboard

Right now you're only able to see that your cluster is running in your terminal. With some basic commands, we can also set up the dashboard for Ceph. It's pretty useful since with the Nautilus release it became much more powerful: you're not only able to see things, but also configure them directly in the dashboard. You can see the whole feature overview here: Ceph Dashboard — Ceph Documentation

So, what we have to do is to enable the dashboard (it's probably already is) and configure it. You can read the configuration in the link above or just follow these steps (all commands have to be run on one of three nodes as root).

  • Enable it:
ceph mgr module enable dashboard
  • Create a self-signed certificate and enable it in the configuration
ceph dashboard create-self-signed-cert
ceph config set mgr mgr/dashboard/ssl false
  • Set the hostname and port. Since every manager node runs its instance of the dashboard, I want to be able to log in into the dashboard even if one node fails. You can set your port fort it.
ceph config set mgr mgr/dashboard/ceph1.example.com/server_addr 10.1.100.10
ceph config set mgr mgr/dashboard/ceph1.example.com/server_port 8443
ceph config set mgr mgr/dashboard/ceph1.example.com/ssl_server_port 8443
ceph config set mgr mgr/dashboard/ceph2.example.com/server_addr 10.1.100.11
ceph config set mgr mgr/dashboard/ceph2.example.com/server_port 8443
ceph config set mgr mgr/dashboard/ceph2.example.com/ssl_server_port 8443
ceph config set mgr mgr/dashboard/ceph3.example.com/server_addr 10.1.100.12
ceph config set mgr mgr/dashboard/ceph3.example.com/server_port 8443
ceph config set mgr mgr/dashboard/ceph3.example.com/ssl_server_port 8443
  • Create a dashboard admin user
ceph dashboard ac-user-create <username> <password> administrator
  • Create a Rados Gateway user
radosgw-admin user create --uid=<user_id> --display-name=<display_name> --system
ceph dashboard set-rgw-api-access-key <access_key>
ceph dashboard set-rgw-api-secret-key <secret_key>
  • Configure the RGW-API network
ceph dashboard set-rgw-api-host 10.1.100.10
ceph dashboard set-rgw-api-port 8080
ceph dashboard set-rgw-api-host 10.1.100.11
ceph dashboard set-rgw-api-port 8080
ceph dashboard set-rgw-api-host 10.1.100.12
ceph dashboard set-rgw-api-port 8080
  • Disable SSL verify for object gateway to avoid refused connections (for self-signed certificate only)
ceph dashboard set-rgw-api-ssl-verify False
  • Restart the dashboard module
ceph mgr module disable dashboard
ceph mgr module enable dashboard

Here we go, now navigate to one of your nodes in the browser (https://10.1.100.10:8443), log in and enjoy your dashboard.

Step 9: Important setting for high availability

That's all for the setup except one thing that cost us some time to find out the reason: if you power off one (or two) of your nodes, in the current state it will automatically recover. But while one of your nodes isn't available (like in failure or planned maintenance), you won't be able to perform any I/O on pools. The reason is that Ceph set the min_size for pools to 3 as a default value. This value describes the min number of available replicas required for I/O. In larger clusters (at least in a cluster with more then 3 nodes) it's not a problem, but if we just have three nodes, this will lead to a problem. So, in our case, we want to be able to perform I/O operations even if we just have one node available. What you need to do here is just to set the min_size for all your pools:

  1. List your pools (or just get them from the dashboard)
[email protected]~# ceph osd lspools
1 .rgw.root
3 default.rgw.control
4 default.rgw.meta
11 default.rgw.log
12 default.rgw.buckets.index
13 default.rgw.buckets.data
  1. Set the min_size for every pool
ceph osd pool set default.rgw.control min_size 1
ceph osd pool set default.rgw.meta min_size 1
ceph osd pool set default.rgw.log min_size 1
ceph osd pool set default.rgw.buckets.index min_size 1
ceph osd pool set default.rgw.buckets.data min_size 1
ceph osd pool set <pool name> min_size 1

Addition: NFS over RGW

Ceph provides the functionality to use NFS over Rados Gateway. It's a pretty nice feature if you're still using NFS in your environment. The part around NFS cost me the most time about all Ceph setup for reasons. For Ubuntu it's pretty easy to set it up, but since our original setup was based on Debian (what we use as our primary server OS in the infrastructure), it was a bit hard to get it to work for different reasons: missing repositories, incompatible repositories, unsigned packages (missing certificates) and some other stuff. But NFS isn't a primary Ceph module, so there is no point to complain about it. Except for one thing, the NFS setup is pretty easy.

In the inventry just add the NFS group and also add them to the children group:

[nfss]
ceph1.example.com
ceph2.example.com
ceph3.example.com
[all:children]
nfss

In the group_vars add the packages for NFS as well as some other settings:

nfs_ganesha_stable: true
nfs_ganesha_stable_branch: V2.7-stable
nfs_ganesha_stable_deb_repo: "[trusted=yes] https://chacra.ceph.com/r/nfs-ganesha-stable/V2.7-stable/2356c3867730696aacc31874357b3499062fc902/ubuntu/bionic/flavors/ceph_nautilus"
ceph_nfs_log_file: "/var/log/ganesha/ganesha.log"

We want to use stable packages, the latest release (it's 2.7) and write log files. Currently, there are no signed packages for NFS modules of Nautilus version of Ceph for Ubuntu Bionic, so a simple workaround here is to use the development packages. Check out the repository before you deploy, maybe there are already signed packages available: Index of /nfs-ganesha/deb-V2.7-stable/nautilus/dists/.

Also, create a new file in the group_vars folder called nfss.yaml and configure the NFS module. These settings are self-explanatory.

---

####################
##### GENERAL ######
####################

ceph_nfs_enable_service: true
nfs_obj_gw: true
nfs_file_gw: false # IMPORTANT!
ceph_nfs_dynamic_exports: true # must be set true for failover!
ceph_nfs_rados_backend: true
ceph_nfs_rados_export_index: "ganesha-export-index"

###################
# FSAL RGW Config #
###################
ceph_nfs_rgw_export_id: 20134
ceph_nfs_rgw_pseudo_path: "/cephobject"
ceph_nfs_rgw_protocols: "3,4"
ceph_nfs_rgw_access_type: "RW"
ceph_nfs_rgw_user: "cephnfs"
ceph_nfs_rgw_squash: "Root_Squash"
rgw_client_name: client.rgw.{{ ansible_hostname }}

And finally, also add the code block to your site.yaml playbook.

- hosts: nfss
  gather_facts: false
  become: True
  pre_tasks:
    - name: set ceph nfs install 'In Progress'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_nfs:
            status: "In Progress"
            start: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"
  roles:
    - role: ceph-defaults
      tags: ['ceph_update_config']
    - role: ceph-handler
    - role: ceph-common
    - role: ceph-config
      tags: ['ceph_update_config']
    - role: ceph-nfs
  post_tasks:
    - name: set ceph nfs install 'Complete'
      run_once: true
      set_stats:
        data:
          installer_phase_ceph_nfs:
            status: "Complete"
            end: "{{ lookup('pipe', 'date +%Y%m%d%H%M%SZ') }}"

That's it. If your cluster is already running, you can just re-deploy it with new settings for the NFS module.