Introducing Phoenix-CI as a low-cost Gitlab CI alternative to K8s

Introducing Phoenix-CI as a low-cost Gitlab CI alternative to K8s

We use Gitlab as our favorite tool for all Git-related operations for years now and seeing it growing in terms of features and simplified workflows.‌‌ When we started to use Gitlab, we still used Jenkins as our CI tool of choice but as soon as Gitlab got better at this, we switched over to gitlab-runners on bare-metal root servers from Hetzner back then.

Over the following months and years we hit certain performance walls while only using a single root server for CI with two gitlab-runner instances for Docker and Shell environment CI jobs.‌‌ We've seen issues with Docker jobs running at the same time resulting in weird errors and the CI itself was much slower when running multiple jobs at once.

Beginning of 2019 I had an idea to improve the situation and back then there was no good alternative while running the CI on Kubernetes was also not as mature as it is today.

So, I invented our own low-cost Kubernetes alternative for our Gitlab CI.‌‌ Phoenix-CI was born. We have [open-sourced it on Github](https://github.com/viafintech/phoenix-ci) under the MIT license.

So how does it actually work?

Well, sit down and get a tea. I had to put puzzle pieces together to make it work.

There was no support for Phoenix-CI in Gitlab, obviously, so I had to imagine what my final result should look like and find a way to make it work.

The idea was to use the Hetzner Cloud servers instead of static root servers, so we can scale our gitlab-runner instances if we need more power. So far, so easy. I decided against the usual suspects namely AWS, GCP and Azure back then for several reasons. We test our internal source code on the CI, so I wanted to have it tested within german borders and at best on servers from a german company due to regulation and legal concerns. The second reason was - and still is - the Hetzner Cloud is way cheaper than the large players and they also have decent power and very good network connectivity. So this sounded perfect for us.

I don't want to bore you with the details on how I found the solution to make all that work with Gitlab, so here is how Phoenix-CI does it's job.

Phoenix-CI is basically just a python script and some logic packed into cloud-init config files, the .gitlab-ci.yml and Gitlab CI Schedules jobs.

The python script gets information how much runners for a specific kind you would like to scale up or down and it also takes the cloud-init config to configure the freshly spawned Hetzner Cloud instance and make it connect to our Gitlab CI, so it can be used as a shared runner by CI jobs.

Phoenix-CI currently uses Gitlabs CI Schedules to scale up or down instances at a specific time, e.g. scale up new instances in the morning on working days and scale them down again in the evening to reduce costs. This is not yet real auto-scaling based on load but at least you scale it according to business hours and already save a lot of money when you only have one or two instances at night or on weekends and much more during business hours when you really need it.

Phoenix-CI uses the Hetzner Cloud API to spawn new instances, configure them how you need them and will also unregister them again from Gitlab and remove the instances when you don't. For the latter work, the cloud-init config also adds the Phoenix-CI Coordinators SSH public key, so it can connect to the machine and unregister the gitlab-runner before the instance is being deleted.

Since you can define as much Schedules as you want, you can really individualize how Phoenix-CI should behave for you.

Compare the costs!

Back in 2019, we used a Hetzner EX41S-SSD root server with a Core i7-6700, 64GB of RAM and 512GB SSD storage with 1Gbit/s network bandwidth. We had the concurrent CI jobs limit set to 2, otherwise every new job would be too slow. This machine had cost us about 54€ per month.

When we migrated over to Hetzner Cloud we started with 9 CX21 cloud instances, costing 47,52€ per month when running 24/7. But we only had them running during working days from 7am till 8pm. We reduced the running instances during the night and weekends to only 2 in case we needed a CI machine for an emergency run.

This reduced our costs dramatically towards ~30€ per month and as a result we roughly saved ~45% of costs per month compared to the root server while being able to handle much more load and having way less errors since each instance was only handling one job at a time, thus there where no resource sharing involved within the instance. We also completely removed operational costs of maintaining the root server.

We also calculated the costs of using Kubernetes on AWS for Gitlab CI. To make it short, it's more expensive and why should we pay additional operational costs by maintaining K8s when you don't need to and when you can keep the CI simple!?

Sounds nice? Setup Phoenix-CI now!

Phoenix-CI just needs an already available gitlab-runner with a Shell executor to run it's code. For that reason, simply [install a gitlab-runner](https://docs.gitlab.com/runner/install/) on the Gitlab host and [register it in Gitlab](https://docs.gitlab.com/runner/register/). Make sure that Python 3 is installed there and that the python3-virtualenv and virtualenv packages are available.

Phoenix-CI Coordinator

After that, please login to the gitlab-runner user on the server and generate an ed25519 SSH keypair to use it for Phoenix-CI. As this user, simply run ssh-keygen -t ed25519 and press enter when asked for a location or password for the private key. When done, copy the contents of the file ~/.ssh/id_ed25519.pub for a later step. If your first runner runs on Debian, make sure to delete the .bash_logout from the gitlab-runner user in case you see a Profile load error in your first run later.

Now clone the [Phoenix-CI repository](https://github.com/viafintech/phoenix-ci) down to your own Gitlab instance. You can also mirror the code if you want to stay up to date.

Now go into the project settings -> CI/CD -> Runners and disable both Shared Runners and Group Runners for this project. Next, head to the Admin UI by clicking the wrench icon in the top menu and then click on Runners in the left menu. In there, edit the Phoenix-CI Coordinator runner and check the box that this runner can run untagged jobs. Further down below find the cloned Phoenix-CI project and enable this runner to only run jobs from this project.

Limit runner to project

Now head back to the Phoenix-CI project and go into Settings -> CI/CD -> Variables this time.
In there create 4 new variables as stated in the screenshot. If you need more explanations for the variables please check out the [README.md](https://github.com/viafintech/phoenix-ci/blob/master/README.md) of the Phoenix-CI project.

The CI_MASTER_SSHKEY is basically the ed25519 public key that you created in the first step and copied it later. In CI_REGISTRATION_TOKEN and CI_REGISTRATION_URL you put in the details you can find in the Gitlab Admin area under Runner. The HCLOUD_TOKEN is the API token you generated in your Hetzner Cloud Phoenix-CI project. Please also mask these 2 token variables, so they won't show up in the CI runs of this project for scaling the instances.
You can find more details on how to generate the Hetzner Cloud API token [here](https://docs.hetzner.cloud/#overview-getting-started).

With all that, you are done with the basic setup of Phoenix-CI.
In the next step we only define when and how many runner instances you want to scale up or down.

Define Phoenix-CI Schedules

Since the basic Phoenix-CI setup is done, let's go into the Phoenix-CI project under CI/CD -> Schedules and define a new schedule there. Give it a name like Phoenix-CI Scale-up 7am business days.
Choose a custom interval pattern and put in something like this 0 7 * * 1-5. Also define the correct timezone for your location. Then add some more variables Phoenix-CI needs, so it does know how many runners of which type it has to scale up with the variables CI_DOCKER_RUNNER and CI_SHELL_RUNNER.
If you don't need shell runners, you can simply add 0 to the variables value. Also add a variable called CI_RUN with the value 1. This tells our .gitlab-ci.yml to not run the tests but to actually do some scaling.

At this stage you can also override the Phoenix-CI default Hetzner Cloud instance it would spawn - the CX21 - with the CI_SERVER_TYPE variable.

That's it already for the first schedule!

Create additional schedules for more flexibility

As said before, you can now fully customize how much instances you want to have at certain times. Create additional schedules as you like now using the same parameters. Phoenix-CI always takes the desired amount of workers that it should have "ready" to run CI jobs. If you have scaled up 5 servers in the morning and you tell Phoenix-CI to scale to 2 in the evening, it will remove the 3 oldest instances - sorted by creation date.

To remove all instances you can simply tell Phoenix-CI to scale to 0. Like this.

cleanup

Scale them up!

Now it's time to scale up your runners! Don't wait for the schedule to run automatically. Click the "play" button and see Phoenix-CI in action!

first run

Bam! The first 3 instances have been created. While the job is done, the instances got their configuration through [cloud-init config](https://cloudinit.readthedocs.io/en/latest/topics/examples.html). It will now take just a couple of minutes (mostly 1 to 2 only) to have them ready in Gitlab. It will look like this then in the Runner overview in Gitlabs Admin area.

My first job!

Of course, after all that you want to see your new Phoenix-CI in action, don't you?
Just create a new empty project and add an empty .gitlab-ci.yml with the following contents.

examplejob

If you don't have any other runners, it will pick a docker runner and hopefully show you that Phoenix-CI rocks!

Conclusion

We've been using Phoenix-CI now since about May 2019 until today very successfully and very stable. We seldomly had issues with it and when we had them, they came from Docker changing settings or bugs within the gitlab-runner. Phoenix-CI and the whole scaling process is pretty rock-solid and reliable.

One other advantage is that by default, when you scale up new instances, these will automatically use the latest stable gitlab-runner version available, so you always have the most bug-free versions ins place.

By open-sourcing Phoenix-CI we want to give something back to the community and we want to enable everyone to have a decent CI, without paying too much for it. If you like this project also feel free to contribute.

The sensitive information that is shown in the pictures is of course only from a demo setup I've created for this blogpost. Always keep your own credentials safe!

Bonus

If you read until here, I will tell you why it's named "Phoenix". When we rewrote large parts of our internal software in 2013/2014, we used the term "Phoenix" to differentiate between our old "Core" software and this new one. Likewise "Phoenix-CI" was named like that to show that this is our next-gen CI solution we have.