This article shows you how to install and use UFW on your Ubuntu 20.04 LTS system.
UFW comes pre-installed on most Ubuntu systems. If your build does not have this program already installed, you can install it using either the snap or the apt package managers.$ sudo snap install ufw
I personally prefer using the apt package manager to do this because snap is less popular and I don’t want to have this extra complexity. At the time of this writing, the version published for UFW is 0.36 for the 20.04 release.
If you are a beginner in the world of networking, the first thing you need to clarify is the difference between incoming and outgoing traffic.
When you install updates using apt-get, browse the internet, or check your email, what you are doing is sending “outgoing” requests to servers, such as Ubuntu, Google, etc. To access these services, you do not even need a public IP. Usually, a single public IP address is allocated for, say, a home broadband connection, and every device gets its own private IP. The router then handles the traffic using something known as NAT, or Network Address Translation.
The details of NAT and private IP addresses are beyond the scope of this article, but the video linked above is an excellent starting point. Coming back to UFW, by default, UFW will allow all regular outgoing web traffic. Your browsers, package managers, and other programs pick a random port number – usually a number above 3000 – and that is how each application can keep track of its connection(s).
When you are running servers in the cloud, they usually come with a public IP address and the above rules of allowing outgoing traffic still hold. Because you will still use utilities, like package managers, that talk to the rest of the world as a ‘client,’ UFW allows this by default.
The fun begins with incoming traffic. Applications, like the OpenSSH server that you use to login to your VM, listen on specific ports (like 22) for incoming requests, as do other applications. Web servers need access to ports 80 and 443.
It is part of the job of a firewall to allow specific applications to listen in on certain incoming traffic while blocking all the unnecessary ones. You may have a database server installed on your VM, but it usually does not need to listen for incoming requests on the interface with a public IP. Usually, it just listens in on the loopback interface for requests.
There are many bots out in the Web, which constantly bombard servers with bogus requests to brute force their way in, or to do a simple Denial of Service attack. A well-configured firewall should be able to block most of these shenanigans with the help of third-party plugins like Fail2ban.
But, for now, we will focus on a very basic setup.
Now that you have UFW installed on your system, we will look at some basic uses for this program. Since firewall rules are applied system-wide, the below commands are run as the root user. If you prefer, you can use sudo with proper privileges for this procedure.
By default, UFW is in an inactive state, which is a good thing. You do not want to block all incoming traffic on port 22, which is the default SSH port. If you are logged into a remote server via SSH and you block port 22, you will be locked out of the server.
UFW makes it easy for us to poke a hole just for OpenSSH. Run the below command:
Notice that I have still not enabled the firewall. We will now add OpenSSH to our list of allowed apps and then enable the firewall. To do so, enter the following commands:
The command may disrupt existing SSH connections. Proceed with operation (y|n)? y.
The firewall is now active and enabled on system startup.
Congratulations, UFW is now active and running. UFW now allows only OpenSSH to listen in on incoming requests at port 22. To check the status of your firewall at any time, run the following code:
As you can see, OpenSSH can now receive requests from anywhere on the Internet, provided it reaches it on port 22. The v6 line indicates that the rules are applied for IPv6, as well.
You can, of course, ban particular ranges of IP, or allow only a particular range of IPs, depending on the security constraints you are working within.
For the most popular applications, the ufw app list command automatically updates its list of policies upon installation. For example, upon installation of the Nginx web server, you will see the following new options appear:
Go ahead and try experimenting with these rules. Note that you can simply allow port numbers, rather than waiting for an application’s profile to show up. For example, to allow port 443 for HTTPS traffic, simply use the following command:
Now that you have the basics of UFW sorted, you can explore other powerful firewall capabilities, starting from allowing and blocking ranges of IP. Having clear and secure firewall policies will keep your systems safe and protected.
]]>So far so good. Each Service can talk to another service. This communication is possible across the entire Kubernetes cluster
“If a tree falls in a forest and no one is around to hear it, does it make a sound?”
On a similar note, if your application doesn’t serve a purpose outside the Kubernetes cluster, does it really matter whether or not your cluster is well built? Probably not.
To give you a concrete example, let’s say we have a classical web app composed of a frontend written in Nodejs and a backend written in Python which uses MySQL database. You deploy two corresponding services on your Kubernetes cluster.
You make a Dockerfile specifying how to package the frontend software into a container, and similarly you package your backend. Next in your Kubernetes cluster, you will deploy two services each running a set of pods behind it. The web service can talk to the database cluster and vice versa.
However, Kubernetes doesn’t expose any of these services (which are essential HTTP endpoint) to the rest of the world. As stated in the official docs:
“Services are assumed to have virtual IPs only routable within the cluster network”
This is perfectly reasonable from a security standpoint, your services can talk to one another, but the cluster won’t allow outside entities to talk to the services directly. For example, only your web frontend can talk to the database service, and no one else can even to send requests to the database service.
The problem arises when we look at the use case of a frontend service. It needs to be exposed to the rest of the Public so end users can use your application. We expose such Services using Kubernetes Ingress.
Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. You can control the routing rules by defining the Kubernetes Ingress resource. But it does a lot more than that. Exposing a single Service can be achieved using various other alternatives like NodePort or Load Balancers but these facilities don’t have features that are sophisticated enough for a modern web app.
Features like, exposing multiple apps on a single IP, defining routes, etc.
So let’s understand these features for the remaining of the article:
This is the simplest version of exposing a single service like a web frontend with an IP (or a domain name) and default HTTP and HTTPS ports (i.e, 80 and 443).
This is an ingress setup that allows you to allow incoming traffic to a single IP and route it to multiple services.
It consists of:
Single fanout is the case where a single IP is used for multiple services. The services can be at different paths in the URI like foo.bar.com/admin can be a service for administrators and foo.bar.com/home can be the service that generates each users home page.
The ingress port will always be 80 or 443, but the port where the services are running (inside the cluster) may differ quite a bit.
This kind of ingress helps us minimize the number of load balancers in the cluster, since it essentially acts like one.
Public IP addresses are finite. They are also quite expensive. The idea of name based virtual hosting is older than Kubernetes. The gist of it is that, you point the DNS records for different websites like ww1.example.com and ww2.example.com to the same IP address. The server running at that IP address will see the incoming request, and if the host name mentioned in the request is for ww1.example.com then it serves that website for you, and if ww2.example.com is requested, then that is served.
In the context of Kubernetes, we can run two services running at, say, port 80 and expose both of them on a single IP address using an ingress also of port 80. At the ingress point the traffic of ww1.example.com will get separated from the traffic for ww2.example.com. Hence the term name based virtual hosting.
Ingress in Kubernetes is quite sophisticated to be covered in a single post. There are a variety of use cases for it, and a variety of Ingress Controllers that will add the Ingress functionality to your cluster. I would recommend starting with Nginx Ingress Controller.
For further details and specifications you can also follow the official documentation. ]]>
With the container ‘revolution’ apps has grown much more than being just a database and a frontend. Applications are split into various microservices and they typically communicate with one another via a REST API (typically JSON formatted payloads over HTTP). Docker containers are ideal for this kind of architecture. You can package your frontend ‘microservice’ into a Docker container, the database goes into another, and so on and so forth. Each service talk to another over a predefined REST API instead of being a monolith written as a single piece of software.
If you need to implement a new functionality or a feature, e.g, an analytics engine, you can simply write a new microservice for that and it would consume data via the REST API exposed by the various microservices of your web app. And as your functionality grows over time, this list of microservices will grow along with it as well.
You don’t want to deploy each individual container, configure it and then configure everything else to talk to it as well. That will get tedious with even three containers. Docker-Compose lets you automate the deployment of multiple containers.
Docker-Compose is one of the simplest tools that helps you transform the abstract idea of microservices into a functional set of Docker container.
Now that we have split open the web app into multiple containers, it makes little sense to keep them all on a single server (worse still on a single virtual machine!) that’s where services like Docker Swarm and Kubernetes come into play.
Docker Swarm allows you to run multiple replicas of your application across multiple servers. If your microservice is written in a way that it can scale ‘horizontally’ then you can use Docker Swarm to deploy your web app across multiple data centers and multiple regions. This offers resilience against the failure of one or more data centers or network links. This is typically done using a subcommand in Docker, that is, Docker Stack.
The Docker Stack subcommand behaves a lot more like the Docker-Compose command and that can lead to misconceptions to someone using either of the technologies.
In terms of usage and workflow, both the technologies work very similar to one another, and this causes confusion. The way you deploy your app using either Docker Swarm or Docker-Compose is very similar. You define your application in a YAML file, this file will contain the image name, the configuration for each image and also the scale (number of replicas) that each microservice will be required to meet in deployment.
The difference lies mostly in the backend, where docker-compose deploys container on a single Docker host, Docker Swarm deploys it across multiple nodes. Loosely speaking, it can still do most things that docker-compose can but it scales it across multiple Docker hosts.
Both Docker Swarm and Docker-Compose have the following similarities:
The few differences between Docker Swarm and Docker-Compose:
As described above, they are both completely different tools and each solves a completely different problem so it is not like one is an alternative for the other. However, to give new comers a sense of what I am talking about, here’s a use case for Docker Compose.
Suppose you want to self-host a WordPress Blog on a single server. Setting it up or maintaining it not something you want to do, manually, so what you would do instead is install Docker and Docker-compose on your VPS, create a simple YAML file defining all the various aspects of your WordPress stack, like below, :
Note: If you are using the below to deploy a WordPress site, please change all the passwords to something secure. Better yet, use Docker Secrets to store sensitive data like passwords, instead of having it in a plain text file.
Once the file is created and both Docker and Docker-compose are installed, all you have to do is run:
And your site will be up and running. If there’s an update, then run:
Then throw away the old Docker images and run the docker-compose up -d command and new images will automatically be pulled in. Since you have the persistent data stored in a Docker Volume, your website’s content won’t be lost.
While Docker-compose is more of an automation tool, Docker Swarm is meant for more demanding applications. Web apps with hundreds or thousands of users or workload that needs to be scaled parallelly. Companies with large user base and stringent SLA requirements would want to use a distributed system like Docker Swarm. If your app is running across multiple servers and multiple data centers then chances of downtime due to an affected DC or network link gets significantly reduced.
That said, I hesitate to recommend Docker Swarm for production use cases because competing technologies like Kubernetes are arguably more fitting for this task. Kubernetes is supported natively across many cloud providers and it works quite well with Docker Containers so you don’t even have to rebuild your app to take advantage of Kubernetes.
I hope that this rambling on Docker and its satellite projects was informative and you are more prepared for the docker ecosystem.
]]>Horizontal scaling refers to spinning up more computers, i.e, VMs, containers or physical servers in order to accommodate any surge in demands. This is in contrast to scaling ‘vertically’, which usually refers to replacing a slower machine (with smaller memory and storage) with a faster ‘larger’ one.
With the containers scaling of both kinds has become very dynamic. You can set quotas for specific applications setting the amount of CPU, memory or storage that they may have access to. This quota can be changed to scale up or down as needed. Similarly, you can scale horizontally by spinning up more containers that will accommodate an uptick in demand, and later scale down by destroying the excess of containers you created. If you are using cloud hosted services that bills you by the hour (or minute) then this can substantially reduce your hosting bills.
In this article we will focus only on horizontal scaling which is not as dynamic as the above description, but it is a good starting point for someone learning the basics. So let’s start.
When you start your application stack by passing your compose file to the CLI docker-compose you can use the flag –scale to specify the scalability of any particular service specified in there.
For example, for my docker-compose file:
Here, the service is called web in the yml declaration but it can be any individual component of your deployment, i.e, web front-end, database, monitoring daemon, etc. The general syntax requires you to pick one of the elements under the top-level services section. Also depending on your service, you may have to modify other parts of the script. For example, the 80-85 range of host ports are given to accomodate 5 instances of Nginx containers all listening on their internal port 80, but the host listens on ports ranging from 80-85 and redirects traffic from each unique port to one of the Nginx instances.
To see which container gets which port number you can use the command:
To scale more than one service, you need to mention them individually with the scale flag and number parameter to ensure that the desired number of instances are created. For example, if you have two different services you need to do something like this:
This is the only way to do this, since you can’t run the docker-compose up –scale command twice one for each service. Doing so would scale the previous service back to a single container.
Later we will see how you can set scale value for a given image, from inside the docker-compose.yml. In case there’s a scale option set in the file, the CLI equivalent for the scale option will override the value in the file.
This option was added in docker-compose file version 2.2 and can technically be used, although I don’t recommend using it. It is mentioned here for completeness sake.
For my docker-compose.yml file:
This is a perfectly valid option. Although it works for Docker Engine 1.13.0 and above.
Instead of using the scale command or the outdated scale value in your compose file you should use the replica variable. This is a simple integer associated with a given service and works pretty much the same way as the scale variable does. The crucial difference is that Docker Swarm is explicitly meant for distributed system.
This means you can have your application deployed across multiple nodes VMs or physical servers running across multiple different regions and multiple different data centers. This allows you to truly benefit from the multitude of service instances that are running.
It allows you to scale your application up and down by modifying a single variable moreover it offers greater resilience against downtime. If a data center is down or a network link fails, the users can still access the application because another instance is running somewhere else. If you spread your application deployment across multiple geographical regions, e.g, EU, US and Asia Pacific it will reduce the latency for the users trying to access your application from the said region.
While docker-compose scale is useful for small environments like a single Docker host running in production. It is also very useful for developers running Docker on their workstation. It can help them test how the app will scale in production, and under different circumstances. Using scale command circumvents the hassle of setting up a new Docker Swarm.
If you have a Docker Swarm instance running, feel free to play around with replicas. Here’s the documentation on that matter, ]]>
MySQL client can be any remote application like phpMyAdmin or your custom web app or MySQL’s own command line client which is also named just mysql.
Setting up MySQL server is often tedious, you have to set up user account, open ports, set passwords, create databases and tables, etc. In this post, I will try and minimize some of your miseries by making a simple MySQL deployment using Docker-Compose. If this is your first time dealing with compose, here’s a quick tutorial on it and while you are at it, you will want to know about more about Docker volumes too. These are used to store persistent data for applications like MySQL.
Disclaimer: In no way is this compose file “production ready”. If you want to run a MySQL database in production, you will have to tighten up the security quite a bit more. This will include locking down the root account, setting up TLS, and setting stricter permissions on various databases for various database users, etc.
First ensure that Docker is installed on your workstation or server. To run a simple MySQL service, first create a new folder on your Docker host. Name it MySQLCompose:
Create a file docker-compose.yml in it using your favorite text editor, and write the following:
Then run the following command from inside the same directory:
This with the above compose file two new containers will be created, first will be the database service, and the second will be an adminier container which will act as a frontend for database management.
Although the communication between the adminier container and the MySQL service is over TCP using port 3306, we don’t have to open any ports on our database. This is because docker containers on a bridge network can talk to one another on any port (except on the default bridge network of a docker host). You can list the docker network by using the command docker network ls and it will show you that a new network has indeed been created.
Visit http://localhost:8080 and log in as root using the password UseADifferentPassword and you will get a very simple UI to interact with your MySQL. MySQL can be configured to be authenticated in a variety of ways, however, we have opted to use just the mysql_native_password as an authentication method. You can pass the MySQL root password using via an environment variable, as shown in the yml file itself.
NOTE: For the sake of clarity, I mentioned important credentials like the MySQL root password and other user passwords in plain text, here. This is obviously a security risk. The proper way to do this would be to use Docker secrets, but that’s a topic for another day.
WordPress is perhaps the classic example for highlighting the strengths and nuances of docker-compose. Like most regular installation of WordPress, the Docker variant also uses MySQL for its backend database. However, the database is run as a different container where as the web server (along with the application WordPress) run on another container.
Here’s a snippet from the official documentation of docker-compose regarding its setup.
This will create a WordPress website open at port 8000 of your Docker host. You can see that the services section defines two services under it:
First, the MySQL database with a named volume to store persistent data and some environment variables to setup MySQL user, database and password.
Second, the WordPress container which has a webserver, php and WordPress all installed with it. It needs to talk to the database (available at db:3306 internally), it exposes the port 80 internally to the rest of the world via the Docker host’s port 8000. It also has a few environment variables defining where to find the database (db:3306), along with the database name, username and password that we defined on the MySQL service.
Hopefully, the above few examples illustrate how to configure a MySQL container. The underlying idea is that you pass your database name and other configuration details as environment variables. You can always refer to the description provided at Docker Hub and then you can configure MySQL for your own application. ]]>
Docker containers are meant to be a drop-in replacement for applications. They are meant to be disposable and easy to replace. This property is, in fact, the cornerstone of many CI/CD pipeline. When a change is made pushed to your source repository that triggers a chain of events. Docker images are automatically built, tested and (sometimes) even deployed right into production, replacing the older versions seamlessly.
But there’s often persistent data that needs to be preserved between different releases of your application. Examples include databases, configuration files for your apps, log files, and security credentials like API keys and TLS certificates.
To allow all this data to persist we will use Docker Volumes which are just parts of Docker Host’s filesystem (a directory or a block device formatted with a filesystem) that can be mounted inside a container at any desired location of the container’s filesystem.
To ensure that we are all on the same page, here’s the version of Docker runtime and Docker-Compose that I am using:
Working with Compose is really straight-forward. You write a yaml file describing your deployment and then run deploy it using the docker-compose cli. Let’s start with a simple Ghost CMS deployment.
Create a directory called ComposeSamples and within it create a file called docker-compose.yaml
This compose file declares a single service that is web which is running the latest image of ghost CMS from Docker Hub’s official repository. The port exposed is 2368 (more on this in a little later) and a volume is then a volume called cms-content mounted at /var/lib/ghost/content you can read about your particular application and its nuances by looking up that apps documentation. For example, Ghost container’s default port 2368 and default mount point for the website’s contents /var/lib/ghost/content are both mention it the container’s official documentation.
If you are writing a new application of your own, think about all the persistent data it will need access to and accordingly set the mount points for your Docker volumes.
To test that the persistent volume works, try this:
The syntax to introduce a volume using docker-compose is pretty straightforward. You start with something akin to a container, and mention the name of the volume that you want to mount inside it. If you don’t mention a name, then you can go for a lazy syntax like below:
If you want to be a bit more verbose, then you will have to mention the Docker Volume as a top level definition:
Although the latter version requires you to type more, it is more verbose. Choose relevant name for your volumes, so your colleagues can understand what’s been done. You can go even further and mention the type of volume (more on this later) and point out source and target.
Bind mounts are parts of the host file system that can be mounted directly inside the Docker container. To introduce a bind mount, simply mention the host directory you want to share and the mount point inside the Docker container where it ought to be mounted:
I used the path /home/<USER>/projects/ghost as just an example, you can use whatever path on your Docker host you want, provided you have access to it, of course.
You can also use relative paths by using $PWD or ~, but that can easily lead to bugs and disasters in the real-world scenarios where you are collaborating with multiple other humans each with their own Linux environment. On the flip side, sometimes relative paths are actually easier to manage. For example, if your git repo is also supposed to be your bind mount using dot (.) to symbolize current directory may very well be ideal.
New users cloning the repo and clone it anywhere in their host system, and run docker-compose up -d and get pretty much the same result.
If you use a more verbose syntax, this is what your compose file will contain:
To organize your applications such that the app is separate from the data can be very helpful. Volumes are sane ways to accomplish just that. Provided that they are backed up, and secure, you can freely use to use the containers as disposable environments, even in production!
Upgrading from one version of the app to the next or using different versions of your app for A/B testing can become very streamlined as long as the way in which data is stored, or accessed is the same for both versions.
]]>Microsoft recently announced that they will soon be shipping a Linux Kernel that’s integrated right into Windows 10. This will allow developers to leverage the Windows 10 platform when developing applications for Linux. In fact, this is the next step in the evolution of Windows Subsystem for Linux (WSL). Let’s review WSL version 1 before getting into the knitty-gritty of Linux kernel intergation and what it means.
If you want to get started with WSL (v1) here is a guide for that.
The Windows Subsystem for Linux should really be called a Linux subsystem for Windows. It offers a driver (a subsystem) for the Windows OS, that translates Linux system calls into native Windows 10 system calls that the NT kernel understands.
This creates a somewhat believable illusion for Linux binaries that they are, in fact, running on top of a Linux kernel! This works well enough that not only can you run simple 64-bit binaries compiled for Linux, but you can run an entire Userland (a.k.a a Linux distribution like Debian or Ubuntu) on top of WSL v1. So when you install Ubuntu from Microsoft store, it just fetches the Ubuntu userland binaries that Canonical ships for the Microsoft’s WSL v1 enviroment.
However, WSL v1 is still far from perfect. Certain semantics that a Linux system would expect are totally unacceptable on Windows. For example, you can’t change the file name of an open file on Windows, but you can do that on Linux.
Other obvious examples include WSL v1’s poor filesystem performance and its inability to run Docker.
This new version of WSL solves all of these problems by including the Linux kernel running in the Hyper-V hypervisor. Throughout their announcement and demos, Craig Loewen and his colleague from Microsoft stressed on the point that, moving forward, Microsoft is going to invest hugely in virtualization technology.
Using a Linux Kernel solves all the semantic issues that can’t be solved using just a system call translation layer. This Linux kernel is going to be very light-weight and will be maintained by Microsoft with all the changes to the Linux Kernel made open source. In their demo, they showed how you can just migrate the existing WSL v1 apps to run unmodified into a WSL v2 environment. The file system performance has improved by 3-4 times and the whole system feels a lot more responsive.
So basically, we have a Linux VM with a lot of userland apps, e.g, OpenSUSE or Debian or Ubuntu userland. You get more than one distro on your Windows machine, with a single Linux VM, so that’s neat. But it also implies that every time you open your WSL Ubuntu terminal, you are booting a complete VM! That’s going to take a while, right?
Actually, no. Linux kernel is actually small enough and lightweight enough that Microsoft was able to get insane boot times with it (~ 1second). They achieved this by removing all the bootloader code and since it is a VM, they directly load the Linux kernel into its address space in the memory and set up a few states of the VM that the kernel expects. This is the same mechanism that powers another new Windows 10 feature that is Windows Sandbox.
It has a very small memory footprint, it loads up in a second and the native ext4 filesystem format allows it to deliver a very smooth user experience. The VM only runs when you need it.
Moreover, it is not completely isolated like a traditional VM. You can very easily interact with the rest of the Windows system, including files in your Windows drives using WSL v2. It is not an isolated VM but an integrated part of Windows 10. How does it achieve that?
With WSL v1, accessing files and directories on your Windows guests was trivial. Your Linux userland is just an app on Windows, so it can read and write files originally belonging to the native OS pretty easily.
With WSL v2, you have a VM running with its virtual harddisk (formatted with ext4, of course) and if we want a similar experience like that of WSL v1, we need some extra mechanisms. Enter Plan 9.
Plan 9, also written as, 9P is an OS originally developed at Bell Labs. While it is unlikely that you will ever find it running in production, it still lives on as various other operating systems adopt interesting ideas that are incubated in it, including the 9P Protocol for a file server.
WSL v2 will have a 9P protocol server running on your Windows 10 host, and a 9P client running inside your WSL apps. This allows you to access Windows 10 files natively inside your WSL environment. The C: drive will be mounted at /mnt/c, just like WSL v1, and every file inside the Windows 10 host can be reached by the WSL environment.
The converse will also be true. There will be a 9P protocol server running in your Linux environment with its corresponding client on Windows 10 host. This will allow users to access their Linux (ext4) file system from the Windows 10 environment. Allowing you to edit your source code or config files using your favorite IDE installed on Windows, listing all the directories using Windows 10 file explorer, and a lot more. Essentially, you will be running your favorite Linux distros with the Windows 10 UI.
In their announcement, Microsoft also hinted that this new environment will also be used by Docker to ship their future Docker for Windows apps. Since there’s a Linux kernel, running Docker on top of it, is going to be quite easy. In the demo, they ran Docker on top of Ubuntu running WSL v2 and it worked as if it is running on a native Ubuntu installation.
To give a very superficial example, if you have worked with Dockerfiles on a Windows system, you must have noticed the security issues that are caused because of the lack of Unix like file permissions. That will no longer be an issue. Docker on Windows already uses a custom VM to provide Linux containers, presumably, it will now use WSL v2 to use the Linux Kernel that microsoft ships.
Overall, I am very impressed with what Microsoft is doing to provide a haven for Linux developers who also want to use Windows. Hopefully, over the long run, it will encourage a lot of cross-pollination between the two different ecosystems.
If you are running Windows 10 Home, Pro or enterprise edition you can get a taste of WSL v2 by opting in for the Preview builds of Windows 10. Here’s a guide on how to do that. ]]>
However, it does get tiresome to reinstall the operating system inside your VM over and over again. It hinders your workflow and, therefore, you need a reliable way to:
I have previously discussed how snapshots work in VirtualBox and this time I wanted to discuss snapshots within Libvirt. I will be using QEMU-KVM as the backend hypervisor for my Libvirt installation. Your case might differ, but the overall functionality and interface should not be very different, since libvirt tries its best to standardize the frontend interface.
If you are not familiar with libvirt and qemu-kvm, here’s a guide on how you can setup KVM on Debian.
There are several ways with which you can take and manage snapshots of your VM. GUI applications like virt-manager and oVirt offers the functionality and you can even write custom scripts to interface with libvirt API that manages the entire range of snapshots for you.
However, I will be using virsh command line interface to show how you can manage your VMs and their snapshots. This utility comes with almost all default libvirt installations and should be available across a wide range of distributions.
For the commands below make sure to replace the name of my VM, VM1, with the actual name of your VM. Libvirt often refers to virtual machine and containers as Domains. So if you see an error message suggesting,say, “specify domain name”, you need to supply your VM’s name as one of the arguments to the command. Use the following command to list all the VMs under Libvirt’s management.
To take a snapshot of a VM simply run:
And to list all the snapshots of a given VM use the command:
You can see that the snapshot is created. By default, the name of the snapshot is its creation time stamp (the number of seconds since UNIX epoch). The Creation Time column shows the time of creation in a human readable fashion and the State column shows the state of the VM when it was snapshotted. The as this VM was running, the snapshot’s state is also ‘running’, but that doesn’t meant that the snapshot itself is running. It won’t change with time. This feature is also known as live snapshot and it is quite valuable since it allows you to take a snapshot of your VM without any downtime. The KVM guests, at least, work fine with live snapshots.
Certain workloads, however, do require you to stop of the VM before it is snapshotted. This ensures that the data in the snapshot is consistent and there’s no half-written file or missing data. If the workload running in your VM has high IO, you probably need to turn the VM off before creating the snapshot. Let’s create one this way.
Domain VM1 is being shutdown
Domain snapshot 1556533868 created
If you want to name the snapshots something other than timestamp, use the command:
Obviously, you don’t have to name it snap1, you can pick any convenient name.
To take a snapshot is of no use if you can’t go back to it. In case, you need to revert back to a snapshot simply use the command:
The name can be the timestamp or the user assigned name given to the snapshot.
Make sure that there’s no important data in your current VM, or if there is anything of importance, then take snapshot of your current VM and then revert back to an older snapshot.
The copy-on-write mechanism of qcow2 files allows each snapshot to take very small space. The space taken by a snapshot increases over time as the running image diverges from the snapshot. So as long as you are not rewriting a lot of data, your snapshots will take only a few MBs of storage.
It also means that the snapshots are very fast as well. Since, the copy-on-write mechanism just needs to mark the timestamp when the snapshot was taken. The data blocks written to the qcow2 file after the snapshot don’t belong to it, but the older ones do. It is as simple as that. My test bench uses a 5400RPM hard drive that is by no means at the peak of its performance, it still takes less than a few seconds to take a live snapshot of a VM on this disk.
As with most libvirt and virsh related utilities, the snapshot functionality provides a very flexible interface with enterprise grade features like live snapshotting along with the benefits of copy-on-write mechanism.
The default naming convention also makes it easier for shell scripts to periodically remove old snapshots and replace them with newer ones. One of my older articles on OpenZFS snapshots and snapshot policies can also be applied for your KVM guest. For more information about the virsh snapshot utility you can use virsh help snapshot command. The help page is very small, precise and easy to understand. ]]>
Ever wondered how VPS providers configure your VMs, add your SSH-keys, create users and install packages every time you spin up a new VM in the ‘cloud’? Well, the answer for most vendors is cloud-init. Most OS and distributions ship virtual disk images with their respective OSes installed in the image. The installation is very minimal and can serve as a template for the root filesystem of the OS. The OS maintainers are also kind enough to provide the virtual disk image for all the various formats from raw disk images to qcow2 and even vmdk, vdi and vhd.
The image also has one extra package pre-installed and that is cloud-init. It is the job of cloud-init to initialize the VM (typically within a cloud hosting service like DigitalOcean, AWS or Azure) talk to the hosting provider’s datasource and get the configuration information which it then uses to configure the VM.
The configuration information can include user-data like SSH keys, hostname of the instance, users and passwords along with any other arbitrary command that the user wants to run.
Cloud-init is a great tool if you are a cloud user, if you are spinning up VMs or containers and your cloud provider is kind enough to ask you for a cloud-config, it is great! With a cloud-config file aka your user-data you can add users, run arbitrary commands, install packages right as the VM is being created. The process can be repeated over and over without tedious commands being typed over and over. Soon you have a fleet of VMs, all with identical configuration.
However, if you dig a little deeper and see how the sausage is being made you will start to question some of cloud-init’s aspects. For example, by default, the datasource is like a REST endpoint, and these are essentially hardcoded into the cloud-init package itself. Sure, you can set up a datasource all by yourself, but the process is clucky and time intensive. The documentation to do this is all but non-existent.
The official documentation is nothing but a user manual for end users relying on preexisting cloud services. It doesn’t tell you how you can setup your own cloud-init datasource, in case you are an upcoming vendor. Even the end-user documentation is poor, and I would recommend people using DigitalOcean’s excellent tutorial instead.
To make matters worse, users with home virtualization labs and small VPS startup find it difficult to benefit from those lightweight cloud-images. You can’t really start a VM off of those templates without a cloud-init datasource or some hackery which is difficult to automate and scale. In other words, you can’t even choose to ignore cloud-init unless you want to craft your own templates.
In a classic systemd fashion, it is breaking free from its predefined roles and it starting to mess with networking and other parts of the OS which throws users off. It gets bundled within Ubuntu 18.04 server ISO which makes absolutely no sense (at least not to me).
All the ranting aside, I still have to deal with cloud-init in my everyday use. I have a very minimal Debian 9 installation on x86_64 hardware, which I use as a KVM hypervisor. I really wanted to use the qcow2 disk images that are shipped by Ubuntu and CentOS. These disk images have the OS preinstalled in them, and to use them you simply need to:
The following steps are followed:
The only user I have here is the root user. If you don’t mention any user, then the default user with name ubuntu gets created. The default username, differs from one OS to another, which is why I recommend specifying a user, even if it is just root. The next part of the user-data file tells cloud-init to configure the password for all the users you want to assign a password. Again, I am just setting the password for just root user, and it is myPassword. Make sure that there’s no space between the colon and the password string.
Better yet, you can use SSH-keys instead of having hardcoded passwords laying around.
Make sure that the file cidata-myVM.iso is situated in /var/lib/libvirt/images/
You can now try logging into the VM by using the command virsh console myVM and using the root username and its corresponding password to login. To exit the console, simply type Ctrl+]
The cloud images that most vendors ship are really efficient in terms of resource utilization and they also feel really fast and responsive. The fact that we need to deal with the awkward cloud-init configuration as a starting point only hinders the community’s adoption of KVM and related technologies.
The community can learn a lot from the way Docker builds and ships its images. They are really easy to manage both as running containers and templates that are easy to distribute and use.z
]]>Just because it supports a large array of technologies, doesn’t mean you have to be familiar with all of them. You can focus on one technology like KVM and build your libvirt experience around that. This article will try and give a comprehensive criticism of the technology from the author’s personal experience with it.
To get a hang of what Libvirt is capable of and how you can use it on your own system you can follow the following guides:
If you are already familiar with tools like virsh, virt-install, virt-manager, oVirt, etc then you are already using libvirt without even knowing it. The aforementioned tools use libvirt in the backend and provide a user-friendly interface be it command line or GUI.
Libvirt is designed to work with any hypervisor and has grown over the years to work with a wide array of hypervisors. The libvirt daemon exposes an API that can be used by apps like virt-manager or virsh (and even your custom Python scripts). The user requests are received by the API. These requests could be anything like create a KVM guest, or show me the memory used by a given LX contianer, etc.
The libvirt daemon then delegates the request to the appropriate libvirt hypervisor driver. This driver understands and implements all the specifics of a given virtualization technology and carries out the instructions accordingly.
There’s a different class of drivers for handling storage and even networks of VMs.
VMs need a lot of storage. The storage technology itself is very variable from hypervisor to hypervisor. VMWare uses its own vmdk format, QEMU likes to use qcow2, there’s also raw disk images and LXC images are a different story as well. Moreover, you would like to group together all the VM disk images and provide them a different storage media like a NFS server, a ZFS dataset or just a directory. This allows you to use libvirt across a variety of different use cases from a single home server to an enterprise grade scalable virtualization solution.
In libvirt vernacular, a single virtual storage device associated with any VM, like the qcow2, raw or vmdk image file of a VM or mountable ISO is known as a volume. The storage media used on the host to store a group of associated volumes is known as a pool. You can use an NFS servers as a pool, or a ZFS dataset, as previously mentioned. If you don’t have a fancy storage solution, then you can simply use a directory.
By default, libvirt has two different pools. First is /var/lib/libvirt/images and /var/lib/libvirt/boot. Volumes for a single VM can be split across multiple pools. For example, I store all the clean cloud images and OS installer ISOs in the /var/lib/libvirt/boot pool and for individual VMs rootfs is installed in image files stored in /var/lib/libvirt/images.
You can even have a single pool for a single VM, or you can split the pools further for VM snapshots, backups ,etc. It’s all very flexible and allows you to organize your data as per your convenience.
Virsh is a popular tool to configure everything from your VM, virtual machine networking and even storage. The configuration files themselves live in the XML format. You will find yourself issuing commands like:
And similarly, there are subcommands like net-dumpxml and pool-edit to view or configure the configuration of pools, networks, etc. If you are curious to where these configuration files live, you can go to /etc/libvirt/ and find your hypervisors concerned directory. The parent directory /etc/libvirt/ itself contains a lot of global configurations like drivers (e.g qemu.conf and lxc.conf ) and their configuration and the default behaviour of libvirt.
To look at specific configuration of individual components like the VMs, pools and volumes you have to go to the corresponding directories. For qemu guests this is /etc/libvirt/qemu
The autostart directory will contain symlinks to VM1.xml and VM2.xml if you have confiured the VMs to autostart when the host system boots ( $ virsh autostart VM1 ).
Similarly, the /etc/libvirt/qemu/network contains a configurations for the default network a qemu guest. The /etc/libvirt/storage contains XMLs defining the storage pools.
If you are interested in setting up your own virtualization host a good place to start will be this article where I show how to install QEMU-KVM guests on a Debian host using libvirt and related tools.
After that you can start playing with virsh CLI and see and manage entities like Domain (libvirt calls guest VMs a domain) networks, storage pools and volumes. This will make you comfortable enough with the technology that you can move on to other concepts like snapshots and network filter. I hope this article will prove to be a good starting point for you. ]]>
On the flip side, new concepts have been introduced in order to optimize the performance, life span and reliability of these novel devices as well. One such concept is the TRIM operation.
SSDs are blazingly fast and are getting faster and cheaper every year. Their reliability also has improved quite a bit since their inception. However, SSDs are still not as reliable as magnetic media, neither are they as durable as a hard disk. In fact, the underlying read-write mechanisms are very different from what one sees inside an HDD.
To understand the problems an SSD suffers from, and why we need TRIM operation to overcome those problems, let’s look at the structure of the SSD first. Data is stored typically in groups of 4KB cells, called pages. The pages are then grouped into clusters of 128 pages, called Blocks and each block is 512KB, for most SSDs.
You can read data from a page that contains some information or you can write data to pages that are clean (with no preexisting data in them, just a series of 1s). However, you can’t overwrite data on a 4KB page that has already been written to, without overwriting all the other 512KB.
This is a consequence of the fact that the voltages required to flip a 0 to 1 are often much higher than the reverse. The excess voltage can potentially flip bits on the adjacent cells and corrupt data.
When data is said to be ‘deleted’ by the OS, the SSD merely marks all the corresponding pages as invalid, rather than deleting the data. This is quite similar to what happens inside an HDD as well, the sectors are marked as free rather than getting physically zeroed out. This makes the deletion operation much much faster.
In case of HDDs, this works just fine. When new data needs to be written, you can overwrite the old data on a freed sector without any issues or worries about the surrounding sectors. HDDs can modify data in-place.
In the case of an SSD, this is not so simple. Let’s say that you modify a file and that corresponds to a change of a single 4KB page. When you try to modify a 4KB page in an SSD, the entire content of its block, the whole 512KB of it, needs to read into a cache (the cache can be built into the SSD or it can be the system’s main memory) and then the block needs to be erased and then you can write the new data your target 4KB page. You will also have to write back the remaining unmodified 508KB of data that you copied to your cache.
This results adds to the phenomenon of Write Amplification where each write operation gets amplified to a read-modify-write operation for chunks of data that are much larger than the actual data that needs to be put in place.
Initially, this amplification doesn’t show up. Your SSD performs very well in the beginning. Eventually, as blocks get filled up, the inevitable point is reached where more and more write operations start involving the expensive read-modify-write operations. The user starts noticing that the SSD is not performing as well as it initially did.
SSD controllers also try to make sure that the data is spread out throughout the disk. So that all dies get equal levels of wear. This is important because flash memory cells tend to wear-out quickly, and therefore if we continuously use only the first few thousands of blocks ignoring the rest of the SSD, those few blocks will get worn out soon. Spreading data across multiple dies also improves your performance as you can read or write data in parallel.
However, now the writes are spread out, increasing the chances of a block having a page. This further accelerates the degradation process.
The TRIM command minimizes performance degradation by periodically trimming the invalid pages. For example, Windows 10 TRIMs your SSD once every week. All the data that has been marked as deleted by the OS gets actually cleaned out of the memory cells by the SSD controller when that operation is run. Yes, it still has to go through the read-modify-write operation but it happens only once a week and can be scheduled in the hours when your system is mostly ideal.
The next time you want to write to a page, it is actually empty and ready for a direct write operation!
The actual frequency of TRIM command depends on the kind of system you are running. Databases tend to do a lot of IOs and would thus require a more frequent trimming. However, if you do it too frequently the database operations will slow down for the period when TRIM is running. It is the job of a system architect to find the right schedule and frequency.
TRIM command is very useful in delaying the performance degradation of your device. It helps maintain the average performance of your device. But that’s only on average.
Suppose, if you are working with a text document and are constantly write to the file, editing things out and saving so you don’t lose any progress. The pages storing the document’s data will still need to go through the excruciating read-modify-write cycle because TRIM is not a service that’s constantly optimizing your SSD. Even if it did run as a service, the performance impact will still be visible because it is built into the very mechanics of an SSD’s operation.
Also running SSD TRIM too often can reduce the longevity of your storage. Since all that deletion and write-cycle will wear out the cells rendering the data stored within them read-only.
Despite all the shortcomings of an SSD it still packs massive performance benefits when compared against a traditional hard disk drive. As the market share for these magical devices grows, more research and engineering efforts will be directed towards bettering the underlying technology.
Operating system vendors, SSD chip manufactures and the people who write all the complex firmware logic come together to give us this awesome device. TRIM is but one of the many many layers of complexity that’s packed in there.
I am using a Libvirt KVM installation on a Debian server. The Python scripts I will be using are running in a Python 3.7.3 environment. This article is supposed to get your feet wet with Libvirt’s Python bindings, when you are designing your application you should always refer to the official documentation which cover a wide range of use cases and are updated reasonably often.
Let’s install all the dependencies required for libvirt first:
That’s all the packages you need.
The following scripts and snippets are run locally on the Libvirt host, as root, rather than being run on a remote client. You can access the services remotely, however, that would require a long digression into securing the connection between the client and the server. Therefore, we will be connecting locally, for simplicity’s sake.
To get started, let’s open a Python prompt, import the libvirt library and open a connection with the libvirt.open method.
Type “help”, “copyright”, “credits” or “license” for more information.
The variable conn can now be used to query your libvirt daemon and we will do that shortly. But first, a little digression.
Libvirt can be used to manage a number of different virtualization and containerization stack. KVM-QEMU, Xen and LXC are the most popular of these. So when you enter libvirt.open(‘qemu:///system’) libvirt enables you to gather information about, and manage, QEMU guests. You can just as well, talk to LXD daemon or Xen hypervisor using lxc:///system or xen:///system respectively.
Similarly, the method libvirt.open() is not the only one at your disposal. open(name), openAuth(uri, auth, flags) and openReadOnly(name) are three different calls each of which returns a virConnect object and offers varying level of control over the host. You can read more about them here. For now, we have conn as an object of the virConnect class. This object is a gateway for doing almost anything from configuring the hypervisor itself to modifying the guests and their resource allocation.
Once you are done working with the object, make sure to close the connection by calling the close method on it.
However, don’t run the above command, just yet. Because we will play around with libvirt a bit more. Let’s ask our hypervisor a few details about itself, like the hostname, and the number of vCPUs that it can offer to guest VMs in total.
Now, we need to understand that with Libvirt metadata about objects like hypervisor stats, VMs, their networking and storage info, etc are all represented in XML format. XML is sorta like JSON only a bit clumsier (and a bit older). The data is stored and presented as a string literal and what that means is that if you query libvirt and the output of that query is XML you will get a really long single line output with ‘\n’ present as a literal string rather than a new line. Python’s built-in print function can clean it up for human readability
If you are maintaining a large array of VMs you need a method to create hundreds of VMs with uniform configuration which also scale properly from simple single threaded workloads to multi-core, multi-threaded processing. Libvirt calls the guest VMs (or containers if you are using LXC) Domains and you can list information about individual domains as well as configure them if your virConnect object has sufficient privileges.
To get information about the VMs and their resource utilization you can use the following calls:
This returns an array of domain IDs which are just small integers for a simple libvirt setup. A more reliable way of labeling your VMs, without having two VMs (let’s say on different nodes) with the same ID or name, is to use UUIDs. In libvirt everything can have a UUID, which is randomly generated 128 bit number. The chances of you creating two identical UUID are quite small indeed.
The network for your Virtual Machines, the VMs themselves and even the storage pools and volumes have their individual UUIDs. Make liberal use of them in your Python code, instead of relying on human assigned names. Unfortunately, the way to get the UUIDs of domains is a bit messy in the current implementation of this library, in my opinion. It does require you to supply the ID of the VM (the domain ID), here is how it looks.
Now you can see the list of domain UUIDs. We have also stumbled across a new Python Object libvirt.virDomain, which, has its own set of methods associated with it much like the variable conn which was a libvirt.virConnect object and had methods like listDomainsID() and lookupByID() associated with it.
For both these methods you can use Python’s built-in dir() methods so that the objects can list their internal variables and methods.
For example:
This can really help you recall quickly the exact name of a method and the object it ought to be used with. Now that we have a libvirt.virDomain object, let’s use it to list various details about this running VM.
This gives you the information regarding the state of the VM, maximum memory and cpu cores as shown here.
You can also find other information about the VM using different methods like OSType()
There’s a lot of flexibility when it comes to the API that libvirt exposes and you only have to worry about your use case and without worrying about the enormous complexity that libvirt handles.
In my voyages into the Libvirt technology, the absence of UUIDs as a first class citizen was probably the only pain point that I faced which seemed like a bad design choice. Other than that, libvirt is pretty nifty for what it accomplishes. Yes, there are a lot of other things that could have been done in a better way, but that’s always the case with software. In hindsight, bad decisions are always obvious but the cost of rewriting a piece of software, as widespread as libvirt, is often tremendous.
A lot has been built on top of it, as the project as evolved slowly and steadily.
Instead of trying to learn the entire library at once, I would recommend come up with a small project or an idea and implement that using Python and Libvirt. The documentation is pretty extensive with a lot of examples and it really forces you to think about proper software design and virtualization stack at the same time.
]]>With that said, let’s try and setup our own KVM hypervisor on a Debian 9 server .
Ideally, you will need a clean installation of your favorite Linux distribution on a machine (not a VM) that has a fairly modern CPU. Most modern Intel CPUs support VT-x extensions and, similarly, AMD has its AMD-V extensions. These extensions are “enhancements” built right into the silicon of your CPU which enables faster and more secure virtualization. You have to enable these extensions from inside your motherboard’s BIOS/UEFI menu. Refer to your motherboard manual for more information.
If you don’t want to sully your perfectly working Linux workstation, you can use a Virtual Machine in the cloud to run these experiments. DigitalOcean, for example, offers virtual machines which has nested virtualization enabled. This allows you to run VMs inside of your cloud hosted VM (this is known as nested virtualization). Obviously, this will be a very inefficient way to practically run a hypervisor but as an experiment it will do just fine. Make sure to get at least 4GB of memory and more than 2 CPUs.
Once you have enabled the said extensions, you can verify that by running lscpu and looking for the Virtualization entry:
Now that we have the extensions enabled time to move up further in the stack.
KVM (or Kernel-Based Virtual Machine) consists of a few Linux kernel modules that will take advantage of the CPU extensions we enabled earlier. QEMU on the other hand consists of a bunch of userland programs that provides us with emulation capabilities. As a standalone software QEMU can be used to run programs from one architecture, like ARM, on another like x86_64 and vice versa. It can be used to run anything from a single binary file to a complete operating system.
We will, of course, use it only to virtualize x86_64 operating systems on x86_64 platform. And for that we need just a single package:
You can verify that package has loaded all the required modules, by running:
That’s all you need, theoretically. But you will soon realize that that’s practical. Virtual machines are enormously complex and we require software wrapper to manage all the various demands like networking, filesystem management, etc in a fairly automated (and scalable way). To do this we need Libvirt virtualization library/daemon.
Libvirt is a quintessential part of your virtualization stack. The libvirtd daemon runs virtualization related services in the background. Services that listens to requests like “Create a VM”, “Destroy a VM”, “Create a Network”, etc and executes them in them using the basic Linux utilities like qemu binaries, iptables, etc.
Libvirt is very generalized and it can be used to manage KVM guests, LXC containers and Xen virtualization stack. We will just focus on Libvirt for KVM guests for now. Libvirtd exposes an API that can be consumed by GUI applications like virt-manager or oVirt or command line tools like virt-install, virsh, etc. We can write even our own custom clients that uses the same standard API. We will be using the command line tools, like virsh and virt-install, so as to keep things standardized.
Let’s install all these tools:
We will also need another package libguestfs-tools, to help us edit or modify guest VM’s hard disks and filesystems.
Great! Now we have installed the entire stack and know how the architecture is laid out. To use libvirt (and related tools) add your user to libvirt-qemu and libvirt groups.
Or run the commands as root user.
The virsh command line utility is something you will use a lot, when managing your VMs. You can simply type in virsh and drop into the virsh command line interface, or type virsh <subcommand> [Options] from your regular shell. Go through the output of virsh help whenever you are stuck with some VM related operation.
The first virsh command we will use will invoke the default network to which a VM may connect:
This will start the default network and will make sure that it is started automatically when the host reboots. To check the details about this default network use the command:
The xml file can show you the range of possible IP addresses and the how they will communicate with the outside world. Basically, the traffic will come to them via a NAT and they won’t be a part of your host’s external network. You can use Bridge Networking to expose each VM to the host machine’s LAN.
To start a virtual machine we need an installation media (like the installation ISO for any operating system) and how many CPUs and how much memory needs to be allocated to the VM, and if it needs VNC. This step is where you can really appreciate a GUI installer like virt-manager, however, we will do it using a rather complex virt-install command.
I like to keep all of my boot media at /var/lib/libvirt/boot and all the VMs and their virtual hard disk at /var/lib/libvirt/images (the default location) this simplifies the organization.
The command above fetches Ubuntu desktop ISO, you can just as easily get CentOS or any other distribution that you desire.
To create a new VM and to boot it run:
The above command is, indeed, complicated. I suggest saving these commands in text files and running those as executable scripts whenever you create a new VM. Most of the parameters like virt-type and virt-name are pretty self-explanatory. They are just tedious to write down.
The last option for VNC display will start a VNC server and would allow you to have console access to your VM remotely, by connecting to the host’s port 5900. Open a VNC client on your desktop and go to your KVM host’s IP at the port 5900. Make sure you reach the host’s IP and not the VM’s IP. Your VNC will connect to the video output of your VM and you can proceed with the installation.
From here on you can try pausing, stopping and deleting the VMs. You can also modify the underlying infrastructure by adding pools for storage and configuring bridge networks. All the configuration files, for individual VMs, network interfaces and storage pools are stored at /etc/libvirt/ and /etc/libvirt/qemu.
Sometimes you will have to physically delete the hard disk files saved at /lib/libvirt/images even after removing the VM from libvirt. To automate things further, try to import qcow2 images that most linux distributions like Ubuntu and CentOS. These have the OS preinstalled in them.
Setting this up is nowhere as easy as setting up VirtualBox and the reason behind this is manifold. Most of the stack are complicated because it are designed to be modular and highly scalable. It doesn’t make any assumptions as to where you are running the VM. The environment can be a personal desktop or a data center. Working with a GUI can help reduce this complexity to some extent. However, these systems are designed to work with a REST API to connect to your organization’s billing systems, monitoring systems, etc. They are almost never touched by a human after being deployed.
That said, automation is the name of the game with libvirt and qemu-kvm. Peruse through the official documentation and write you own cool script to spin up a fleet of VMs and let us know if you found this tutorial useful.
]]>By vertical scaling, I mean the optimal resource requirement per server. For example, if you need something small like 1 to 8 vCPUs and a 1 to 32 GB of memory, then you can consider any mainstream cloud hosting provider like DigitalOcean, Azure or AWS. This is by far the most common option, and probably something that you want. These services can scale to host anything from your hobby projects and personal blogs to the complete stack of a typical DevOps pipeline.
For larger requirements, like upwards of 128GB of memory, it makes more economical sense to rent a dedicated server from vendors like OVH. The upfront cost might be higher in the later case, but over longer periods of time it is significantly cheaper.
You might want something completely managed by a cloud server provider including FaaS options like AWS Lambda and Azure Functions, or PaaS options like Google Cloud Platform. However, these don’t strictly count as a Linux environment, because you are restricted in the environment the vendor offers to you, rather than having your own Linux environment with root privileges, etc.
Horizontal scalability simply refers to the fact that how can you scale your software across multiple servers. In this context, it also refers to the outreach that your cloud service provider has to offer. Do they have data centers close to where your users might be? If you plan on hosting something like a VPN, then the question is different. Do they have data centers in regions which respect user privacy.
If you think that your user base will grow, then you need to plan ahead for this.
When it comes to servers, automation is the name of the game. These are not devices that are meant to be interacted by you, as an owner. They should quietly run the application they are supposed to run and get of your way.
With that in mind, a lot of services offer a whole array of services starting from cloud-init to an HTTP API with which you can remotely spin up servers and configure them. If your team is familiar with the API of one vendor and not the other, then it is better to stick to the familiar one rather than re-writing your entire tool-set for another vendor.
Once you have a set of scripts automating everything for you, it will save you hundreds of human hours in the long run. And you will never have to worry about a server being misconfigured because of human error. Visit the API documentation pages for all the vendors that you are considering, before making a decision.
To take full advantage of the open source software that comes with Linux you need to make sure that your server provider offers a certain minimum standard of features.
In case of cloud hosted VMs, these may include backups and snapshots, block devices, object store, floating IPs, managed firewall, private networking, DNS server and a wide range of Linux distros. In case of a dedicated server the features are more hardware specific like the availability of IPMI, remote KVM and a useful management interface like OVH’s vRack.
Monitoring your server is another crucial factor to take into account when renting a Linux server. What level of monitoring does the platform allow. Of course, there are bonus points for alerting as well.
Services like Cloud Log from Amazon are more and more commonplace across all vendors. They give you a very fine-grained glimpse of your systems health and availability. Third party services have also popped up to help you take advantage of this. You just sign up for them and they tap into the metrics and offer them to you in terms of intuitive graphs and alerts.
With Linux, disaster is not a matter of “If it happens” but “When it happens”. When you inevitably run into an issue, you need to make sure that the vendor offers affordable and timely support to get you out of the mess. Moreover, when things go wrong on your cloud provider’s end you should get an immediate alert about it. You will be thankful for a communicative vendor when things go wrong. On the other hand, lack of communication can lead to frustration and anxiety.
If support contracts are expensive for you, consider opting for a vendor around which there is an active community of developers and operators. People who have, probably, solved the problem that you are facing or who can point you in the right direction. If you adopt an arcane, and poorly documented technology, you are out on your own. This is true with any technology, in general, but especially so with platforms that are controlled by another party.
Standards like HIPAA and PCI are a rabbit-hole of their own. Whether or not you see any merit in them, is a different matter. The plain fact of the matter is that if you are designing products that need to meet certain legal standards, then you better dot your ‘i’s and cross your ‘t’s.
Consult with the experts and talk to your vendor about it. See, if their infrastructure meets the various standards you need to comply with before making a business decision.
Thanks to the ever growing rate of open source projects, renting a Linux server across any cloud platform is not much of a technical hinderance. The decision really boils down to economics, geo-location, legalities and personal preference.
On that note, I hope you found the above factors conducive to thought. I hope that it will help you select your perfect platform.
]]>Most web servers like nginx and apache listen on port 80 by default and need quite a bit of configuration before they start using the certificates to encrypt the traffic. Despite having it configured, the web server can still service HTTP traffic no problem. So the visitors to your website will just type http://example.com instead of https://example.com and the entire traffic will remain unencrypted for them. To circumvent this issue we need to configure the HTTP servers such that they themselves redirect all the HTTP to HTTPS.
The setup I have is using an FQDN with a public IP, so I will be issuing an SSL certificate from LetsEncrypt rather than issuing a self-signed one. Depending on the kind of web server you are using, you can do this in multiple ways. But the general flow of it is like this:
Let’s demonstrate various way to achieve what we want. First is the easiest solution which uses Certbot.
I will be using Nginx as an example for this server. If you are running a different one, like Apache or HAProxy, then just visit the Certbot official page and select your OS and your web server of choice. For Nginx on Ubuntu 18.04, these are the commands you would need.
First, update your repo index.
You would need to add the required third-party repositories, which Ubuntu may not have enabled by default.
And then install the certbot package with Nginx plugins, using the command below.
The instruction will be different for different platforms and install plugins for the web server if available. The reason plugins make our lives so much easier is because they can automatically edit the configuration files on the web server to redirect the traffic as well. The downside could be that if you are running a very customized server for pre-existing website, then the plugin may break some stuff in there.
For new websites, or very simple configurations, like a reverse proxy, the plugin works surprisingly well. To obtain the certificates and to redirect the traffic, simply run the below command and follow through the various interactive options as the package walks you through them.
Output:
IMPORTANT NOTES:
– Congratulations! Your certificate and chain have been saved at:
Your key file has been saved at:
As shown in the above example you only have to provide a valid email address and your domain name to get the certificate. This certificate is sotred in /etc/letsencrypt/live/SUBDOMAIN.DOMAINNAME.TLD
. The last directory will be named after your FQDN.
The most important aspect is selecting the Redirect option and it will do the job of redirecting all the HTTP traffic to HTTPS. If you are curious as to what these changes are, you can inspect the config files in /etc/nginx/
to get the gist of it.
If you want to manually configure your server to use the certificates. To get the certificates using certbot, run:
As before, the certificates are saved in the directory /etc/letsencrypt/live/yourdomainname.com/
Now we can configure Nginx to use the files in this directory. First things first, I will get rid of the Debian specific directory layout. The default page’s site config file is /etc/nginx/sites-available/default
subdirectory with a symlink to /etc/nginx/site-enabled
.
I will just delete the symlink and move the config file to /etc/nginx/conf.d with a .conf extension just to keep things more generalized and applicable to other distros as well.
I will be modifying this default config file to demonstrate how the TLS is enabled.
The following are the contents inside your default config file, without the commented out sections. The highlighted sections are the one that you ought to add to your server configuration in order to enable TLS and the last block in this config file detects if the scheme is using TLS or not. If TLS is not being used, then it simply returns a 301 redirect code to the client and changes the URL to use https instead. This way, you won’t miss out on users
There are a few extra parameters added to this config file. Including parameters declaring the timeout, the TLS version you ought to use and what encryption ciphers the server will be using. This was borrowed from Certbot’s recommended (but optional) configurations for Nginx.
Now, check if the configuration file is valid and restart the server.
You can apply the same approach for more complicated web apps and services that need HTTPS. Letsencrypt let’s you issue certificates for multiple domain names at once, and you can host multiple websites behind your nginx web server quite easily. If you followed the above example, try reaching out to your website using http (http://SUBDOMAIN.DOMAIN.TLD) and you will be redirected to HTTPS automatically.
For other web servers, like Apache, use the appropriate certbot plugin or refer their official documentation.
]]>Given how involved the process of benchmarking is, and how important it is when making a decision. We need some standard set of tools that we can use to benchmark our systems, get a simple to understand result and use it to compare different hardware components and configurations effectively.
Here are a few free benchmarking tools that you cover a wide array of hardware and use cases.
Now that the PC and desktop computing war is waging at an all time high between AMD and Intel and also AMD and Nvidia, this benchmark is strongly recommended. This benchmark can be used to test both your CPU and GPU to the utmost for certain workloads like video rendering and content creation.
The reason it is first on the list is because it is cross-platform. You can install it on macOSX, Windows and, of course, Linux. The crossplatform nature of the software can also help you choose the best operating system for your rig, on top of letting you compare various hardware options.
Phoronix offers a more complete set of tools for benchmarking nearly any aspect of your system. Moreover, it is completely open source and not just free to use. The framework it offers is extensible and can accomodate any number of different tests that you may want to see your system perform. It is extremely powerful, flexible and useful for both sysadmins as well as desktop enthusiasts.
Moreover, the official website for Phoronix offers a very in depth analysis of the benchmarking procedures, in case you are new to this field. Their latest post detailing the impact of spectre and meltdown mitigation patches on your system’s performance is something I can personally recommend.
Perhaps not the foremost consideration while building a PC or a server, your SSDs are important. Faster SSDs lead to snappier systems. The reason is quite simple. Modern CPUs and memory are fast enough that once a program or a data reaches them it can then quickly be read or executed.
Secondary storage, like your SSDs, are major bottlenecks. The longer it takes for information to reach your CPU, the slower your experience will be. IOzone lets you have a really close peek at how your storage is doing. Sequential reads, sequential writes as well as random IOPs are to be considered to select your perfect SSD.
Workloads like video streaming can benefit from higher sequential reads whereas databases can really benefit from higher random IOPs. So storage benchmarking is never isn’t as simple as running dd to a disk.
We have talked a lot about storage and compute, that leaves out one thing and that is networking. While there are a ton of tools for network engineers to benchmark and monitor their networking their 10G interfaces, I wanted to talk about a different layer of networking altogether.
The Web Latency Benchmark is a benchmark for your web browser from Google. This cross-platform benchmark is quite useful when comparing the real-world performance of your web browser. Things like delay between keystrokes and browser responses, scroll latency and jank and a few other things are measured by the benchmark.
Browsers are something we spend a lot of time working at, if the performance between Firefox and Chrome differs even in the slightest, it is worth the time to benchmark them and pick the better one.
Yes, the archival tool 7-zip comes with its own benchmarking tool built into it. If your workload involves a lot of compressing and uncompressing. Then this is benchmark is really worth considering.
You can take this tool even further, things like running password brute force attack or dictionary attack are all possible using 7-zip. If you want to see the difference between your CPU and GPU when handling these kind of workloads (which can be multithreaded easily), 7-zip has a lot to offer.
Before you start running benchmarks on your own system, I would highly encourage you to check out PassMark software’s website and just try and infer what the different CPU benchmarks show and reflect. There’s multithreaded score, single threaded score and different CPUs work at different clock-speeds. In short, there is quite a bit of variation.
Try and picture yourself as someone trying to pick one of the CPUs for their own build, how would you decide which one is better for you? Good benchmarks should answer these questions for you.
]]>Once you have created and started your NFS server go to your Windows 10, search for “Turn Windows Features ON or OFF” and it will open a window with a list of available features. Search for NFS and within the NFS sublist you will want the NFS client utility.
That’s it. Click OK and let Windows do its thing. You may have to reboot your system to let this feature kick in. Now go to your File Explorer, and in the text box where you typically write the path to a folder, type in the IP address of your NFS server prefixed by two backslashes, as shown:
In my case, the IP address of my NFS server was 192.168.0.104, the two backslashes before it tells Windows that what follows is an address for another computer, not a drive letter or something else. Congrats, you can now get the benefits of ZFS on Windows 10!
Now, you can use this folder as if it is a part of your desktop computer. You get the benefits of ZFS, its reliability, its robustness, etc. And you also get the flexibility to work with the favorite software of your choice. This is in fact, a very common practice in the enterprise where a the work directories of all employees is in fact hosted remotely. This way even if a careless employee does something catastrophic to their computer, the data on the remote end will be safe. Features like ZFS snapshots can actually help you take periodic snapshots of your work and store it in a read-only format.
Ransomware can encrypt your folder, maybe even your remote folder, if it is mounted and writable, but it can’t do anything to your snapshots which are read-only. If the ransomware targets Windows 10, it most likely can’t take into account a Linux or BSD underbelly, so that’s also another added layer of security (although not a bullet-proof one).
Speaking of security, when using such a setup you have to make it absolutely sure that the network you are on, your home or LAN, is trustworthy. That there are no rogue parties on this network. You certainly don’t want to host it over an open Wifi (that is Wifi without password) or on any network where you don’t trust a computer or device that is attached to it.
NFS traffic is not encrypted, and any device that is a part of that NFS server’s network can snoop at the traffic flow, even if it doesn’t have direct read-write access to the files being hosted.
The answer to this depends heavily on the reliability of your NFS server and the speed of connection between the server and client. But even if you have mirrored SSDs it is more reliable than not having anything. If you data is crucial but you want local system’s performance you can do a periodic sync instead of working on the remote folder directly.
For workloads like video editing, design and other content creation, where you spend hours fine-tuning everything and a single system crash can result in hours of lost work, NFS can be a real lifesaver. Even when working with text files, like large Git repos, this can be helpful. ZFS’ copy-on-write mechanism can prevent partial writes so you avoid rampant data corruption that follows after a power failure or system crash.
For people who run benchmarking workloads, or just have to do clean installs many time a day, you can save yourself a tonne of time and Internet bandwidth by locally hosting your pre configured system images which can be used to bring a new build upto speed in a matter of minutes.
Projects like Steamcache have really gone above and beyond when it comes to saving you both bandwidth and time. You can cache you game setups on an NFS server and reinstall your entire steam library whenever the need arises. This also frees up space on your local disk. This write up by Arstechnica is a real inspiration behind this write-up and the use cases I mentioned above.
More and more people work from home these days. Your desktop, and the data stored on it, are crucial for your work and it is really worth the time and effort to create a small local backup solution for yourself, if you can. While solutions like the Creative Cloud, Google Docs, Backblaze are really promising too, for various creative endeavors. We should remember that cloud is just someone else’s computer. Nothing is really bulletproof. The difference between having an added layer of redundancy and reliability, and not having anything, can really make or break your day.
]]>With Caddy web server, you get HTTPS or nothing. So let’s see how you can install Caddy on Ubuntu and configure it to serve your web app. We will be getting our TLS certificates from LetsEncrypt.
Assume you have a VPS with IP address: 10.20.30.40 and a FQDN subdomain.example.com who’s A record is pointing at this IP.
The VPS is running Ubuntu 18.04 LTS server edition and following configurations are done as the root user.
Caddy is written in Go, and can run as a standalone executable binary. However, there are various plugins that you can build into it for specific DNS servers, etc. We will be installing the the plain binary without any plugin so it works across all customizations.
To get your binary visit their official downloads page and select all the plugins and telemetry that you require. Below it will be a bash command to download and place the caddy server binary in the right location. As root user, run:
Once that is done, we can locate the binary, by running:
If you ever need to remove the server, or update it with newer executable, you now know where to look.
If you don’t have a website, just create an empty folder and run the commands in there. You may get an Error 404 on your browser but the server setup can still be tested. If you do have a website traverse to the directory where the webroot of your website is located at. As a typical example, I will be selecting the /var/www/mysite as an example with the following index.html stored inside it.
This is enough to get us started. Now in the same directory as this index.html page, run the following commad:
WARNING: File descriptor limit 1024 is too low for production servers. At least 8192 is recommended. Fix with `ulimit -n 8192`.
Leave caddy running in this state.
You can go to your server’s public IP at port number 2015 to test this: http://10.20.30.40:2015 make sure that your firewall is not blocking this port.
And you will see that index.html is automatically served. This follows the age old convention that any website’s first page is named index which most web servers like Nginx, Apache and even Caddy serves up as the first page, even when you don’t specify this page by using /index.html at the end of the URL.
Now that you have confirmed that your website does indeed work with Caddy and can be served with it, it is time to setup HTTPS. To do this you can use the command line interface, or use a config file called as Caddyfile. We will use the command line first.
In the same directory as your website, run the following command:
Output:
That’s it! Your website is now up and running. You can visit subdomain.example.com and it will be automatically redirected to HTTPS without any custom port number or other nuances.
It is that easy! You can CTRL+C to stop the server, the next time it will just reuse this certificate.
The above method is good for experimental use cases where you are just testing the water. But if you want a running web server as a background process you need to write a Caddyfile and tell the web server to use this configuration to run your server.
This is the simplest example for the same website we hosted above:
The root directive tells the web server where the website is located. You can’t get out of this directory from client side. It is generally a good idea to place your caddy file anywhere but inside this webroot. You can place it in /etc/ folder or your home directory. For example, if the file is created at /etc/Caddyfile, you can tell the server to use this configuration, by running the command:
There are multiple directives that you can use to fine tune your server. You can enable logging, compression, reverse proxy, etc. The official documentation is a good place to start looking for directives related to your use case. Here’s another example where two websites with two different domain names are being served:
The directive gzip enables compression, if the client supports it. This improves performance as more data can be sent over the bandwidth and same interval of time. Logging helps with debugging and keeping track of network activity.
The greatest strength of Caddy web server is its easy to write and read config file and its flexibility across multiple platform. However, due to its weird licensing, the server is not strictly open source. The source code is open source, and you can totally compile it yourself and use the resulting executable, but the binary that you receive from the official site is not meant to be use for commercial purposes without proper license.
This brings us back to the issue of complications where instead of dealing with just config files, we also have to deal with the source code compilation defeating the purpose of easy to use web server. Let us know if you have any thoughts on Caddy, and if any of your websites run on top of it.
]]>I will detail the creation of NFS mount point on a Windows 10 client in the Part 2 of this series. For now let’s focus on an Ubuntu server offering NFS storage and a Ubuntu client trying to connect to it.
My NFS server is going to be based on Ubuntu 18.04 LTS. You can use your favorite Linux distro or FreeBSD, or any other OS that supports OpenZFS. My reason for using Ubuntu 18.04 is that it is quite popular and would considerably reduce the barrier of entry.
The NFS is supposed to be available only on my LAN which has the subnet mask of 255.255.255.0 and 192.168.0.1 as its default gateway. In plain English, this means that all the devices connected to my home network (WiFi and Ethernet, et al) will have IP addresses ranging from 192.168.0.2 through 192.168.0.254.
The NFS server will be configured to allow only devices with only the aforementioned IP address to be have access to the NFS server. This would ensure that only devices which have connected to my LAN are accessing my files and the outside world can’t access it. If you have an ‘open Wifi’ setup or if the security on your router’s endpoint is dubious, this would not guarantee any security.
I wouldn’t recommend running NFS over public internet without additional security measure.
Lastly, the commands being run on the NFS server have the prompt, server $ and the commands to be run on the client side have the prompt client $
If you already have a zpool up and running, skip this step. On my NFS server, which is running Ubuntu 18.04 LTS server, I first install OpenZFS.
Next we will list all the available block devices, to see the new disks (and partitions) waiting to be formatted with zfs.
A typical example is shown above, but your naming convention might be wildly different. You will have to use your own judgement, and be very careful about it. You don’t want to accidentally format your OS disk. For example, the sda1 partition clearly has the root filesystem as its mount point so it is not wise to touch it. If you are using new disks, chances are they won’t have a mount point or any kind of partitioning.
Once you know the name of your devices, we will use zpool create command to format a couple of these block devices (called sdb and sdc) into a zpool with a single vdev that is made up of two mirrored disk.
Moving forward, you can add disks in sets of two (called vdev) to grow the size of this zpool, the new disks will show up as mirror-1, mirror-2 etc. You don’t have to create your zpool the way I did, you can use mirroring with more disks, you can use striping without redundancy but better performance, or you can use RAIDZ. You can learn more about it here.
At the end of the day, what matters is that we have created a zpool named tank. Upon which the shared NFS will live. Let’s create a dataset that will be shared. First make sure that the pool, named ‘tank’, is mounted. Default mount point is ‘/tank’ .
When sharing an NFS directory, the superuser on the client system doesn’t have access to anything on the share. While the client-side superuser is capable of doing anything on the client machine, the NFS mount is technically not a part of the client machine. So allowing operations on behalf of the client-side superuser mapped as server-side superuser could result in security issues. By default, NFS maps the client-side superuser actions to nobody:nogroup user and user group. If you intend on accessing the mounted files as root, then dataset on our NFS server should also have the same permissions,
The NFS server will run any action by the client-side root as user nobody, so the above permission will allow the operations to go through.
If you are using a different (regular) username, it is often convenient to have a user with the same exact username on both sides.
Once you have Zpool created, you should install the nfs server package from your package manager:
Traditionally, NFS server uses /etc/exports file to get as list of approved clients and the files they will have access to. However, we will be using ZFS’ inbuilt feature to achieve the same.
Simply use the command:
Earlier, I alluded to giving only certain IPs the access. You can do so as following:
The ‘rw’ stands for read-write permissions, and that is followed by the range of IPs. Make sure that the port number 111 and 2049 are open on your firewall. If you are using ufw, you can check that by running:
Make a note of your server’s IP on the LAN, by using ifconfig or ip addr command. Let’s call it server.ip
Once the share is created, you can mount it on your client machine, by running the command:
This will mount the NFS share on /mnt folder but you could have just as easily picked any other mount point of your choice.
File sharing is probably the most important aspect of system administration. It is improves your understanding of the storage stack, networking, user permissions and privileges. You will quickly realize the importance of Principle of Least Privilege — That is to say, only give a user the barest possible access that it needs to its job.
You will also learn about the interoperability between different operating systems. Windows users can access NFS files, so can the Mac and BSD users. You can’t restrict yourself to one OS when dealing with a network of machines all having their own conventions and vernacular. So go ahead and experiment with your NFS share. I hope you learned something.
]]>RAID, concerns itself with the live data, it is a mechanism with which a running system combines multiple disks into a single storage entity. The data is then spread across across all the disks in such a way that it can survive the failure of at least one (or more) of the physical disks. The simplest type of RAID array is RAID1, or mirroring. This is where you copy (or mirror) the same data across two or more disks such that if one of the disks fail, the data can still survive and still be actively used. There are other RAID configurations as well, and we will discuss those as we go along.
RAID, or Redundant Array of Inexpensive Disks, is a mechanism to store data across disks. There is a wide “array” of RAID setup that you can go with, but the two basic mechanisms that they are all based on are the following:
Mirroring implies that your data blocks are copied, mirrored, across multiple disks. If you mirror your data across three disks you can survive upto two disk’s failing at any given time, the failed disks can then be replaced with new ones without much hassle. Similarly, if you copy data across n+1 disks, you can withstand upto n disks failing. The downside of this is that you only get the storage capacity equal to the smallest disk in your RAID array.
A second approach is to split your data into two parts, using the two blocks of user data you can create a third ‘parity’ block. The three blocks are all of the same size and are spread across different devices. A minimum of three devices are necessary for this configuration to work. If any of the disk fails, you can recreate the blocks stored in that disk using the other two blocks. For example, if the second user block is lost, the first block and the parity block can be used to compute the second user block. If you are interested in how this works check out this wonderful explanation.
This method can be improved upon further to have 2 or even 3 parity blocks. But more than 3 parity blocks aren’t seen in the industry that often. If you have one parity block you can survive one disk failure. Two parity block means you can withstand two disks failing and so on.
It is more efficient in terms of storage utilization, than mirroring. If you have one parity block you only need 50% more physical storage per actual user data that you are storing. This means to store 1GB of data you will need 1.5GB of storage (plus there is a small overhead for the metadata). This is way more efficient than even the most efficient mirroring scheme where you need at least 2GB of storage to mirror 1GB of data between two disks.
The downside is that random write operations are going to be slowed down, thanks to the extra bit of computation and write operation associated with the parity block. Also the reliability isn’t as good as that of an n+1 mirrored disks where you can prepare for any arbitrary number of disks failing.
RAID configurations can be as complex or as simple as you like them to be, you can combine the parity and mirroring strategies and modify them to your enterprise’s liking. There are dedicate RAID controllers to which you connect your physical disks, and the OS then sees a single logical disk as shown by the controller. LSI is one such vendor of RAID controllers. You can also perform RAID in the software OpenZFS is probably the best bet you have on that regard.
One last kind of RAID, that gets an honourable mention is RAID 0. Technically, it is not a RAID scheme, because there is no Redundancy involved here. The idea behind RAID 0 is to simply spread your data across multiple storage devices without any resilience against disk failures. The advantage is that you get performance improvements by doing this. If you are writing 1GB of data to a single disk, the process is slow. The disk can only do a limited number of write operations per second and your OS has to wait for it to finish that operation before new data is sent its way. If you spread the same 1GB of data across two such disks, you can write (and read) from both of them simultaneously and gain quite a bit of performance improvement.
The concept of backups is arguable more important than that of RAID. A backup, in the context of storage management, is a known good copy of data, from a given point in time, from which you can restore files back into your main system when needed. In terms of implementation, there are many cloud hosted solutions and many offline ones as well that can be used.
Tarsnap and Backblaze are my favorite managed backup services for both private and business use cases. You can also include Google Drive, iCloud or Dropbox in this definition of a backup solution but they are targeted more towards the consumer market than the enterprise. However, the underlying principle is still the same. When you sign in to a new iPhone or iPad all the data, your contacts, photos, media library etc, is synced from your iCloud account seamlessly and as you keep using your device the newer data gets silently backed into the Cloud and you don’t have to worry about it.
Your backup solution can be as simple as copying data to an external hard disk or to use rsync (or zfs send, if you are using OpenZFS) to periodically generate a copy of all the relevant information. This could include your Documents folder, your database, your source repository or even your entire root file system splat into a flat zip or a tarball. The important criteria that a good backup solution should meet are the following:
Now that we know a little bit about both RAID and backup, let’s highlight some differences between them.
RAID is always concerned with blocks of data, not how the filesystem presents that data to the user. Both software and hardware RAID deals with data as blocks of information, the size of blocks may vary from 128 KiB to 1 MiB.
Backups on the other hand are much more flexible. They usually are performed on the file system level, although there is no hard and fast rule for this to be the case. They are also more granular. You can restore a single file from your backup, if your solution is flexible enough. RAID arrays are not backups, they are just a way to spread data across multiple disks. If a file is deleted, all its mirrored blocks and parity blocks are freed. End of story.
Backups are for everyone. The approach and extent may vary from personal use case to enterprise, but everyone with a digital life needs backup. RAID is more of a business/enterprise specific feature. You see RAID arrays in servers, storage devices like NAS and SANs, cloud hypervisors, etc. Pretty much any place that stores live critical data uses some form of RAID. Even the servers that run your cloud hosted backups probably use RAID arrays. These are not mutually exclusive technologies.
This doesn’t mean you can’t use RAID for your personal use case, it just has more utility in the enterprises. Part of the reason behind this is that in the enterprise, disks are pounded with IO operations 24/7. In production environment, like the storage of a database or video streaming service or a cloud hypervisor, the storage device of your server will under constant gruesome load, data is constantly being read from and written to these devices and often by several applications simultaneously. In these conditions your drives are much more likely to fail. Having a RAID configuration means if a drive fails you suffer little or no downtime. Most servers can continue to operate even after a disk failure so you don’t lose new information and requests coming in every second.
An average desktop computer can hardly recreate the same stressful condition, even if the disk dies, if you are using a backup solution like Backblaze, you can retrieve most of your lost data and losing a few hours worth of work is probably the worst thing that can happen. Even this is becoming a rarity thanks to cloud hosted solutions like Adobe Creative Cloud, Office 365, etc.
If there is a single take away you want from this article, it should be this. RAID is NOT a substitute for Backup. Always back your data up! There are many people out there who think if you have RAID, it means that the data is safe across multiple disks and so there is no need to back it up. Nothing is further from truth. RAID is meant to deal with a single specific issue — the disks failing or giving back erroneous data. Having RAID won’t protect you from a million other threats like the following:
The data on your RAID array is live. If the OS, an application (or a user) goes haywire and deletes a few files here and there then the file will be deleted all across your RAID array. Having a seperate copy of your data, a backup, is the only way you can ever protect yourself against this kind of scenario.
If you are worried about your data, your first concern should be backup solution. Most desktop users, except maybe power users, should invest more into a reliable backup instead of fiddling with RAID1, RAID5 or RAIDZ. If you want to build your own backup server, you need to think of a decent backup policy and a reliable storage backend. This article maybe a good place to start. You can use rsync or zfs send to take period copy of your data to this backend.
If you are in the enterprise, and are considering a RAID solution to store all of your live data. Consider using OpenZFS, it offers a very flexible solution, everything from n-disk mirroring to RAIDZ1 with one parity block to RAIDZ2 and RAIDZ3 with 2 and 3 parity blocks. You need to consider a lot about your application’s requirements before making a decision. There are trade-offs between your read-write performances, resilience and storage efficiency. However, I would recommend that you should only think of RAID after you have a decided upon a backup solution.
]]>The Internet is an untrusted channel of communicate. When you send or receive information from an old HTTP site http://www.example.com in your browser, a lot of things can happen mid-way to your packets.
The first two problems can be solved by encrypting the message before it is sent over the Internet to the server. That is to say, by switching over to HTTPS. However, the last problem, the problem of Identity is where a Certificate Authority comes into play.
The main problem with encrypted communication over an insecure channel is “How do we start it?”
The very first step would involve the two parties, your browser and the server, to exchange the encryption keys to be exchanged over the insecure channel. If you are unfamiliar with the term keys, think of them as a really long randomly generated password with which your data will be encrypted before being sent over the insecure channel.
Well, if the keys are being sent over an insecure channel, anyone can listen on that and compromise the security of your HTTPS session in the future. Moreover, how can we trust that the key being sent by a server claiming to be www.example.com is indeed the actual owner of that domain name? We can have an encrypted communication with a malicious party masquerading as a legitimate site and not know the difference.
So, the problem of ensuring identity is important if we wish to ensure secure key exchange.
You may have heard of LetsEncrypt, DigiCert, Comodo and a few other services that offer TLS certificates for your domain name. You can choose the one that fits your need. Now, the person/organization who owns the domain has to prove in some way to their Certificate Authority that they indeed have control over the domain. This can be done by either create a DNS record with a unique value in it, as requested by the Certificate Authority, or you can add a file to your web server, with contents specified by the Certificate Authority, the CA can then read this file and confirm that you are a the valid owner of the domain.
Then you negotiate a TLS certificate with the CA, and that results in a private key and a public TLS certificate issued to your domain. Messages encrypted by your private key can then be decrypted by the public cert and vice versa. This is known as asymmetric encryption
The client browsers, like Firefox and Chrome (sometimes even the Operating system) have the knowledge of Certificate Authorities. This information is baked into the browser/device from the very beginning (that is to say, when they are installed) so they know that they can trust certain CAs. Now, when they try and connect to www.example.com over HTTPS and see a certificate issued by, say DigiCert, the browser can actually verify that using the keys stored locally. Actually, there are a few more intermediary steps to it, but this is a good simplified overview of what’s happening.
Now that the certificate provided by www.example.com can be trusted, this is used to negotiate a unique symmetric encryption key which is used between the client and the server for the remaining of their session. In symmetric encryption, one key is used to encrypt as well as decryption and is usually much faster than its asymmetric counterpart.
If the idea of TLS and Internet security appeals to you, you can look further into this topic by digging into LetsEncrypt and their free TLS CA. There’s a lot more minutiate to this entire rigmarole than stated above.
Other resources that I can recommend for learning more about TLS are Troy Hunt’s Blog and work done by EFF like HTTPS Everywhere and Certbot. All of the resources are free to access and really cheap to implement (you just have to pay for domain name registration and VPS hourly charges) and get a hands on experience.
]]>Let’s talk about the regular, non-UEFI, boot process first. What happens between that point in time where you press the power ON button to the point where your OS boots and presents you with a login prompt.
Step1: The CPU is hardwired to run instructions from a physical component, called NVRAM or ROM, upon startup. These instructions constitute the system’s firmware. And it is this firmware where the distinction between BIOS and UEFI is drawn. For now let’s focus on BIOS.
It is the responsibility of the firmware, the BIOS, to probe various components connected to the system like disk controllers, network interfaces, audio and video cards, etc. It then tries to find and load the next set of bootstrapping code.
The firmware goes through storage devices (and network interfaces) in a predefined order, and tries to find a bootloader stored within them. This process is not something a user typically involves herself with. However, there’s a rudimentary UI that you can use to tweak various parameters concerning the system firmware, including the boot order.
You enter this UI by typically holding F12, F2 or DEL key as the system boots. To look for specific key in your case, refer your motherboard’s manual.
Step2: BIOS, then assumes that the boot device starts with an MBR (Master Boot Record) which containers a first-stage boot loader and a disk partition table. Since this first block, the boot-block, is small and the bootloader is very minimalist and can’t do much else, for example, read a file system or load a kernel image.
So the second stage bootloader is called into being.
Step3: The second stage bootloader is responsible for locating and loading the proper Operating System kernel into the memory. The most common example, for Linux users, is the GRUB bootloader. In case you are dual-booting, it even provider you with a simple UI to select the appropriate OS to start.
Even when you have a single OS installed, GRUB menu lets you boot into advanced mode, or rescue a corrupt system by logging into single user mode. Other operating systems have different boot loaders. FreeBSD comes with one of its own so do other Unices.
Step4: Once the appropriate kernel is loaded, there’s still a whole list of userland processes are waiting to be initialized. This includes your SSH server, your GUI, etc if you are running in multiuser mode, or a set of utilities to troubleshoot your system if you are running in single user mode.
Either way an init system is required to handle the initial process creation and continued management of critical processes. Here, again we have a list of different options from traditional init shell scripts that primitive Unices used, to immensely complex systemd implementation which has taken over the Linux world and has its own controversial status in the community. BSDs have their own variant of init which differs from the two mentioned above.
This is a brief overview of the boot process. A lot of complexities have been omitted, in order to make the description friendly for the uninitiated.
The part where UEFI vs BIOS difference shows up is in the very first part. If the firmware is of a more modern variant, called UEFI, or Unified Extensible Firmware Interface, it offers a lot more features and customizations. It is supposed to be much more standardized so motherboard manufacturers don’t have to worry about every specific OS that might run on top of them and vice versa.
One key difference between UEFI and BIOS is that UEFI supports a more modern GPT partitioning scheme and UEFI firmware has the capability to read files from a small FAT system.
Often, this means that your UEFI configuration and binaries sit on a GPT partition on your hard disk. This is often known as ESP (EFI System Partition) mounted at /efi, typically.
Having a mountable file system means that your running OS can read the same file system (and dangerously enough, edit it as well!). Many malware exploit this capability to infect the very firmware of your system, which persists even after an OS reinstall.
UEFI being more flexible, eliminates the necessity of having a second stage boot loader like GRUB. Often times, if you are installing a single (well-supported) operating system like Ubuntu desktop or Windows with UEFI enabled, you can get away with not using GRUB or any other intermediate bootloader.
However, most UEFI systems still support a legacy BIOS option, you can fall back to this if something goes wrong. Similarly, if the system is installed with both BIOS and UEFI support in mind, it will have an MBR compatible block in the first few sectors of the hard disk. Similarly, if you need to dual boot your computer, or just use second stage bootloader for other reasons, you are free to use GRUB or any other bootloader that suits your use case.
UEFI was meant to unify the modern hardware platform so operating system vendors can freely develop on top of them. However, it has slowly turned into a bit of a controversial piece of technology especially if you are trying to run open source OS on top of it. That said, it does have its merit and it is better to not ignore its existence.
On the flip-side, legacy BIOS is also going to stick around for at least a few more years in the future. Its understanding is equally important in case you need to fall back to BIOS mode to troubleshoot a system. Hope this article informed you well enough about both these technologies so that the next time you encounter a new system in the wild you can follow along the instructions of obscure manuals and feel right at home.
]]>With that in mind, I would always suggest users to look for alternative to deploy their applications and run their websites. This makes the IT personnel more agile, as they can work across a wide range of platforms and it makes your application platform independent. Let’s look at some such alternatives and see what they have to offer.
Using LunaNode has been pure pleasure! Clean and intuitive GUI, feature-rich platform with a whole range of supported operating systems and apps. Seriously, there’s little I can say here that will be as compelling as checking out their service first-hand! So I strongly recommend you do that first.
The pricing is very competitive against vendors like DigitalOcean, so you don’t have to worry about any surprise bills. You can spin up a whole range of virtual machines straight from a pre-existing template, like you do on any other cloud platform, or, you can install an operating system from ground up.
Most vendors try their best to hide away console access and direct interaction with the virtualization environment, not LunaNode. LunaNode encourages you to upload your own custom VM images or install the operating system right from the ISO file like you would do on your physical machine or VirtualBox. This means you can go very deep in the stack with your VM. Personally, I am a FreeBSD user and I was pleased to see FreeBSD templates and ISOs.
DevOps personnel would find an absolute delight in the automation provided by their startup scripts feature. Every time you create a new VM, you don’t want to do repeated tasks such as creating users, updating packages, etc. LunaNode allows you to add bash scripts or cloud-init scripts which can configure the VMs, everytime you launch a new one.
To sum it up, LunaNode offers:
1. Clean UI — Everything from creating VMs, to monitoring and managing them is super simple. If you find AWS console clunky and time consuming, then this is the exact opposite of that.
2. API Access — Most people who are going to use LunaNode, are programmers and admins. Well, a flexible feature rich API is a lot faster for professional, these users. LunaNode offers secure API access for those needs as well.
3. Easy automation using the technology of your choice — Bash or Cloud-init
4. Competitive pricing — It is slightly cheaper than the mainstream vendors. You can get started for as little as $3.5 per month on their memory optimized nodes. You can scale from there to anywhere upto 16 CPUs and 64 GB of memory. No surprise bills.
5. Data centers in the EU and North America regions. I hope they spin up more servers in the future.
6. Block Storage, Backups and Snapshots so you don’t lose valuable data.
7. DDoS protection, or as they like to call it; Anti DDoS. If any of your assets are sitting on an Internet facing server, you need DDoS protection.
There are also a tonne of other services and features everything from DNS management to floating IPs and affinity groups so you don’t miss out on anything that the mainstream vendors offer, except maybe everything is a lot easier. Check the LunaNode home page here.
Linode has been around for a long time. It has been one of the strongest competitors to DigitalOcean and Vultr. With quick to deploy SSDs and world-wide network of data centers. However, it has a very Linux centric bias. It supports a wide range of Linux distros but nothing else. Disappointingly, installing FreeBSD is not at all a straight-forward process. That immediately raises a red flag for me.
To install FreeBSD, or any other custom OS, you would have to jump through a lot of hoops. The user experience is simple and smooth as long as you are doing what the designers of that UI wanted you to do. Anything out of the ordinary will require a lot of workaround and can possibly break the VM. This is quite unlike LunaNode, where the UI let’s you do a lot more with your virtualized resources.
The pricing is similar to DigitalOcean’s. You can start off with a minimum of $5 per month VM and scale upwards from there. All the usual cloud-related features, like DNS, block storage, etc are available.
Linode wins out with more widespread infrastructure and years of experience in the cloud reliable, in that regard it is an exact replacement for DigitalOcean. However, it also has the same slightly higher pricing and there’s nothing really unique in their offerings. It is almost the same exact service as DigitalOcean or Vultr. Check out the Linode home page.
Leading the example of Linode, Vultr tries its best to not do anything out of the norm. The same old pricing that you expect from mainstream vendors and the same set of features. It has one of the better user interfaces in the market and is slightly cheaper than Linode. One arena where it wins out is in its offers of bare metal servers.
That’s right! Instead of being confined to a VM, you can get access to the bare metal server with complete control over the underlying CPU, memory and other resources. That combined with the fact that Vultr offers a versatile range of operating systems. Available operating systems include a range of Linux distros from Ubuntu , Fedora and other Linux distros to Windows, FreeBSD and OpenBSD makes Vultr quite an appealing option for power users.
Especially, if your workload is mission critical, you can distribute it across different VMs with different underlying operating systems. This makes your application much more resilient to security threats. If there’s a threat to Linux nodes, you can turn those systems down and patch them while the BSDs or the Windows VMs are still handling the business.
Disappointingly enough, the hardware that the platform offers is not that modern. Their advertised Skylake CPU, even for bare metal servers, are a couple of generations old at the time of this writing. If you are considering this as an option, wait for them to upgrade. Vultr home page can be found here.
In the final analysis of things, LunaNode wins over the rest of the platforms. For professional workloads you will need DDoS protection, because Internet facing servers get DDoSed, period. Its intuitive UI and really permissive console access allows you to pick and choose the distribution of your choice and finally the pricing is at par, if not cheaper, than most mainstream players.
They have data centers in France and Canada, however, I have used it from Asia Pacific region without any noticeable latency whatsoever.
]]>The guide is assuming that you have a server set aside for MySQL use with an accessible static IP address, maybe on the cloud or somewhere on your local network. The following commands, in this subsection, are to be executed on the server’s shell. Let’s quickly install and setup MySQL on Ubuntu.
The last command will run a script to change some of the insecure defaults of MySQL. First would be a prompt to install a password validation plugin. This would check if the new password you are setting for users is strong enough or not. You can opt out of this plugin, if you want. After this you will be prompted to set MySQL root user password. Go ahead and set a strong root user password.
After this you can pretty much say yes to every other prompt in this script, as the script removes test user, removes test database, disables remote root login and finally reloads its privilege table. Once that is done, since we have disallowed remote root login, let’s create a database and a new user that can access that database remotely without actually having to SSH (or login) into the server’s UNIX/Linux shell. But before we do that, let’s verify if our build of MySQL has TLS built-in or not.
TLS is available in MySQL only if MySQL is compiled to have it built into it. There’s no dynamic module to load. So if you are unsure that your MySQL package has TLS installed or not, you can check that by running:
If it says that the variables have_openssl and have_ssl have values set to DISABLED then you have SSL and are good to go (you only need to enable it, by reading further). If the values are set to NO, then you have to get a different version of MySQL from your package manager or elsewhere.
By default mysql server listens only on the loopback interface, that is to say on the address ‘localhost’ or ‘127.0.0.1’, for remote connections we want it to listen on the public static IP too. To do this open the file, /etc/mysql/my.cnf and append the following couple of lines at the very end of it.
Here, you replace the <StaticIP> with the actual IP of your server. If you are in doubt about which IP to use, you can use 0.0.0.0 to listen on all interfaces. Now restart the server, for the new configuration to take place.
Note: If you want to use the database in production, chances are that the client that will connect to this database – your front-end – will have a static IP. If that is the case, replace the percentage ‘%’ symbol with the appropriate client IP. ‘%’ is just a wildcard, which means ‘any value’. We will be configuring our myUser so that it can login from any IP address (for example, the changing IP address of your domestic broadband connection) which is, arguably, insecure.
Replace ‘password’ with an actual strong password and we have a user named myUser which has complete access to the database myDatabase.
While you are logged to the mysql shell as the mysql root user, you can check the connection status by typing \s:
Pay attention to the highlighted lines about Connection and SSL. While this state is fine for a local login of the root user, by the time we login over TLS as the myUser the connection type will be over TCP/IP not a raw socket and an SSL cipher will be in use. There’s a simple command to accomplish this. But first let’s exit our mysql prompt.
Now run,
Once this is done, you can look at the have_ssl variable again.
There are new parameters indicating that TLS certificate and key are in place and TLS is enabled. Now you can log out of this machine, open up a MySQL client on your local computer, if you don’t have one (and are using Debian or Ubuntu) get a MySQL shell client:
Replace the myUser and <MySQLServerIP> with your actual username and server IP, enter your chosen password and you should be logged into the database. Check the connection:
You can see that now it is using RSA to encrypt your traffic and the connection is to a specific IP over TCP/IP. Now, your connection to this MySQL database is secure.
This is the simplest way to secure your remote MySQL connections with TLS. Bear in mind that this is not the same as securing a phpMyAdmin client over TLS. That is TLS and HTTP combined, and requires for you to secure the web interface. The connection between the phpMyAdmin, which renders your web UI, and the database might still be unencrypted which is fine as long as they are on the same server.
You can learn more about the TLS connection, underlying CAs, certificates and key management in the official docs of MySQL. ]]>