Raspberry Pi Cluster Part I: Goals, Hardware, Choosing OS

A while back I built a raspberry cluster with 1 x RPi4 and 2 x RPi3b devices. This was shortly after the release of the RPi4, and, due to the many fixes that it required, I didn’t get far beyond hooking them up through a network switch.

Now that RPi4 has had some time to mature, I decided to start again from scratch and to document my journey in some detail.

Goals

Being able to get computers to coordinate together over a network to achieve various tasks is a valuable skill set that has been made affordable to acquire thanks to the RPi Foundation.

My goals are to build a cluster in order to figure out and/or practice the following technical competencies:

  • Hardware Skills: acquiring, organizing, and monitoring the cluster hardware
  • Networking Skills: setting up a network switch, DHCP server, DNS server, network-mounting drives, etc.
  • Dev Ops: installing, updating, managing the software, and backing everything up in a scalable manner
  • Web Server Skills: installing apache and/or nginx, with load balancing across the cluster nodes; also, being able to distribute python and node processes over the cluster nodes
  • Distributed-Computing Skills: e.g. being able to distribute CPU-intensive tasks across the nodes in the cluster
  • Database Skills: being able to create shards and/or replica nodes for Mysql, Postgres, and Mongo
  • Kubernetes Skills: implement a kluster across my nodes

Those the are the goals; I hope to make this a multi-part series with a lot of documentation that will help others learn from my research.

Hardware

RPi Devices

This is a 4-node cluster with the following nodes:

  • 1 x RPi4b (8GB RAM)
  • 3 x RPi3b

I’ll drop the ‘b’s from hereon.

The RPi4 will serve as the master/entry node. If you’re building from scratch then you may well want to go with 4xRpi4. I chose to use 3xRPi3 because I already had three from previous projects, and I liked the thought of having less-power hungry devices running 24×7. (Since their role is entirely one of cluster pedagogy/experimentation, it doesn’t bother me that their IO speed is less than that of the RPi4. Also, the RPi4 really needs some sort of active cooling solution, while the RPi3b arguably does not, so my cluster will only have one fan running 24/7 instead of 4.)

Equipment Organization

I know from my previous attempt that it’s really hard keeping your hardware neat, tidy and portable. It is important to me to be able to transport the cluster with minimal disassembly, and I therefore sought to house everything on a single tray and with a single power cable to operate it. That means that my cluster’s primary connection to the Internet would be by wifi but, importantly, I’ve insisted that the nodes communicate to each other over ethernet through a switch. The network schematic therefore looks something like this:

RPi Cluster Network Schematic

The RPi4 will thus need to act as a router so that the other nodes can access the internet. Since each node has built-in wifi, I’m also going to establish direct links between each node and my home wifi router, but these shall only be used for initial setup and debugging purposes (if/when the network switch fails).

To keep the RPi nodes arranged neatly, I got a cluster case for $20-$30. Unfortunately, the RPi4 has a different physical layout which spoils the symmetry of the build, but it also makes it easy to identify it. I also invested in a strip plug with USB-power connectors, so that I would only need a single plug to connect the cluster to the outside world. I was keen to power the RPi3s through the USB connectors on the strip plug in order to avoid having 5 power supplies,​*​ which gets bulky and ugly IMO.

Finally, I had to decide about what sort of storage drives I would use on my RPi3s. For the RPi4s, there was no question that I would need an external SSD drive to make the most of its performance.

BEWARE about purchasing an SSD for your RPi4! Not all drives work on the RPi4 and I lost a ton of time/money with Sabrent. This time round I went with this 1TB drive made by Netac. So far, so good. If $130 is too pricey then just get a 120/240GB version in the $20-40 range. (I only got 1TB because I have plans to use my cluster to do some serious picture-file back ups and serving).

For the RPi3s, which I expected to use a lot less in general, there is not nearly as much to be gained from an external SSD. Also, I wanted to limit the cost of the set up as well as the number of cables floating around the cluster and so I decided to start off with SD Cards for the RPi3bs, though I am wary of this decision (and deem it likely that I will regret this decision as soon as one of them fails). I’m using 3x64GB Samsung Evo Plus (U3 speed). I’ll be sure to benchmark their performance once set up.

I also got a 2TB HDD drive to provide the RPi3s with some more durable read-write space, and on which I’ll be able to backup everything on the cluster.

I got a simple network switch, some short micro-USB cables , and some short flexible ethernet cables. Be careful with your ethernet cables; you want them short to keep your cluster tidy, but make sure they are not too rigid as a result; in my previous attempt I got these cables that were short but so rigid that they created a lot of torque between the switch and node connectors, and made the whole cluster look/feel contorted.

I also got a high quality power supply for my RPi4 since it, being the primary node that will undergo the most work, and having two external storage drives to power, needs a reliable voltage.

Finally, I also got a bunch of USB-A and USB-C Volt/Amp-Meters for a few bucks from China, because I like to know the state of the power going through the nodes.

So, in total, I calculate that the equipment will have cost ~$500. It’s added up, but that’s not bad a for computing cluster.

4-Node RPi Cluster Hardware

And, yes, I need a tray upgrade.

Choosing an OS

When it came to choosing an OS, the only two I considered viable candidates were Raspbian OS (64 bit beta), or Ubuntu 20.04 LTS server (64 bit).

I went with Ubuntu in the end because my project is primarily pedagogic in nature and so, by choosing Ubuntu, I figured I’d be deepening my knowledge of a “real world” OS. I also just generally like Ubuntu, and it has long been my OS of choice on cloud servers.

For the RPi3s, I used the Raspberry Pi Imager application to select the Ubuntu server 20.04 and burned that image onto each SD card.

Raspberry Pi Imager

For the RPi4 though I wanted to boot from an external SSD drive, and this isn’t trivial yet with the official Ubuntu image. I therefore opted to use an image posted here that someone had built using the official image but with a few tweaks to enable booting from an external USB device. (It required you to first update the RPi4’s EEPROM, but I had already done that. It’s easily googled.)

Initial Setup

Once the cluster hardware had been assembled and wired up, I powered everything on and then had to go through each fresh install of ubuntu and perform the following:

  • Login with ‘ubuntu’, ‘ubuntu’ credentials
  • Connect to wifi following this advice; note: you need to reboot after calling sudo netplan apply before it will work! (My netplan conf is included in the next part of this series.)
  • Update with sudo apt update; sudo apt upgrade -y
  • Set the timezone with sudo timedatectl set-timezone America/New_York; if you want to use a different timezone then list he ones available with timedatectl list-timezones.
  • Add a new user ‘dwd’ (sudo adduser dwd) and assign him the groups belonging to the original ubuntu user (sudo usermod -aG $(groups | sed "s/ /,/g") dwd)
  • Switch to dwd and disable ubuntu (sudo passwd -l ubuntu)
  • Install myconfig and use it to further install super versions of vim, tmux, etc. See this post for more details.
  • Install oh-my-zsh, powerlevel10k, and zsh-autosuggestions.
  • Install iTerm2 shell integration
  • Install nvm
  • Create a ~/.ssh/authorized_keys file enabling public-key ssh-ing
  • Change the value of /etc/hostname in order to call our nodes rpi0, rpi1, rpi2, rpi3.

This workflow allowed me to get my four nodes into a productive state in a reasonably short amount of time. I also set up an iTerm2 profile so that my cluster nodes have a groovy Raspberry Pi background, making it quick and easy to distinguish where I am.

RPi4 node at the ready with tmux, vim, oh-my-zsh, powerlevel10k

Finally, we also want to allocate memory “swap space” on any device not using. an SD card. (Swap space is the space you allocate on your storage disk that will get used if you use up your RAM. Most linux distros nowadays will not allocate swap space to your boot drive by default, so it has to be done manually.)

Since only the RPi4 has an external drive, that’s all we’ll lset up for now. (Later, once we have a single network mounted HDD drive available to each node, we’ll allocate swap space there.) Use the following to add 8GB of swap:​†​

sudo fallocate -l 8G /swapfile
sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile

Finally, add the following line to /etc/fstab to make this change permanent: /swapfile swap swap defaults 0 0

Summary

That’s it for part I. In the next part, we’re going to set up our ethernet connections between the RPi nodes using our network switch.


  1. ​*​
    4 x RPi + 1 x Network Switch
  2. ​†​
    According to lore it’s best practice to only add ~1/2 your RAM size as swap. However, I’ve never encountered issues by going up to x2.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *