Dan Darg's Blog

Tag: linux

Disk Partitions, Tables, Labels and File Systems
Intro

I just got a new disk, installed it physically into my Ubuntu Server, and realized that I’ve forgotten everything to do with disk formatting/partitioning, etc. Here I remind myself as to what exactly all of these are concepts are and how to work with them.

Identifying Disks

When you’ve got a disk installed for the first time, run lsblk -o name,fstype,size,mountpoint,partuuid. If your disk is in a blank state, then you will probably only get this sort of info:
```
├─sdb1  vfat       512M /boot/efi     79033c3d-...
└─sdb2  ext4       931G /             5c6b1ad5-...
...
nvme0n1            3.6T
```
In this case, I can see my disk has been detected and given the handle /dev/nvme0n1, and that it has a size of 3.6T, but that is it.

Partition Tables

Every disk needs to have some space dedicated to a partition table, that is, a space that enables interfacing systems to determine how the disk is partitioned. There are different standards (i.e. conventions) for how such tables are to be organized and read.

A widely-supported type of partition table is the “GUID Partition Table” (GPT). GUID stands for “globally unique identifiers”.

To see what partition table is used by a given disk, you can use ‘print’ within the wizard launched by parted in the following manner:
```
❯ sudo parted /dev/nvme0n1
GNU Parted 3.4
Using /dev/nvme0n1
Welcome to GNU Parted! Type 'help' to view a list of commands.
(parted) print
Error: /dev/nvme0n1: unrecognised disk label
Model: CT4000P3PSSD8 (nvme)
Disk /dev/nvme0n1: 4001GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:
```
In this case, we are told that we have an “unrecognised disk label”, and that the Partition Table is “unknown”. These two issues are one and the same: the ‘disk label’ is just another name for the partition table, and we don’t have one.

Typing “help” will show the actions that we can take, in particular, it tells us we can create our partition table using either mklabel TYPE or mktable TYPE. Type gpt is ‘the’ solid and universally supported type, so I just went with that: mklabel gpt. Now when I type print I do not get the earlier error messages.

Note: if you want to have a table of type ‘Master Boot Record’ (MBR), then you enter mklabel msdos. I have read that “GPT is a more modern standard, while MBR is more widely supported among older operating systems. For a typical cloud server, GPT is a better option.” Also, if you erase a disk on a Mac with Disk Utility, it gives you the option of an ‘Apple Partition Map’ (APM) for its ‘Scheme’ (as Mac calls the partition table). I did some light googling on this Scheme, and concluded that it is outdated — used for PowerPC Macs — and that the most recent Apple disks use GPT. In short, whenever you have to decide what Scheme to use, use GUID.

Creating Partitions

Now that we have a partition table, we can insert information into it to establish partitions. Still using parted, we can run mkpart as a wizard of sorts. This will prompt you for information about the partition you want to create.

The first thing it asked me for was a “Partition name? []”. This was confusing, because the arguments for mkpart given by ‘help’ are: mkpart PART-TYPE [FS-TYPE] START END.

So the argument is calling it a ‘type’, and the wizard is asking for a ‘name’. What’s going on? The confusion stems from different conventions used by different partition-table types. For partition tables of type msdos or dvh, you can specify a ‘type’ from one of these three options: ‘primary’, ‘extended’ or ‘ logical ‘. The wizard for parted mkpart is geared towards setting up partitions governed by msdos partition tables, hence why its documentation calls it a ‘type’. However, these categories do not apply to GUID partition tables. Instead, GPTs have a category that msdos does not have — a ‘name’ — which you have to specify (though it can just be an empty string). Hence when the wizard detects that you are filling in a GUID table, it prompts you for a ‘name’ instead of a ‘type’.

What is the GPT ‘name’ used for? It can be used to boot the drive from /etc/fstab (see below). It does not determine the name of the handle in /dev/* for the partition (which, from what I can tell, is determined solely by the OS).

Next, it asks for a “File system type? [ext2]”. The default is ext2. Note that parted does not go ahead and create a filesystem on this partition; this is just a ‘hint’ being recorded in the table as to the intended use of the partition, and certain programs might use this hint when e.g. looking to auto-mount the partition. Here I chose ‘ext4’ — the standard file system for linux.

Next it asks for the ‘Start’ and then ‘End’ of the partition. Because I am only making one partition, and I want it to take up all the available space, I could just put 0% and 100% respectively. If you have to work around other partitions, then you need to note their start and end positions first using parted print, etc.

To summarize so far, every disk needs to have it’s first few blocks set aside for a partition table. Partition tables are, no doubt, designed by strict conventions to convey to any interface as to what sort of partition table it will be dealing with (I’m not sure how exactly, but I’d guess it’s something like the very first byte tells you the type from 256 options established by the comp sci community). The partition table then in turn records where each partition begins, ends, its type, its name, etc. The only role of the program parted is to set/adjust/erase the contents of the partition table. (As far as I can tell, it does not read-write to any part of the disk outside of the partition table.)

Note: if you delete a partition with parted, I expect that all that that does is remove the corresponding entries from the partition table, without erasing the information within the partition table itself. I therefore expect — but can’t be bothered to confirm — that if you were to then recreate a partition with the exact same bounds, that you would be able to re-access the file system and its contents.

File Systems

Now that the partition table has been created, our disk has a well-defined range of addresses to which we can read/write data. Now, in order to store data into the partition in an organized and efficient manner, we need another set of conventions by which programs acting on the data in the partition can operate. Such conventions are called the ‘file system’. As with the partition table being the very first thing on the disk, modern ‘hierarchical’ file systems (i.e. ones that allow for folders within folders) work by reserving the very first section of the partition a space that describes the contents and properties of the root directory, which in turn points to the location of data files and other directory files within it. Those directory files in turn point to the location of files and directories within them, etc. For an excellent, more-detailed overview of these concepts, see this video.

Now, having quit parted, we can create a file system within our partition with:
```
sudo mkfs.ext4 /dev/nvme0n1p1
```
In practice, there can only be one file system per partition, so you don’t need to think about what sort of space the file system takes up — it is designed to work within the boundaries of the partition it finds itself in.

Mounting

To mount the partition temporarily, you use sudo mount /dev/nvme0n1p1 /mnt/temp where the dir /mnt/temp already exists. To have the partition mounted automatically on boot, add the following line to /etc/fstab:
```
PARTLABEL=NameYouChose /mnt/data ext4 defaults 0 2
```
…where:
- PARTLABEL=NameYouChose is the criterion by which the OS will select from all detected partitions
- /mnt/data is the path where the partition is being mounted
- ext4 signals to the OS what sort of file system to expect to find on the partition
- defaults means that this partition should be mounted with the default options, such as read-write support
- 0 2 signifies that the filesystem should be validated by the local machine in case of errors, but as a 2nd priority, after your root volume
To put this line into effect without rebooting, run sudo mount -a.

Note: just to make things a little more confusing, you can also use mkfs.ext4 with the -L flag to set yet another kind of ‘label’ within the partition table. If you use this label, then you can use it to mount the partition in the /etc/fstab file using LABEL=FlagYouChose (instead of PARTLABEL=NameYouChose using parted above).

Volumes, Containers, etc.

As a final note, sometimes the term ‘Volume’ is used in the context of disks and partitions. The important thing to note is that the term is used differently on different platforms. In Appleland, the APFS has ‘Containers’, ‘Volumes’ and ‘Partitions’ as distinct constructs. From what I can tell, a Volume in Appleland is synonomous with Logical Volume in other lands. An example of a Logical Volume is a RAID1 set up where you have two disks, your data is duplicated on one of the disks, but you interact with that data as though it is in one place (i.e. the fact that the data has been spread across two disks has been abstracted away and hidden from you). In general, a LV can be spread across multiple physical disks, but is presented you as though you were dealing one old school physical disk.

It’s not clear to me at this time what a Mac ‘container’ really is.
January 22, 2024
LAMP Stack App on AWS with Docker, RDS and phpMyAdmin
Intro

I was recently tasked with migrating an old LAMP-stack app (PHPv5) running on a Centos6 server to the newer PHPv7 on a Centos8 machine, and ensuring that the code didn’t break in the php upgrade. I figured the best way to do that would be to use Docker to simulate PHP 7 on a Centos8 machine running on my laptop.

However, the plan changed and instead of deploying the new app on a Centos8 machine, it was decided that we would deploy the app to its own EC2 instance. Since I was already using Docker, and since I no longer had to plan for a Centos8 deployment, I decided to use Ubuntu 20.04 for the EC2 instance. I installed docker and docker-compose, and adapted the code to use proper PHP-Apache and phpMyAdmin Docker images. I also decided to use AWS RDS mysql, and to use the EC2 instance to implement logical backups of the mysql DB to AWS S3.

The rest of this article consists in more detailed notes on how I went about all of this:
- Dockerizing a LAMP-stack Application
  - php-apache docker image
  - creating dev and prod versions
  - updating code from PHPv5 to PHPv7
  - handling env variables
  - Adding a phpMyAdmin interface
- AWS RDS MySQL Setup
  - rdsadmin overview
  - creating additional RDS users
  - connecting from a server
- AWS EC2 Deployment
  - virtual machine setup
  - deploying prod version with:
    Apache proxy with SSL Certification
    OS daemonization
  - MySQL logical backups to AWS S3
Dockerizing a LAMP-stack Application

php-apache docker image

I’ll assume the reader is somewhat familiar with Docker. I was given a code base in a dir called DatasetTracker developed several years ago with PHPv5. The first thing to do was to set up a git repo for the sake of development efficiency, which you can find here.

Next, I had to try and get something working. The key with Docker is to find the official image and RTFM. In this case, you want the latest php-apache image, which leads to the first line in your docker file being: FROM php:7.4-apache. When you start up this container, you get an apache instance that will interpret php code within the dir /var/www/html and listening on port 80.

creating dev and prod versions

I decided to set up two deployment tiers: dev and prod. The dev tier is chiefly for local development, wherein changes to the code do not require you to restart the docker container. Also, you want to have php settings that allow you to debug the code. The only hiccup I experienced in getting this to work was understanding how php extensions are activated within a docker context. It turns out that the php-apache image comes with two command-line tools: pecl and docker-php-ext-install. In my case, I needed three extensions for the dev version of the code: xdebug, mysqli, and bcmath. Through trial and error I found that you could activate those extensions with the middle 3 lines in the docker file (see below).

You can also set the configurations of your php to ‘development’ by copying the php.ini-development file. In summary, the essence of a php-apache docker file for development is as follows:
```
FROM php:7.4-apache

RUN pecl install xdebug
RUN docker-php-ext-install mysqli
RUN docker-php-ext-install bcmath

RUN cp /usr/local/etc/php/php.ini-development /usr/local/etc/php/php.ini
```
When you run a container based on this image, you just need to volume-mount the dir with your php code to /var/www/html to get instant updates, and to map port 80 to some random port for local development.

Next, we need to write a docker-compose file in order to have this image run as a container along with a phpMyAdmin application, as well as to coordinate environment variables in order to connect to the remote AWS RDS mysql instance.

An aspect of the set up that required a bit of thought was how to log into phpMyAdmin. The docker-image info was a bit confusing. In the end though, I determined that you really only need one env variable — PMA_HOST — passed to the phpMyAdmin container through the docker-compose file. This env variable just needs to point to your remote AWS RDS instance. phpMyAdmin is really just an interface to your mysql instance, so you then log in through the interface with your mysql credentials. (See .env-template in the repo.)

(NOTE: you might first need to also pass env variables for PMA_USER and PMA_PASSWORD to get it to work once, and then you can remove these; I am not sure why this seems to be needed.)

updating code from PHPv5 to PHPv7

Once I had an application running through docker-compose, I was able to edit the code to make it compatible with PHPv7. This included, amongst other things, replacing mysql_connect with mysqli_connect, and replacing hard-coded mysql credentials with code for grabbing such values from env variables. A big help was using the VSCode extension intelephense, which readily flags mistakes and code that is deprecated in PHPv7.

AWS RDS MySQL Setup

rdsadmin overview

Note: discussions about ‘databases’ can be ambiguous. Here, I shall use ‘DB’ or ‘DB instance’ to refer to the mysql host/server, and ‘db’ to refer to the internal mysql collection of tables that you select with the syntax `use [db name];`. As such, a mysql DB instance can have multiple dbs within it.

In order to migrate the mysql database from our old Centos6 servers to an RDS instance, I first used the AWS RDS interface to create a mysql db instance.

When I created the mysql DB instance via the AWS RDS interface, I assumed that the user I created was the root user with all privileges. But this is not the case! Behind the scenes, RDS creates a user called rdsadmin, and this user holds all the cards.

To see the privileges of a given user, you need to use SHOW GRANTS FOR 'user'@'host'. Note: you need to provide the exact host associated with the user you are interested in; if you are not sure what the host is for the user, you first need to run:
```
SELECT user, host FROM mysql.user WHERE user='user';
```
In the case of an RDS DB instance, rdsadmin is created so as to only be able to log into the DB instance from the same host machine of the instance, so you need to issue the following command to view the permissions of the rdsadmin user:
```
SHOW GRANTS for 'rdsadmin'@'localhost';
```
I’ll call the user that you initially create via the AWS console the ‘admin’ user. You can view the admin’s privileges by running SHOW GRANTS; which yields the following result:
```
GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, 
DROP, RELOAD, PROCESS, REFERENCES, INDEX, 
ALTER, SHOW DATABASES, CREATE TEMPORARY TABLES, 
LOCK TABLES, EXECUTE, REPLICATION SLAVE, 
REPLICATION CLIENT, CREATE VIEW, SHOW VIEW, 
CREATE ROUTINE, ALTER ROUTINE, CREATE USER, 
EVENT, TRIGGER ON *.* TO `admin`@`%` 
WITH GRANT OPTION
```
The final part — WITH GRANT OPTION — is mysql for “you can give all of these permissions to another user”. So this user will let you create another user for each db you create.

If you compare these privileges with those for rdsadmin, you’ll see that rdsadmin has the following extra privileges:
```
SHUTDOWN, FILE, SUPER, CREATE TABLESPACE, CREATE ROLE, DROP ROLE, SERVICE_CONNECTION_ADMIN, SET_USER_ID, SYSTEM_USER
```
Several of these privileges — such as shutdown — can be executed via the AWS console. In summary, rdsadmin is created in such a way that you can never use it directly, and you will never need to. The admin user has plenty of permissions, and one needs to consider best practices as to whether to use the admin user when connecting from one’s application.

I personally think that it is good general practice to have a separate db for each deployment tier of an application. So if you are developing an app with, say, a ‘development’, ‘stage’, and ‘production’ deployment tier, then it’s wise to create a separate db for each tier. Alternatively, you might want to have the non-production tiers share a single db. The one thing that I believe is certain though is that you need a dedicated db for production, that it needs to have logical backups (i.e. mysqldump to file) carried out regularly, and that you ideally never edit the prod db directly (or, if you do, that you do so with much fear and trembling).

Is it a good practice to have multiple dbs on a single DB instance? This totally depends on the nature of the applications and their expected load on the DB instance. Assuming that you do have multiple applications using dbs on the same DB instance, you might want to consider creating a specialized user for each application in case compromise of one user compromises ALL your applications. In that case, the role of the admin is ONLY to create users whose credentials will be used to connect an application to the db. The next section shows how to accomplish that.

creating additional RDS users

So lets assume that you want to create a user who’s sole purpose is to enable an application deployed on some host H_A (application host) to connect to the host on which the DB instance is running H_db (db host). Enter the RDS DB instance with your admin user credentials and enter:
```
CREATE USER 'newuser'@'%' IDENTIFIED BY 'newuser_password';
GRANT ALL PRIVILEGES ON db_name.* TO 'newuser'@'%';
FLUSH PRIVILEGES;
```
This will create user ‘newuser’ with all of the privileges of the admin user. The ‘user’@’%’ syntax means “this user connecting from any host”.

Of course, if you want to be extra secure, you can specify that the user can only connect from specific hosts by running this command multiple times replacing the wildcard ‘%’.

As an aside, if you want to know the name of the host you are currently connecting from, then run:
```
mysql> SELECT USER() ;
+-------------------------------------------+
| USER()                                    |
+-------------------------------------------+
| admin@c-XX-XX-XXX-XXX.hsd1.sc.comcast.net |
+-------------------------------------------+
1 row in set (0.07 sec)
```
In this case, the host ‘c-XX-XX-XXX-XXX.hsd1.sc.comcast.net’ has been determined as pointing to my home’s public IP address (assigned by my ISP). (I assume that under the hood mysql has used something like nslookup MYPUBLIC_IPADDRESS to determine the hostname as it prefers that rather than my present IP address, which is assumed to be less permanent.)

enabling user to change password

As of Nov 2022, there seems to be an issue with phpmyadmin whereby a user thus created cannot change his/her own password through the phpmyadmin interface. Presumably under the hood the sql command to change the user’s password is such as to require certain global privileges (and this user has none). A temporary solution is to connect to the DB instance with your admin user and run:
```
GRANT CREATE USER ON *.* TO USERNAME WITH GRANT OPTION; 
```
connecting from a server

One thing that threw me for a while was the need to explicitly white-list IP addresses to access the DB instance. When I created the instance, I selected the option to be able to connect to the database from a public IP address. I assumed that this meant that, by default, all IP addresses were permitted. However, this is not the case! Rather, when you create the DB instance, RDS will determine the public IP address of your machine (in my case – my laptop at my home public IP address), and apply that to the inbound rule of the AWS security group attached to the DB instance.

In order to be able to connect our application running on a remote server, you need to go that security group in the AWS console and add another inbound-rule for MySQL/Aurora for connections from the IP address of your server.

AWS EC2 Deployment

virtual machine setup

I chose Ubuntu server 20.04 for my OS with a single core and 20GB of storage. (The data will be stored in the external DB and S3 resources, so not much storage is needed.) I added 4GB of swap space and installed docker and docker-compose.

apache proxy with SSL Certification

I used AWS Route 53 to create two end points pointing to the public IP address of the EC2 instance. To expose the two docker applications to the outside world, I installed apache on the EC2 instance and proxy-ed these two end points to ports 5050 and 5051. I also used certbot to establish SSL certification. The apache config looks like this:
```
<IfModule mod_ssl.c>
<Macro SSLStuff>
    ServerAdmin webmaster@localhost
    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined
    Include /etc/letsencrypt/options-ssl-apache.conf
    SSLCertificateFile /etc/letsencrypt/live/xxx/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/xxx/privkey.pem
</Macro>

<VirtualHost _default_:443>
    Use SSLStuff
    DocumentRoot /var/www/html
</VirtualHost>

<VirtualHost *:443>
    Use SSLStuff
    ServerName dataset-tracker.astro-prod-it.aws.umd.edu
    ProxyPass / http://127.0.0.1:6050/
    ProxyPassReverse / http://127.0.0.1:6050/
</VirtualHost>

<VirtualHost *:443>
    Use SSLStuff
    ServerName dataset-tracker-phpmyadmin.astro-prod-it.aws.umd.edu
    ProxyPass / http://127.0.0.1:6051/
    ProxyPassReverse / http://127.0.0.1:6051/
    RequestHeader set X-Forwarded-Proto "https"
    RequestHeader set X-Forwarded-Port "443"
</VirtualHost>
</IfModule>
```
OS daemonization

Once you clone the code for the applications to the EC2 instance, you can begin it in production mode with:
```
docker-compose -f docker-compose.prod.yml up -d
```
… where the flag ‘-d’ means to start it in the background (‘daemonized’).

One of the nice things about using docker is that it becomes super easy to set up your application as a system service by simply adding restart: always to your docker-compose file. This command will cause docker to take note to restart the container if it registers an internal error, or if the docker service is itself restarted. This means that if the EC2 instance crashes or is otherwise restarted then docker (which, being a system service, will itself restart automatically) will automatically restart the application.

MySQL logical backups to AWS S3

Finally, we need to plan for disaster recovery. If the EC2 instance gets messed up, or the AWS RDS instance gets messed up, then we need to be able to restore the application as easily as possible.

The application code is safe, thanks to github, and so we just need to make sure that we never lose our data. RDS performs regular disk backups, but I personally prefer to create logical backups because, in the event that the disk becomes corrupted, I feel wary about trying to find a past ‘uncorrupted’ state of the disk. Logical backups to file do not rely on the intergrity of the entire disk, and thereby arguably provide a simpler and therefore less error-prone means to preserve data.

(This is in accordance with my general philosophy of preferring to backup files over than disk images. If something serious goes wrong at the level of e.g. disk corruption, I generally prefer to ‘start afresh’ with a clean OS and copy over files as needed, rather than to try and restore a previous snapshot of a disk. This approach also helps maintain disk cleanliness since disks tend to accumulate garbage over time.)

To achieve these backups, create an S3 bucket on AWS and called it e.g. ‘mysql-backups’. Then install an open-source tool to mount S3 buckets onto a linux file system with sudo apt install s3fs.

Next, add the following line to /etc/fstab:
```
mysql-backups /path/to/dataset-tracker-mysql-backups fuse.s3fs allow_other,passwd_file=/home/user/.passwd-s3fs 0 0
```
Next, you need to create an AWS IAM user with permissions for full programmatic access your S3 bucket. Obtain the Access key ID and Secret access key for that user and place them into a file /home/user/.passwd-s3fs in the format:

[Access key ID]:[Secret access key]

Now you can mount the S3 bucket by running sudo mount -a (which will read the /etc/fstab file).

Check that the dir has successfully mounted by running df -h and/or by creating a test file within the dir /path/to/dataset-tracker-mysql-backups and checking in the AWS S3 console that that file has been placed in the bucket.

Finally, we need to write a script to be run by a daily cronjob that will perform a mysql dump of your db to file to this S3-mounted dir, and to maintain a history of backups by removing old/obsolete backup files. You can see the script used in this project here, which was adapted from this article. Add this as a daily cronjob, and it will place a .sql file in your S3 dir and remove obsolete versions.
April 21, 2022
Raspberry Pi Cluster V: Deploying a NextJs App on Ubuntu Server 20.04
Intro

In the last part I opened up my primary node to the Internet. We’re now in a position to make public-facing applications that will eventually connect up with microservices that harness the distributed computing power of the 4 RPi nodes in our cluster.

Before we can make such parallel applications, we need to be able to deploy a web-facing interface to the primary node in our cluster. Not only is that a generally important thing to be able to do, but it allows us to separate web-specific code/processes from what we might call “computing” or “business-logic” code/processes (i.e. a microservices architecture).

So in this post (and the next few), I am going to go through the necessary steps to get a MERN stack up and running on our primary node. This is not a cluster-specific task; it is something every full-stack developer needs to know how to do on a linux server.

Tech Stack Overview

In the last part, we used AWS Route 53 to set up a domain pointing to our primary node. I mentioned that you need to have a web server like Apache running to check that everything is working, namely the port forwarding and dynamic DNS cronjob.

We are going to continue on here by creating a customer-facing application with the following features:
- Full set up of Apache operating as our gateway/proxy web server
- SSL Certification with certbot
- NextJs as our application server (providing the “Express”, “React” and “Node” parts of our MERN stack)
- User signup and authentication with:
  - AWS Cognito as our Authentication Server
  - MongoDB as our general/business-logic DB
Apache & Certbot Setup

Apache 101

This section is aimed at beginners for setting up Apache. Apache is a web server. It’s job is to receive an http/https request and return a response. That response is usually one of three things:
1. A copy of a file on the filesystem
2. HTML representing the content within the filesystem with links to download individual files (a ‘file browser’)
3. A response that Apache gets back from another server that Apache “proxied” your original request to.
In my opinion, absolutely every web developer needs to know how to set up Apache and/or Nginx with SSL certification to be able to accomplish these three things. I tend to use Apache because I am just more used to it than Nginx.

An important concept in Apache is that of a “virtual host”. The server that Apache runs on can host multiple applications. You might want to serve some files in a folder to the internet at one subdomain (e.g. myfiles.mydomain.com), a react app at another domain (e.g. react.mydomain.com), and an API with JSON responses at yet another domain (e.g. api.mydomain.com).

In all three of these example subdomains, you are setting up the DNS server to point the subdomain to the same IP Address — the public IP Address of your home in this project’s case. So if requests are coming into the same apache server listening on port 443, then we need in general to configure Apache to separate these requests and have them processed by the appropriate process running on our machine. The main way that one configures Apache to separate requests is based on the target subdomain. This is done by creating a “virtual host” within the Apache configuration, as demonstrated below.

Installing Apache and Certbot on Ubuntu 20.04

Installing apache and certbot on Ubuntu 20.04 is quite straightforward.
```
sudo apt install apache2
sudo snap install core
sudo snap refresh core
sudo apt remove certbot
sudo snap install --classic certbot
sudo ln -s /snap/bin/certbot /usr/bin/certbot
```
We also need to enable the following apache modules with the a2enmod tool (“Apache2-Enable-Module”) that gets installed along with the apache service:
```
sudo a2enmod proxy_http proxy macro
```
Make sure you have a dynamic domain name pointing to your public IP address, and run the certbot wizard with automatic apache configuration:
```
sudo certbot --apache
```
If this is the first time running, it will prompt you for an email, domain names, and whether to set up automatic redirects from http to https. (I recommend you do.) It will then modify your configuration files in /etc/apache2/sites-available. The file /etc/apaches/sites-available/000-default-le-ssl.conf looks something like this:
```
<IfModule mod_ssl.c>
<VirtualHost *:443>
        # ...        
        ServerAdmin webmaster@localhost
        DocumentRoot /var/www/html
        # ...
        ErrorLog ${APACHE_LOG_DIR}/error.log
        CustomLog ${APACHE_LOG_DIR}/access.log combined
        # ...
        ServerName wwww.yourdomain.com
        SSLCertificateFile /etc/letsencrypt/live/wwww.yourdomain.com/fullchain.pem
        SSLCertificateKeyFile /etc/letsencrypt/live/wwww.yourdomain.com/privkey.pem
        Include /etc/letsencrypt/options-ssl-apache.conf
</VirtualHost>
</IfModule>
```
There is quite a lot of boilerplate stuff going on in this single virtual host. It basically says “create a virtual host so that requests received on port 443 with target URL with subdomain wwww.yourdomain.com will get served a file from the directory /var/www/html; decrypt using the information within these SSL files; if errors occur, log them in the default location, etc.”.

Since we might want to have lots of virtual hosts set up on this machine, each with certbot SSL certification, we will want to avoid having to repeat all of this boilerplate.

To do this, let’s first disable this configuration with the tool sudo a2dissite 000-default-le-ssl.conf.

Now lets create a fresh configuration file with sudo touch /etc/apaches/sites-available/mysites.conf and add the following text:
```
<IfModule mod_ssl.c>
<Macro SSLStuff>
    ServerAdmin webmaster@localhost
    ErrorLog ${APACHE_LOG_DIR}/error.log
    CustomLog ${APACHE_LOG_DIR}/access.log combined
    Include /etc/letsencrypt/options-ssl-apache.conf
    SSLCertificateFile /etc/letsencrypt/live/www.yourdomain.com/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/www.yourdomain.com/privkey.pem
</Macro>

<VirtualHost _default_:443>
    Use SSLStuff
    DocumentRoot /var/www/notfound
</VirtualHost>
<VirtualHost *:443>
    Use SSLStuff
    ServerName www.yourdomain.com
    ProxyPass / http://127.0.0.1:5050/
    ProxyPassReverse / http://127.0.0.1:5050/
</VirtualHost>
</IfModule>
```
Here we are making use of the apache “macro” module we enabled earlier to define the boilerplate configurations that we want all of our virtual hosts to have. By including the line Use SSLStuff in a virtual host, we thereby include everything we defined in the SSLStuff block.

This configuration has two virtual hosts. The first one is a default; if a request is received without a recognized domain, then serve files from /var/www/notfound. (You of course need to create such a dir, and, at minimum, have an index.html file therein with a “Not found” message.)

The second virtual host tells Apache to forward any request sent to www.yourdomain.com and forward it onto the localhost on port 5050 where, presumably, a separate server process will be listening for http requests. This port is arbitrary, and is where we will be setting up our nextJs app.

Whenever you change apache configurations, you of course need to restart apache with sudo systemctl restart apache2. To quickly test that the proxied route is working, install node (I always recommend with with nvm), install a simple server with run npm i -g http-server, create a test index.html file somewhere on your filesystem, and run http-server -p 5050.

Now visit the proxied domain and confirm that you are receiving the content of the index.html file you just created. The great thing about this set up is that Apache is acting as a single encryption gateway on port 443 for all of your apps, so you don’t need to worry about SSL configuration on your individual application servers; all of our inner microservices are safe!

Expanding Virtual Hosts

(There will inevitably come a time when you want to add more virtual hosts for new applications on the same server. Say that I want to have a folder for serving miscellaneous files to the world.

First, you need to go back to your DNS interface (AWS Route 53 in my case), and add a new subdomain pointing to your public IP Address.

Next, in my case, where I am using a script to dynamically update my AWS-controlled the IP Address that my domain points to, as I described in the last part of this cluster series, I need to open up crontab -e and add a line for this new domain.

Next, I need to change the apache configuration by adding another virtual host and restarting apache:
```
<VirtualHost *:443>
    Use SSLStuff
    DocumentRoot /var/www/miscweb
    ServerName misc.yourdomain.com
</VirtualHost>
```
Next, we need to create a dir at /var/www/misc (with /var/www being the conventional location for all dirs served by apache). Since /var/www has strict read/write permissions requiring sudo, and since I don’t want to have to remember to use sudo every time I want to edit a file therein, I tend to create the real folder in my home dir and link it there with, in this case:
```
sudo ln -fs /home/myhome/miscweb /var/www/miscweb
```
Next, I need to rerun certbot with a command to expand the domains listed in my active certificate. This is done with the following command:
```
sudo certbot certonly --apache --cert-name www.yourdomain.com --expand -d \
www.yourdomain.com,\
misc.yourdomain.com
```
Notice that when you run this expansion command you have to specify ALL of the domains to be included in the updated certificate including those that had been listed therein previously; it’s not enough to specify just the ones you want to add. Since it can be hard to keep up with all of your domains, I recommend that you keep track of this command with all of your active domains in a text file somewhere on your server. When you want to add another domain, first edit this file with one domain on each line and then copy that new command to the terminal to perform the update.

If you want to prevent the user from browsing the files within ~/miscweb, then you need to place an index.html file in there. Add a simple message like “Welcome to my file browser for misc sharing” and check that it works by restarting apache and visiting the domain with https.

Quick Deploy of NextJs

We’ll talk more about nextJs in the next part. For now, we’ll do a very quick deployment of nextJs just to get the ball rolling.

Normally, I’d develop my nextJs app on my laptop, push changes to github or gitlab, pull those changes down on the server, and restart it. However, since node is already installed on the RPi primary node, we can just give it a quick start by doing the following:
- Install pm2 with npm i -g pm2
- Create a fresh nextJs app in your home directory with cd ~; npx create-next-app --typescript
- Move into the dir of the project you just created and edit the start script to include the port you will proxy to: "start": "next start -p 5050"
- To run the app temporarily, run npm run start and visit the corresponding domain in the browser to see your nextJs boilerplate app in service
- To run the app indefinitely (even after you log out of the ssh shell, etc.), you can use pm2 to run it as a managed background process like so: pm2 start npm --name "NextJsDemo" -- start $PWD -p 5050
NextJs has several great features. First, it will pre-render all of your react pages for fast loading and strong SEO. Second, it comes with express-like API functionality built-in. Go to /api/hello at your domain to see the built-in demo route in action, and the corresponding code in pages/api/hello.ts.

More on NextJs in the next part!
September 27, 2021
Raspberry Pi Cluster Part IV: Opening Up to the Internet
Intro

We want to serve and access the cluster to the Internet from a home connection. To do that, we need to set up our home router with port forwarding, and open up ports for SSH, HTTP and HTTPS.

All traffic goes through our primary node (RPi1). If I want to connect to a non-primary node from the outside world, then I can just ssh into RPi1 and then ssh onto the target node.

So far, we have been able to move around the network with with using passwords. This is not ideal. Having to type in a password each time slow us down and makes our automations trickier to secure. It’s also just not good practice to use passwords since hackers will spam your ssh servers with brute-force password attacks.

Network SSH Inter Communication

So we want to be able to ssh from our laptop into any node, and from any node to any node, without using a password. We do this with private-public key pairs.

First, if you have not done so already, create a public-private key pair on your unix laptop with ssh-keygen -t rsa and skip adding a passphrase:
```
❯ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/me/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/me/.ssh/id_rsa
Your public key has been saved in /home/me/.ssh/id_rsa.pub
The key fingerprint is:
....
The key's randomart image is:
+---[RSA 3072]----+
|      . +.o.     |
...
|       .o++Boo   |
+----[SHA256]-----+
```
This will generate two files (“public” and “private”) in ~/.ssh (id_rsa.pub and id_rsa respectively). Now, in order to ssh into another host, we need to copy the content of the public key file into the file ~/.ssh/authorized_keys on the destination server. Once that is in place, you can ssh without needing a password.

Rather than manually copying the content of the public key file into a remote host, Unix machines provide a script called `ssh-copy-id` that provides a shortcut for you. In this case, running it as ssh-copy-id name@server will copy the default id_rsa.pub file content to the /home/user/.ssh/authorized_keys file on the server. If you want to use a non-default public key file, you can specify it with ssh-copy-id -i [non-default] file.

Once the public-private keys are available on the laptop, to make this process easier, we can copy/paste this script to copy the public portion into each of the nodes. Each time we’ll need to specify the password to copy into the node, but afterwards we can ssh without passwords.
```
while read SERVER
do
    ssh-copy-id user@"${SERVER}"
done <<\EOF
rpi1
rpi2
rpi3
rpi4
EOF
```
Next, we want to be able to ssh from any node into any other node. We could just copy the private key we just created on the laptop into each node, but this is not the safest practice. So, in this case, I went into each node and repeated the process (create a private-public pair, and then ran the above script).

Securing SSH Connections

Raspberry Pis are notorious for getting pwned by bots as soon as they’re opened up to the internet. One way to help ensure the security of your RPi is to use 2-Factor Authorization (2FA).

I am not going to do that because it creates added complexity to keep up with, and the other measures I’ll be taking are good enough.

Now that we have set up the ssh keys on our nodes, we need to switch off the ability to ssh in using a password — especially on our primary node which will be open to the internet. We do this by editing the file /etc/ssh/sshd-config and change the line from PasswordAuthentication yes to PasswordAuthentication no. Having done that, the only way to ssh in now is through public-private key-gen pairs, and they only exist on my laptop. (If my laptop gets lost or destroyed, then the only way to access these nodes will be by directly connecting them to a monitor with keyboard, and logging in with a password.)

The next precaution we’ll take is to change the port on which the sshd service runs on the primary node from the default 22 to some random port. We do this by uncommenting the line #Port 22 in the sshd_config file and changing the number to, say, Port 60022 and restarting the sshd service.

Having made this change, you will need to specify this non-default port whenever you try to get into that node with e.g. ssh -p 60022 user@rpi1. A bit annoying but, I have heard off the grapevine, this will prevent 99% of hacker bots in their tracks.

Finally, we can install a service called fail2ban with the usual sudo apt install fail2ban. This is a service that scans log files for suspicious behvior (e.g. failed logins, email-searching bots) and takes action to prevent malicious behavior by modifying the firewall for a period of time (e.g. banning the ip address).

With these three measures in place, we can be confident in the security of the cluster when opening it up to the internet.

Dynamic DNS

To open up our cluster to the internet, we need to get our home’s public IP address. This is assigned to your home router/modem box by your Internet Service Provider (ISP). In my case, I am using a basic Xfinity service with very meager upload speeds. I’ve used Xfinity for several years now, and it tends to provide fairly stable IP addresses, and does not seem to care if you use your public IP address to host content. (By contrast, I once tried setting up port-forwarding for a friend who had a basic home internet connection provided by Cox, and Cox seemed to actively adapt to block this friend’s outgoing traffic. I.e. Cox wants you to upgrade to a business account to serve content from your home connection.)

To see your public IP Address, run curl http://checkip.amazonaws.com.

We want to point a domain name towards our home IP address, but since ISPs can change your public IP address without notice, we need to come up with a method to adapt to any such change. The “classic” way to adapt to these changes in IP address is to use a “dynamic” DNS (DDNS) service. Most modern modem/router devices will give you the option to set up with an account from a a company like no-ip.com, and there are plenty of tutorials on the web if you wish to go this route.

However, I don’t want to pay a company like no-ip.com, or mess about with their “free-tier” service that requires you to e.g. confirm/renew the free service every month.

Since I get my IP addresses using AWS Route 53, we can use a cronjob script wrapping to periodically check that the IP Address assigned by the ISP is the same as the one that the AWS DNS server points to and, if it has changed, then this script can use the AWS CLI to update the IP Address. The process is described here and the adapted script that I am using is based on this gist. The only change I made was to extract the two key variables HOSTED_ZONE_ID and
NAME to the scripts arguments (in order to allow me to run this same script for multiple domains).

Once I had the script _update_ddns in place, I decided to run it every 10 minutes by opening crontab -e and adding the line:
```
*/10 * * * * /home/myname/ddns_update/_update_dns [MY_HOSTED_ZONE] [DOMAIN] >/dev/null 2>&1
```
Port Forwarding

Finally, we need to tell our modem/router device to forward requests submitted to our ssh, http and https ports onto the primary node’s wifi IP Address. Every router/modem device will be different depending on the brand and model, so you’ll have to poke around for the “port-forwarding” setup.

In my case, I’m using an Arris router, and it was not too hard to find the port-forwarding. You’ll then need to set up a bunch of rules that tell the device how to route packets that come from the external network on a given port (80 and 443 in the figure below) and to what internal address and ports I want those packets directed (192.168.0.51 in the figure below). Also add a rule if you want to be able to ssh into your primary node on the non-default port.

Port forwarding on my home modem/router device.

Make sure you have a server running, e.g. apache, when you test the URL you set up through route 53.

And that’s it — we have a pretty secure server open to the internet.
September 27, 2021
Raspberry Pi Cluster Part II: Network Setup
Introduction

In the last post we got the hardware in order and made each of our 4 RPi nodes production ready with Ubuntu Server 20.04. We also established wifi connections between each node and the home router.

In this post, I’m going to describe how to set up the “network topology” that will enable the cluster to become easily transportable. The primary RPi4 node will act as the gateway/router to the cluster. It will communicate with the home router on behalf of the whole network. If I move in the future, then I’ll only have to re-establish a wifi connection with this single node in order to restore total network access to each node. I also only need to focus on securing this node in order to expose the whole cluster to the internet. Here’s the schematic again:

In my experience, it’s tough to learn hardware and networking concepts because the field is thick with jargon. I am therefore going to write as though to my younger self keenly interested in becoming self-reliant in the field of computer networking.

Networking Fundamentals

If you’re not confident with your network fundamentals, then I suggest you review the following topics by watching the linked explainer videos. (All these videos are made by the YouTube chanel “Power Cert Animated Videos” and are terrific.
- DHCP (video)
- IP Address (video)
- Hub vs Switch vs Router (video)
- Port Forwarding (video)
- DNS Server (video)
Before we get into the details of our cluster, let’s quickly review the three main things we need to think about when setting up a network: IP-address assignment, domain-name resolution, and routing.

IP-Address Assignment

At its core, networking is about getting fixed-length “packets” of 1s and 0s from one program running on a computer to another program running on any connected computer (including programs running on the same computer). For that to happen, each computer needs to have an address – an IP Address – assigned to it. As explained in the above video, the usual way in which that happens is by interacting with a DHCP server. (However, most computers nowadays run a process in the background that will attempt to negotiate an IP Address automatically in the event that no machine on its network identifies itself as a DHCP server.) In short, we’ll need to make sure that we have a DHCP server on our primary node in order to assign IP addresses to the other nodes.

Domain-Name Resolution

Humans do not like to write instructions as 1s and 0s, so we need each node in our network to be generally capable of exchanging a human-readable address (e.g. ‘www.google.com’, ‘rpi3’) into a binary IP address. This is where domain-name servers (DNS) and related concepts come in.

The word “resolve” is used to describe the process of converting a human-readable address into an IP address. In general, an application that needs to resolve an IP address will interact with a whole bunch of other programs, networks and servers to obtain its target IP address. The term “resolver” is sometimes used to refer to this entire system of programs, networks and servers. The term resolver is also sometimes used to refer to a single element within such a system. (Context usually makes it clear.) From hereon, we’ll use “resolver” to refer to a single element within a system of programs, networks and servers whose job is to convert strings of letters to an IP Address, and “resolver system” to refer to the whole system.

Three types of resolver to understand here are “stub resolvers”, “recursive resolver”, and “authoritative resolver”. A stub resolver is a program that basically acts as a cache within the resolver system. If it has recently received a request to return an IP address in exchange for a domain name (and therefore has it in its cache), then it will return that domain name. Otherwise, it will pass the request onto another resolver, (which might also be a stub resolver that has to just pass the buck on).

A recursive resolver will also act as a cache and if it does not have all of the information needed to return a complete result, then it will pass on a request for information to another resolver. Unlike a stub resolver though, it might not receive back a final answer to its question but, rather, an address to another resolver that might have the final answer. The recursive resolver will keep following any such lead until it gets its final answer.

An “authoritative” resolver is a server that does not pass the buck on. It’s the final link in the chain, and if it does not have the answer or suggestions for another server to consult, then the resolution will fail, and all of these resolvers will send back a failure message.

In summary, domain-name resolution is all about finding a simple lookup table that associates a string (domain name) with a number (the IP Address). This entry in the table is called an “A Record” (A for Address).

Routing

Once a program has an IP Address to send data to, it needs to know where first to send the packet in order to get it relayed. In order for this to happen, each network interface needs to have a router address applied to it when configured. You can see the router(s) on a linux with router -n. In a home setup, this router will be the address of the wifi/modem box. Once the router address is determined, the application can just send packets there and the magic of Internet routing will take over.

Ubuntu Server Networking Fundamentals

Overview

Ubuntu Server 20.04, which we’re using here, comes with several key services/tools that are installed/enabled by default or by common practice: systemd-resolved, systemd-networkd, NetworkManager and netplan.

systemd-resolved

You can learn the basic about it by running:
```
man systemd-resolved
```
This service is a stub resolver making it possible for applications running on the system to resolve hostnames. Applications running on the system can interact with it by issuing some low-level kernel jazz via their underlying C libraries, or by pinging the internal (“loopback”) network address 127.0.0.53. To see it in use as a stub server, you can run dig @127.0.0.53 www.google.com.

You can check what DNS servers it is set up to consult by running resolvectl status. (resolvectl is a pre-installed tool that lets you interact with the running systemd-resolved service; see resolvectl --help to get a sense of what you can do with it.)

Now we need to ask how systemd-resolved resolves hostnames? It does it by communicating over a network with a DNS server. How do you configure it so it knows what DNS servers to consult and in what order of priority?

systemd-networkd

systemd-networkd is a pre-installed and pre-enabled service on Ubuntu that acts as a DHCP client (listening on port 68 for signals from a DHCP server). So when you switch on your machine and this service starts up, it will negotiate the assignment of an IP Address on the network based upon DHCP broadcast signals. In the absence of a DHCP server on the network, it will negotiate with any other device. I believe it is also involved in the configuration of interfaces.

NetworkManager

This is an older service that does much the same as networkd. It is NOT enabled by default, but is so prominent that I thought it would be worth mentioning in this discussion. (Also, during my research to try and get the cluster configured the way I want it, I installed NetworkManager and messed with it only to ultimately conclude that this was unnecessary and confusing.)

Netplan

Netplan is pre-installed tool (not service) that, in theory, makes it easier to configure systemd-resolved and either networkd or NetworkManager. The idea is that you declare your desired network end state in a YAML file (/etc/netplan/50-cloud-init.yaml) so that after start up (or running netplan apply), it will do whatever needs to be done under the hood with the relevant services to get the network into your desired state.

Other Useful Tools

In general, when doing networking on linux machines, it’s useful to install a couple more packages:

sudo apt install net-tools traceroute

The net-tools package gives us a bunch of classic command-line utilities, such as netstat. I often use it (in an alias) to check what ports are in use on my machibne: sudo netstat -tulpn.

traceroute is useful in making sense of how your network is presently set up. Right off the bat, running traceroute google.com, will show you how you reach google.

Research References

For my own reference, the research I am presenting here is derived in large part from the following articles:
- This is the main article I consulted that shows someone using dnsmasq to set up a cluster very similar to this one, but using Raspbian instead of Ubuntu.
- This article and this article on getting dnsmasq and system-resolved to handle single-word domain names.
- Overview of netplan, NetworkManager, etc.
- https://unix.stackexchange.com/questions/612416/why-does-etc-resolv-conf-point-at-127-0-0-53
- This explains why you get the message “ignoring nameserver 127.0.0.1” when starting up dnsmasq.
- Nice general intro to key concepts with linux
- This aids understanding of systemd-resolved’s priorities when multiple DNS’s are configured on same system
- https://opensource.com/business/16/8/introduction-linux-network-routing
- https://www.grandmetric.com/2018/03/08/how-does-switch-work-2/
- https://www.cloudsavvyit.com/3103/how-to-roll-your-own-dynamic-dns-with-aws-route-53/
Setting the Primary Node

OK, enough preliminaries, let’s get down to setting up out cluster.

A chief goal is to try to set up the network so that as much of the configuration as possible is on the primary node. For example, if we want to be able to ssh from rpi2 to rpi3, then we do NOT want to have to go to each node and explicitly state where each hostname is to be found.

So we want our RPi4 to operate as the single source of truth for domain-name resolution and IP-address assignment. We do this by running dnsmasq – a simple service that turns our node into a DNS and DHCP server:
```
sudo apt install dnsmasq
sudo systemctl status dnsmasq
```
We configure dnsmasq with /etc/dnsmasq.conf. On this fresh install, this conf file will be full of fairly detailed notes. Still, it takes some time to get the hang of how it all fits together. This is the file I ended up with:
```
# Choose the device interface to configure
interface=eth0

# We will listen on the static IP address we declared earlier
# Note: this might be redundant
listen-address=127.0.0.1

# Enable addresses in range 10.0.0.1-128 to be leased out for 12 hours
dhcp-range=10.0.0.1,10.0.0.128,12h

# Assign static IPs to cluster members
# Format = MAC:hostname:IP
dhcp-host=ZZ:YY:XX:WW:VV:UU,rpi1,10.0.0.1
dhcp-host=ZZ:YY:XX:WW:VV:UU,rpi2,10.0.0.2
dhcp-host=ZZ:YY:XX:WW:VV:UU,rpi3,10.0.0.3
dhcp-host=ZZ:YY:XX:WW:VV:UU,rpi4,10.0.0.4

# Broadcast the router, DNS and netmask to this LAN
dhcp-option=option:router,10.0.0.1
dhcp-option=option:dns-server,10.0.0.1
dhcp-option=option:netmask,255.255.255.0

# Broadcast host-IP relations defined in /etc/hosts
# And enable single-name domains
# See here for more details
expand-hosts
domain=mydomain.net
local=/mydomain.net/

# Declare upstream DNS's; we'll just use Google's
server=8.8.8.8
server=8.8.4.4

# Useful for debugging issues
# Run 'journalctl -u dnsmasq' for resultant logs
log-queries
log-dhcp

# These two are recommended default settings
# though the exact scenarios they guard against 
# are not entirely clear to me; see man for further details
domain-needed
bogus-priv
```
Hopefully these comments are sufficient to convey what is going on here. Next, we make to sure that the /etc/hosts file associates the primary node with its domain name, rpi1. It’s not clear to me why this is needed. The block of dhcp-host definitions above do succeed in enabling dnsmasq to resolve rpi2, rpi3, and rpi4, but the line for rpi1 does not work. I assume that this is because dnsmasq is not setting the IP address of rpi1, and this type of setting only works for hosts that it sets the IP Address of. (Why that is the case seems odd to me.)
```
# /etc/hosts
10.0.0.1 rpi1
```
Finally, we need to configure the file /etc/netplan/50-cloud-init.yaml on the primary node in order to declare this node with a static IP Address on both the wifi and ethernet networks.
```
network:
    version: 2
    ethernets:
        eth0:
            dhcp4: no
            addresses: [10.0.0.1/24]
    wifis:
        wlan0:
            optional: true
            access-points:
                "MY-WIFI-NAME":
                    password: "MY-PASSWORD"
            dhcp4: no
            addresses: [192.168.0.51/24]
            gateway4: 192.168.0.1
            nameservers:
                addresses: [8.8.8.8,8.8.4.4]
```
Once these configurations are set up and rpi1 is rebooted, you can expect to find that ifconfig will show ip addresses assigned to eth0 and wlan0 as expected, and that resolvectl dns will read something like:
```
Global: 127.0.0.1
Link 3 (wlan0): 8.8.8.8 8.8.4.4 2001:558:feed::1 2001:558:feed::2
Link 2 (eth0): 10.0.0.1
```
Setting up the Non-Primary Nodes

Next we jump into the rpi2 node and edit the /etc/netplan/ to:
```
network:
    version: 2
    ethernets:
        eth0:
            dhcp4: true
            optional: true
            gateway4: 10.0.0.1
            nameservers:
                addresses: [10.0.0.1]
    wifis:
        wlan0:
            optional: true
            access-points:
                "MY-WIFI-NAME":
                    password: "MY-PASSWORD"
            dhcp4: no
            addresses: [192.168.0.52/24]
            gateway4: 192.168.0.1
            nameservers:
                addresses: [8.8.8.8,8.8.4.4]
```
This tells netplan to set up systemd-networkd to get its IP Address from a DHCP server on the ethernet network (which will be found to be on rpi1 when the broadcast event happens), and to route traffic and submit DNS queries to 10.0.0.1.

To reiterate, the wifi config isn’t part of the network topology; this is optionally added because it makes life easier when setting up the network to be able to ssh straight into a node. In my current setup, I am assigning all the nodes static IP Addresses on the wifi network of 192.168.0.51-4.

Next, as described here, in order for our network to be able to resolve single-word domain names, we need to alter the behavior of systemd-resolved by linking these two files together:
```
sudo ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf
```
This causes the systemd-resolved stub resolver to dynamically determine a bunch of settings based upon what dnsmasq broadcasts on rpi1.

After rebooting, and doing the same configuration on rpi3 and rpi4, we can run dig rpi1, dig rpi2, etc. on any of the non-primary nodes and expect to get the single-word hostnames resolved as we intend.

If we go to trpi1 and check the ip-address leases:
```
cat /var/lib/misc/dnsmasq.leases
```
… then we can expect to see that dnsmasq has successfully acted as a DHCP server. You can also check that dnsmasq has been receiving DNS queries by examining the system logs: journalctl -u dnsmasq.

Routing All Ethernet Traffic Through the Primary Node

Finally, we want all nodes to be able to connect to the internet by routing through the primary node. This is achieved by first uncommenting the line net.ipv4.ip_forward=1 in the file /etc/sysctl.conf and then running the following commands:
```
sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
sudo iptables -A FORWARD -i wlan0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
sudo iptables -A FORWARD -i eth0 -o wlan0 -j ACCEPT
```
These lines mean something like the following:
1. When doing network-address translation (-t nat), and just before the packet is to go out via the wifi interface (-A POSTROUTING = “append a postrouting rule”), replace the source ip address with the ip address of this machine on the outbound network
2. forward packets in from wifi to go out through ethernet
3. forward packets in from ethernet to go out through wifi
For these rules to survive across reboots you need to install:
```
sudo apt install iptables-persistent
```
and agree to storing the rules in /etc/iptables/rules.v4. Reboot, and you can now expect to be able to access the internet from any node, even when the wifi interface is down (sudo ifconfig wlan0 down).

Summary

So there we have it – an easily portable network. If you move location then you only need to adjust the wifi-connection details in the primary node, and the whole network will be connected to the Internet.

In the next part, we’ll open the cluster up to the internet through our home router and discuss security and backups.
September 14, 2021
Launching Subprocesses in Python3
Motivation

It’s important in the world of IT to know a scripting language for data processing, task automation, etc. For simple tasks (e.g. moving files, starting programs), my goto scripting language is Bash. But when I need something with more tools and precision (e.g. parsing html), I use python3.

I decided in recent times to move towards coding in a more cross-platform manner, which basically means less bash, and more python. That meant that I needed to get more comfortable with python’s system of launching subprocesses. For years I’d been in the habit of copy/pasting code like this (which I probably grabbed from Stack Overflow originally), without really thinking through what was happening:
```
import subprocess
# Launch some shell script CMD
p = subprocess.Popen(
    CMD,
    shell=True,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
)
# Wait for it to finish
p.wait()
# Save the result to a str variable
result: str = p.stdout.read().decode('utf8')
```
Not only is this a lot of code compared to the Bash equivalent (i.e. just $CMD), I also wasn’t very clear on whether all of these verbose arguments and methods were really needed (Spoiler: they’re not!), and this had been a mental block to using python3 more liberally in the past. So having actually now read the subprocess documentation, I figured I’d consolidate what I’d learnt there by trying to describe it all to a younger self.

Standard Streams Recap

Skip this section if you’re already well-versed with “Standard Streams” (stdin, stdout, and stderr) — you’ll need this to understand the above code block. If you are not well-versed with them then here is a super swift crash course (with focus on the two output streams: stdout, and stderr).

What streams “are” is a bit abstract (think inner OS/kernel magic), and it’s easier to learn to think in terms of how you work with them. Let’s start by thinking in general terms about the sort of things one wants to be able to do when writing a program:
- receive input information
- send signals/data to other running processes
- interact with devices (files in storage, graphics cards, network cards, etc.)
- signal to developers/administrators the state and/or output of the program
- start child processes (that can also do all of the above)
The item in red is what output streams are all about, which will be our focus. When you write a program, you want to be able to get information about the state of the program to the user who will run the program but you want that end user to get to decide how that information is to be viewed/processed.^* ^†

Since you do not know in advance what the user will want to do with the information that you will want the program to broadcast, the operating system (OS) provides you with the ability to send messages out of your program in a sort of “un-opinionated” manner. More specifically, the OS lets you place a label on the data you want to emit (viz. “standard” or “error”), but where such messages will go, and how they will be used, will not be decided at the moment you write the program. Rather, the user of the program will be in control as to where data output with the “standard” label will be sent, and where data output with the “error” label will be sent.^‡

The purpose of providing the programmer with two spaces to publish information is that it will allow the end user to separate these messages by, for example, viewing the messages sent to stdout on the terminal and saving messages sent to stderr to a file.

To see this in action, we’ll consider the simple command “ping www.google.com” run on a linux terminal. (I choose ping because it runs indefinitely, allowing us to examine the properties of this process before it ends.)

If your network connection is ok, this program will print to stdout every second. Now, the terminal is itself a program — a special program — that is designed to receive input, run programs and, being a program, it can (and does) send messages to stdout and stderr.

Where do those messages “end up”? We can find the PID of the ping process (ps -ef | grep -Ei "PID|ping"), which is 5381 in this case, and then examine the use that PID in the following command on linux:
```
sudo ls -l /proc/5381/fd
lrwx------ 1 root root 64 Sep 28 00:10 0 -> /dev/tty1
lrwx------ 1 root root 64 Sep 28 00:10 1 -> /dev/tty1
lrwx------ 1 root root 64 Sep 28 00:10 2 -> /dev/tty1
```
The file descriptors shown in this print out (/proc/5381/fd/0, /proc/5381/fd/1, and /proc/5381/fd/2) tell you that the stdin, stdout and stderr respectively for this process all “point to” /dev/tty1. This is a linux virtual device file that you can think of as a handle or interface to a driver for the (emulated) terminal. (This is the same terminal that ping was started in, which can be confirmed by running the command tty.) Since ping prints to stdout, and since stdout points to the terminal emulator, the data is sent there and displayed on the screen accordingly.

As stated earlier, the destination of messages sent to stdout and stderr is only determined at the moment that the byte code for the program is converted into a new process by the OS. In the case of a linux terminal, the processes that are started therein, such as ping, inherit by default the same standard-stream destinations as those of the terminal. This is why the file descriptors above all point to the terminal by default /dev/tty1. But we can change what the file descriptors will point to when we start the process by using redirects.

For example, if we now begin a new process in the terminal with ping www.google.com 1> /dev/null, then we get a new PID (5816), and override the default value of the file descriptor (which woould have been /proc/5816/fd/1 -> /dev/tty1), so that we won’t see any regular printout to terminal. Examining the file descriptors again for the new ping process:
```
sudo ls -l /proc/5816/fd
lrwx------ 1 root root 64 Sep 28 00:10 0 -> /dev/tty1
lrwx------ 1 root root 64 Sep 28 00:10 1 -> /dev/null
lrwx------ 1 root root 64 Sep 28 00:10 2 -> /dev/tty1
...
```
… confirms that stdout is pointing to /dev/null — the linux “black hole” — so the messages now just get thrown away. Likewise, if we now redirect stderr to a file, and stdout to stderr when starting ping:
```
ping www.google.com 2> /tmp/temp.txt 1>&2 
sudo ls -l /proc/5816/fd
lrwx------ 1 root root 64 Sep 28 01:25 0 -> /dev/pts/0
l-wx------ 1 root root 64 Sep 28 01:25 1 -> /tmp/temp.txt
l-wx------ 1 root root 64 Sep 28 01:25 2 -> /tmp/temp.txt
```
… then we get nothing printed to the terminal, and the output of ping ends up in /tmp/temp.txt, as expected.

A few notes are useful here if this is new-ish to you:
- The numbers around the redirect symbol > represent the standard streams as follows: stdin: 0, stdout: 1, stderr: 2. So 1>&2 means redirect stdout to stderr, etc.
- A redirect symbol > without a number in front is short for 1> (redirect stdout to something)
- You need an ampersand & after the > symbol whenever you redirect to a number representing a standard stream, otherwise the terminal will read e.g. 2>1 as “redirect stderr to a file named 1“. Don’t use an ampersand though if redirecting to a file path.
- The order of the redirects in the earlier example might seem counterintuitive at first. You might expect it to look like ping www.google.com 2>&1 1>&2 /tmp/temp.txt, which looks as though it reads “redirect stderr to stdout, and stdout to a file”. But if you think of these redirects in terms of setting what the file descriptors point to, and reading these commands from left to right, then you see that at the moment that the terminal reads 2>&1 it will set the file descriptor /proc/5816/fd/2 to point to the same destination held by /proc/5816/fd/1, which has not been changed yet from its default value; so this redirect will not have any effect, and stderr will still print to screen. That is why you need to first set one of the streams to point to the file (e.g. /proc/5816/fd/1 -> /tmp/temp.txt), and then set the other stream to point to the same thing as the previous stream (e.g. /proc/5816/fd/2 -> /proc/5816/fd/1 -> /tmp/temp.txt).
You can also print messages to another terminal window by identifying its device file (tty), and then redirecting stdout to that device (e.g. echo hello > /dev/tty2).

In summary, since most of us learn about programming and system admin in a terminal, it’s easy to come to think of the programs that we’re used to launching therein as being in some sense bound up with, or unusable without, the terminal. But once you realize that all the programs you are used to launching from the terminal have no intrinsic tie to the terminal, and that the terminal has been conveniently determining the default destinations for the standard streams of the programs you’ve been running in it, you can begin to appreciate the need to be able to explicitly set the streams of programs that are not launched by the terminal.

Python3 Subprocess.run and .Popen

Now let’s go back to the python code I’d been pasting/copying for several years and see if we can understand and simplify what’s happening with the subprocess module.
```
import subprocess
# Launch CMD
p = subprocess.Popen(
    CMD,
    shell=True,
    stdout=subprocess.PIPE,
    stderr=subprocess.STDOUT
)
# Wait for it to finish
p.wait()
# Save the result to result str
result: str = p.stdout.read().decode('utf8').strip()
```
The first thing I learned by reading the subprocess documentation is that (blush), I wasn’t even using the recommended method:

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle

The run function is a simplified wrapper around the more complete Popen function I had been using, and you basically want to use it whenever you want to execute a synchronous^§ process of finite duration. For example, you can view your PWD contents with the following in the python3 REPL:
```
>>> import subprocess
>>> subprocess.run(['/bin/ls'])
temp1    temp2
```
(Note: if you launch python3 in a terminal then it will inherit the shell’s environment variables, including $PATH, which means that you often don’t need to spell out the full path to the executable as I’ve done here.)

Notice that the executed program, though launched as a separate process, still prints to the same terminal as the python3 REPL. We can discern why this happens by going through the same procedure we went through earlier, i.e. by launching an ongoing process like ping:
```
>>> import subprocess
>>> subprocess.run(['/bin/ping','www.google.com'])
64 bytes from 172.253.63.104: icmp_seq=1 ttl=101 time=1.31 ms
64 bytes from 172.253.63.104: icmp_seq=2 ttl=101 time=1.42 ms
...
```
… and, in a separate window, finding the PID and examining the file descriptors of that process:
```
❯ sudo ls -l /proc/18175/fd
lrwx------ 1 root root 64 Sep 28 21:29 0 -> /dev/tty1
lrwx------ 1 root root 64 Sep 28 21:29 1 -> /dev/tty1
lrwx------ 1 root root 64 Sep 28 21:29 2 -> /dev/tty1
```
The ping process evidently inherited the same file descriptors as its parent process (the python3 REPL), which itself inherited those descriptors from its parent, the terminal. So both python3 and ping will print to the same terminal.

Now, we want to be able to launch processes in python3 with something akin to redirection in the terminal. In particular, we want to be able to pipe the standard output streams of the process we launch with the subprocess module to the parent python3 process and to be able to capture that data as a python3 variable. We do that by providing stdout and stderr arguments, as shown in the following example:
```
>>> from subprocess import run, PIPE
>>> url = 'www.google.com'
>>> p = run(['/bin/ping','-c','2',url], stdout=PIPE, stderr=PIPE)
>>> p.stdout.decode('utf8')
'PING www.google.com (172.217.2.100) 56(84) bytes of data. ...'
```
Notice this time that we made ping run for only a finite duration by supplying the --count=2 arguments, and that the process did not print to terminal while running. This is because the stdout=PIPE argument has an effect similar to the terminal redirection command (1>).

Where/how was ping‘s stdout redirected? We can investigate as before by rerunning the above code (but without ‘-c’, ‘2’ to make the process run indefinitely), finding the PID of the new ping process in another window, and examining that process’ file descriptors:
```
❯ sudo ls -l /proc/33463/fd/
lrwx------ 1 root root 64 Sep 28 20:58 0 -> /dev/tty1
l-wx------ 1 root root 64 Sep 28 20:58 1 -> 'pipe:[110288]'
l-wx------ 1 root root 64 Sep 28 20:58 2 -> 'pipe:[110289]'
...
```
As we can see, ping‘s stdout is now being directed to a device labelled 'pipe:[110288]' (and stderr to a device labelled 'pipe:[110289]'). This “pipe” is an OS in-memory “unnamed” device^¶ whose purpose is to connect a write-able file descriptor of one process to a read-able file descriptor of another process. (Pipes connect a process to a process, redirects connect a process to a file.) The number 110288 is the ID for the inode of the pipe device file in the filesystem. You can get more information on the pipe device file with the lsof (“list open files”) utility:
```
❯ lsof | grep -E "PID|110288"
COMMAND   PID   ... FD     TYPE    DEVICE ...    NODE    NAME
python3   33462 ... 3r     FIFO    0,13   ...    110288  pipe
ping      33463 ... 1w     FIFO    0,13   ...    110288  pipe
```
Here we can see that the pipe shows up in relation to the python3 and ping processes, with PID 33462 and 33463 respectively. The FD column gives the file descriptor number for the pipe file, and the letters r and w refer to read and write permissions. Referring to the previous ls -l command, we can confirm here that /proc/33463/fd/1 does indeed point to this pipe device file, and it does have write-only permissions.

Lets now look at the corresponding python3 file descriptors:
```
> ls -l /proc/33462/fd/
lrwx------ 1 dwd dwd 64 Sep 28 20:59 0 -> /dev/tty1
lrwx------ 1 dwd dwd 64 Sep 28 20:59 1 -> /dev/tty1
lrwx------ 1 dwd dwd 64 Sep 28 20:59 2 -> /dev/tty1
lr-x------ 1 dwd dwd 64 Sep 28 20:59 3 -> 'pipe:[110288]'
lr-x------ 1 dwd dwd 64 Sep 28 20:59 5 -> 'pipe:[110289]'
```
Here we can see that the python3 parent process has kept its standard streams pointing to /dev/tty1 (so you can still interact with it through the terminal). In addition, it has created two new file descriptors (3 and 5) pointing to the two pipes we created in our subprocess.run command (one for stdout, one for stderr). The file descriptor /proc/33462/fd/3, as we have seen, is the read-only end of the pipe emanating from the stdout file descriptor of the ping process. This “non-standard stream” file descriptor is created by the python3 process according to its underlying C code. That code is responsible for marshaling the data emitted from the pipe into a python runtime variable and, hence, why we are able to see the result of the ping process printed out in the python3 REPL as a string.

For reference here is some relatively simple C code demonstrating inter-process communication through pipes, the sort of thing you’d find in python3‘s source code.

Let’s return to the subprocess module. The other argument worth taking special note of is shell=False. When set to True, subprocess.run passes the first argument (that the documentation recommends be a string now instead of an array of strings) to the /bin/sh program for execution as a script. So now python3 launches a single child process — the sh shell — that can launch arbitrarily many further child processes. That obviously has the advantage of letting you do more complex sequences of commands in a script-like written format, and lets you take advantage of shell features like setting/expanding env variables. Piping to stdout and stderr works the same: any process you invoke in your shell script that writes to either stream will contribute to the strings that become accessible in run().stdout.decode('utf8') and run().stderr.decode('utf8').

There are only two disadvantages that I can discern of using the shell=True argument:
- Overhead: it takes resources to start a new shell
- Complexity: there’s something to be said about keeping your calls to the outside world short, simple and infrequent
Finally, let’s review the subprocess.Popen that subprocess.run wraps around. The main difference is that run is blocking, while Popen is not. That is, if you start a process with run, it will wait to finish before proceeding to the next line of python code. Again, this is great when you just want to get e.g. the stdout of a command-line tool dumped straight into a string variable:
```
>>> from subprocess import run
>>> date = run('date', capture_output=True).stdout.decode('utf8')
>>> print(date)
'Mon Sep 28 23:44:49 EDT 2020\n'
```
Note: the argument (..., capture_output=True) is provided in python3.6+ as a short cut for (..., stderr=PIPE, stderr=PIPE).

Popen, by contrast, is a class constructor that will start the subprocess, return an object that lets you communicate with that process, and then you’ll immediately move onto the next line of code. This is useful if you want to launch a lot of processes, like network requests, in parallel. You then control the timing of the processes with the Popen.wait() method. The Popen object also has more complex data structures owing. toits asynchronous nature, meaning that, for example, you have to convert a buffer to a string using an intermediary .read() method. The equivalent code with Popen to the above run code is thus the more verbose pattern I had been using for so long:
```
>>> from subprocess import Popen
>>> p = Popen('date', capture_output=True)
>>> p.wait()
>>> date = p.stdout.read().decode('utf8')
>>> print(date)
'Mon Sep 28 23:44:49 EDT 2020\n'
```
Summary

I expect I’ll be using the following patterns a lot going forward.
```
from subprocess import Popen, run, PIPE
### Simple stuff
run(['mv', 'temp', 'temp2'])
### Simple I/O stuff
date = run(['date'], capture_output=True).stdout.decode('utf8')
### Parallel stuff
p1 = Popen('curl -o foo.html https://www.foo.com')
p2 = Popen('curl -o bar.html https://www.bar.com')
p1.wait()
p2.wait()
```
(And, yes, I know there are native python3 equivalents to all these commands.)

Further reading

https://www.linusakesson.net/programming/tty/

https://www.informit.com/articles/article.aspx?p=2854374&seqNum=5

https://lucasfcosta.com/2019/04/07/streams-introduction.html
1. ^*
  Is the message to be literally “viewed” (i.e. on the screen), is it to be “stored” (i.e. saved to a file on disk), is it to be “piped” (i.e. imbibed as input information by another program), or ignored (i.e. discarded)?
2. ^†
  The “user” of your program could of course be someone who writes a program that calls your program
3. ^‡
  The key here is that you don’t need to know what this channel “is”, only that the user will be provided with a systematic means to determine where messages designated for that channel will end up.
4. ^§
  I.e. your code will wait for it to finish
5. ^¶
  An unnamed device is one that does not show up in the /dev/
September 29, 2020