Tag: networking

  • Raspberry Pi Cluster Part IV: Opening Up to the Internet

    Intro

    We want to serve and access the cluster to the Internet from a home connection. To do that, we need to set up our home router with port forwarding, and open up ports for SSH, HTTP and HTTPS.

    All traffic goes through our primary node (RPi1). If I want to connect to a non-primary node from the outside world, then I can just ssh into RPi1 and then ssh onto the target node.

    So far, we have been able to move around the network with with using passwords. This is not ideal. Having to type in a password each time slow us down and makes our automations trickier to secure. It’s also just not good practice to use passwords since hackers will spam your ssh servers with brute-force password attacks.

    Network SSH Inter Communication

    So we want to be able to ssh from our laptop into any node, and from any node to any node, without using a password. We do this with private-public key pairs.

    First, if you have not done so already, create a public-private key pair on your unix laptop with ssh-keygen -t rsa and skip adding a passphrase:

    ❯ ssh-keygen -t rsa
    Generating public/private rsa key pair.
    Enter file in which to save the key (/home/me/.ssh/id_rsa):
    Enter passphrase (empty for no passphrase):
    Enter same passphrase again:
    Your identification has been saved in /home/me/.ssh/id_rsa
    Your public key has been saved in /home/me/.ssh/id_rsa.pub
    The key fingerprint is:
    ....
    The key's randomart image is:
    +---[RSA 3072]----+
    |      . +.o.     |
    ...
    |       .o++Boo   |
    +----[SHA256]-----+

    This will generate two files (“public” and “private”) in ~/.ssh (id_rsa.pub and id_rsa respectively). Now, in order to ssh into another host, we need to copy the content of the public key file into the file ~/.ssh/authorized_keys on the destination server. Once that is in place, you can ssh without needing a password.

    Rather than manually copying the content of the public key file into a remote host, Unix machines provide a script called `ssh-copy-id` that provides a shortcut for you. In this case, running it as ssh-copy-id name@server will copy the default id_rsa.pub file content to the /home/user/.ssh/authorized_keys file on the server. If you want to use a non-default public key file, you can specify it with ssh-copy-id -i [non-default] file.

    Once the public-private keys are available on the laptop, to make this process easier, we can copy/paste this script to copy the public portion into each of the nodes. Each time we’ll need to specify the password to copy into the node, but afterwards we can ssh without passwords.

    while read SERVER
    do
        ssh-copy-id user@"${SERVER}"
    done <<\EOF
    rpi1
    rpi2
    rpi3
    rpi4
    EOF

    Next, we want to be able to ssh from any node into any other node. We could just copy the private key we just created on the laptop into each node, but this is not the safest practice. So, in this case, I went into each node and repeated the process (create a private-public pair, and then ran the above script).

    Securing SSH Connections

    Raspberry Pis are notorious for getting pwned by bots as soon as they’re opened up to the internet. One way to help ensure the security of your RPi is to use 2-Factor Authorization (2FA).

    I am not going to do that because it creates added complexity to keep up with, and the other measures I’ll be taking are good enough.

    Now that we have set up the ssh keys on our nodes, we need to switch off the ability to ssh in using a password — especially on our primary node which will be open to the internet. We do this by editing the file /etc/ssh/sshd-config and change the line from PasswordAuthentication yes to PasswordAuthentication no. Having done that, the only way to ssh in now is through public-private key-gen pairs, and they only exist on my laptop. (If my laptop gets lost or destroyed, then the only way to access these nodes will be by directly connecting them to a monitor with keyboard, and logging in with a password.)

    The next precaution we’ll take is to change the port on which the sshd service runs on the primary node from the default 22 to some random port. We do this by uncommenting the line #Port 22 in the sshd_config file and changing the number to, say, Port 60022 and restarting the sshd service.

    Having made this change, you will need to specify this non-default port whenever you try to get into that node with e.g. ssh -p 60022 user@rpi1. A bit annoying but, I have heard off the grapevine, this will prevent 99% of hacker bots in their tracks.

    Finally, we can install a service called fail2ban with the usual sudo apt install fail2ban. This is a service that scans log files for suspicious behvior (e.g. failed logins, email-searching bots) and takes action to prevent malicious behavior by modifying the firewall for a period of time (e.g. banning the ip address).

    With these three measures in place, we can be confident in the security of the cluster when opening it up to the internet.

    Dynamic DNS

    To open up our cluster to the internet, we need to get our home’s public IP address. This is assigned to your home router/modem box by your Internet Service Provider (ISP). In my case, I am using a basic Xfinity service with very meager upload speeds. I’ve used Xfinity for several years now, and it tends to provide fairly stable IP addresses, and does not seem to care if you use your public IP address to host content. (By contrast, I once tried setting up port-forwarding for a friend who had a basic home internet connection provided by Cox, and Cox seemed to actively adapt to block this friend’s outgoing traffic. I.e. Cox wants you to upgrade to a business account to serve content from your home connection.)

    To see your public IP Address, run curl http://checkip.amazonaws.com.

    We want to point a domain name towards our home IP address, but since ISPs can change your public IP address without notice, we need to come up with a method to adapt to any such change. The “classic” way to adapt to these changes in IP address is to use a “dynamic” DNS (DDNS) service. Most modern modem/router devices will give you the option to set up with an account from a a company like no-ip.com, and there are plenty of tutorials on the web if you wish to go this route.

    However, I don’t want to pay a company like no-ip.com, or mess about with their “free-tier” service that requires you to e.g. confirm/renew the free service every month.

    Since I get my IP addresses using AWS Route 53, we can use a cronjob script wrapping to periodically check that the IP Address assigned by the ISP is the same as the one that the AWS DNS server points to and, if it has changed, then this script can use the AWS CLI to update the IP Address. The process is described here and the adapted script that I am using is based on this gist. The only change I made was to extract the two key variables HOSTED_ZONE_ID and
    NAME to the scripts arguments (in order to allow me to run this same script for multiple domains).

    Once I had the script _update_ddns in place, I decided to run it every 10 minutes by opening crontab -e and adding the line:

    */10 * * * * /home/myname/ddns_update/_update_dns [MY_HOSTED_ZONE] [DOMAIN] >/dev/null 2>&1

    Port Forwarding

    Finally, we need to tell our modem/router device to forward requests submitted to our ssh, http and https ports onto the primary node’s wifi IP Address. Every router/modem device will be different depending on the brand and model, so you’ll have to poke around for the “port-forwarding” setup.

    In my case, I’m using an Arris router, and it was not too hard to find the port-forwarding. You’ll then need to set up a bunch of rules that tell the device how to route packets that come from the external network on a given port (80 and 443 in the figure below) and to what internal address and ports I want those packets directed (192.168.0.51 in the figure below). Also add a rule if you want to be able to ssh into your primary node on the non-default port.

    Port forwarding on my home modem/router device.

    Make sure you have a server running, e.g. apache, when you test the URL you set up through route 53.

    And that’s it — we have a pretty secure server open to the internet.

  • Raspberry Pi Cluster Part II: Network Setup

    Introduction

    In the last post we got the hardware in order and made each of our 4 RPi nodes production ready with Ubuntu Server 20.04. We also established wifi connections between each node and the home router.

    In this post, I’m going to describe how to set up the “network topology” that will enable the cluster to become easily transportable. The primary RPi4 node will act as the gateway/router to the cluster. It will communicate with the home router on behalf of the whole network. If I move in the future, then I’ll only have to re-establish a wifi connection with this single node in order to restore total network access to each node. I also only need to focus on securing this node in order to expose the whole cluster to the internet. Here’s the schematic again:

    In my experience, it’s tough to learn hardware and networking concepts because the field is thick with jargon. I am therefore going to write as though to my younger self keenly interested in becoming self-reliant in the field of computer networking.

    Networking Fundamentals

    If you’re not confident with your network fundamentals, then I suggest you review the following topics by watching the linked explainer videos. (All these videos are made by the YouTube chanel “Power Cert Animated Videos” and are terrific.

    Before we get into the details of our cluster, let’s quickly review the three main things we need to think about when setting up a network: IP-address assignment, domain-name resolution, and routing.

    IP-Address Assignment

    At its core, networking is about getting fixed-length “packets” of 1s and 0s from one program running on a computer to another program running on any connected computer (including programs running on the same computer). For that to happen, each computer needs to have an address – an IP Address – assigned to it. As explained in the above video, the usual way in which that happens is by interacting with a DHCP server. (However, most computers nowadays run a process in the background that will attempt to negotiate an IP Address automatically in the event that no machine on its network identifies itself as a DHCP server.) In short, we’ll need to make sure that we have a DHCP server on our primary node in order to assign IP addresses to the other nodes.

    Domain-Name Resolution

    Humans do not like to write instructions as 1s and 0s, so we need each node in our network to be generally capable of exchanging a human-readable address (e.g. ‘www.google.com’, ‘rpi3’) into a binary IP address. This is where domain-name servers (DNS) and related concepts come in.

    The word “resolve” is used to describe the process of converting a human-readable address into an IP address. In general, an application that needs to resolve an IP address will interact with a whole bunch of other programs, networks and servers to obtain its target IP address. The term “resolver” is sometimes used to refer to this entire system of programs, networks and servers. The term resolver is also sometimes used to refer to a single element within such a system. (Context usually makes it clear.) From hereon, we’ll use “resolver” to refer to a single element within a system of programs, networks and servers whose job is to convert strings of letters to an IP Address, and “resolver system” to refer to the whole system.

    Three types of resolver to understand here are “stub resolvers”, “recursive resolver”, and “authoritative resolver”. A stub resolver is a program that basically acts as a cache within the resolver system. If it has recently received a request to return an IP address in exchange for a domain name (and therefore has it in its cache), then it will return that domain name. Otherwise, it will pass the request onto another resolver, (which might also be a stub resolver that has to just pass the buck on).

    A recursive resolver will also act as a cache and if it does not have all of the information needed to return a complete result, then it will pass on a request for information to another resolver. Unlike a stub resolver though, it might not receive back a final answer to its question but, rather, an address to another resolver that might have the final answer. The recursive resolver will keep following any such lead until it gets its final answer.

    An “authoritative” resolver is a server that does not pass the buck on. It’s the final link in the chain, and if it does not have the answer or suggestions for another server to consult, then the resolution will fail, and all of these resolvers will send back a failure message.

    In summary, domain-name resolution is all about finding a simple lookup table that associates a string (domain name) with a number (the IP Address). This entry in the table is called an “A Record” (A for Address).

    Routing

    Once a program has an IP Address to send data to, it needs to know where first to send the packet in order to get it relayed. In order for this to happen, each network interface needs to have a router address applied to it when configured. You can see the router(s) on a linux with router -n. In a home setup, this router will be the address of the wifi/modem box. Once the router address is determined, the application can just send packets there and the magic of Internet routing will take over.

    Ubuntu Server Networking Fundamentals

    Overview

    Ubuntu Server 20.04, which we’re using here, comes with several key services/tools that are installed/enabled by default or by common practice: systemd-resolved, systemd-networkd, NetworkManager and netplan.

    systemd-resolved

    You can learn the basic about it by running:

    man systemd-resolved

    This service is a stub resolver making it possible for applications running on the system to resolve hostnames. Applications running on the system can interact with it by issuing some low-level kernel jazz via their underlying C libraries, or by pinging the internal (“loopback”) network address 127.0.0.53. To see it in use as a stub server, you can run dig @127.0.0.53 www.google.com.

    You can check what DNS servers it is set up to consult by running resolvectl status. (resolvectl is a pre-installed tool that lets you interact with the running systemd-resolved service; see resolvectl --help to get a sense of what you can do with it.)

    Now we need to ask how systemd-resolved resolves hostnames? It does it by communicating over a network with a DNS server. How do you configure it so it knows what DNS servers to consult and in what order of priority?

    systemd-networkd

    systemd-networkd is a pre-installed and pre-enabled service on Ubuntu that acts as a DHCP client (listening on port 68 for signals from a DHCP server). So when you switch on your machine and this service starts up, it will negotiate the assignment of an IP Address on the network based upon DHCP broadcast signals. In the absence of a DHCP server on the network, it will negotiate with any other device. I believe it is also involved in the configuration of interfaces.

    NetworkManager

    This is an older service that does much the same as networkd. It is NOT enabled by default, but is so prominent that I thought it would be worth mentioning in this discussion. (Also, during my research to try and get the cluster configured the way I want it, I installed NetworkManager and messed with it only to ultimately conclude that this was unnecessary and confusing.)

    Netplan

    Netplan is pre-installed tool (not service) that, in theory, makes it easier to configure systemd-resolved and either networkd or NetworkManager. The idea is that you declare your desired network end state in a YAML file (/etc/netplan/50-cloud-init.yaml) so that after start up (or running netplan apply), it will do whatever needs to be done under the hood with the relevant services to get the network into your desired state.

    Other Useful Tools

    In general, when doing networking on linux machines, it’s useful to install a couple more packages:

    sudo apt install net-tools traceroute

    The net-tools package gives us a bunch of classic command-line utilities, such as netstat. I often use it (in an alias) to check what ports are in use on my machibne: sudo netstat -tulpn.

    traceroute is useful in making sense of how your network is presently set up. Right off the bat, running traceroute google.com, will show you how you reach google.

    Research References

    For my own reference, the research I am presenting here is derived in large part from the following articles:

    • This is the main article I consulted that shows someone using dnsmasq to set up a cluster very similar to this one, but using Raspbian instead of Ubuntu.
    • This article and this article on getting dnsmasq and system-resolved to handle single-word domain names.
    • Overview of netplan, NetworkManager, etc.
    • https://unix.stackexchange.com/questions/612416/why-does-etc-resolv-conf-point-at-127-0-0-53
    • This explains why you get the message “ignoring nameserver 127.0.0.1” when starting up dnsmasq.
    • Nice general intro to key concepts with linux
    • This aids understanding of systemd-resolved’s priorities when multiple DNS’s are configured on same system
    • https://opensource.com/business/16/8/introduction-linux-network-routing
    • https://www.grandmetric.com/2018/03/08/how-does-switch-work-2/
    • https://www.cloudsavvyit.com/3103/how-to-roll-your-own-dynamic-dns-with-aws-route-53/

    Setting the Primary Node

    OK, enough preliminaries, let’s get down to setting up out cluster.

    A chief goal is to try to set up the network so that as much of the configuration as possible is on the primary node. For example, if we want to be able to ssh from rpi2 to rpi3, then we do NOT want to have to go to each node and explicitly state where each hostname is to be found.

    So we want our RPi4 to operate as the single source of truth for domain-name resolution and IP-address assignment. We do this by running dnsmasq – a simple service that turns our node into a DNS and DHCP server:

    sudo apt install dnsmasq
    sudo systemctl status dnsmasq

    We configure dnsmasq with /etc/dnsmasq.conf. On this fresh install, this conf file will be full of fairly detailed notes. Still, it takes some time to get the hang of how it all fits together. This is the file I ended up with:

    # Choose the device interface to configure
    interface=eth0
    
    # We will listen on the static IP address we declared earlier
    # Note: this might be redundant
    listen-address=127.0.0.1
    
    # Enable addresses in range 10.0.0.1-128 to be leased out for 12 hours
    dhcp-range=10.0.0.1,10.0.0.128,12h
    
    # Assign static IPs to cluster members
    # Format = MAC:hostname:IP
    dhcp-host=ZZ:YY:XX:WW:VV:UU,rpi1,10.0.0.1
    dhcp-host=ZZ:YY:XX:WW:VV:UU,rpi2,10.0.0.2
    dhcp-host=ZZ:YY:XX:WW:VV:UU,rpi3,10.0.0.3
    dhcp-host=ZZ:YY:XX:WW:VV:UU,rpi4,10.0.0.4
    
    # Broadcast the router, DNS and netmask to this LAN
    dhcp-option=option:router,10.0.0.1
    dhcp-option=option:dns-server,10.0.0.1
    dhcp-option=option:netmask,255.255.255.0
    
    # Broadcast host-IP relations defined in /etc/hosts
    # And enable single-name domains
    # See here for more details
    expand-hosts
    domain=mydomain.net
    local=/mydomain.net/
    
    # Declare upstream DNS's; we'll just use Google's
    server=8.8.8.8
    server=8.8.4.4
    
    # Useful for debugging issues
    # Run 'journalctl -u dnsmasq' for resultant logs
    log-queries
    log-dhcp
    
    # These two are recommended default settings
    # though the exact scenarios they guard against 
    # are not entirely clear to me; see man for further details
    domain-needed
    bogus-priv

    Hopefully these comments are sufficient to convey what is going on here. Next, we make to sure that the /etc/hosts file associates the primary node with its domain name, rpi1. It’s not clear to me why this is needed. The block of dhcp-host definitions above do succeed in enabling dnsmasq to resolve rpi2, rpi3, and rpi4, but the line for rpi1 does not work. I assume that this is because dnsmasq is not setting the IP address of rpi1, and this type of setting only works for hosts that it sets the IP Address of. (Why that is the case seems odd to me.)

    # /etc/hosts
    10.0.0.1 rpi1

    Finally, we need to configure the file /etc/netplan/50-cloud-init.yaml on the primary node in order to declare this node with a static IP Address on both the wifi and ethernet networks.

    network:
        version: 2
        ethernets:
            eth0:
                dhcp4: no
                addresses: [10.0.0.1/24]
        wifis:
            wlan0:
                optional: true
                access-points:
                    "MY-WIFI-NAME":
                        password: "MY-PASSWORD"
                dhcp4: no
                addresses: [192.168.0.51/24]
                gateway4: 192.168.0.1
                nameservers:
                    addresses: [8.8.8.8,8.8.4.4]

    Once these configurations are set up and rpi1 is rebooted, you can expect to find that ifconfig will show ip addresses assigned to eth0 and wlan0 as expected, and that resolvectl dns will read something like:

    Global: 127.0.0.1
    Link 3 (wlan0): 8.8.8.8 8.8.4.4 2001:558:feed::1 2001:558:feed::2
    Link 2 (eth0): 10.0.0.1

    Setting up the Non-Primary Nodes

    Next we jump into the rpi2 node and edit the /etc/netplan/ to:

    network:
        version: 2
        ethernets:
            eth0:
                dhcp4: true
                optional: true
                gateway4: 10.0.0.1
                nameservers:
                    addresses: [10.0.0.1]
        wifis:
            wlan0:
                optional: true
                access-points:
                    "MY-WIFI-NAME":
                        password: "MY-PASSWORD"
                dhcp4: no
                addresses: [192.168.0.52/24]
                gateway4: 192.168.0.1
                nameservers:
                    addresses: [8.8.8.8,8.8.4.4]

    This tells netplan to set up systemd-networkd to get its IP Address from a DHCP server on the ethernet network (which will be found to be on rpi1 when the broadcast event happens), and to route traffic and submit DNS queries to 10.0.0.1.

    To reiterate, the wifi config isn’t part of the network topology; this is optionally added because it makes life easier when setting up the network to be able to ssh straight into a node. In my current setup, I am assigning all the nodes static IP Addresses on the wifi network of 192.168.0.51-4.

    Next, as described here, in order for our network to be able to resolve single-word domain names, we need to alter the behavior of systemd-resolved by linking these two files together:

    sudo ln -sf /run/systemd/resolve/resolv.conf /etc/resolv.conf

    This causes the systemd-resolved stub resolver to dynamically determine a bunch of settings based upon what dnsmasq broadcasts on rpi1.

    After rebooting, and doing the same configuration on rpi3 and rpi4, we can run dig rpi1, dig rpi2, etc. on any of the non-primary nodes and expect to get the single-word hostnames resolved as we intend.

    If we go to trpi1 and check the ip-address leases:

    cat /var/lib/misc/dnsmasq.leases

    … then we can expect to see that dnsmasq has successfully acted as a DHCP server. You can also check that dnsmasq has been receiving DNS queries by examining the system logs: journalctl -u dnsmasq.

    Routing All Ethernet Traffic Through the Primary Node

    Finally, we want all nodes to be able to connect to the internet by routing through the primary node. This is achieved by first uncommenting the line net.ipv4.ip_forward=1 in the file /etc/sysctl.conf and then running the following commands:

    sudo iptables -t nat -A POSTROUTING -o wlan0 -j MASQUERADE
    sudo iptables -A FORWARD -i wlan0 -o eth0 -m state --state RELATED,ESTABLISHED -j ACCEPT
    sudo iptables -A FORWARD -i eth0 -o wlan0 -j ACCEPT

    These lines mean something like the following:

    1. When doing network-address translation (-t nat), and just before the packet is to go out via the wifi interface (-A POSTROUTING = “append a postrouting rule”), replace the source ip address with the ip address of this machine on the outbound network
    2. forward packets in from wifi to go out through ethernet
    3. forward packets in from ethernet to go out through wifi

    For these rules to survive across reboots you need to install:

    sudo apt install iptables-persistent

    and agree to storing the rules in /etc/iptables/rules.v4. Reboot, and you can now expect to be able to access the internet from any node, even when the wifi interface is down (sudo ifconfig wlan0 down).

    Summary

    So there we have it – an easily portable network. If you move location then you only need to adjust the wifi-connection details in the primary node, and the whole network will be connected to the Internet.

    In the next part, we’ll open the cluster up to the internet through our home router and discuss security and backups.