Stuff I'm Up To

Technical Ramblings

Icinga2 and Cloudflare — April 28, 2024

Icinga2 and Cloudflare

With Cloudflare, I wanted to host my Icinga2 instance behind a tunnel. This posed a bit of an issue as whenever I tried to submit a passive result the logs showed sslv3 alert bad certificate.

I figured it’s something to do with the Cloudflare TLS getting in the way, and I was right. Between Cloudflare and Icinga2, I need to get Cloudflare to ignore the self-signed certificate of the Icinga2 service. There is a very simple option in the tunnel under TLS that turns off the verification of certificates. With this disabled, I now get correct submissions of passive results.

See also: Icinga2

Prometheus, Alert Manager and Docker Swarm — April 26, 2024

Prometheus, Alert Manager and Docker Swarm

This is not a complete plug and go HOW-TO for using Prometheus for scaling a Docker Swarm, but it does contain the building blocks for how to do it.

Orchestration

Orchestration is the ability to deploy and manage systems. If we want to manage Docker, we would typically require an orchestration tool such as Kubernetes or Hashicorp Nomad.

Kubernetes has a steep learning curve, and to automatically scale services with Nomad you need an enterprise licence.

Monitoring

Prometheus

Prometheus is a time series database that is able to pull key value pairs (metrics) from systems that export the data using a web service.

Some software does this as a feature, others require additional services. If you install the prometheus-node-exporter service on Linux, you can gather a whole raft of metrics by visiting

http://localhost:9100/metrics

You then point Prometheus at the URL, and it will periodically scrape the data from the URL and process it into its time series database.

Our usage example is to monitor Nginx for the number of active connections. If it goes above 100, then we use AlertManager to trigger a message.

We scrape metrics from the nginx-node-exporter – published on port 9113, which collects the metrics from the Nginx stub_status. This location is enabled by adding the following directive into the default.conf

    location = /stub_status {
        stub_status;
        access_log off;
    }

prometheus.yml

global:
  scrape_interval: 15s
  scrape_timeout: 10s
  scrape_protocols:
  - OpenMetricsText1.0.0
  - OpenMetricsText0.0.1
  - PrometheusText0.0.4
  evaluation_interval: 1m

rule_files:
  - "rules.yml"

scrape_configs:
- job_name: nginx
  static_configs:
  - targets: ["192.168.121.174:9113"]

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['192.168.121.174:9093']

Under the alerting: stanza, we add in the target IP address and port (9093) for our AlertManager instance.

rules.yml

# rules.yml
groups:
  - name: nginx
    rules:
      - alert: Nginx 100 active connections
        for: 1m
        expr: nginx_connections_active{job="nginx"} >= 100
        labels:
          severity: critical
        annotations:
          title: Nginx 100 active connections on {{ $labels.instance }}
          description: The Nginx on instance {{ $labels.instance }} has seen >100 active connections for the past 1 minute.

AlertManager

AlertManager is a Prometheus product that can be leveraged by Prometheus to send alerts when conditions are met for specified rules about the metrics it received.

The messages it sends out can be of many types SMTP, web chat, discord, etc. and web hooks.

If we use a web hook, we can configure AlertManager with a simple config:

alertmanager.yml

global:
  resolve_timeout: 5m
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 5m
  repeat_interval: 3h
  receiver: 'webhook'
receivers:
  - name: 'webhook'
    webhook_configs:
      - url: 'http://192.168.121.174:3000'
        send_resolved: true

The receivers: stanza contains the webhook URL for our custom web service that will handle the data that is passed to it.

Python

Using python, we have a simple script that listed on port 3000 for our posted web data from the web hook call.

from flask import Flask, request, jsonify

app = Flask(__name__)

@app.route('/', methods=['POST'])
def process_webhook():
    try:
        alert_data = request.json
        # Process the alert data here (e.g., extract labels, annotations, etc.)
        # Implement your scaling logic based on the alert information
        # ...

        print("What we do to process the data goes here")

        # Return a response (optional)
        return jsonify({'message': 'Webhook received successfully'}), 200
    except Exception as e:
        return jsonify({'error': str(e)}), 500

if __name__ == '__main__':
    app.run(host='0.0.0.0', port=3000)

The data that comes in from the AlertManager webhook is JSON, and when formatted looks like this:

When firing

{
    "receiver": "webhook",
    "status": "firing",
    "alerts": [
        {
            "status": "firing",
            "labels": {
                "alertname": "Nginx 100 active connections",
                "instance": "192.168.121.174:9113",
                "job": "nginx",
                "severity": "critical"
            },
            "annotations": {
                "description": "The Nginx on instance 192.168.121.174:9113 has seen >100 active connections for the past 1 minute.",
                "title": "Nginx 100 active connections on 192.168.121.174:9113"
            },
            "startsAt": "2024-04-26T17:21:05.311Z",
            "endsAt": "0001-01-01T00:00:00Z",
            "generatorURL": "http: //prometheus:9090/graph?g0.expr=nginx_connections_active%7Bjob%3D%22nginx%22%7D+%3E%3D+100&g0.tab=1",
            "fingerprint": "f19572f660b24b61"
        }
    ],
    "groupLabels": {
        "alertname": "Nginx 100 active connections"
    },
    "commonLabels": {
        "alertname": "Nginx 100 active connections",
        "instance": "192.168.121.174:9113",
        "job": "nginx",
        "severity": "critical"
    },
    "commonAnnotations": {
        "description": "The Nginx on instance 192.168.121.174:9113 has seen >100 active connections for the past 1 minute.",
        "title": "Nginx 100 active connections on 192.168.121.174:9113"
    },
    "externalURL": "http://alertmanager:9093",
    "version": "4",
    "groupKey": "{}:{alertname=\"Nginx 100 active connections\"}",
    "truncatedAlerts": 0
}

When resolved

{
    "receiver": "webhook",
    "status": "resolved",
    "alerts": [
        {
            "status": "resolved",
            "labels": {
                "alertname": "Nginx 100 active connections",
                "instance": "192.168.121.174:9113",
                "job": "nginx",
                "severity": "critical"
            },
            "annotations": {
                "description": "The Nginx on instance 192.168.121.174:9113 has seen >100 active connections for the past 1 minute.",
                "title": "Nginx 100 active connections on 192.168.121.174:9113"
            },
            "startsAt": "2024-04-26T17:21:05.311Z",
            "endsAt": "2024-04-26T17:23:05.311Z",
            "generatorURL": "http://prometheus:9090/graph?g0.expr=nginx_connections_active%7Bjob%3D%22nginx%22%7D+%3E%3D+100&g0.tab=1",
            "fingerprint": "f19572f660b24b61"
        }
    ],
    "groupLabels": {
        "alertname": "Nginx 100 active connections"
    },
    "commonLabels": {
        "alertname": "Nginx 100 active connections",
        "instance": "192.168.121.174:9113",
        "job": "nginx",
        "severity": "critical"
    },
    "commonAnnotations": {
        "description": "The Nginx on instance 192.168.121.174:9113 has seen >100 active connections for the past 1 minute.",
        "title": "Nginx 100 active connections on 192.168.121.174:9113"
    },
    "externalURL": "http://alertmanager:9093",
    "version": "4",
    "groupKey": "{}:{alertname=\"Nginx 100 active connections\"}",
    "truncatedAlerts": 0
}

We can then develop our python script to respond to the data received.

Docker

We can include a very simple method in our webhook service, the ability to manage a service in Docker.

import docker

client = docker.from_env()
service = client.services.get('helloworld')
desired_replicas = 3 # Set your desired replica count
service.scale(desired_replicas)
fish — April 19, 2024

fish

I’ve been using the excellent zsh and antigen for a while, but one thing really annoyed me. The autocomplete often showed duplicate characters or mangled the line somehow. I thought I’d have a look at fish shell.

In zsh I like my simple ay theme. Shows me where I am and what my git status is, turns out I can use the same/equivalent theme in fish. Even more fun it has an extension called “Oh My Fish” paying homage, or tongue in cheek, to “Oh My Zsh”. Install fish and Oh My Fish.

pamac install fish
curl https://raw.githubusercontent.com/oh-my-fish/oh-my-fish/master/bin/install | fish

Get Oh My Fish to install the ays theme.

omf install ays

What am I missing? Docker aliases – I make big use of dcupd, dclf, etc. How do I get my aliases into fish? There doesn’t appear to be a docker/docker compose plugin like there is in Oh My Zsh, but aliases can be added to your fish config in ~/.config/fish/fish.config just like your ~/.zshrc. But I then discovered abbr. A nice feature of fish that acts like aliases, but expands them out as you type, eg. dclf becomes the full docker compose logs -f as soon as you hit the spacebar.

This is my fish.config, a few docker aliases, with exa, bat and nvim.

if status is-interactive
    # Commands to run in interactive sessions can go here
    abbr -a -- dcps 'docker compose ps'
    abbr -a -- dclf 'docker compose logs -f'
    abbr -a -- dcup 'docker compose up'
    abbr -a -- dcupd 'docker compose up -d'
    abbr -a -- dcdn 'docker compose down'
    abbr -a -- dcl 'docker compose logs'
    abbr -a -- ls 'exa'
    abbr -a -- cat 'bat -p'
    abbr -a -- vi nvim
    abbr -a -- vim nvim
end

Remove the welcome greeting with:

set -U fish_greeting

References

zsh antigen

Knock vs. Knockd —

Knock vs. Knockd

I’m working on a project that requires a machine to be contactable on a client’s remote network even when its DHCP fails. We’ve seen an issue where our devices disappear periodically. After investigation, we discovered that when the device’s DHCP lease expires, it fails to renew its address from the client’s DHCP/BOOTP server.

Often the site has a number of devices that have different lease periods, and one device may at least still be online. Now we need to think of a what we can do to use the online device to our advantage to somehow fix, or continue to work with, the offline device. My thinking was to use Avahi’s auto IP feature to at least give the offline device an IP address we could then get to from a working device.


When I enabled avahi-daemon and got avahi-autoipd giving the device an IPv4LL address (169.254.0.0/16), I found that knockd stopped working. Even though I could see the device has a valid IPv4LL address, and could see knock packets using tcpdump on the device, knockd wasn’t seeing them.

This is going to be a problem. We don’t really want the client network to be able to see any ports on the devices, so knockd was a good way of making all ports closed unless we need them.

After some investigation, I found that knockd is not the only way to use port knocking. You can get iptables to handle it for you. https://wiki.archlinux.org/title/Port_knocking

I used this method and set the knock ports as necessary and now, even when the IP changes, the port knock still works.

*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT ACCEPT [0:0]
:TRAFFIC - [0:0]
:SSH-INPUT - [0:0]
:SSH-INPUTTWO - [0:0]
# TRAFFIC chain for Port Knocking. The correct port sequence in this example is  8881 -> 7777 -> 9991; any other sequence will drop the traffic
-A INPUT -j TRAFFIC
-A TRAFFIC -p icmp --icmp-type any -j ACCEPT
-A TRAFFIC -m state --state ESTABLISHED,RELATED -j ACCEPT
-A TRAFFIC -m state --state NEW -m tcp -p tcp --dport 22 -m recent --rcheck --seconds 30 --name SSH2 -j ACCEPT
-A TRAFFIC -m state --state NEW -m tcp -p tcp -m recent --name SSH2 --remove -j DROP
-A TRAFFIC -m state --state NEW -m tcp -p tcp --dport 9991 -m recent --rcheck --name SSH1 -j SSH-INPUTTWO
-A TRAFFIC -m state --state NEW -m tcp -p tcp -m recent --name SSH1 --remove -j DROP
-A TRAFFIC -m state --state NEW -m tcp -p tcp --dport 7777 -m recent --rcheck --name SSH0 -j SSH-INPUT
-A TRAFFIC -m state --state NEW -m tcp -p tcp -m recent --name SSH0 --remove -j DROP
-A TRAFFIC -m state --state NEW -m tcp -p tcp --dport 8881 -m recent --name SSH0 --set -j DROP
-A SSH-INPUT -m recent --name SSH1 --set -j DROP
-A SSH-INPUTTWO -m recent --name SSH2 --set -j DROP
-A TRAFFIC -j DROP
COMMIT
# END or further rules

References

knockd

Docker and a Local Squid Proxy — April 6, 2024

Docker and a Local Squid Proxy

I’ve been repetitively building a Docker multi-stage image and found many of the Python requirements are dragging in some pretty large content. This isn’t great when the office network isn’t particularly fast, and the Docker build stage repeatedly pulls the same files from online.

Time to add a caching proxy.

It would be nice if I can switch the proxy on and off as I need it. I don’t necessarily want it system-wide, just to cache the Docker requirements. For this, I can edit the users ~/.docker/config.json and add in the proxy settings to cache my requests.

Add the Docker image for squid and just run it pretty much as is. Here’s my compose.yml

services:
  squid:
    image: ubuntu/squid
    ports:
      - 0.0.0.0:3128:3128
    environment:
      TZ: Europe/London
    volumes:
      - ./data:/var/spool/squid:rw
      - ./logs:/var/log/squid:rw

Now I just point my ~/.docker/config.json to the proxy. My sample showing the “proxies” stanza added to the config.

{
   "auths": {
       "ghcr.io": {
           "auth": "V2UgaGF2ZSBiZWVuIGV4cGVjdGluZyB5b3UgTXIgQm9uZC4K"
       }
   },
   "proxies": {
     "default": {
        "httpProxy": "http://192.168.0.94:3128",
        "httpsProxy": "http://192.168.0.94:3128",
        "noProxy": "*.domain.tld,127.0.0.0/8"
      }
   }  
}

Now when Docker build pulls in web content it’s via my cache and stops it hammering the slow speed internet so much.

I can even use this proxy in my browser using SwitchyOmega to turn it on and off.

AWS: SSH using Systems Manager — April 2, 2024

AWS: SSH using Systems Manager

Systems Manager

Systems Manager adds a layer of management to your EC2’s. One particular benefit is being able to SSH into an EC2 without having to open a port to the server. Giving an additional layer of security to the host.

It does this by connecting an Agent service from your EC2 to Systems Manager, and you are then able to proxy via a aws-cli ssm connection.

There are a number of Amazon Machine Images (AMI) that come with the ssm agent preinstalled – Ubuntu 22.04 LTS server is one of them.

Continue reading