Monitoring My Discord Bot

14 July, 2022

I decided to implement a health check capability into my Discord bot. Its uptime is totally unimportant, but it was still fun to do.

In terms of monitoring the Discord bot, there were a few options:

Build my own with cron, bash, and sendmail; install it on a separate host
Use cron to send pings to healthchecks.io
Use either updown.io or uptimerobot.com.

I chose the second option. I found it compelling because the model of pushing a ping from my host to the monitoring API didn’t require me to punch a hole in my host’s firewall so that the monitoring API could scrape the health check endpoint. It’s also free, but it’s worth noting that when I played around with updown.io’s price estimators that I found it to be very inexpensive (on the order of cents per year for my use case).

Instead, I have a cron job that executes a bash script, like this:

#!/usr/bin/env bash

details=$(curl -sSf http://localhost:8000/health)
curl -m 10 --retry 5 --data-raw "$details" https://hc-ping.com/<redacted>/$?

The way it works is the cron job runs on the same host that my Discord bot runs on. It hits the bot’s health check endpoint and stores the output in the details variable. The output is just JSON that describes the health of the bot, like this:

{
  "discord_connection":"connected",
  "discord_heartbeat_latency":"63 ms"
}

healthchecks.io can persist information that you send in your ping, and I wanted it to persist the health details with each check-in because I figure there’d be some forensic value in having that information.

The second curl invocation is the actual healthchecks.io ping. I include the exit status of the first curl invocation because the healthchecks.io API allows you to specify that the monitored service was able to check-in but is experiencing failures.

To take advantage of that, my curl invocation uses the -f flag so that its exit code will be non-zero if the healthcheck endpoint doesn’t return an HTTP 200 code.

The end result of this monitoring goes a little something like this:

My cronjob runs every 5 minutes. I configured the healthchecks.io monitor to expect a ping every 5 minutes.
If healthchecks.io does not receive a ping within that 5 minute interval, I configured it to have a 15 minute grace period. If the grace period elapses and healthchecks.io does not receive a ping, then healthchecks.io sends me an email.
If healthchecks.io receives a ping with the non-zero exit code supplied from my cronjob, then it sends me an email.

Otherwise, no news is good news! Setting all of this up took approximately 5 minutes.

So what happens if the bot goes down and I get an email? Probably nothing, heh. It’ll be cool though. If anything, I’m guessing this will end up inadvertently tracking my ISP’s uptime since my Discord bot runs in my homelab.

I’m not in any way affiliated with healthchecks.io, and I’ve never spoken with them.