9 min read

How to Create a Simple System Health Monitoring Script in Bash

Table of Contents

Introduction

Manually checking your server’s health is a repetitive and inefficient task. A much better approach is to automate these checks with a simple Bash script. This allows you to get a quick, consistent overview of your system’s vital signs—like CPU load, memory, and disk space—anytime you need it.

This guide will walk you through creating a straightforward yet powerful health monitoring script. You’ll learn to check critical metrics and set custom thresholds to alert you when something needs attention.

Step 1: Create and Prepare the Script File

First, create a new file for your script. We’ll name it health_check.sh.

touch health_check.sh

Next, make the file executable. This permission allows you to run it directly from your terminal.

chmod +x health_check.sh

Now, open health_check.sh in your favorite text editor (like nano, vim, or VS Code) to start building the script.

Step 2: Build the Health Check Script

We will build the script piece by piece. The final result will be a single file that checks multiple system metrics.

The Basic Structure

Every Bash script should start with a “shebang” (#!) that tells the system which interpreter to use. We will also define some variables for color-coded output and set warning thresholds.

Copy this foundational code into your health_check.sh file:

#!/bin/bash

# ------------------------------------------------------------------
# A simple script to monitor basic system health metrics.
#
# Author: ShellFu
# ------------------------------------------------------------------

# --- Configuration ---
# Set the warning thresholds for CPU, memory, and disk usage (in percent).
CPU_THRESHOLD=80
MEM_THRESHOLD=80
DISK_THRESHOLD=90

# --- Colors for Output ---
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# --- Functions ---

# Function to print a separator line
print_separator() {
    printf -- '-%.0s' {1..60}
    printf "\n"
}

# --- Main Logic ---

echo "System Health Report - $(date)"
print_separator

This setup defines thresholds and color codes, making the final report easy to read.

Step 3: Check CPU Load

The system’s load average gives you an idea of how busy the CPU has been over the last 1, 5, and 15 minutes. We can get this from the uptime command.

Add this function to your script below the --- Functions --- comment:

# Function to check system load average
check_load() {
    echo -e "${YELLOW}System Load:${NC}"
    # Get load average from the 'uptime' command
    LOAD_AVG=$(uptime | awk -F'load average: ' '{print $2}')
    echo "  Load Average (1m, 5m, 15m): $LOAD_AVG"

    # For a simple CPU check, we'll use the 1-minute load average.
    # We compare it to the number of CPU cores.
    LOAD_1MIN=$(echo "$LOAD_AVG" | awk -F, '{print $1}')
    CPU_CORES=$(nproc)
    
    # bc is used for floating-point comparison
    if (( $(echo "$LOAD_1MIN > $CPU_CORES" | bc -l) )); then
        echo -e "  ${RED}Status: High 1-minute load average detected.${NC}"
    else
        echo -e "  ${GREEN}Status: Load average is within normal limits.${NC}"
    fi
    print_separator
}
  • uptime | awk -F'load average: ' '{print $2}': This command gets the output of uptime and uses awk to print only the load average values.
  • nproc: This command returns the number of available processing units (CPU cores). A common rule of thumb is that a load average higher than the number of cores indicates a performance bottleneck.

Step 4: Check Memory Usage

Next, let’s check how much RAM is being used. The free command provides this information.

Add this function to your script:

# Function to check memory usage
check_memory() {
    echo -e "${YELLOW}Memory Usage:${NC}"
    # Get memory usage and format it. 'free -m' shows values in megabytes.
    MEM_INFO=$(free -m)
    TOTAL_MEM=$(echo "$MEM_INFO" | awk 'NR==2 {print $2}')
    USED_MEM=$(echo "$MEM_INFO" | awk 'NR==2 {print $3}')
    MEM_PERCENT=$((USED_MEM * 100 / TOTAL_MEM))

    echo "  Total Memory: ${TOTAL_MEM}MB"
    echo "  Used Memory: ${USED_MEM}MB (${MEM_PERCENT}%)"
    
    if [ "$MEM_PERCENT" -gt "$MEM_THRESHOLD" ]; then
        echo -e "  ${RED}Status: High memory usage detected.${NC}"
    else
        echo -e "  ${GREEN}Status: Memory usage is normal.${NC}"
    fi
    print_separator
}
  • free -m: Displays system memory in megabytes.
  • awk 'NR==2 {print $2}': awk processes the output line by line. NR==2 tells it to only look at the second line (the “Mem:” line), and {print $2} prints the second column (total memory).
  • $((...)): This is Bash’s arithmetic expansion, used here to calculate the usage percentage.

Step 5: Check Disk Space

Running out of disk space is a common problem. This function will check the usage of the root filesystem (/).

Add this function to your script:

# Function to check disk space usage for the root partition
check_disk() {
    echo -e "${YELLOW}Disk Usage (/):${NC}"
    # 'df -h /' gets disk usage for the root filesystem in human-readable format.
    DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
    
    echo "  Usage: ${DISK_USAGE}%"

    if [ "$DISK_USAGE" -gt "$DISK_THRESHOLD" ]; then
        echo -e "  ${RED}Status: Low disk space detected.${NC}"
    else
        echo -e "  ${GREEN}Status: Disk space is sufficient.${NC}"
    fi
    print_separator
}
  • df -h /: Reports the disk space for the root (/) filesystem in a “human-readable” format (e.g., GB, MB).
  • awk 'NR==2 {print $5}': This grabs the usage percentage from the output.
  • sed 's/%//': This command removes the % symbol so we can perform a numerical comparison.

Step 6: Putting It All Together

Now, let’s call these functions from the main part of our script. Replace the --- Main Logic --- section with the following:

# --- Main Logic ---

echo "System Health Report - $(date)"
print_separator

check_load
check_memory
check_disk

echo "Health check complete."

Your final health_check.sh file should now look like this:

#!/bin/bash

# ------------------------------------------------------------------
# A simple script to monitor basic system health metrics.
#
# Author: ShellFu
# ------------------------------------------------------------------

# --- Configuration ---
# Set the warning thresholds for CPU, memory, and disk usage (in percent).
CPU_THRESHOLD=80
MEM_THRESHOLD=80
DISK_THRESHOLD=90

# --- Colors for Output ---
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color

# --- Functions ---

# Function to print a separator line
print_separator() {
    printf -- '-%.0s' {1..60}
    printf "\n"
}

# Function to check system load average
check_load() {
    echo -e "${YELLOW}System Load:${NC}"
    # Get load average from the 'uptime' command
    LOAD_AVG=$(uptime | awk -F'load average: ' '{print $2}')
    echo "  Load Average (1m, 5m, 15m): $LOAD_AVG"

    # For a simple CPU check, we'll use the 1-minute load average.
    # We compare it to the number of CPU cores.
    LOAD_1MIN=$(echo "$LOAD_AVG" | awk -F, '{print $1}')
    CPU_CORES=$(nproc)
    
    # bc is used for floating-point comparison
    if (( $(echo "$LOAD_1MIN > $CPU_CORES" | bc -l) )); then
        echo -e "  ${RED}Status: High 1-minute load average detected.${NC}"
    else
        echo -e "  ${GREEN}Status: Load average is within normal limits.${NC}"
    fi
    print_separator
}

# Function to check memory usage
check_memory() {
    echo -e "${YELLOW}Memory Usage:${NC}"
    # Get memory usage and format it. 'free -m' shows values in megabytes.
    MEM_INFO=$(free -m)
    TOTAL_MEM=$(echo "$MEM_INFO" | awk 'NR==2 {print $2}')
    USED_MEM=$(echo "$MEM_INFO" | awk 'NR==2 {print $3}')
    MEM_PERCENT=$((USED_MEM * 100 / TOTAL_MEM))

    echo "  Total Memory: ${TOTAL_MEM}MB"
    echo "  Used Memory: ${USED_MEM}MB (${MEM_PERCENT}%)"
    
    if [ "$MEM_PERCENT" -gt "$MEM_THRESHOLD" ]; then
        echo -e "  ${RED}Status: High memory usage detected.${NC}"
    else
        echo -e "  ${GREEN}Status: Memory usage is normal.${NC}"
    fi
    print_separator
}

# Function to check disk space usage for the root partition
check_disk() {
    echo -e "${YELLOW}Disk Usage (/):${NC}"
    # 'df -h /' gets disk usage for the root filesystem in human-readable format.
    DISK_USAGE=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
    
    echo "  Usage: ${DISK_USAGE}%"

    if [ "$DISK_USAGE" -gt "$DISK_THRESHOLD" ]; then
        echo -e "  ${RED}Status: Low disk space detected.${NC}"
    else
        echo -e "  ${GREEN}Status: Disk space is sufficient.${NC}"
    fi
    print_separator
}


# --- Main Logic ---

echo "System Health Report - $(date)"
print_separator

check_load
check_memory
check_disk

echo "Health check complete."

Step 7: Run the Script

Execute the script from your terminal to see the report.

./health_check.sh

You’ll see a clean, color-coded report. If all metrics are within the thresholds, the output will look something like this:

System Health Report - Sun Nov 16 22:31:38 UTC 2025
------------------------------------------------------------
System Load:
  Load Average (1m, 5m, 15m): 0.05, 0.15, 0.20
  Status: Load average is within normal limits.
------------------------------------------------------------
Memory Usage:
  Total Memory: 7950MB
  Used Memory: 2150MB (27%)
  Status: Memory usage is normal.
------------------------------------------------------------
Disk Usage (/):
  Usage: 45%
  Status: Disk space is sufficient.
------------------------------------------------------------
Health check complete.

If disk usage were above 90%, that section would show a red warning message instead.

Conclusion

You now have a simple, effective Bash script for monitoring your system’s health. You can run it anytime for a quick status check or customize it further. For example, you could add checks for running services (systemctl is-active nginx) or network connectivity.

As a next step, automate this script using cron to run it on a schedule. To run it every hour, edit your crontab:

crontab -e

And add the following line, making sure to use the full path to your script:

0 * * * * /home/youruser/health_check.sh > /var/log/health_report.log 2>&1

This simple automation is a cornerstone of effective Linux system administration.