Process Logs and XML Data Using Bash (Part 31/34 )

I once had a cron job that quietly failed for three weeks straight because the script's bash logging xml parsing setup just printed to a terminal that nobody was watching. By the time anyone noticed, we had no record of what went wrong or when it started. That single missing habit, writing logs that actually survive after the script exits, cost more debugging time than the script itself took to write.

This guide is for anyone who already writes basic bash scripts and now needs them to produce logs you can trust and read XML data without losing your mind over angle brackets. By the end, you will be able to build a script that logs cleanly to a file, tags severity levels, and pulls values out of an XML file using tools made for that job instead of fragile regex.

The Problem Most Scripts Have:

Picture a deployment script running on a remote server at 2 AM through cron. It fails. You SSH in the next morning and there is nothing. No log file, no timestamp, no clue which command broke. The terminal output that would have told you everything disappeared the moment the SSH session closed.

  • Output only goes to the screen, never saved to disk
  • No timestamps, so you cannot tell when something happened relative to other events
  • Errors and normal output mixed together with no way to separate them
  • Config or data files in XML format that get parsed with grep and break the moment the format shifts slightly

Sound familiar? Let's fix both problems properly.

Before diving into logging, it helps to understand how bash treats output streams in the first place. If you have not already gone through how echo works in shell scripts, that foundation will make the redirection tricks below click faster.

#01

Why Printing to the Screen Is Not Logging

A lot of beginner scripts use echo for everything and call it logging. It is not. The moment that terminal session closes, every line of output is gone. Real logging means the message survives on disk, has a timestamp attached, and ideally tells you whether it was routine information or something that needs attention.

There is also a stream problem. Bash gives every script two separate output channels by default: standard output (stdout, file descriptor 1) and standard error (stderr, file descriptor 2). Most beginner scripts dump both into the same place, which means when something breaks you are stuck scrolling through hundreds of lines of normal output looking for the one error buried in the middle.

bash
LinuxTeck.com
#!/bin/bash
echo "Starting backup job"
some_command_that_might_not_exist
echo "Backup finished"
OUTPUT
Starting backup job
script.sh: line 3: some_command_that_might_not_exist: command not found
Backup finished

Notice the script kept going and printed "Backup finished" even though the command in the middle failed. The error message went to stderr, the success message went to stdout, and both landed in the same terminal with no way to tell which one mattered. That is the exact gap real logging closes.

#02

Bash Logging XML Parsing: Building a Timestamped Function

The simplest real upgrade is a small function that wraps every message with a timestamp before printing it. You define it once near the top of the script and call it instead of echo everywhere else. This alone makes a script's output ten times more useful when you are reading it after the fact.

bash
LinuxTeck.com
#!/bin/bash

log() {
echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"
}

log "Starting system check"
sleep 1
log "Check completed"

OUTPUT
[2026-06-20 09:14:02] Starting system check
[2026-06-20 09:14:03] Check completed

The $* inside the function grabs whatever text you pass when calling log, so the function stays generic instead of being tied to one message. Once you have this in place, the next step is sending that output somewhere that does not vanish when the terminal closes.

Note:

The single quotes around the date format string matter here. If you used double quotes, bash would try to expand the percent signs as part of variable substitution oddities in some contexts, and it just gets messy. Single quotes keep the format string literal until date itself processes it.

#03

Separating Info, Warnings, and Errors

A single log function gets you timestamps but not severity. In production, you want to be able to grep a log file for just the errors without wading through every routine message. The fix is three small functions instead of one, with warnings and errors deliberately sent to stderr using >&2 so they can be filtered separately later.

bash
LinuxTeck.com
#!/bin/bash

info() { echo "[INFO] $(date '+%H:%M:%S') $*"; }
warn() { echo "[WARN] $(date '+%H:%M:%S') $*" >&2; }
error() { echo "[ERROR] $(date '+%H:%M:%S') $*" >&2; }

info "Service started"
warn "Disk usage above 80 percent"
error "Config file missing"

OUTPUT
[INFO] 09:21:10 Service started
[WARN] 09:21:10 Disk usage above 80 percent
[ERROR] 09:21:10 Config file missing

Now you can run the script and send only the errors to one file and everything else to another with ./script.sh 2>errors.log 1>info.log. That single change is the difference between scrolling through hundreds of lines after an incident and running one grep command that shows you exactly what broke.

Common Mistake:

People write warn() { echo "[WARN] $*" >2; } with a single greater-than sign instead of >&2. This does not redirect to stderr. It creates a literal file named 2 in the current directory and writes the message there silently. The script runs without any visible error, but your warnings just disappear into a junk file called "2" that nobody ever checks.

Fix: always use >&2 with the ampersand. The ampersand tells bash that 2 refers to file descriptor 2 (stderr), not a filename. Without it, bash treats 2 as plain text and creates a file.

#04

Sending Everything to a File Without Losing the Live View

Level functions are great inside the script, but you still need the output to land on disk. The trick most production scripts use is exec combined with tee near the top of the file. This redirects the script's own output streams for the rest of its run, so every later command writes to both the screen and the log file automatically, with no extra typing needed at each line.

bash
LinuxTeck.com
#!/bin/bash
set -euo pipefail

LOG_DIR="$HOME/logs"
mkdir -p "$LOG_DIR"
LOG_FILE="$LOG_DIR/run_$(date -u +%Y%m%dT%H%M%SZ).log"

exec > >(tee -a "$LOG_FILE") 2>&1

echo "Job started"
echo "Doing work..."
echo "Job finished"

OUTPUT
Job started
Doing work...
Job finished

$ ls ~/logs/
run_20260620T035512Z.log

$ cat ~/logs/run_20260620T035512Z.log
Job started
Doing work...
Job finished

The 2>&1 at the end merges stderr into the same stream so nothing slips through unlogged, and giving every run its own timestamped filename means you never overwrite yesterday's log by accident. This pattern is exactly what you want for anything running unattended, including scripts triggered by cron jobs where nobody is watching the terminal in real time.

You can take this further by catching failures automatically with a trap, so the exact failing command gets written to the log even if you never expected that specific command to break.

bash
LinuxTeck.com
#!/bin/bash
set -Eeuo pipefail

log() { echo "[$(date '+%Y-%m-%d %H:%M:%S')] $*"; }
trap 'log "[ERROR] Command failed: $BASH_COMMAND"' ERR

log "Deploy started"
cp /etc/app/config.yml /backup/
log "Deploy finished"

OUTPUT
[2026-06-20 09:32:18] Deploy started
[2026-06-20 09:32:18] [ERROR] Command failed: cp /etc/app/config.yml /backup/
cp: cannot create regular file '/backup/config.yml': No such file or directory

If you have not worked with exit codes before, it is worth reading through bash exit codes and error handling alongside this, since traps and exit codes work together to make a script fail loudly instead of silently.

#05

Bash Logging XML Parsing: Reading XML Data Safely

This is where a lot of people get into trouble. XML shows up everywhere in older enterprise systems, deployment manifests, and config exports, and the instinct is to grab the value with grep or sed since that is what you already know. It works fine on a clean sample file and then quietly breaks the day someone adds a line break or an attribute in a different order.

The tool built for this job is xmllint, which ships in the libxml2-utils package on most distributions. It understands XML structure instead of treating it as plain text, so it does not care about whitespace, attribute order, or line breaks the way grep does.

bash
LinuxTeck.com
sudo apt install libxml2-utils
xmllint --xpath "//server/@ip" servers.xml

Say you have a config file listing servers, and you need a script to pull every IP address out of it automatically before running a health check loop. Here is the sample servers.xml file used for this example, followed by the script that reads it.

xml
LinuxTeck.com
<?xml version="1.0" encoding="UTF-8"?>
<infrastructure>
<server ip="192.168.1.10" name="web-prod-01" />
<server ip="192.168.1.20" name="db-prod-01" />
<server ip="192.168.1.30" name="cache-prod-01" />
</infrastructure>
bash
LinuxTeck.com
#!/bin/bash
set -euo pipefail

CONFIG_FILE="servers.xml"

if [ ! -f "$CONFIG_FILE" ]; then
echo "[ERROR] Missing config: $CONFIG_FILE" >&2
exit 1
fi

# Reading line by line with while avoids word-splitting on the result
xmllint --xpath "//server/@ip" "$CONFIG_FILE" | grep -oP '(?<=")[^"]+' | while read -r ip; do
echo "Checking $ip..."
done

OUTPUT
Checking 192.168.1.10...
Checking 192.168.1.20...
Checking 192.168.1.30...

The --xpath flag returns the matching attribute including its quotes, so the grep right after it just peels the quotes off to give you a clean list of values. Piping that straight into a while read -r ip loop instead of storing it in a variable first avoids word-splitting issues if a hostname or value ever contains a space. For repeated work where you need to both query and rewrite XML, xmlstarlet is worth installing alongside xmllint since it adds editing and transformation on top of querying.

Common Mistake:

The classic mistake is writing grep "ip=" servers.xml | cut -d'"' -f2 and calling it done. It works exactly once, on the exact sample file you tested with. The moment someone reformats the XML onto a single line, adds a comment, or reorders the attributes, the cut field positions shift and the script returns garbage with no error at all, which is worse than a crash.

Fix: use xmllint --xpath or xmlstarlet sel instead of grep and cut for anything XML. They parse the document structure rather than guessing at text positions, so formatting changes do not silently corrupt your output.

#06

Putting Logging and XML Parsing Into One Script

Here is the part that actually shows up in real bash logging xml parsing environments. You get a config in XML, you need to act on it, and you need a record of what happened. A common pattern is reading a list of services from an XML manifest, restarting each one, and logging every step to a file so the next person does not have to guess what ran. Here is the sample services.xml file that matches the log output below it.

xml
LinuxTeck.com
<?xml version="1.0" encoding="UTF-8"?>
<manifest>
<service name="nginx" type="web" />
<service name="api-gateway" type="backend" />
</manifest>
bash
LinuxTeck.com
#!/bin/bash
set -Eeuo pipefail

LOG_DIR="$HOME/logs"
mkdir -p "$LOG_DIR"
LOG_FILE="$LOG_DIR/restart_$(date -u +%Y%m%dT%H%M%SZ).log"
exec > >(tee -a "$LOG_FILE") 2>&1

log() { echo "[$(date '+%H:%M:%S')] $*"; }
trap 'log "[ERROR] Failed at: $BASH_COMMAND"' ERR

MANIFEST="services.xml"

if [ ! -f "$MANIFEST" ]; then
log "[ERROR] Manifest file not found: $MANIFEST"
exit 1
fi

log "Reading manifest: $MANIFEST"

# Herestring feeds the extracted names in without a subshell, so log() still works
while IFS= read -r svc; do
log "Restarting $svc"
sudo systemctl restart "$svc"
log "$svc restarted successfully"
done <<< "$(xmllint --xpath "//service/@name" "$MANIFEST" | grep -oP '(?<=")[^"]+')"

log "All services processed"

OUTPUT
[03:55:12] Reading manifest: services.xml
[03:55:12] Restarting nginx
[03:55:13] nginx restarted successfully
[03:55:13] Restarting api-gateway
[03:55:13] api-gateway restarted successfully
[03:55:13] All services processed

If this runs from cron and one of those services has a typo in the manifest, the trap catches the failing systemctl command, logs exactly which service name was wrong, and the timestamped log file in ~/logs is sitting there waiting for you the next morning. No guessing, no recreating the failure by hand.

Note:

If you are setting this up to run on a schedule rather than by hand, it is worth comparing cron against systemd timers first, since timers give you built-in logging through journalctl that pairs nicely with the manual logging shown here.

For debugging a script like this while you are still writing it, run it with bash -x script.sh once to see every command bash actually executes, variable expansions included. It is slower and noisier than the clean log output above, but it catches the kind of typo that a trap alone will not explain clearly.

END

What Changes Once You Have Real Logs and Real XML Parsing

Once a script logs to a timestamped file with severity levels baked in, you stop dreading the "it ran fine for me" conversation. You can hand a log file to a teammate and they can see exactly what happened without re-running anything. The trap pattern means failures point at the actual broken command instead of a vague exit code, which matters most at the worst possible time, during an incident at 3 AM.

The XML side matters for a different reason. A lot of infrastructure still ships config and inventory data in XML, and treating it as plain text with grep works right up until it does not, usually during a deploy you cannot easily roll back. Parsing it properly with xmllint or xmlstarlet means a reformatted file or an extra attribute does not quietly break your automation.

Once these two pieces are solid, the natural next step is wrapping the same logging pattern around bash scripting log processing and structured output for bigger jobs, things like multi-server health checks or scheduled report generation, where the log file becomes the audit trail rather than an afterthought. The official GNU Bash reference manual is worth bookmarking once you start combining traps, process substitution, and file descriptors like this.

FAQ

Frequently Asked Questions

Why does my logging work fine when I run the script manually but the log file is empty when cron runs it?

Almost always a path problem. Cron runs with a much smaller environment and often a different working directory than your interactive shell, so relative paths like logs/run.log point somewhere unexpected. Use full paths like $HOME/logs/run.log or define an absolute path at the top of the script, and you will stop chasing this one.

Do I really need set -euo pipefail in every script, even small ones?

For anything you plan to keep around or run unattended, yes. Without it, a failed command in the middle of a pipeline can get silently swallowed and the script keeps going as if nothing happened. It costs you nothing to add and saves you from exactly the kind of silent failure this whole article is about avoiding.

What's the difference between exec > >(tee -a log) and just using >> log at the end of every line?

The exec and tee combo redirects the script's output streams once, so every later command automatically writes to both the screen and the file with zero extra typing. Appending >> log manually to every single line works too, but you have to remember it every time, and one missed line means a gap in your record.

My XML file has namespaces and xmllint returns nothing. What's going on?

XPath queries need to account for namespaces explicitly, they do not get ignored just because you did not mention them. You either register the namespace prefix in your xmllint or xmlstarlet command, or use a local-name() match in your XPath to sidestep the namespace check entirely. This trips up almost everyone the first time they touch a namespaced XML file.

Should log messages go to stdout or stderr?

Routine progress messages belong on stdout. Anything that signals a problem, warnings and errors, belongs on stderr. This separation is what lets you filter a script's output later with simple redirection instead of grepping through a wall of mixed text.

Is grep ever okay to use on an XML file, or should I avoid it completely?

Grep is fine for a quick sanity check, like confirming a string exists somewhere in the file. It is not fine for actually extracting values you plan to act on programmatically, since it has no concept of XML structure. Once the output of a command feeds into a loop or a decision in your script, switch to xmllint or xmlstarlet.

END

Summary

Now that you have a working pattern for both pieces, timestamped logging with severity levels and structure-aware XML parsing, you can stop guessing why a script failed and start reading the answer straight out of a log file. The next logical step is learning how to write reusable bash functions so the logging setup here becomes a shared block you drop into every script instead of retyping it.

Related Articles

LinuxTeck - A Complete Linux Learning Blog
Learn step-by-step how to automate Linux tasks with real-world scripts and practical examples.

About Sharon J

Sharon J is a Linux System Administrator with strong expertise in server and system management. She turns real-world experience into practical Linux guides on Linux Teck.

View all posts by Sharon J →

Leave a Reply

Your email address will not be published.

L