CLI Practices

CLI Practices

I’ve written hundreds of CLI tools over the years on a variety of operating systems, in a ton of languages.

I have a set of practices that I invariably do, and I think they’re good for anyone.

The Future

You can’t tell the future.

The one-offs that you think are unimportant often are the things that stick around, or that you need to go back to months and years later.

Sometimes you have to hand that code off to someone else to build on or maintain.

And sometimes, the thing you think is important and lasting goes nowhere.

Some of this advice applies to all code and not just CLIs, but I want to cover some of my practices that I’ve been called pedantic for… and thanked for.

Revision Control Everything

Don’t ever leave code on a laptop or server. Always keep a history. I’ve hit problems that I know I made a solution to years ago and couldn’t find them because I failed to do this. I made sure I followed this rule about 10 years ago and I haven’t regretted it. At least once a month I go back to something I made, either for myself, or as an example to help a friend.

For “real” things, I keep individual repositories. I may archive them after years of neglect. A lot are personal repos, code I never really want anyone to see.

For scripts, I make sure to keep them in a repository alongside the thing they support when I can. Maybe it’s a piece of software or a collection of data processing scripts that work on the same task or data, like capacity planning.

When they are truly 1-off, I have 2 “scripts” repositories. One for personal, one for work. My “work” scripts are made to work with my employers resources and make no sense outside of that context. My personal scripts are just handy things I use.

Then I have things like shell functions. I manage that, my shell config, and all my dotfiles with chezmoi. I’d made a lot of dotfile management systems over the years, and this is the only one I’ve stuck with for a significant amount of time.

Always support options and arguments

Don’t tack it on later. Just start by writing support for options and arguments or using a library for them. Assume that you’ll need them and follow the idiomatic method for whatever language you’re using.

For more “real” programming languages, I do tend to use libraries outside of the standard library. Here are a couple examples:

Go

These days, I write most CLIs in Go. It’s a good fit. I use Cobra (and almost always Viper with it). I keep a config file for the cobra CLI in my dotfiles so I can scaffold a project out any time.

flags is fine. I just find Cobra fast, friendly, and

Python

In Python I like Click. I used to use this more, but distribution is just such a pain. It’s especially bad to share with someone unfamiliar with Python tooling. But for some tasks it is the best tool for the job. Stat or working in/with Python systems are a couple examples.

Shell

For Bourne-ish shells, I always start with the same boilerplate that uses the built-in getopts

#!/bin/bash

prog=${0##*/}

SILENT=false
TARGET=""

usage() {
cat <<EOM

${prog} USAGE
  ${prog} is a tool that does X, Y, and Z

  ${prog} [-t OPT] [-s] ACTION
  ${prog} -h

OPTIONS
  -h        Print this handy message
  -s        An option without an argument
  -t OPT    Some option with an argument

ARGUMENTS
  ACTION    Some argument

EXAMPLES
  To do cool things run:
  ${prog} coolthing

  To do cool things silently run:
  ${prog} -s coolthing

  To do cool thing on all nunchucks run:
  ${prog} -t nunchucks coolthing
EOM
}

while getopts hst: OPT; do
    case "$OPT" in
        h)
            usage
            exit 0
            ;;
        s)
            SILENT=true
            ;;
        t)
            TARGET="$OPTARG"
            ;;
        *)
            usage
            exit 22
            ;;
    esac
done

shift $((OPTIND -1))

ACTION="$1"
if [ -z "$ACTION" ]; then
    echo "ERROR: Missing ACTION" >&2
    usage
    exit 22
fi

# maybe a 1 line command

It’s a lot of boilerplate, I know. You can probably use snippets or some tool in your editor. I’m weird enough to just have it as muscle memory. Key things

  1. I always get the program name with just POSIX string methods instead of executing basename. I try to stay within the shell when I can and only execute commands when I need to or when its less obtuse.
  2. Always define a usage message and stick to the norms of a Unix system. There are a few flavors, this is just what I’ve settled on.
  3. Always have a help message.
  4. Use getopts to process the arguments and don’t write your own from scratch. You can also use getopt, but I try to stick to shell built-ins when I can. You can walk the docs to see the differences.
  5. shift to strip out everything getopts dealt with
  6. Do your validation. Make sure any limited option argument or CLI argument is looked at as early as possible.
  7. Exit with the correct error code from errno.h. 22 is Invalid argument. It’s one of a few I have memorized.

The only time I don’t use getopts is when I know I’ll only have arguments that I can handle with a case statement.

I’ll draw out the reasoning for this more, since it applies to more than just shell.

Always give help

A lot of CLIs are very simple. And using what I’ve described will likely be more code than the actual logic. I have shell scripts that have all of that preceding a single command like ldapsearch or curl + jq.

But in 6 months, I won’t remember how it works. Heck, usually 2 weeks. Reading the code can be painful because I have to look up documentation to remember exactly what that option or query does. You may end up passing it along to others who don’t understand any of the logic.

I’m very against wasting time on defensive programming things “in case.” This is not that. Th e majority of the time, it has happened for me.

Also if you know you always have a -h or --help, then you know you can run it like that and not have to look to see if it will handle it or explode.

Sidenote: The hostname command on older Unix did not handle options. If you ran hostname -h, it would dutifully set the host name to -h. That causes all kinds of crazy things when cron, etc. kicks off and passes the host name to other commands. I always use uname -n to check the host name because of a tragic hostname -h accident.

Don’t be hostname. Be better than that.

Always use the correct error values

Read errno.h. Keep it handy. Only exit 1, for errors not described by it. Yes, there are a lot of bizarre error codes for things you’ve never encountered. But these are just like HTTP status codes. Giving the right value enables usage later.

A lot of times you can just return the error value of a command or function you ran that failed and not have to reinvent it.

Validate first

Before you do anything do your input validations. Print an error that is actionable by the user. If it’s a usage error, print out the usage. Help the user as quickly as possible and try not to let things explode by running obviously bad or missing values.

Also peeling off these things outside of the domain on your function (in math terms) makes your logic cleaner. It makes the code readable to not have validation and “business” logic intermingled.

That brings in a topic sideways that bugs me. I’ll hear or read people say not to use break and continue, or to have only 1 return statement per function because it makes it readable. I really couldn’t disagree more for the reasons above. Forcing yourself into that rule results in jumping through hoops and a lot of inefficiency.

The idea of peeling away exceptional cases first was taught to me by a older programmer responsible for some traditional Unix tools that caught on, and many more that didn’t. More than his advice, the evidence in the codebase convinced me. It functions as an optimization, reducing unnecessary loop iterations, etc. but that’s only a side benefit. You have a section that clearly defines what the code is not going to handle (comments for those conditions are nice), and then a clear distinct block that does the work.

Whenever possible, act like a CLI

Whether a shell script, interpreted language, or compiled code, write it so that that it could be used by a future shell script.

If you work on a file, take STDIN without a file specified, or allow - to be specified as the file. Do the same with STDOUT. This gives your CLI the magic of participating in the Unix environment, being able to handle pipes, redirection, etc.

As stated before, use correct error values, so that a script can react appropriately or pass them forward.

Don’t reinvent the wheel, cheat when you can

Don’t waste time writing things that already exist unless you’ve used them and they don’t work for your case. Look for opportunities to use tools for any of the functionality you need. Most of the time.

The exception to this is if you’re using an exercise like this to learn something new. A new language, an API, etc.

But if you’re building something for work, do as little work as possible to get the job you need done. As your work is used, you’ll find the places that it needs to be changed and where the work really needs to be focused.

I wrote a script to work with hundreds of YAML files with an existing CLI. When it crossed into thousands of YAML files, it grew horribly inefficient. I borrowed the structs from the CLI, and rewrote it to be a single binary to avoid the overhead of creating hundreds of processes and opening files over and over. Note: I did not invent the system that used YAML files as a database. It’s handy and useful but does not scale gracefully.

When I wrote the original script, I didn’t know if it would be used regularly, monthly, etc. So I made a thing that worked. When it later turned out to be daily in a pipeline and the dataset expanded, then it was time to make it more real. And because of the experience I knew exactly where the bottlenecks where and I was able to make a very real tool, usable by people who don’t write software in just a couple days (including the myriad of distractions at work).

Well, and I later turned that code into a more holistic piece of software that runs as a service with a REST interface and works around the inefficiency of files-as-database better. It’s worth it to do that and stay within the ecosystem of that very YAMLy toolchain.