tech

Simplistic Python Command-Line Applications

2018-08-12 8 min read

Intro

Python is becoming the world’s most popular coding language, and there are so many uses for it. One of the most popular ones, probably, would be scripting.

When writing a command-line application in Python, parsing arguments passed to it is usually the problem #1.

If you’re dealing with a program that has its own dependencies or there is an environment (hope it’s virtual rather than system-wide) you don’t mind installing few packages to - you probably should consider using excellent click library.

If that doesn’t hold yet you’re going to share your application with others - please consider “investing” into proper command-line processing with argparse. I find its API to be a little bit tedious, but it’s the best you can do while relying solely on the standard library.

If none of the above holds - we’re usually down to some simplistic and minimalistic ad-hoc solutions, and as creative their authors are as bizarre, those solutions are.

I am guilty of the latter and did that many times. So many times that I came up with a “pattern” that has some convenience “features” that I find handy. So let’s get straight to the code. Here it is:

import sys


def main(_stdin, _stdout, _stderr, self, arg_1, optional=None, *rest):
    raise NotImplementedError


if __name__ == '__main__':
    sys.exit(main(sys.stdin, sys.stdout, sys.stderr, *sys.argv))

And that’s it.

If you’re curious about detailed explanations about the code above - keep reading. Otherwise, you can jump to the end to see a complete usage example.

Piecemeal

First of all, executing anything upon module import in Python is a bad idea and guaranteed to lead to frustrating errors. Because of that, it’s generally recommended to always put the code in a function which is going to be executed only when it’s intended to:

def main():
    raise NotImplementedError


if __name__ == '__main__':
    main()

Also, when writing Python, there are no excuses not to follow PEP8. That’s why there are 2 blank lines between main() function and if statement as well as between it and import sys statement at the top of the file.

import sys


def main():
    raise NotImplementedError


if __name__ == '__main__':
    main()

Yes, it’s PEP8, that says that imports should ordered, grouped and put at the top of the file.

Why do we import sys? Because that’s how we access our command-line arguments as well as standard streams such as stdin, stdout and stderr in Python.

Given the sys module, we can access sys.argv which is a list that holds all the arguments. However, instead of accessing it directly whenever we need a value we’re going to do it in once and in one place - where we call the main function:

import sys


def main(self, arg_1, optional='foo', *rest):
    raise NotImplementedError


if __name__ == '__main__':
    main(*sys.argv)

Not only we make our code better by dealing with arguments just in one place - but we also leverage Python’s ability to unpack argument lists to map arguments to variables. And because we do that, we now can leverage default argument values and arbitrary argument lists. Just let’s not forget that 1st item in the sys.argv is the path to the script that is being executed (the script we’re writing right now) the way how it was invoked (for instance, it can be something like ~/script.py). We call it self in this example but if it’s not needed we even can use _ instead of self in main function definition which is a conventional placeholder for values that are passed but not used in Python.

Now, what does it look like? Yes, right - that exactly looks like dependency injection. We’re not injecting much [yet] - just the arguments themselves, but may we commit/agree to not access sys.argv ever again, we have only one way to get the arguments our program was invoked with. It makes it really easy to trace the flow of parameters or inject particular values to induce specific behaviors for testing purposes.

But let’s take that even further! When writing scripts, dealing with data streams (stdin, stdout, stderr) may be even more common than dealing with command-line arguments. So let’s commit to the same principle as we did for arguments and inject them right away when we call the main function:

import sys


def main(_stdin, _stdout, _stderr, self, arg_1, optional='foo', *rest):
    raise NotImplementedError


if __name__ == '__main__':
    main(sys.stdin, sys.stdout, sys.stderr, *sys.argv)

In this example, we have stream parameter names prefixed with an underscore (_) to make them visually distinct from argument parameters, but that’s more of a personal style.

The idea here is that instead of access sys.stdin and other streams directly whenever we need them, we going not to do that and use _stdin instead. May we split our code into few functions we would need to pass stream parameters along which may seem tedious, but on the other hand, it makes our code more explicit and testable. Given that code is read [by humans] way more often than written, this relatively cheap “investment” cannot be overvalued when someone would be plowing through our script trying to figure out why it doesn’t work as one has expected.

Last but not least, exit codes are often under-appreciated. Not only they can be used to distinguish between successful and failed executions, but they also can highlight some common problems or help during debugging. All of that holds only when sensible exit codes are returned though. And as usual, to promote that we need to make it easier to do. Given that there is not much use for return values in the high-level functions such as main we can conveniently use their return values as exit code. The neat trick here is that in Python there is always a return value from every function call, it’s just when it’s not set it is None. If we look at sys.exit’s docstring:

>>> print(sys.exit.__doc__)
exit([status])

Exit the interpreter by raising SystemExit(status).
If the status is omitted or None, it defaults to zero (i.e., success).
If the status is an integer, it will be used as the system exit status.
If it is another kind of object, it will be printed and the system
exit status will be one (i.e., failure).

we see that None is treated as 0 which means success. In other words, we need to return something only when there is a failure.

It must be noted that everything we talked above is not foolproof and works only when the conditions above are met. May somebody access sys.argv or use print without specifying a stream via file parameter the code will become absolutely not testable and even harder to reason about.

Putting all of this together, we can proceed with a practical usage example.

Example

Let’s assume we need to write a small script that would accept two arguments - <search> and <replace> (optional, defaults to an empty string). Given those arguments, it would read standard input, search for occurrences of <search>, replace them with <replace> and write the result to standard output. If there are not enough or too many arguments are given we should exit with status code 1 and print usage example.

In reality, nobody would ever need such a script since there is tr that does all of that but 10 times better. However, it doesn’t matter as we just need an idea of a script that would involve all the standard streams along with some command-line arguments processing.

Here is how it can look like:

#!/usr/bin/env python3.6
import sys


def usage(self, _stderr):
    print("Usage:", file=_stderr)
    print(f"\t{self} <search> [<replace>]", file=_stderr)
    print(file=_stderr)
    print("\tReads stdin and writes to stdout.", file=_stderr)
    print("\tReplaces <search> with <replace>.", file=_stderr)
    print("\t<replace> defaults to an empty string.", file=_stderr)
    print(file=_stderr)
    print("Example:", file=_stderr)
    print(f"\techo foo bar baz | {self} bar", file=_stderr)
    print(f"\techo foo bar baz | {self} bar foo", file=_stderr)
    print(file=_stderr)

    return 1

def search_and_replace(_stdin, _stdout, search, replace):
    for line in _stdin:
        _stdout.write(line.replace(search, replace))

def main(_stdin, _stdout, _stderr, self, search=None, replace='', *rest):
    return (
        search_and_replace(_stdin, _stdout, search, replace)
            if search and not rest
                else usage(self, _stderr)
    )


if __name__ == '__main__':
    sys.exit(main(sys.stdin, sys.stdout, sys.stderr, *sys.argv))

Python 3.6 is assumed here (and denoted in the shebang) so that we can use amazing PEP 498 for literal string interpolation.

Another detail that may require some explanations is how we a one-line expression

X if condition else Y

became a multi-line one

(
    X
        if condition
            else Y
)

because of the author’s personal preference to avoid the addition of intermediary variables or multiple return statements with a conventional multi-line if-else statement. There is no magic to it as round braces () can be used as a safer alternative to backslashes \ when defining multi-line expressions, e.g., long strings:

>>> (
...     "a"
...     "b"
...     "c"
... )
'abc'

Now we can give it a try and see how it fails:

$ ./tr.py
Usage:
    ./tr.py <search> [<replace>]

    Reads stdin and writes to stdout.
    Replaces <search> with <replace>.
    <replace> defaults to an empty string.

Example:
    echo foo bar baz | ./tr.py bar
    echo foo bar baz | ./tr.py bar foo

and how it works:

$ echo foo bar baz | ./tr.py bar
foo  baz

$ echo foo bar baz | ./tr.py bar foo
foo foo baz

Voilà!