python, bytecode, and read-only containers

Upon first access, python compiles .py code into bytecode and stores it in .pyc files. Subsequent uses of those python sources are read from the .pyc files without needing to re-compile. This make startup time, but not runtime, faster.

But what about read-only filesystems? If python is running on a read-only filesystem no .pyc files are written and every use of a .py file involves compiling the code afresh. Everything works, the startup time of a script is just a little slower.

Read-only filesystems within a container are a security best practice in production environments. Often in kubernetes deployment manifests you might see something like:

securityContext:
readOnlyRootFilesystem: true

And if you’re running python only once within the container, the container has a little bit of overhead at startup as it compiles into bytecode and then everything is in memory and off it goes. But if you’re running python multiple times, or want to make even the single run start faster, we can pre-compile the code when we build the container with the built-in python compileall library.

# Compile code on sys.path (includes system packages and CWD):
RUN python -c "import compileall; compileall.compile_path(maxlevels=10)"

# Compile code in a directory:
RUN python -m compileall /path/to/code

This moves the compilation overhead to the container build where it happens once, and out of the startup.

Thanks to Itamar Turner-Trauring at https://pythonspeed.com/ for their excellent Production-ready Docker packaging for Python slide deck with this gem.

poetry auth via .netrc

poetry, the python package manager, provides several ways of authenticating against a repository. What isn’t explicitly documented, because it’s an implicit dependency, is that poetry can also use the ~/.netrc file for authentication when fetching packages.

poetry uses requests under the covers, and requests falls back to the ~/.netrc file. This is the same fallback method for pip for the same reason.

There are several (probably bad) reasons why someone would want to do this vs one of the explicit methods given by poetry. One that comes to mind is needing to install python packages from a private repository from inside a docker container by simply volume mapping the host’s ~/.netrc file to have poetry use the right creds.

This approach probably won’t work when publishing packages — caveat emptor.

While I’m not suggesting that this is a best practice, it’s good to know that it’s an available method in some extreme edge cases.

Secret sauce to including data_files in python bdist_rpm

2017/02/10 update: I’ve discovered a better solution (using an updated version of setuptools) and have updated the post below.

To add additional data files to python installation files — such as service files in /etc/init.d for RH-based distros — all the docs tell you to simply add a data_files line to your setup.py file, like:

    data_files=[('/etc/init.d', ['init.d/service_name'])],

If you do this, however, might get this lovely error when attempting to create the RPM:

$ python setup.py bdist_rpm
< snip >
running install_data
error: can't copy 'init.d/service_name': doesn't exist or not a regular file
error: Bad exit status from /var/tmp/rpm-tmp.DRsp99 (%install)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.DRsp99 (%install)
error: command 'rpmbuild' failed with exit status 1

This will perplex you for ages considering the file clearly exists in the path relative to setup.py, per the instructions:

$ ls -l setup.py
-rw-rw-r--. 1 cpeel cpeel 1233 Mar  9 14:37 setup.py
$ ls -l init.d/service_name
-rw-rw-r--. 1 cpeel cpeel 1242 Mar  9 14:40 init.d/service_name

The problem is that there are (at least) two different flavors of setup() functions and they are not completely alike. Both distutils and setuptools provide a setup() function and both packages have changed over time. See this StackOverflow example discussing the python setup tools landscape as of January 2017.

I don’t know what version of distutils I was using when I wrote this blog post, but it was probably some version with python 2.6. The version of distutils bundled with python 2.7.12, and presumably later, does not generate the RPM build error.

I also get the above RPM build error when using setup() from older versions of setuptools (at least 20.1.1). Updating to the most recent version of setuptools (34.1.1 as of February 2017) resolves the issue.

It appears that the general internet consensus, and official Python Packaging User Guide, is to use setuptools which can be updated independently of the python version you are using, so just be sure you’re using the latest version.

If you’re stuck on an older version of setuptools or distutils, the missing piece that took me hours to discover, is that even with data_files in setup.py, you need a MANIFEST.in file that tells them to include the files in the package:

$ cat MANIFEST.in 
recursive-include init.d *