Debugging MediaWiki extensions

Recently while updating the Auth_PHPBB MediaWiki extension I encountered some frustration debugging some issues happening on extension load. Here are some things that helped me track down the problem.

First, it’s useful to enable one or more debug flags in LocalSettings.php:

// Debugging
if (true) {
    // Show exceptions and DB backtraces
    $wgShowExceptionDetails = true;
    $wgShowDBErrorBacktrace = true;

    // Write out MediaWiki debug log messages
    $wgDebugLogFile = "/tmp/mw-debug.log";

    // And/or route specific log groups to individual files
    $wgDebugLogGroups = [
        "Auth_phpBB" => "/tmp/mw-debug-Auth_phpBB.log",
        "PluggableAuth" => "/tmp/mw-debug-PluggableAuth.log",
    ];

    // Disable ALL caching with the following two lines
    $wgEnableParserCache = false;
    $wgCachePages = false;
}

Enabling the full MediaWiki debug log file was the most helpful for debugging the extension as in some failure scenarios the server doesn’t return any content to the client. This was particularly frustrating since I expected such failures to output something in the PHP error log, but no. The PHP error log is also a very useful place to look for other failure scenarios though.

Remember to disable any debugging flags after you’re done with development and delete any debugging output files that were created!

python, bytecode, and read-only containers

Upon first access, python compiles .py code into bytecode and stores it in .pyc files. Subsequent uses of those python sources are read from the .pyc files without needing to re-compile. This make startup time, but not runtime, faster.

But what about read-only filesystems? If python is running on a read-only filesystem no .pyc files are written and every use of a .py file involves compiling the code afresh. Everything works, the startup time of a script is just a little slower.

Read-only filesystems within a container are a security best practice in production environments. Often in kubernetes deployment manifests you might see something like:

securityContext:
readOnlyRootFilesystem: true

And if you’re running python only once within the container, the container has a little bit of overhead at startup as it compiles into bytecode and then everything is in memory and off it goes. But if you’re running python multiple times, or want to make even the single run start faster, we can pre-compile the code when we build the container with the built-in python compileall library.

# Compile code on sys.path (includes system packages and CWD):
RUN python -c "import compileall; compileall.compile_path(maxlevels=10)"

# Compile code in a directory:
RUN python -m compileall /path/to/code

This moves the compilation overhead to the container build where it happens once, and out of the startup.

Thanks to Itamar Turner-Trauring at https://pythonspeed.com/ for their excellent Production-ready Docker packaging for Python slide deck with this gem.

MediaWiki VisualEditor: slashes and namespaces

Recently I upgraded the wiki at Distributed Proofreaders to the MediaWiki 1.35 LTS release. This comes with the fancy new VisualEditor which should be great for our users who already deal with phpBB markup in the forums and our own custom markup for project formatting.

Setting up the VisualEditor was a bit of a head-scratcher in a couple of ways and hopefully this helps others who encounter similar problems.

Error contacting the Parsoid/RESTBase server (HTTP 404)

This one frustrated me for quite some time. Everything pointed to setting

AllowEncodedSlashes NoDecode

in our Apache config, but that wasn’t working for me. For reasons I don’t understand, I needed to include this in both the :80 and :443 VirtualHost sections, not just the :443 which was serving all traffic.

Error contacting the Parsoid/RESTBase server (HTTP 500)

This was thankfully pretty obvious by looking in the php_errors log. As the VisualEditor Troubleshooting section calls out, our $wgTmpDirectory had the wrong write permissions.

Enabling for Namespaces

The documentation says that to change the namespaces that the VisualEditor will be used on, you use the English canonical names. To get this to work, we needed to use the namespace constants instead. Note that the MW code will include all content namespaces as enabled by default so you only need to include those if you want to disable them.

$wgVisualEditorAvailableNamespaces = [
// includes Content namespaces by default (Main)
NS_PROJECT => true,
NS_PROJECT_TALK => true,
NS_TALK => true,
NS_USER_TALK => true,
];

Stepping back from Distributed Proofreaders

After almost 14.5 years it’s time for me to step back from volunteering with Distributed Proofreaders. What was once an enjoyable activity has become a stressor that I simply don’t need 11 months into a pandemic.

In many ways DP has been a lifeline to me at various times in my life, giving me something constructive and meaningful I can do. This was true as I was going through my divorce a decade ago, during my sabbatical, and at the beginning of the pandemic. But the bitching and criticism that comes from virtually any change we make to the site recently has become unbearable. Complaints about changes aren’t new — humans are classically change-averse and our community seems to be doubly-so — but during the pandemic they’ve seemed to have increased in both frequency and volume.

Receiving verbal or written recognition of my work is important to me. Indeed, it’s the best, and easiest, way to keep me happy. While I have often received that type of feedback from Linda, the General Manager, and Sharon, a fellow admin and developer, I don’t usually get that from the rest of the community. Instead, I most often get the opposite. That’s very demoralizing after hours and hours of time spent.

Development Contributions

I’ve been a developer at DP for over a decade and the lead developer for the past 5+ years. Looking back I have to say we’ve collectively come a long way. I sat down and made a list of the most notable and memorable software changes that I was involved in and while there were some new features, almost all of the big changes were ensuring that the software could run on modern middleware.

My most enduring legacy at DP is likely to be that the site continues to function at all and that makes me incredibly happy.

New Features & Capabilities

Site Modernization

Middleware Support

Development Improvements

What’s Next

I’m not sure what stepping back means exactly or what’s next for me, but it’s time for a change. I’ve committed to finishing some of the planned maintenance work (assisting with the phpBB forum upgrade and eventual OS upgrade) and updating documentation. Beyond that, I’m not sure, but decidedly less of the forums and less dev work which results in all the despised changes.

I hope to find some other open source software I can contribute to. I thought perhaps I would work with other DP-adjacent open source projects like getting the Auth_phpBB MediaWiki extension updated to support the latest MediaWiki LTS, except that only took me about 12 hours.

Creating aspell dictionary packages for Ubuntu

There are many aspell dictionary packages available for Ubuntu, but not all of them. If you’re a somewhat esoteric project like Distributed Proofreaders, you may discover that you need things like the Latin aspell dictionary (aspell-la) which I can’t seem to find packaged anywhere.

Installing from source

It’s super easy and perfectly possible to install any of the aspell dictionaries directly. Just fetch the file, configure, make, and make install and you’re golden:

wget https://ftp.gnu.org/gnu/aspell/dict/la/aspell6-la-20020503-0.tar.bz2
tar xvfj aspell6-la-20020503-0.tar.bz2
cd aspell6-la-20020503-0
./configure
make
make install

The quick and dirty works but for systems maintained by multiple people it’s a recipe for disaster without a lot of documentation. How will someone remember that this needs to be done again for the next server upgrade or server migration? In these cases it’s usually best to create a system package and install the package.

Building & installing a package

Building a package for Ubuntu / Debian can be mind-boggling complicated when all you want to do is package up a few files to lay down on the filesystem. Luckily for aspell dictionaries we can easily borrow the template used by the aspell-en package.

Start by finding and downloading the aspell dictionary that you want to install from the list available and extracting it.

wget https://ftp.gnu.org/gnu/aspell/dict/la/aspell6-la-20020503-0.tar.bz2
tar xvfj aspell6-la-20020503-0.tar.bz2

Configure and build it to create the .rws file:

cd aspell6-la-20020503-0
./configure
make

Now head over to the aspell-en package on LaunchPad, to find and download the aspell-en_*.debian.tar.xz file from the Ubuntu version that most closely matches your own, then extract it into the the dictionary directory. This is the source file for the debian/ control directory used to build the aspell-en package, which we’ll use as a template for our own.

# from within aspell6-la-20020503-0/
wget https://launchpad.net/ubuntu/+archive/primary/+files/aspell-en_2017.08.24-0-0.1.debian.tar.xz
tar xvfJ aspell-en_2017.08.24-0-0.1.debian.tar.xz

This contains several files that we don’t need for our simple dictionary, so we can clean things up a bit. Keep in mind that we’re not creating a dictionary for distribution, just for ourselves, so this doesn’t have to be perfect.

cd debian
rm aspell-en.info-aspell changelog copyright extrawords.txt
cp ../COPYING copyright

You’ll need to update some of the files to reference your language, most of these are fairly straightforward:

  • control – Update references to aspell-en to your aspell dictionary; also update Maintainer and Description. You might need to change the debhelper version to whatever is installed on your system (Ubuntu 16.04 uses v9 not v10). If you change this, you should change it in compat too.
  • watch – Update the last line to point to where you got your aspell dictionary from — you probably just need to change the two instances of ‘en’ to your language’s code.

Three files require a little more finessing: installrules, and source/format.

The install file specifies which files should be copied into the package for installation. For reasons that I, frankly, just don’t understand, we need to specify that the .rws file needs to be installed. Your install file should look like this:

*.multi         usr/lib/aspell
*.alias         usr/lib/aspell
*.dat           usr/lib/aspell
*.rws           var/lib/aspell

The rules files is a makefile that does all of the heavy lifting for building the package. The version for aspell-en includes bits that we don’t care about, namely everything related to docs and extrawords, we can remove those and update the DICT_LANG which leaves us with:

#!/usr/bin/make -f

include /usr/share/cdbs/1/rules/debhelper.mk

DICT_LANG := la

DEB_DH_MD5SUMS_ARGS += -Xvar/lib/aspell

install/aspell-$(DICT_LANG)::
        for f in `LC_ALL=C ls *.cwl`; do \
            gzip -9 -n -c "$$f" > "$(DEB_DESTDIR)/usr/share/aspell/"$$f".gz"; \
            WL=`echo $$f | sed 's/\.cwl$$//'`; \
            touch "$(DEB_DESTDIR)/var/lib/aspell/$$WL.rws"; \
            dh_link "var/lib/aspell/$$WL.rws" "usr/lib/aspell/$$WL.rws"; \
            echo "$$WL" >> "$(DEB_DESTDIR)/usr/share/aspell/$(DICT_LANG).contents"; \
        done

        touch $(DEB_DESTDIR)/var/lib/aspell/$(DICT_LANG).compat

        installdeb-aspell

Note that the 8-space indents above should be tabs in your version — this is a makefile!

The final thing to do is change source/format to say we want to use the 1.0 version:

1.0

The last thing to do is to create the changelog file using dch. This file is used by the packager to determine the name and version of the package file. To keep things simple, I recommend sticking with the version from the source file itself, even if that differs from the normal Debian version format.

# from within aspell6-la-20020503-0/
dch --create -v 20020503-0 --package aspell-la

Now all that’s left is building the package:

# from within aspell6-la-20020503-0/
debuild -us -uc

If successful, this will put a aspell-la_20020503-0_all.deb file in the parent directory.

$ ls -1
aspell-la_20020503-0.dsc
aspell-la_20020503-0.tar.gz
aspell-la_20020503-0_all.deb
aspell-la_20020503-0_amd64.build
aspell-la_20020503-0_amd64.changes
aspell6-la-20020503-0
aspell6-la-20020503-0.tar.bz2

You can now install this via:

sudo apt install ./aspell-la_20020503-0_all.deb

Note, the ./ is required, otherwise it will look in the package catalog instead of on disk for the package.

You can test that your new dictionary works via:

$ echo hello | aspell list --lang=la

If that returns with “hello” as misspelled word, it worked. If you have problems, you can remove the package (sudo apt remove aspell-la), futz with some of the files, and try rebuilding it again. Things to watch out for are ensuring you’ve configured and make’d the package and that your changes to the install and rules files are correct.

DP code release with mysqli goodness

Today we set free the second DP code release this year: R201707. This comes just six months after the last major code release. Both were focused on getting us moved to modern coding practices and middleware.

Today’s release moved the code off the deprecated mysql PHP extension and over to the mysqli PHP extension for connecting to the MySQL database. This will enable the site to run on PHP 7.x in addition to PHP 5.3 and later. This change was essential in enabling the code to run on modern operating systems, such as Ubuntu 16.041.

This release also included the ability to run against phpBB 3.2 allowing pgdp.net and others to upgrade to the latest-and-greatest (and supported) version of phpBB.

Perhaps most importantly to some of our international users, this release includes a full French translation of the DP user interface.

Next up for the DP code is modernizing our HTML and CSS to bring it up-to-date as well as standardizing the look-and-feel across the site. Work is well under way by several volunteers on this front.

Many thanks to all of the volunteers who developed and tested the code in this release!


1 Technically you can run PHP 5.6 on Ubuntu 16.04 as well, but 7.x is clearly the future.

CheckType parameters for processing XUnit test results

A Jenkins pipeline can publish XUnit test results as a step in a Jenkinsfile. Being unable to find any online documentation for the XUnitBuilder CheckType parameters, I dug into the code myself to find the answers.

Here’s a full XUnitBuilder stanza like that generated from the Jenkins Pipeline Snippet Generator (with the lines wrapped):

step([$class: 'XUnitBuilder',
     testTimeMargin: '3000',
     thresholdMode: 1,
     thresholds: [
       [$class: 'FailedThreshold',
         failureNewThreshold: '',
         failureThreshold: '',
         unstableNewThreshold: '',
         unstableThreshold: ''],
       [$class: 'SkippedThreshold',
         failureNewThreshold: '',
         failureThreshold: '',
         unstableNewThreshold: '',
         unstableThreshold: '']
     ],
     tools: [
       [$class: 'CheckType',
         deleteOutputFiles: false,
         failIfNotNew: false,
         pattern: '**/unittests.xml',
         skipNoTestFiles: false,
         stopProcessingIfError: true]
     ]
])

Here are the CheckType parameters and what they mean:

  • deleteOutputFiles – If true, the output files are deleted after being processed. If false they are left in-place. Default: false.
  • failIfNotNew – If true and files match the pattern but were not updated in the last build, the check fails. This helps ensure that all tests were run. Default: false.
  • pattern – File pattern that identifies XUnit-formatted output.
  • skipNoTestFiles – If true and no test files matching pattern are found, the check is skipped. If false and no tests are found the check fails. Default: false.
  • stopProcessingIfError – If true, any error (such as an empty result file) will stop any further processing. If false, errors will be reported but processing will continue. Default: true.

Note that you can get by with a much smaller step stanza by just including values that differ from the defaults, eg:

step([$class: 'XUnitBuilder',
     tools: [
       [$class: 'CheckType',
         pattern: '**/unittests.xml',
         skipNoTestFiles: true]
     ]
])

 

DP code release with modern PHP goodness

Today I’m proud to announce a new release of the software that runs pgdp.net: R201701. The last release was a year ago and I’m trying to hold us to a yearly release cadence (compared to the 9 years in the last one).

This version contains a slew of small bug fixes and enhancements. The most notable two changes that I want to highlight are support for PHP versions > 5.3 and the new Format Preview feature.

This is the first DP release to not depend on PHP’s Magic Quotes allowing the code to run on PHP versions > 5.3 up to, but not including, PHP 7.x. This means that the DP code can run on modern operating systems such as Ubuntu 14.041 and RHEL/CentOS 7. This is a behind-the-scenes change that end users should never notice.

The most exciting user-visible change in this release is the new Format Preview functionality that assists proofreaders in formatting rounds. The new tool renders formatting via a simple toggle allowing the user to see what the formatted page would look like and alerting if it detects markup problems.

What’s next for the DP code base? We have a smattering of smaller changes coming in over the next few months. The biggest change on the horizon is moving from the deprecated mysql extension to mysqli, which will allow the code to run on PHP 7.x, and moving to phpBB 3.2.

Many thanks to all of the DP volunteers who made this release possible, including developers, squirrels, and the multitude of people who assisted in testing!


1 Ubuntu 16.04 uses PHP 7.0, but can be configured to use PHP 5.6.

Death to magic quotes

Magic quotes is a misguided feature of PHP that modifies user input to PHP pages so that the input can be used directly in SQL statements. This violates the programing principle of only escaping data when it is necessary and results in all kinds of weird edge cases.

This feature was deemed so misguided that it was deprecated in PHP 5.3 and removed entirely from PHP 5.4. The DP code base has relied on magic quotes to function from the beginning of the project in 2000.

I’m very happy to report that after much development and validation effort, we’ve removed the dependency on magic quotes from the DP code base! The work was done over the course of a year, primarily by myself with help from jmdyck, and validated by a team of squirrels (shout-out to wfarrell and srjfoo) and other volunteers. It was rolled out in production on November 5th and has been almost 100% bug-free – quite an accomplishment given how much of the code was impacted. A huge thank you to the team who helped make this possible!

The biggest win is our ability to run the DP code on much more recent versions of PHP all the way up to, and including 5.6.1

RIP magic quotes.


1 It won’t work on PHP 7.0 or later because the code still relies on the deprecated mysql extension, although I fixed that on a branch last night!

Installing yaz for PHP on Ubuntu

tl;dr

Here’s how to install yaz on Ubuntu 20.04 with PHP 7.4:

sudo apt install yaz libyaz-dev php-dev php-pear
sudo pecl install yaz

The libyaz-dev package is the important, and oft-overlooked part.

Then add the following line to /etc/php/7.4/apache2/php.ini:

extension=yaz.so

And restart apache:

sudo systemctl restart apache2

Original post from 4 years ago follows.

Numerous sites on the internet have answered the basic question of “how do I install yaz for PHP on Ubuntu”. Which basically boils down to some flavor of:

PHP 5.x

sudo apt-get install yaz
sudo apt-get install pecl      # Ubuntu pre-16.04
sudo apt-get install php-pear  # Ubuntu 16.04 and later
sudo pecl install yaz

Then add the following line to /etc/php5/apache2/php.ini:

extension=yaz.so

PHP 7.0

sudo apt-get install yaz
sudo apt-get install php7.0-dev php7.0-pear
# might just be php-dev and php-pear on your OS (eg: Ubuntu 16.04)
sudo pecl install yaz

Then add the following line to /etc/php/7.0/apache2/php.ini:

extension=yaz.so

But wait, that fails

Sadly, the pecl install will fail with the error:

checking for yaz-config... NONE
configure: error: YAZ not found (missing NONE)
ERROR: `/tmp/pear/temp/yaz/configure --with-yaz' failed

All the search results for this error solve it by downloading the yaz source code and compiling and installing it outside the package manager, which is non-ideal.

The missing piece is that yaz-config is included with the libyaz4-dev package:

sudo apt-get install libyaz4-dev

Interestingly, this yaz install blog post does explicitly calls out the need for the -dev packages, but doesn’t include the error when you don’t have it. Hopefully this blog post will tie the two bits together for future people perplexed by this.

Updates:

  • 2018-06-03: include PHP 7.0 instructions for Ubuntu 16.04.
  • 2020-12-05: include PHP 7.4 instructions for Ubuntu 20.04.