Creating aspell dictionary packages for Ubuntu

There are many aspell dictionary packages available for Ubuntu, but not all of them. If you’re a somewhat esoteric project like Distributed Proofreaders, you may discover that you need things like the Latin aspell dictionary (aspell-la) which I can’t seem to find packaged anywhere.

Installing from source

It’s super easy and perfectly possible to install any of the aspell dictionaries directly. Just fetch the file, configure, make, and make install and you’re golden:

wget https://ftp.gnu.org/gnu/aspell/dict/la/aspell6-la-20020503-0.tar.bz2
tar xvfj aspell6-la-20020503-0.tar.bz2
cd aspell6-la-20020503-0
./configure
make
make install

The quick and dirty works but for systems maintained by multiple people it’s a recipe for disaster without a lot of documentation. How will someone remember that this needs to be done again for the next server upgrade or server migration? In these cases it’s usually best to create a system package and install the package.

Building & installing a package

Building a package for Ubuntu / Debian can be mind-boggling complicated when all you want to do is package up a few files to lay down on the filesystem. Luckily for aspell dictionaries we can easily borrow the template used by the aspell-en package.

Start by finding and downloading the aspell dictionary that you want to install from the list available and extracting it.

wget https://ftp.gnu.org/gnu/aspell/dict/la/aspell6-la-20020503-0.tar.bz2
tar xvfj aspell6-la-20020503-0.tar.bz2

Configure and build it to create the .rws file:

cd aspell6-la-20020503-0
./configure
make

Now head over to the aspell-en package on LaunchPad, to find and download the aspell-en_*.debian.tar.xz file from the Ubuntu version that most closely matches your own, then extract it into the the dictionary directory. This is the source file for the debian/ control directory used to build the aspell-en package, which we’ll use as a template for our own.

# from within aspell6-la-20020503-0/
wget https://launchpad.net/ubuntu/+archive/primary/+files/aspell-en_2017.08.24-0-0.1.debian.tar.xz
tar xvfJ aspell-en_2017.08.24-0-0.1.debian.tar.xz

This contains several files that we don’t need for our simple dictionary, so we can clean things up a bit. Keep in mind that we’re not creating a dictionary for distribution, just for ourselves, so this doesn’t have to be perfect.

cd debian
rm aspell-en.info-aspell changelog copyright extrawords.txt
cp ../COPYING copyright

You’ll need to update some of the files to reference your language, most of these are fairly straightforward:

  • control – Update references to aspell-en to your aspell dictionary; also update Maintainer and Description. You might need to change the debhelper version to whatever is installed on your system (Ubuntu 16.04 uses v9 not v10). If you change this, you should change it in compat too.
  • watch – Update the last line to point to where you got your aspell dictionary from — you probably just need to change the two instances of ‘en’ to your language’s code.

Three files require a little more finessing: installrules, and source/format.

The install file specifies which files should be copied into the package for installation. For reasons that I, frankly, just don’t understand, we need to specify that the .rws file needs to be installed. Your install file should look like this:

*.multi         usr/lib/aspell
*.alias         usr/lib/aspell
*.dat           usr/lib/aspell
*.rws           var/lib/aspell

The rules files is a makefile that does all of the heavy lifting for building the package. The version for aspell-en includes bits that we don’t care about, namely everything related to docs and extrawords, we can remove those and update the DICT_LANG which leaves us with:

#!/usr/bin/make -f

include /usr/share/cdbs/1/rules/debhelper.mk

DICT_LANG := la

DEB_DH_MD5SUMS_ARGS += -Xvar/lib/aspell

install/aspell-$(DICT_LANG)::
        for f in `LC_ALL=C ls *.cwl`; do \
            gzip -9 -n -c "$$f" > "$(DEB_DESTDIR)/usr/share/aspell/"$$f".gz"; \
            WL=`echo $$f | sed 's/\.cwl$$//'`; \
            touch "$(DEB_DESTDIR)/var/lib/aspell/$$WL.rws"; \
            dh_link "var/lib/aspell/$$WL.rws" "usr/lib/aspell/$$WL.rws"; \
            echo "$$WL" >> "$(DEB_DESTDIR)/usr/share/aspell/$(DICT_LANG).contents"; \
        done

        touch $(DEB_DESTDIR)/var/lib/aspell/$(DICT_LANG).compat

        installdeb-aspell

Note that the 8-space indents above should be tabs in your version — this is a makefile!

The final thing to do is change source/format to say we want to use the 1.0 version:

1.0

The last thing to do is to create the changelog file using dch. This file is used by the packager to determine the name and version of the package file. To keep things simple, I recommend sticking with the version from the source file itself, even if that differs from the normal Debian version format.

# from within aspell6-la-20020503-0/
dch --create -v 20020503-0 --package aspell-la

Now all that’s left is building the package:

# from within aspell6-la-20020503-0/
debuild -us -uc

If successful, this will put a aspell-la_20020503-0_all.deb file in the parent directory.

$ ls -1
aspell-la_20020503-0.dsc
aspell-la_20020503-0.tar.gz
aspell-la_20020503-0_all.deb
aspell-la_20020503-0_amd64.build
aspell-la_20020503-0_amd64.changes
aspell6-la-20020503-0
aspell6-la-20020503-0.tar.bz2

You can now install this via:

sudo apt install ./aspell-la_20020503-0_all.deb

Note, the ./ is required, otherwise it will look in the package catalog instead of on disk for the package.

You can test that your new dictionary works via:

$ echo hello | aspell list --lang=la

If that returns with “hello” as misspelled word, it worked. If you have problems, you can remove the package (sudo apt remove aspell-la), futz with some of the files, and try rebuilding it again. Things to watch out for are ensuring you’ve configured and make’d the package and that your changes to the install and rules files are correct.

DP code release with mysqli goodness

Today we set free the second DP code release this year: R201707. This comes just six months after the last major code release. Both were focused on getting us moved to modern coding practices and middleware.

Today’s release moved the code off the deprecated mysql PHP extension and over to the mysqli PHP extension for connecting to the MySQL database. This will enable the site to run on PHP 7.x in addition to PHP 5.3 and later. This change was essential in enabling the code to run on modern operating systems, such as Ubuntu 16.041.

This release also included the ability to run against phpBB 3.2 allowing pgdp.net and others to upgrade to the latest-and-greatest (and supported) version of phpBB.

Perhaps most importantly to some of our international users, this release includes a full French translation of the DP user interface.

Next up for the DP code is modernizing our HTML and CSS to bring it up-to-date as well as standardizing the look-and-feel across the site. Work is well under way by several volunteers on this front.

Many thanks to all of the volunteers who developed and tested the code in this release!


1 Technically you can run PHP 5.6 on Ubuntu 16.04 as well, but 7.x is clearly the future.

CheckType parameters for processing XUnit test results

A Jenkins pipeline can publish XUnit test results as a step in a Jenkinsfile. Being unable to find any online documentation for the XUnitBuilder CheckType parameters, I dug into the code myself to find the answers.

Here’s a full XUnitBuilder stanza like that generated from the Jenkins Pipeline Snippet Generator (with the lines wrapped):

step([$class: 'XUnitBuilder',
     testTimeMargin: '3000',
     thresholdMode: 1,
     thresholds: [
       [$class: 'FailedThreshold',
         failureNewThreshold: '',
         failureThreshold: '',
         unstableNewThreshold: '',
         unstableThreshold: ''],
       [$class: 'SkippedThreshold',
         failureNewThreshold: '',
         failureThreshold: '',
         unstableNewThreshold: '',
         unstableThreshold: '']
     ],
     tools: [
       [$class: 'CheckType',
         deleteOutputFiles: false,
         failIfNotNew: false,
         pattern: '**/unittests.xml',
         skipNoTestFiles: false,
         stopProcessingIfError: true]
     ]
])

Here are the CheckType parameters and what they mean:

  • deleteOutputFiles – If true, the output files are deleted after being processed. If false they are left in-place. Default: false.
  • failIfNotNew – If true and files match the pattern but were not updated in the last build, the check fails. This helps ensure that all tests were run. Default: false.
  • pattern – File pattern that identifies XUnit-formatted output.
  • skipNoTestFiles – If true and no test files matching pattern are found, the check is skipped. If false and no tests are found the check fails. Default: false.
  • stopProcessingIfError – If true, any error (such as an empty result file) will stop any further processing. If false, errors will be reported but processing will continue. Default: true.

Note that you can get by with a much smaller step stanza by just including values that differ from the defaults, eg:

step([$class: 'XUnitBuilder',
     tools: [
       [$class: 'CheckType',
         pattern: '**/unittests.xml',
         skipNoTestFiles: true]
     ]
])

 

DP code release with modern PHP goodness

Today I’m proud to announce a new release of the software that runs pgdp.net: R201701. The last release was a year ago and I’m trying to hold us to a yearly release cadence (compared to the 9 years in the last one).

This version contains a slew of small bug fixes and enhancements. The most notable two changes that I want to highlight are support for PHP versions > 5.3 and the new Format Preview feature.

This is the first DP release to not depend on PHP’s Magic Quotes allowing the code to run on PHP versions > 5.3 up to, but not including, PHP 7.x. This means that the DP code can run on modern operating systems such as Ubuntu 14.041 and RHEL/CentOS 7. This is a behind-the-scenes change that end users should never notice.

The most exciting user-visible change in this release is the new Format Preview functionality that assists proofreaders in formatting rounds. The new tool renders formatting via a simple toggle allowing the user to see what the formatted page would look like and alerting if it detects markup problems.

What’s next for the DP code base? We have a smattering of smaller changes coming in over the next few months. The biggest change on the horizon is moving from the deprecated mysql extension to mysqli, which will allow the code to run on PHP 7.x, and moving to phpBB 3.2.

Many thanks to all of the DP volunteers who made this release possible, including developers, squirrels, and the multitude of people who assisted in testing!


1 Ubuntu 16.04 uses PHP 7.0, but can be configured to use PHP 5.6.

Death to magic quotes

Magic quotes is a misguided feature of PHP that modifies user input to PHP pages so that the input can be used directly in SQL statements. This violates the programing principle of only escaping data when it is necessary and results in all kinds of weird edge cases.

This feature was deemed so misguided that it was deprecated in PHP 5.3 and removed entirely from PHP 5.4. The DP code base has relied on magic quotes to function from the beginning of the project in 2000.

I’m very happy to report that after much development and validation effort, we’ve removed the dependency on magic quotes from the DP code base! The work was done over the course of a year, primarily by myself with help from jmdyck, and validated by a team of squirrels (shout-out to wfarrell and srjfoo) and other volunteers. It was rolled out in production on November 5th and has been almost 100% bug-free – quite an accomplishment given how much of the code was impacted. A huge thank you to the team who helped make this possible!

The biggest win is our ability to run the DP code on much more recent versions of PHP all the way up to, and including 5.6.1

RIP magic quotes.


1 It won’t work on PHP 7.0 or later because the code still relies on the deprecated mysql extension, although I fixed that on a branch last night!

Installing yaz for PHP on Ubuntu

Numerous sites on the internet have answered the basic question of “how do I install yaz for PHP on Ubuntu”. Which basically boils down to some flavor of:

PHP 5.x

sudo apt-get install yaz
sudo apt-get install pecl      # Ubuntu pre-16.04
sudo apt-get install php-pear  # Ubuntu 16.04 and later
sudo pecl install yaz

Then add the following line to /etc/php5/apache2/php.ini:

extension=yaz.so

PHP 7.0

sudo apt-get install yaz
sudo apt-get install php7.0-dev php7.0-pear
sudo pecl install yaz

Then add the following line to /etc/php/7.0/apache2/php.ini:

extension=yaz.so

But wait, that fails

Sadly, the pecl install will fail with the error:

checking for yaz-config... NONE
configure: error: YAZ not found (missing NONE)
ERROR: `/tmp/pear/temp/yaz/configure --with-yaz' failed

All the search results for this error solve it by downloading the yaz source code and compiling and installing it outside the package manager, which is non-ideal.

The missing piece is that yaz-config is included with the libyaz4-dev package:

sudo apt-get install libyaz4-dev

Interestingly, this yaz install blog post does explicitly calls out the need for the -dev packages, but doesn’t include the error when you don’t have it. Hopefully this blog post will tie the two bits together for future people perplexed by this.

Update 2018-06-03: Updated to include PHP 7.0 instructions for Ubuntu 16.04 and later.

Enabling DP development with a developer VM

Getting started doing development on the DP code can be quite challenging. You can get a copy of the source code quite readily, but creating a system to test any changes gets complicated due to the code dependencies — primarily its tight integration with phpBB.

For a long time now, developers could request an account on our TEST server which has all the prerequisites installed, including a shared database with loaded data. There are a few downside with using the TEST server, however. The primary one being that everyone is using the shared database, significantly limiting the changes that could be made without impacting others. Another downside is that you need internet connectivity to do development work.

Having a way to do development locally on your desktop would be ideal. Installations on modern desktops are almost impossible, however, given our current dependency on magic quotes, a “feature” which has us locked on PHP 5.3, a very archaic version that no modern Linux desktop includes.

Environments like this are a perfect use case for virtual machines. While validating the installation instructions on the recent release I set out to create a DP development VM. This ensured that our instructions could be used to set up a fully-working installation of DP as well as produce a VM that others could use.

The DP development VM is a VMware VM running Ubuntu 12.04 LTS with a fully-working installation of DP. It comes pre-loaded with a variety of DP user accounts (proofer, project manager, admin) and even a sample project ready for proofing. The VM is running the R201601 release of DP source directly from the master git repo, so it’s easy to update to newer ‘production’ milestones when they come out. With the included instructions a developer can start doing DP development within minutes of downloading the VM.

I used VMware because it was convenient as I already had Fusion on my Mac and that VMware Player is freely available for Windows and Linux. A better approach would have been VirtualBox1 as it’s freely available for all platforms. Thankfully it should be fairly straightforward to create a VirtualBox VM from the VMware .vmdk (I leave this as an exercise to another developer).

After I had the VM set up and working I discovered vagrant while doing some hacking on OpenLibrary. If I had to create the VM again I would probably go the vagrant route. Although I expect it would take me a lot longer to set up it would significantly improve the development experience.

It’s too early to know if the availability of the development VM will increase the number of developers contributing to DP, but having yet another tool in the development tool-box can’t hurt.

1 Although I feel dirty using VirtualBox because it’s owned by Oracle. Granted, I feel dirty using MySQL for the same reason…