Creating aspell dictionary packages for Ubuntu

There are many aspell dictionary packages available for Ubuntu, but not all of them. If you’re a somewhat esoteric project like Distributed Proofreaders, you may discover that you need things like the Latin aspell dictionary (aspell-la) which I can’t seem to find packaged anywhere.

Installing from source

It’s super easy and perfectly possible to install any of the aspell dictionaries directly. Just fetch the file, configure, make, and make install and you’re golden:

wget https://ftp.gnu.org/gnu/aspell/dict/la/aspell6-la-20020503-0.tar.bz2
tar xvfj aspell6-la-20020503-0.tar.bz2
cd aspell6-la-20020503-0
./configure
make
make install

The quick and dirty works but for systems maintained by multiple people it’s a recipe for disaster without a lot of documentation. How will someone remember that this needs to be done again for the next server upgrade or server migration? In these cases it’s usually best to create a system package and install the package.

Building & installing a package

Building a package for Ubuntu / Debian can be mind-boggling complicated when all you want to do is package up a few files to lay down on the filesystem. Luckily for aspell dictionaries we can easily borrow the template used by the aspell-en package.

Start by finding and downloading the aspell dictionary that you want to install from the list available and extracting it.

wget https://ftp.gnu.org/gnu/aspell/dict/la/aspell6-la-20020503-0.tar.bz2
tar xvfj aspell6-la-20020503-0.tar.bz2

Configure and build it to create the .rws file:

cd aspell6-la-20020503-0
./configure
make

Now head over to the aspell-en package on LaunchPad, to find and download the aspell-en_*.debian.tar.xz file from the Ubuntu version that most closely matches your own, then extract it into the the dictionary directory. This is the source file for the debian/ control directory used to build the aspell-en package, which we’ll use as a template for our own.

# from within aspell6-la-20020503-0/
wget https://launchpad.net/ubuntu/+archive/primary/+files/aspell-en_2017.08.24-0-0.1.debian.tar.xz
tar xvfJ aspell-en_2017.08.24-0-0.1.debian.tar.xz

This contains several files that we don’t need for our simple dictionary, so we can clean things up a bit. Keep in mind that we’re not creating a dictionary for distribution, just for ourselves, so this doesn’t have to be perfect.

cd debian
rm aspell-en.info-aspell changelog copyright extrawords.txt
cp ../COPYING copyright

You’ll need to update some of the files to reference your language, most of these are fairly straightforward:

  • control – Update references to aspell-en to your aspell dictionary; also update Maintainer and Description. You might need to change the debhelper version to whatever is installed on your system (Ubuntu 16.04 uses v9 not v10). If you change this, you should change it in compat too.
  • watch – Update the last line to point to where you got your aspell dictionary from — you probably just need to change the two instances of ‘en’ to your language’s code.

Three files require a little more finessing: installrules, and source/format.

The install file specifies which files should be copied into the package for installation. For reasons that I, frankly, just don’t understand, we need to specify that the .rws file needs to be installed. Your install file should look like this:

*.multi         usr/lib/aspell
*.alias         usr/lib/aspell
*.dat           usr/lib/aspell
*.rws           var/lib/aspell

The rules files is a makefile that does all of the heavy lifting for building the package. The version for aspell-en includes bits that we don’t care about, namely everything related to docs and extrawords, we can remove those and update the DICT_LANG which leaves us with:

#!/usr/bin/make -f

include /usr/share/cdbs/1/rules/debhelper.mk

DICT_LANG := la

DEB_DH_MD5SUMS_ARGS += -Xvar/lib/aspell

install/aspell-$(DICT_LANG)::
        for f in `LC_ALL=C ls *.cwl`; do \
            gzip -9 -n -c "$$f" > "$(DEB_DESTDIR)/usr/share/aspell/"$$f".gz"; \
            WL=`echo $$f | sed 's/\.cwl$$//'`; \
            touch "$(DEB_DESTDIR)/var/lib/aspell/$$WL.rws"; \
            dh_link "var/lib/aspell/$$WL.rws" "usr/lib/aspell/$$WL.rws"; \
            echo "$$WL" >> "$(DEB_DESTDIR)/usr/share/aspell/$(DICT_LANG).contents"; \
        done

        touch $(DEB_DESTDIR)/var/lib/aspell/$(DICT_LANG).compat

        installdeb-aspell

Note that the 8-space indents above should be tabs in your version — this is a makefile!

The final thing to do is change source/format to say we want to use the 1.0 version:

1.0

The last thing to do is to create the changelog file using dch. This file is used by the packager to determine the name and version of the package file. To keep things simple, I recommend sticking with the version from the source file itself, even if that differs from the normal Debian version format.

# from within aspell6-la-20020503-0/
dch --create -v 20020503-0 --package aspell-la

Now all that’s left is building the package:

# from within aspell6-la-20020503-0/
debuild -us -uc

If successful, this will put a aspell-la_20020503-0_all.deb file in the parent directory.

$ ls -1
aspell-la_20020503-0.dsc
aspell-la_20020503-0.tar.gz
aspell-la_20020503-0_all.deb
aspell-la_20020503-0_amd64.build
aspell-la_20020503-0_amd64.changes
aspell6-la-20020503-0
aspell6-la-20020503-0.tar.bz2

You can now install this via:

sudo apt install ./aspell-la_20020503-0_all.deb

Note, the ./ is required, otherwise it will look in the package catalog instead of on disk for the package.

You can test that your new dictionary works via:

$ echo hello | aspell list --lang=la

If that returns with “hello” as misspelled word, it worked. If you have problems, you can remove the package (sudo apt remove aspell-la), futz with some of the files, and try rebuilding it again. Things to watch out for are ensuring you’ve configured and make’d the package and that your changes to the install and rules files are correct.

DP code release with mysqli goodness

Today we set free the second DP code release this year: R201707. This comes just six months after the last major code release. Both were focused on getting us moved to modern coding practices and middleware.

Today’s release moved the code off the deprecated mysql PHP extension and over to the mysqli PHP extension for connecting to the MySQL database. This will enable the site to run on PHP 7.x in addition to PHP 5.3 and later. This change was essential in enabling the code to run on modern operating systems, such as Ubuntu 16.041.

This release also included the ability to run against phpBB 3.2 allowing pgdp.net and others to upgrade to the latest-and-greatest (and supported) version of phpBB.

Perhaps most importantly to some of our international users, this release includes a full French translation of the DP user interface.

Next up for the DP code is modernizing our HTML and CSS to bring it up-to-date as well as standardizing the look-and-feel across the site. Work is well under way by several volunteers on this front.

Many thanks to all of the volunteers who developed and tested the code in this release!


1 Technically you can run PHP 5.6 on Ubuntu 16.04 as well, but 7.x is clearly the future.

CheckType parameters for processing XUnit test results

A Jenkins pipeline can publish XUnit test results as a step in a Jenkinsfile. Being unable to find any online documentation for the XUnitBuilder CheckType parameters, I dug into the code myself to find the answers.

Here’s a full XUnitBuilder stanza like that generated from the Jenkins Pipeline Snippet Generator (with the lines wrapped):

step([$class: 'XUnitBuilder',
     testTimeMargin: '3000',
     thresholdMode: 1,
     thresholds: [
       [$class: 'FailedThreshold',
         failureNewThreshold: '',
         failureThreshold: '',
         unstableNewThreshold: '',
         unstableThreshold: ''],
       [$class: 'SkippedThreshold',
         failureNewThreshold: '',
         failureThreshold: '',
         unstableNewThreshold: '',
         unstableThreshold: '']
     ],
     tools: [
       [$class: 'CheckType',
         deleteOutputFiles: false,
         failIfNotNew: false,
         pattern: '**/unittests.xml',
         skipNoTestFiles: false,
         stopProcessingIfError: true]
     ]
])

Here are the CheckType parameters and what they mean:

  • deleteOutputFiles – If true, the output files are deleted after being processed. If false they are left in-place. Default: false.
  • failIfNotNew – If true and files match the pattern but were not updated in the last build, the check fails. This helps ensure that all tests were run. Default: false.
  • pattern – File pattern that identifies XUnit-formatted output.
  • skipNoTestFiles – If true and no test files matching pattern are found, the check is skipped. If false and no tests are found the check fails. Default: false.
  • stopProcessingIfError – If true, any error (such as an empty result file) will stop any further processing. If false, errors will be reported but processing will continue. Default: true.

Note that you can get by with a much smaller step stanza by just including values that differ from the defaults, eg:

step([$class: 'XUnitBuilder',
     tools: [
       [$class: 'CheckType',
         pattern: '**/unittests.xml',
         skipNoTestFiles: true]
     ]
])

 

DP code release with modern PHP goodness

Today I’m proud to announce a new release of the software that runs pgdp.net: R201701. The last release was a year ago and I’m trying to hold us to a yearly release cadence (compared to the 9 years in the last one).

This version contains a slew of small bug fixes and enhancements. The most notable two changes that I want to highlight are support for PHP versions > 5.3 and the new Format Preview feature.

This is the first DP release to not depend on PHP’s Magic Quotes allowing the code to run on PHP versions > 5.3 up to, but not including, PHP 7.x. This means that the DP code can run on modern operating systems such as Ubuntu 14.041 and RHEL/CentOS 7. This is a behind-the-scenes change that end users should never notice.

The most exciting user-visible change in this release is the new Format Preview functionality that assists proofreaders in formatting rounds. The new tool renders formatting via a simple toggle allowing the user to see what the formatted page would look like and alerting if it detects markup problems.

What’s next for the DP code base? We have a smattering of smaller changes coming in over the next few months. The biggest change on the horizon is moving from the deprecated mysql extension to mysqli, which will allow the code to run on PHP 7.x, and moving to phpBB 3.2.

Many thanks to all of the DP volunteers who made this release possible, including developers, squirrels, and the multitude of people who assisted in testing!


1 Ubuntu 16.04 uses PHP 7.0, but can be configured to use PHP 5.6.

Death to magic quotes

Magic quotes is a misguided feature of PHP that modifies user input to PHP pages so that the input can be used directly in SQL statements. This violates the programing principle of only escaping data when it is necessary and results in all kinds of weird edge cases.

This feature was deemed so misguided that it was deprecated in PHP 5.3 and removed entirely from PHP 5.4. The DP code base has relied on magic quotes to function from the beginning of the project in 2000.

I’m very happy to report that after much development and validation effort, we’ve removed the dependency on magic quotes from the DP code base! The work was done over the course of a year, primarily by myself with help from jmdyck, and validated by a team of squirrels (shout-out to wfarrell and srjfoo) and other volunteers. It was rolled out in production on November 5th and has been almost 100% bug-free – quite an accomplishment given how much of the code was impacted. A huge thank you to the team who helped make this possible!

The biggest win is our ability to run the DP code on much more recent versions of PHP all the way up to, and including 5.6.1

RIP magic quotes.


1 It won’t work on PHP 7.0 or later because the code still relies on the deprecated mysql extension, although I fixed that on a branch last night!

Installing yaz for PHP on Ubuntu

Numerous sites on the internet have answered the basic question of “how do I install yaz for PHP on Ubuntu”. Which basically boils down to some flavor of:

PHP 5.x

sudo apt-get install yaz
sudo apt-get install pecl      # Ubuntu pre-16.04
sudo apt-get install php-pear  # Ubuntu 16.04 and later
sudo pecl install yaz

Then add the following line to /etc/php5/apache2/php.ini:

extension=yaz.so

PHP 7.0

sudo apt-get install yaz
sudo apt-get install php7.0-dev php7.0-pear
# might just be php-dev and php-pear on your OS (eg: Ubuntu 16.04)
sudo pecl install yaz

Then add the following line to /etc/php/7.0/apache2/php.ini:

extension=yaz.so

But wait, that fails

Sadly, the pecl install will fail with the error:

checking for yaz-config... NONE
configure: error: YAZ not found (missing NONE)
ERROR: `/tmp/pear/temp/yaz/configure --with-yaz' failed

All the search results for this error solve it by downloading the yaz source code and compiling and installing it outside the package manager, which is non-ideal.

The missing piece is that yaz-config is included with the libyaz4-dev package:

sudo apt-get install libyaz4-dev

Interestingly, this yaz install blog post does explicitly calls out the need for the -dev packages, but doesn’t include the error when you don’t have it. Hopefully this blog post will tie the two bits together for future people perplexed by this.

Update 2018-06-03: Updated to include PHP 7.0 instructions for Ubuntu 16.04 and later.

Enabling DP development with a developer VM

Getting started doing development on the DP code can be quite challenging. You can get a copy of the source code quite readily, but creating a system to test any changes gets complicated due to the code dependencies — primarily its tight integration with phpBB.

For a long time now, developers could request an account on our TEST server which has all the prerequisites installed, including a shared database with loaded data. There are a few downside with using the TEST server, however. The primary one being that everyone is using the shared database, significantly limiting the changes that could be made without impacting others. Another downside is that you need internet connectivity to do development work.

Having a way to do development locally on your desktop would be ideal. Installations on modern desktops are almost impossible, however, given our current dependency on magic quotes, a “feature” which has us locked on PHP 5.3, a very archaic version that no modern Linux desktop includes.

Environments like this are a perfect use case for virtual machines. While validating the installation instructions on the recent release I set out to create a DP development VM. This ensured that our instructions could be used to set up a fully-working installation of DP as well as produce a VM that others could use.

The DP development VM is a VMware VM running Ubuntu 12.04 LTS with a fully-working installation of DP. It comes pre-loaded with a variety of DP user accounts (proofer, project manager, admin) and even a sample project ready for proofing. The VM is running the R201601 release of DP source directly from the master git repo, so it’s easy to update to newer ‘production’ milestones when they come out. With the included instructions a developer can start doing DP development within minutes of downloading the VM.

I used VMware because it was convenient as I already had Fusion on my Mac and that VMware Player is freely available for Windows and Linux. A better approach would have been VirtualBox1 as it’s freely available for all platforms. Thankfully it should be fairly straightforward to create a VirtualBox VM from the VMware .vmdk (I leave this as an exercise to another developer).

After I had the VM set up and working I discovered vagrant while doing some hacking on OpenLibrary. If I had to create the VM again I would probably go the vagrant route. Although I expect it would take me a lot longer to set up it would significantly improve the development experience.

It’s too early to know if the availability of the development VM will increase the number of developers contributing to DP, but having yet another tool in the development tool-box can’t hurt.

1 Although I feel dirty using VirtualBox because it’s owned by Oracle. Granted, I feel dirty using MySQL for the same reason…

A new release of the DP site code, 9 years in the making

Today we released a new version of the Distributed Proofreaders code that runs pgdp.net! The announcement includes a list of what’s changed in the 9 years since the last release as well as a list of contributors, some statistics, and next steps. I’ve been working on getting a new release cut since mid-September so I’m pretty excited about it!

The prior release was in September 2006 and since that time there have been continuous, albeit irregular, updates to pgdp.net, but no package available for folks to download for new installations or to update their existing ones. Instead, enterprising individuals had to pull code from the ‘production’ tag in CVS (yes, seriously).

In the process of getting the code ready for release I noticed that there had been changes to the database on pgdp.net that hadn’t been reflected in the initial DB schema or the upgrade scripts in the code. So even if someone had downloaded the code from CVS they would have struggled to get it working.

As part of cutting the release I walked through the documentation that we provide, including the installation, upgrade, and configuration steps, and realized how much implied knowledge was in there. Much of the release process was me updating the documentation after learning what you were suppose to do.1 I ended up creating a full DP installation on a virtual machine to ensure the installation steps produced a working system. I’m not saying they’re now perfect, but they are certainly better than before.

Cutting a release is important for multiple reasons, including the ability for others to use code that is known to work. But the most important to me as a developer is the ability to reset dependency versions going forward. The current code, including that released today, continues to work on severely antiquated versions of PHP (4.x up through 5.3) and MySQL (4.x up to 5.1). This was a pseudo design decision in order to allow sites running on shared hosting with no control over their middleware to continue to function. Given how the hosting landscape has changed drastically over the past 9 years, and how really old those versions are, we decided it’s time to change that.

Going forward we’re resetting the requirements to be PHP 5.3 (but not later, due to our frustrating dependency on magic quotes) and MySQL 5.1 and later. This will allow us to use modern programming features like classes and exceptions that we couldn’t before.

Now that we have a release behind us, I’m excited to get more developers involved and start making some much-needed sweeping changes. Things like removing our dependency on magic quotes and creating a RESTful API to allow programmatic access to DP data. I’m hoping being on git and the availability of a development VM (more on that in a future blog post) will accelerate development.

If you’re looking for somewhere to volunteer as a developer for a literary2 great cause, come join us!

1 A serious hat-tip to all of my tech writer friends who do this on a daily basis!

2 See what I did there?

Secret sauce to including data_files in python bdist_rpm

2017/02/10 update: I’ve discovered a better solution (using an updated version of setuptools) and have updated the post below.

To add additional data files to python installation files — such as service files in /etc/init.d for RH-based distros — all the docs tell you to simply add a data_files line to your setup.py file, like:

    data_files=[('/etc/init.d', ['init.d/service_name'])],

If you do this, however, might get this lovely error when attempting to create the RPM:

$ python setup.py bdist_rpm
< snip >
running install_data
error: can't copy 'init.d/service_name': doesn't exist or not a regular file
error: Bad exit status from /var/tmp/rpm-tmp.DRsp99 (%install)


RPM build errors:
    Bad exit status from /var/tmp/rpm-tmp.DRsp99 (%install)
error: command 'rpmbuild' failed with exit status 1

This will perplex you for ages considering the file clearly exists in the path relative to setup.py, per the instructions:

$ ls -l setup.py
-rw-rw-r--. 1 cpeel cpeel 1233 Mar  9 14:37 setup.py
$ ls -l init.d/service_name
-rw-rw-r--. 1 cpeel cpeel 1242 Mar  9 14:40 init.d/service_name

The problem is that there are (at least) two different flavors of setup() functions and they are not completely alike. Both distutils and setuptools provide a setup() function and both packages have changed over time. See this StackOverflow example discussing the python setup tools landscape as of January 2017.

I don’t know what version of distutils I was using when I wrote this blog post, but it was probably some version with python 2.6. The version of distutils bundled with python 2.7.12, and presumably later, does not generate the RPM build error.

I also get the above RPM build error when using setup() from older versions of setuptools (at least 20.1.1). Updating to the most recent version of setuptools (34.1.1 as of February 2017) resolves the issue.

It appears that the general internet consensus, and official Python Packaging User Guide, is to use setuptools which can be updated independently of the python version you are using, so just be sure you’re using the latest version.

If you’re stuck on an older version of setuptools or distutils, the missing piece that took me hours to discover, is that even with data_files in setup.py, you need a MANIFEST.in file that tells them to include the files in the package:

$ cat MANIFEST.in 
recursive-include init.d *

Decreasing bitmap resolution when exporting PDFs in Inkscape

When exporting a document with bitmap images in it, Inkscape won’t downsample them to a lower resolution, despite the “Resolution for rasterization (dpi)” setting in the PDF export dialog. That export dialog setting only applies to bitmaps that have had filters applied against them. See bug 246677.

4 years ago I figured out how to work around this and today I had to relearn it. So as to not repeat this in another 4 years, I’m documenting this here for posterity.

The workaround is to apply an identity filter to each bitmap before saving it. An identity filter is one that doesn’t do anything, but because it is a filter it forces Inkscape to do the downsampling upon export. Because an identity filter doesn’t actually do anything, there isn’t one available in the Filters menu, but we can create one.

To create a reusable identity filter:

  1. Open a new Inkscape file
  2. From the Filters menu, select Filter Editor…
  3. In the Filter Editor pane, click the New button. This will add the filter “filter1” to the Filter list
  4. Double click “filter1” and rename it “Identity”
  5. To the right of the Add Effect: button there is a drop-down. Change it to Color Matrix, and hit the Add Effect: button
  6. Save the document as “Identity filter.svg”
  7. Now put the file where Inkscape can find it:
    • Linux/OS X: ~/.config/inkscape/filters
    • Windows XP: C:\Documents and Settings\[USERNAME]\Application Data\Inkscape\filters
    • Windows Vista and later: C:\Users\[USERNAME]\AppData\Roaming\Inkscape\filters
  8. The ‘filters’ directory may not exist, in which case just create it.
  9. Close and restart Inkscape
  10. This should add the Personal submenu under the Filters menu which should include our Identity filter.

To use the identity filter:

  1. In your Inkscape file, select the bitmaps you want to downsample. Be sure not to select any vector images or text since you don’t want to rasterize that.
  2. From the Filters menu, select Personal > Identity
  3. Now when you save the document as a PDF, adjust the “Resolution for rasterization (dpi)” setting. 90 is a decent number for documents that are being viewed on a computer screen. 150 is probably the smallest you want to go for anything that is being printed.