DP code release with modern PHP goodness

Today I’m proud to announce a new release of the software that runs pgdp.net: R201701. The last release was a year ago and I’m trying to hold us to a yearly release cadence (compared to the 9 years in the last one).

This version contains a slew of small bug fixes and enhancements. The most notable two changes that I want to highlight are support for PHP versions > 5.3 and the new Format Preview feature.

This is the first DP release to not depend on PHP’s Magic Quotes allowing the code to run on PHP versions > 5.3 up to, but not including, PHP 7.x. This means that the DP code can run on modern operating systems such as Ubuntu 14.041 and RHEL/CentOS 7. This is a behind-the-scenes change that end users should never notice.

The most exciting user-visible change in this release is the new Format Preview functionality that assists proofreaders in formatting rounds. The new tool renders formatting via a simple toggle allowing the user to see what the formatted page would look like and alerting if it detects markup problems.

What’s next for the DP code base? We have a smattering of smaller changes coming in over the next few months. The biggest change on the horizon is moving from the deprecated mysql extension to mysqli, which will allow the code to run on PHP 7.x, and moving to phpBB 3.2.

Many thanks to all of the DP volunteers who made this release possible, including developers, squirrels, and the multitude of people who assisted in testing!

1 Ubuntu 16.04 uses PHP 7.0, but can be configured to use PHP 5.6.

Death to magic quotes

Magic quotes is a misguided feature of PHP that modifies user input to PHP pages so that the input can be used directly in SQL statements. This violates the programing principle of only escaping data when it is necessary and results in all kinds of weird edge cases.

This feature was deemed so misguided that it was deprecated in PHP 5.3 and removed entirely from PHP 5.4. The DP code base has relied on magic quotes to function from the beginning of the project in 2000.

I’m very happy to report that after much development and validation effort, we’ve removed the dependency on magic quotes from the DP code base! The work was done over the course of a year, primarily by myself with help from jmdyck, and validated by a team of squirrels (shout-out to wfarrell and srjfoo) and other volunteers. It was rolled out in production on November 5th and has been almost 100% bug-free – quite an accomplishment given how much of the code was impacted. A huge thank you to the team who helped make this possible!

The biggest win is our ability to run the DP code on much more recent versions of PHP all the way up to, and including 5.6.1

RIP magic quotes.

1 It won’t work on PHP 7.0 or later because the code still relies on the deprecated mysql extension, although I fixed that on a branch last night!

Enabling DP development with a developer VM

Getting started doing development on the DP code can be quite challenging. You can get a copy of the source code quite readily, but creating a system to test any changes gets complicated due to the code dependencies — primarily its tight integration with phpBB.

For a long time now, developers could request an account on our TEST server which has all the prerequisites installed, including a shared database with loaded data. There are a few downside with using the TEST server, however. The primary one being that everyone is using the shared database, significantly limiting the changes that could be made without impacting others. Another downside is that you need internet connectivity to do development work.

Having a way to do development locally on your desktop would be ideal. Installations on modern desktops are almost impossible, however, given our current dependency on magic quotes, a “feature” which has us locked on PHP 5.3, a very archaic version that no modern Linux desktop includes.

Environments like this are a perfect use case for virtual machines. While validating the installation instructions on the recent release I set out to create a DP development VM. This ensured that our instructions could be used to set up a fully-working installation of DP as well as produce a VM that others could use.

The DP development VM is a VMware VM running Ubuntu 12.04 LTS with a fully-working installation of DP. It comes pre-loaded with a variety of DP user accounts (proofer, project manager, admin) and even a sample project ready for proofing. The VM is running the R201601 release of DP source directly from the master git repo, so it’s easy to update to newer ‘production’ milestones when they come out. With the included instructions a developer can start doing DP development within minutes of downloading the VM.

I used VMware because it was convenient as I already had Fusion on my Mac and that VMware Player is freely available for Windows and Linux. A better approach would have been VirtualBox1 as it’s freely available for all platforms. Thankfully it should be fairly straightforward to create a VirtualBox VM from the VMware .vmdk (I leave this as an exercise to another developer).

After I had the VM set up and working I discovered vagrant while doing some hacking on OpenLibrary. If I had to create the VM again I would probably go the vagrant route. Although I expect it would take me a lot longer to set up it would significantly improve the development experience.

It’s too early to know if the availability of the development VM will increase the number of developers contributing to DP, but having yet another tool in the development tool-box can’t hurt.

1 Although I feel dirty using VirtualBox because it’s owned by Oracle. Granted, I feel dirty using MySQL for the same reason…

A new release of the DP site code, 9 years in the making

Today we released a new version of the Distributed Proofreaders code that runs pgdp.net! The announcement includes a list of what’s changed in the 9 years since the last release as well as a list of contributors, some statistics, and next steps. I’ve been working on getting a new release cut since mid-September so I’m pretty excited about it!

The prior release was in September 2006 and since that time there have been continuous, albeit irregular, updates to pgdp.net, but no package available for folks to download for new installations or to update their existing ones. Instead, enterprising individuals had to pull code from the ‘production’ tag in CVS (yes, seriously).

In the process of getting the code ready for release I noticed that there had been changes to the database on pgdp.net that hadn’t been reflected in the initial DB schema or the upgrade scripts in the code. So even if someone had downloaded the code from CVS they would have struggled to get it working.

As part of cutting the release I walked through the documentation that we provide, including the installation, upgrade, and configuration steps, and realized how much implied knowledge was in there. Much of the release process was me updating the documentation after learning what you were suppose to do.1 I ended up creating a full DP installation on a virtual machine to ensure the installation steps produced a working system. I’m not saying they’re now perfect, but they are certainly better than before.

Cutting a release is important for multiple reasons, including the ability for others to use code that is known to work. But the most important to me as a developer is the ability to reset dependency versions going forward. The current code, including that released today, continues to work on severely antiquated versions of PHP (4.x up through 5.3) and MySQL (4.x up to 5.1). This was a pseudo design decision in order to allow sites running on shared hosting with no control over their middleware to continue to function. Given how the hosting landscape has changed drastically over the past 9 years, and how really old those versions are, we decided it’s time to change that.

Going forward we’re resetting the requirements to be PHP 5.3 (but not later, due to our frustrating dependency on magic quotes) and MySQL 5.1 and later. This will allow us to use modern programming features like classes and exceptions that we couldn’t before.

Now that we have a release behind us, I’m excited to get more developers involved and start making some much-needed sweeping changes. Things like removing our dependency on magic quotes and creating a RESTful API to allow programmatic access to DP data. I’m hoping being on git and the availability of a development VM (more on that in a future blog post) will accelerate development.

If you’re looking for somewhere to volunteer as a developer for a literary2 great cause, come join us!

1 A serious hat-tip to all of my tech writer friends who do this on a daily basis!

2 See what I did there?

Development leadership failure

Last night I did some dev work for DP. Mostly some code cleanup (heaven knows we need it) but also rolling out some committed code to production. I’ve made a concerted effort to get committed-but-not-released code deployed — some of which has been waiting for, literally, years.

Even worse, we have reams of code updates sitting uncommitted (and slowly suffering from bitrot) in volunteers’ sandboxes waiting for code review. In the case of Amy’s new quizzes, for almost 5(!!!!) years. In other cases volunteers have done a crazy amount of legwork to address architectural issues that remain unimplemented due to no solid commitment that if they did the work it would be reviewed, committed, and deployed — like Laurent’s site localization effort.

These are clear systematic failures by development leadership, ie: me. It’s obvious why even when the project attracts developers, we can’t retain them.

The first step is to get through the backlog of outstanding work. I have Laurent’s localization work almost finished. This will allow the site to be translated into other languages — I think Portuguese and French are already done. Next up is getting Amy’s new quizzes pushed out. She’s done a marvelous job of keeping her code up to date with HEAD based on my initial work last night. Now to get them committed and rolled out. Then a site-wide change on our include()s required to get full site localization implemented.

After all that, we need to address how to better keep code committed and rolled out. I think we as a team suffer from “don’t commit until it’s perfect, then wait until it’s simmered before rolling it out”. Where “simmered” means “sitting in CVS with no active testing done on it”. We need to move to a more flexible check-in criteria or a more liberal roll-out. There’s no good reason why the bar is so crazy high on both ends of that.

But first – the backlog.

Mystery of the terrible throughput (or how I solved a TCP problem)

It all started out with a simple single stream reading test. Just a simple request for the entirety of an 8GB file. We do this stuff all the time. Except this time instead of 700 MB/s I was getting 130 MB/s. What?

Usually we test with jumbo frames (9000 MTU) but for this exercise we were using standard frames (1500 MTU). Still, there’s no way that was the difference. After 2 days I discover a method to consistently reproduce the problem: while the streaming test is running, toggle the LRO flag on the server’s network interface. This is just as crazy as making your car go faster by removing your soda from the cupholder. There’s no way that it has anything to do with it, but for some reason it does. Consistently. At last I have a reproducible, if ludicrous, defect.

Fast forward through 5 days of eliminating nodes, clients, switches, and NFS overcommits. Add in packet traces, kernel debugging output, and assorted analysis. Eventually Case catches the first real clue: the packet congestion window between the ‘fast’ and ‘slow’ states are distinctly different. In the ‘fast’ state, the congestion window stays fairly constant. In the ‘slow’ state, the window oscillates wildly – starting at the MTU growing really large, and starting over.

The LRO trick worked by causing enough retransmits that the stack dropped into slow start mode — one mystery solved. The reason we haven’t seen this before is that after a node-client pair get into the fast state, the slow start threshold is retained in the TCP hostcache between connections which is why we haven’t clearly identified this before — another mystery solved.

Fast forward through a few more days of slogging through TCP code down the path of blaming slow start threshold (or rather the lack of slow start in the slow state). By this time I’m way more familiar with the TCP code, and our kernel debugging framework, than I want to be. I notice that every time the congestion window drops back to the MTU it’s caused by an ENOBUFS error. It’s very unlikely we’re running out of buffer space though. Checking the called function reveals that the error would show up not only when we’re out of buffers, but also if we can’t return one immediately. We surmise the problem is some contention causing an inability to immediately get the requested buffer. So I change the code to reduce the congestion window by a single segment size (aka MTU) instead of dropping it all the way down to the segment size. The assumption being the next time we request a buffer of this size, we’re likely to get one.

And performance shoots up to 900 MB/s — even higher than the previous fast state.

The reason we’re unable to return the requested buffer immediately is unclear, and frankly above my paygrade. I’ll happily let the kernel devs work on that (it involves slabs and uma and things geekier than me).

The core of the problem remains “why aren’t we able to return the requested buffer immediately” but until the devs conquer that one we have a valid, shippable, workaround. And a lowly tester found, identified, and fixed it!

A geek and his keyboard

Simply accepting the death of one keyboard and the failure of its backup was simply not an option, so I started off this morning with my trusty screwdriver.

I opened up the bottom of the dead keyboard and studied its innards. From top to bottom the keyboard consists of:

  1. keys
  2. translucent rubber layer
  3. flexible transparent layer with printed circuit
  4. flexible transparent buffer layer with no circuit
  5. flexible transparent layer with printed circuit
  6. 3 large white plastic structural pieces
  7. 1 PCB

Given the simple structure it is apparent that the PCB is the failing component of the backup keyboard. The PCB design and rev number differ between the two keyboards, but I thought swapping them out would be worth a shot. Fortunately the physical structure of both keyboards is identical. Unfortunately swapping them didn’t work and examining the circuit layers (#3) it’s obvious why: they changed the circuit layout to the PCB.

I went with Plan B which was determining why those specific keys on the dead keyboard were dead. One look at layer #3 confirmed that all the dead keys are on the same circuit. Bringing out my trusty multimeter I discovered a break in the circuit to the PCB. But how to fix that? The transparent circuit layers are on a plastic layer so even if I had my soldering iron here in Denver, there was no way that was going to work. The dead gap wasn’t all that large, just a couple of millimeters, I just needed something to bridge it. A small piece of wire wasn’t optimal as it wouldn’t be flat and it would be hard to secure. Then the light bulb went off: aluminum foil. Conductive, easily trimmed down to the right size, and flat. Throw in a small piece of scotch tape and a few minutes later I have my first hardhack:

And thus far it works beautifully. As a bonus I moved layers #3-5 and #7 to the shell of the backup keyboard so I get the pearly white keys of the backup with the tried-and-true workings of the original.

I’m a bit concerned that the failure of that one circuit is simply a foreshadowing of things to come with different circuits. By the looks of the backup keyboard’s circuits it’s clear that the degradation isn’t from use but with age (which makes perfect sense anyway). We’ll see how long my hardhack works and if there are future failures elsewhere. Who knows, by the time I’m through maybe I’ll have a completely rebuilt keyboard full of aluminum foil.