Mystery of the terrible throughput (or how I solved a TCP problem)

It all started out with a simple single stream reading test. Just a simple request for the entirety of an 8GB file. We do this stuff all the time. Except this time instead of 700 MB/s I was getting 130 MB/s. What?

Usually we test with jumbo frames (9000 MTU) but for this exercise we were using standard frames (1500 MTU). Still, there’s no way that was the difference. After 2 days I discover a method to consistently reproduce the problem: while the streaming test is running, toggle the LRO flag on the server’s network interface. This is just as crazy as making your car go faster by removing your soda from the cupholder. There’s no way that it has anything to do with it, but for some reason it does. Consistently. At last I have a reproducible, if ludicrous, defect.

Fast forward through 5 days of eliminating nodes, clients, switches, and NFS overcommits. Add in packet traces, kernel debugging output, and assorted analysis. Eventually Case catches the first real clue: the packet congestion window between the ‘fast’ and ‘slow’ states are distinctly different. In the ‘fast’ state, the congestion window stays fairly constant. In the ‘slow’ state, the window oscillates wildly – starting at the MTU growing really large, and starting over.

The LRO trick worked by causing enough retransmits that the stack dropped into slow start mode — one mystery solved. The reason we haven’t seen this before is that after a node-client pair get into the fast state, the slow start threshold is retained in the TCP hostcache between connections which is why we haven’t clearly identified this before — another mystery solved.

Fast forward through a few more days of slogging through TCP code down the path of blaming slow start threshold (or rather the lack of slow start in the slow state). By this time I’m way more familiar with the TCP code, and our kernel debugging framework, than I want to be. I notice that every time the congestion window drops back to the MTU it’s caused by an ENOBUFS error. It’s very unlikely we’re running out of buffer space though. Checking the called function reveals that the error would show up not only when we’re out of buffers, but also if we can’t return one immediately. We surmise the problem is some contention causing an inability to immediately get the requested buffer. So I change the code to reduce the congestion window by a single segment size (aka MTU) instead of dropping it all the way down to the segment size. The assumption being the next time we request a buffer of this size, we’re likely to get one.

And performance shoots up to 900 MB/s — even higher than the previous fast state.

The reason we’re unable to return the requested buffer immediately is unclear, and frankly above my paygrade. I’ll happily let the kernel devs work on that (it involves slabs and uma and things geekier than me).

The core of the problem remains “why aren’t we able to return the requested buffer immediately” but until the devs conquer that one we have a valid, shippable, workaround. And a lowly tester found, identified, and fixed it!

Isilon breaks the SPEC barrier

Today Isilon announced new SPEC benchmarking results for our new S200 platform released earlier this year. We were able to get over 1 million SPECsfs2008 NFSv3 IOPS — 75% higher than the nearest competition — and 1.6 million SPECsfs2008 CIFS IOPS — 126% higher than the nearest competition.

By far the cooler thing we demonstrated, again, to the industry is that our products scale linearly as you add nodes. Our IOPS scaled linearly with 7, 14, 28, 56, and 140 nodes. Chuck Hollis’ blog entry has more detail on what the benchmark does and our results.

I wasn’t directly involved with this benchmarking exercise but I did play a role, as did the entire rest of the engineering team, in building, testing, and tweaking the software that enabled this spectacularly cool result.

And yes, I realize that benchmarks like these are really just a pissing game between competitors, but I know first-hand that they come up in sales opportunities which makes them useful if not terribly informative.

Not feelin’ the EMC lovin’

Looks like the bonus information I was given before was incorrect. The amount was correct but instead of being a “thanks for doing such a good job” bonus, it’s a “here’s the maximum amount of bonus you could get paid over the next 4 quarters if you meet your quarterly goals” carrot. In that light, the amount is much less “lovin'” and more of a joke. Apparently this is the new EMC way, which is disappointing.

Far from being an incentive, this carrot is more closely aligned with its vegetable namesake.

While I’ll continue to do my job to the best of my ability – I’ll be doing it because I enjoy it, not because of the orange tuber dangling in front of me.

Feelin’ the Isilon lovin’

I think Isilon likes me. I discovered today, much to my surprise, that I will receive a bonus for the 1.5 months I worked in 2011.

The dollar amount was 86% of my IBM bonus for the 12 months I worked in 2010.

It’s hard to do a direct comparison given the different economic climates, work location, recent acquisition, etc. but I’m still feelin’ loved more here than I was at IBM when I left.

Best of all was a small note from my manager regarding the bonus, reading simply:

Thank you for being so damn good at your job — you absolutely deserve it.

Isilon Customers

Isilon has many cool customers doing awesome things with our products, but most of them we can’t talk about. There are some we can talk about though and I wanted to share a few that were announced in press releases:

  • Sony Imageworks – makers of Alice in Wonderland and Spider-Man (via)
  • NAVTEQ – makers of map data used in things like GPS systems (via)
  • Mr. X Inc. – they did the post-production of TRON: Legacy (via)
  • Prologue – they did the post-production of Iron Man 2 and Sherlock Holmes (via)

Note that just because the company uses Isilon products doesn’t mean that they used them when making the above films, but you get the idea.

The Isilon press release page has the full listing, these are just some that I thought most folks could relate to.

First day at Isilon

My first day at Isilon went well. My manager, Ryan, was in an all-day manager’s meeting so Case met me at the front desk and directed me to my cube. Yes, his name is Case and he’s the Performance Guru I’ll be learning from. That should make things interesting!

Today was amazingly crazy, although it had nothing to do with Isilon or the first day on the job. Instead, it was all about Riley. This morning while I was getting ready he ate a used dryer sheet. Or at least I’m 98% certain he ate a used dryer sheet as I saw him chewing on something and the meager remains of a dryer sheet left behind. After that he spent about 5 minutes hacking. This was around 7:30 and I was suppose to go into the office at 9:30. The very short version of the story is that I dashed to take him to the vet over my lunch break to rush back to the office. Thus far he seems to be fine but we’re continuing to keep an eye on him.

A change from IBM & Denver: Isilon & Seattle

Today I accepted a job with Isilon in Seattle. The change is a lateral move working on a performance team. My official start date is November 16th. I haven’t yet told IBM of my departure — I’m waiting until late October before doing so. My last day at IBM will be November 15th but I’ll be off on vacation the week prior. Or at least that’s the plan. We’ll fly up for a weekend in mid October to scout out a place to live (likely an apartment in the Belltown or Lower Queen Anne neighborhoods).

We’re sad to leave Denver but excited about new opportunities in Seattle. B has already applied for jobs at the Starwood properties up there (W, Westin, Sheraton) and will likely be broaching the topic with his manager sometime this week as it is likely that any interested Seattle properties will be contacting his manager and we don’t want it to be a surprise. We’re crossing our fingers they don’t just let him go like Compass Bank effectively did.

B and I have decided to bring back our move-to-Denver slogan for the move to Seattle: It’s all part of the adventure!