iozone and caching – the bane of benchmarking

iozone is a common tool used by companies and researchers when benchmarking storage systems. What most iozone users don’t seem to realize is that unless care is taken, the test may exercise only the storage system’s cache and not the underlying system.

iozone supports different test types, the most common used are:

  • -i0 – sequential write
  • -i1 – sequential read
  • -i2 – random write / random read

These are often used together in a single iozone execution, like:

iozone -i0 -i1 -i2 [...]

This will do a sequential writes test, followed by a sequential read test, followed by a random write test, followed by a random read test. The problem with this approach is that it will exercise both the client’s VM cache as well as whatever front-end cache the storage system is using. Depending on the storage system, in addition to going to nonvolatile storage, the initial sequential write may also go into the storage system’s front-end cache, so the subsequent sequential read may not hit the nonvolatile storage at all making the read test a pure cached test. Ditto the random read test. Similarly, the client may cache some or all of the file as well making the results even less useful.

The client cache is why the iozone docs recommend using a file size 2x the size of the client’s memory. At Isilon we find this unwieldy — particularly when you have 24GB clients. Moreover it does’t solve the problem of the storage system cache at all.

Instead, we have a wrapper script that will run each test in isolation and between test runs flush both the client’s cache (assuming the client is Linux – sysctl vm.drop_caches=1, or just unmount and remount the storage) and the Isilon cluster’s cache (isi_for_array isi_flush) between each run. This allows us to use smaller file sizes while getting results from the underlying storage and not caches.

The above “-i0 -i1 -i2” test gets broken down executed like this (but in a loop of course):

Isilon cluster cache flush
client cache flush
iozone -i0
Isilon cluster cache flush
client cache flush
iozone -i1
Isilon cluster cache flush
client cache flush
iozone -i2

There’s only one other problem with this approach: random writes and reads are done with the same test operation (-i2). This prevents iozone from being able to provide cache-free results as the random read immediately follows the random write. At Isilon we’ve modified our iozone binary to split the random write and random read operations into separately runable tests to work around this limitation.

If your intent is to test a system’s cache, then by all means run all the iozone tests in the same execution. But if you’re wanting to test the underlying storage you need to run them separately and flush caches between the executions.

Hadoop OutOfMemory errors

If, when running a hadoop job, you get errors like the following:

11/10/21 10:51:56 INFO mapred.JobClient: Task Id : attempt_201110201704_0002_m_000000_0, Status : FAILED
Error: Java heap space

The OOM isn’t with the JVM that the hadoop JobTracker or TaskTracker is running in (the maximum heap size for those are set in conf/hadoop-env.sh with HADOOP_HEAPSIZE) but rather the separate JVM spawned for each task. The maximum JVM heap size for those can be controlled via parameters in conf/mapred-site.xml. For instance, to change the default max heap size from 200MB to 512MB, add these lines:

   <property>
       <name>mapred.map.child.java.opts</name>
       <value>-Xmx512m</value>
   </property>
   <property>
       <name>mapred.reduce.child.java.opts</name>
       <value>-Xmx512m</value>
   </property>

I find it sad that this took me a day to figure out. I kept googling for variations of “hadoop java out of memory” which were all red herrings. If I had just googled for the literal error “Error: Java heap space” plus hadoop I’d have gotten there a lot faster. Lesson learned: don’t try to outsmart google with the actual problem.

CSS box model woes

This weekend I spent a couple of hours struggling with a new DIV-based layout design for peelinc.com (it’s not up yet, don’t bother looking). The core of my problem was that I was designing for the border box model but the browser was rendering it with the W3C content box model. It wasn’t until I realized that I needed to explicitly state that I wanted the border box model that things started working correctly.

And really W3C – content box model? Trying to develop a fully-fluid layout using the content box model is nuts.

Except I couldn’t get IE8 to render using the border box model even though it’s supported. And yes, I had the !DOCTYPE specified so it was suppose to be rendering it in standards compliance mode. After enough digging I figured out that despite the DOCTYPE, it was rendering it in IE7 compatability mode instead. Arg. I was able to get IE8 to cooperate by using the IE document compatibility meta tag:

<meta http-equiv=”X-UA-Compatible” content=”IE=edge” >

Yet another reason why I hate hate hate IE.

But now, thankfully, the page renders exactly as I had intended on Safari, Chrome, Firefox, and IE8 without any browser hacks (if you don’t consider telling IE8 to use the freakin’ standards a hack). No idea what it’ll look like on IE6 or IE7, but frankly I’m not worried if it looks a bit wonky in those — the content will still be perfectly readable.

gridengine qmon font problems

Even after you get gridengine installed, you’ll have some lovely font problems when using qmon. After trying a few things, the easiest is just to follow the advice at this web page and set the fonts to fixed. Note that copying $SGE_ROOT/qmon/Qmon to $HOME is an important part — updating the file directly won’t work. This should be done on the system with the qmon binary (should be obvious, but you never know).

If you decide to try and fix the fonts themselves, remember that the fonts need to exist on the Xserver rendering the qmon dialog, not necessarily the one on the system where qmon is being run from. In other words, if you’re doing X11 forwarding from a remote host to your desktop, it’s your desktop’s X11 server that needs the fonts, not the fonts on the remote host. Thanks to this post clarifying that pesky detail.

And finally, if you decide to poke around and get non-fixed fonts working, you’ll probably need these RHEL packages (via):

  • xorg-x11-font-utils
  • xorg-x11-fonts-100dpi
  • xorg-x11-fonts-75dpi
  • xorg-x11-fonts-misc

and possibly futzing with xfs and fc-cache. I tried this approach and fixed the initial error messages from the X server (yay!) but resulted in boxes instead of glyphs (boo) and failed back to fixed fonts.

gridengine RPM for RHEL5 (and CentOS 5) is broken…

…and here’s how you fix it.

After installing gridengine, before running install_sge (or install_qmaster / install_execd):

  1. edit /usr/share/gridengine/util/install_modules/install_common.sh
  2. find the CreateGSEStartUpScripts() function and nuke the entire contents of the “if [ $create = true]; then” up until (but not including) the “if [ $euid = 0 …]” line.
  3. save and exit the file
  4. edit /usr/share/gridengine/util/install_modules/install_qmaster.sh
  5. find the StartQmaster() function and change the ‘$SGE_STARTUP_FILE -qmaster’ line to read ‘/etc/init.d/sgemaster’ and remove the remaining contents of the function up until (but not including) the ‘$INFOTEXT …’ line
  6. save and exit the file

Now you can run the appropriate installation program.

The problem is that the RHEL5 packaging tries to remove the gridengine startup scripts and replaces them with its own startup scripts. Sadly the job wasn’t well done (and obviously wasn’t tested). I’ve notified the package maintainers but this will get you going until the package has been fixed (given that the package has existed in this form since 2008, I’m guessing it isn’t that popular of a package as-is).