Hadoop OutOfMemory errors

If, when running a hadoop job, you get errors like the following:

11/10/21 10:51:56 INFO mapred.JobClient: Task Id : attempt_201110201704_0002_m_000000_0, Status : FAILED
Error: Java heap space

The OOM isn’t with the JVM that the hadoop JobTracker or TaskTracker is running in (the maximum heap size for those are set in conf/hadoop-env.sh with HADOOP_HEAPSIZE) but rather the separate JVM spawned for each task. The maximum JVM heap size for those can be controlled via parameters in conf/mapred-site.xml. For instance, to change the default max heap size from 200MB to 512MB, add these lines:

   <property>
       <name>mapred.map.child.java.opts</name>
       <value>-Xmx512m</value>
   </property>
   <property>
       <name>mapred.reduce.child.java.opts</name>
       <value>-Xmx512m</value>
   </property>

I find it sad that this took me a day to figure out. I kept googling for variations of “hadoop java out of memory” which were all red herrings. If I had just googled for the literal error “Error: Java heap space” plus hadoop I’d have gotten there a lot faster. Lesson learned: don’t try to outsmart google with the actual problem.

Published by

cpeel

I'm a gay geek living in Seattle, WA.

One thought on “Hadoop OutOfMemory errors”

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s