TAMS Scholarship challenge: $31k raised!

Three weeks ago I threw out a challenge to raise money for the TAMS Alumni Association Scholarship Forever Fund to enable students of any financial means to attend TAMS. Together we raised $31,029.15!

This is fantastic  – way to go TAMSters!

A hearty thank you to everyone who donated! And a shout out to Kim Cooper, the TAMS Alumni fundraising chair, for managing the challenge with no advanced notice whatsoever.

tan, sec, cos, sin,
3.14159.
Solid, liquid, plasma, gas,
TAMS is gonna kick your ass!

Sonic cherry cokes

Those of you who know me (or have seen photos on Facebook) are aware of my appreciation1 of cherry cokes from Sonic Drive-In. But have you ever wondered what’s so special about them?

Let me start by correcting a common misconception: cherry cokes from Sonic are not simply the Cherry Coke soda from Coca-Cola. No, long before Coca-Cola came out with cherry- or vanilla-flavored drinks you could drive up to a Sonic and have those flavors added to whatever soda you wanted. Vanilla Dr Pepper – easy peasy. Cherry limeade (ie: Sprite with cherry and limes) – delicious. So a cherry coke is cherry syrup added to Coca-Cola.

The next thing to know about Sonic drinks is that their ice is awesome. The ice isn’t cubes or crescents but rather small crunchable spheres. They’re so popular in the south that you can buy bags of it directly from a Sonic Drive-In. I, personally, think that Coke is best served over ice and even better served over Sonic ice.

I really do enjoy the flavor of the beverage, but nostalgia is the real reason why I enjoy cherry cokes from Sonic so much. There wasn’t a whole lot in the little town I grew up in, but there was, and still is, a Sonic. And sometimes after picking us up from school, Mom would take us to Sonic for a treat2 and I’d get a cherry coke. Sometimes during the summer when I was working for my Dad in the print shop collating, stapling, and labeling newsletters he’d take some money out of the till and send me to Sonic to get everyone a drink in the afternoon.

With the exception of Seattle, every town I’ve lived in has had a Sonic nearby. So what use to be a “hey, lets bop down the street and get a drink” has become a “lets get in the car and drive 45 minutes to Tacoma”. Taking a selfie with my drink at Sonic and posting it to Facebook has become a shtick — one that brings me joy for absolutely no sensible or logical reason. And despite there being almost nothing for Daniel to eat or drink there, he still humors me and stops whenever we pass one on the road. That act of love also also brings me great joy.

While I know it’s just another form of artificially-flavored high-fructose corn syrup served over frozen water in a styrofoam cup3, Sonic cherry cokes have a special place in my heart.

It doesn’t make a lot of sense, but thankfully not everything has to.

1 Some might go so far as to call it an unhealthy obsession.

2 Bonus that Happy Hour with half-price drinks is from 2-4. Those marketing guys at Sonic are brilliant.

3 I cringe every time I think about it like this.

Dear Recruiter

Dear Recruiter,
Thank you for the unsolicited message regarding employment at your company1. As you know, hiring in the tech industry is very competitive and high-performing individuals have many opportunities. To save us both time, please answer the following questions to see if your company is a good fit for me.

  1. Are you recruiting for Microsoft, Amazon, or Oracle? If so, your company is not a good fit.
  2. Would I be required to use a Windows desktop and/or develop/test software primarily for Windows? If so, you are not a good fit. Macs are great, Linux is even better.
  3. Where are your offices located? I don’t enjoy commuting, so if you’re not in Seattle, preferably downtown, you’re unlikely to be a good fit. I’m not currently interested in moving from Seattle, so if you are further afield, you are not a good fit although I might entertain working remotely for the right opportunity.
  4. What is your company’s relationship with the open source community? Are you consumers of open source projects? Do you contribute back to open source projects?
  5. Is your company on the HRC Corporate Equality Index? If so, what is your score? If you are not listed on the HRC Corporate Equality Index, please send me a copy of your corporate nondiscrimination and inclusion policies.
  6. What are you doing to enhance diversity and inclusion in your organization, specifically hiring and retention of women, LGBT, and minorities?
  7. How long is your paid maternity leave? Do you offer paid paternity leave? If so, how long?
  8. What are your vacation and sick policies?

If, after reviewing the above, you think your company would still be of interest, please contact me again with the requested information and we’ll talk further.

Sincerely,
— Casey

1 Whether that be from finding me on LinkedIn (good job) or sending an email to my work address (really?!).

The relevance of Distributed Proofreaders in a Google Books world

The mission of Distributed Proofreaders (DP) is to preserve public domain books by converting them into high-quality eBooks and publishing them to Project Gutenberg. But they’re not the only ones who are working to digitally preserve the world’s books. Other players in this space include:

  • The Internet Archive (TIA), through its OpenLibrary project, is digitizing all the world’s public domain books and making them accessible.
  • Google Books has a similar mission1, but focuses on indexing the contents of the books so users can search against them.
  • HathiTrust is a collaboration of multiple universities working to digitally preserve their collections. They work with TIA, Google Books, and others to source their material.

Both OpenLibrary and Google Books make not only the digitized images available, but also the underlying text. In fact, both bundle up that text into eBooks automatically. The creation of these eBooks is entirely automated without any human interaction and thus are lightening fast compared to DP.

If TIA, Google Books, and others are both providing digital books to the public, why does DP still exist? What is its relevance in today’s world?

The answer is actually quite simple: accuracy of the text.

Text from OCR is ‘data’ not ‘information’

The quality of the OCR text from an image depends on multiple factors, including the quality of the image, the capabilities of the OCR software, and how the OCR software was configured. Even with the crispest image and sophisticated software the OCR text isn’t perfect and are filled with page artifacts and errors.

For a example, I found a book that was available in Project Gutenberg that went through DP and was also available from the above 3 providers above. I selected a random page in the project and compared the text output from each. The book was Railway construction by William Hemingway Mills; page 20. Links to the book from the different sources:

  1. Distributed Proofreaders (eBook 50696 at Project Gutenberg)
  2. Google Books
  3. OpenLibrary
  4. HathiTrust (the edition was digitized by Google Books)

Using the DP version, specifically the text of the page after it finished the P3 proofreading round and before it was formatted, I did a diff against the one provided by Google Books and TIA ignoring any changes in whitespace. I’ve highlighted the differences below; the first version given (lines prefaced by <) is the DP version, and the second (lines prefaced by >) is the Google Books or TIA version.

The Google Books text was taken from their ePub version, the HTML tags stripped out, and the newlines reinserted for easier comparison.
$ diff -w railroad_dp.txt railroad_google.txt
4c4
< (b) At any intermediate siding connection upon a line
---
> (6) At any intermediate siding connection upon a line
31c31
< Every signal arm to be so weighted as to fly to and remain
---
> Every signal arm to be so weighted as to lly to and remain

The TIA text was taken from their plain text version. Note the cruft on the page at the end — yes, that’s really present in their text:
$ diff -w railroad_dp.txt railroad_tia.txt
4,5c4,5
< (b) At any intermediate siding connection upon a line
< worked under the train staff and ticket system, or under the
---
> (b) At any intermediate siding cormection upon a line
> worked u/nder the train staff and ticket system, or under the
11c11
< system: Sidings, if any, being locked as in (b).
---
> system : Sidings, if any, being locked as in (6).
15c15
< one arm on one side of a post, to be made to apply--the first, or
---
> one arm on one side of a post, to be made to apply — the first, or
41c41
< run over by trains of other companies using a different system
---
> run over by trains of other companies usiing a different system
50a51,58
>
>
> .â– J ML^
>
>
>
> *•**
>

Something important to note, is that there are no cases in this page where the Google Books or TIA versions found errors in the DP version. At least for this one page in this one book, DP provides the most accurate text.2

Google Books cares about the accuracy of the text only as much as it can index and bring up a book based on that book’s indexed contents. They don’t care if pages have ancillary characters or incorrect words as long as, taken as a whole, the book is indexable. I presume TIA cares more about having valid text, but they don’t appear to have the resources to improve them.

Errors like the ones above are fairly minor and mostly just annoying for the average reader. However, consider such errors in a scientific book or journal where the accuracy of the numbers is very important.

OCR-only text is just a bunch of data, but without accuracy it’s not really information. In fact, in some subtle cases it could be misinformation.

DP provides more accurate text, but it does so at the cost of speed. A book can take from days to years to go through the whole process at DP and be published at Project Gutenberg.

Improving DP’s relevance

Currently, every text that goes through DP ends up in Project Gutenberg as an eBook. The eBooks are far superior to the ones produced by automated systems and are a delight to read. There will always be a need for these.

However, there are small things we can do at DP to become more relevant in today’s digital ecosystem.

Closing the loop with image providers

It’s sad that while DP sources many of its images from Google Books or TIA, those providers continue to offer sub-par text and eBooks for download well after DP has uploaded finished eBooks to Project Gutenberg.

DP should close the loop with TIA and Google Books to provide them with updated eBooks and page texts. Projects at DP already identify where the images were sourced from, so it would be straightforward to send the providers the updated text in an automated way. I can see providers like Google Books being particularly interested in accurate page texts to refine their OCR algorithms and improve their search index. Both TIA and Google Books could use the accurate page texts to update the underlying text in their image PDFs, allowing accurate PDF searching and accessibility (eg: screen readers).

Partnering with the image providers in this manner is the right thing to do for the world at large and a potential source for more volunteers, developers, and perhaps even funding.

Not everything needs to become an eBook

Not everything needs to be a beautiful, hand-crafted eBook. Some printed materials, like journal articles and legal briefs, would benefit most from simply having accurate text — something DP excels at.

If DP were to expand its mission to encompass the accurate preservation of all public-domain printed materials, with the end product varying depending on the needs of the item, it could increase the rate at which accurate public domain texts are produced. Such materials would only go through the proofreading rounds, skipping the formatting rounds and post-processing step that are the biggest bottleneck. This could result in the accurate text being available within mere days.

With this in place, I can see DP partnering with folks like JSTOR or CourtListener to proofread their public domain materials. Such partnerships would be good publicity and a valuable source of new volunteers. Because this would still be limited to public domain material, Project Gutenberg could accept these not-eBooks as a final product if it chose.

Potentially more than relevant

By increasing DP’s scope, partnering with image providers, and leveraging its strengths, DP can remain more than relevant in the age of Google Books, but it’s going to take some realignment of mission and buy-in from the community to get there.

 

1 Google Books is digitizing much more than just public domain books, but let’s focus on the public domain books in this discussion, there are more than enough of them out there.

2 The astute among you will point out that all I’ve done is compare the resulting text from each source, not the text to the actual image, which is required to determine accuracy. You are correct, all I have shown is how precise the texts are to one another. I have great confidence in the accuracy of the DP version compared to the image, but I leave proof of that as an exercise for the reader. And if you enjoy that kind of work, have I got a great site for you to volunteer with!

My new car: old hybrid with new batteries

If you own a hybrid vehicle that is 8 years old or older, replacing the batteries could make it run like new or better than new.

In 2009 I purchased a gently-used 2004 Honda Civic Hybrid with manual transmission from some good friends. At the time it didn’t have a whole lot of miles on it and still doesn’t — despite being almost 12 years old Eiffel1 only has around 82,000 miles on it. I’ve been an urban-dweller in the heart of Seattle the past 5 years and use the car infrequently, mostly on weekends to go out running or hiking.

Honda Civic Hybrids use the electric motor as an assist to the small, efficient gasoline engine. During acceleration the electric motor kicks in and provides more power to get the car moving. This is particularly noticeable when going up Seattle hills and getting the car moving from a stop in first gear. The battery is recharged during deceleration or when the car thinks the engine can spare the power. It gets around 41 MPG on average, including both in-town and highway driving.

About two years ago it was clear that the hybrid battery was losing its ability to retain a charge. There wasn’t as much ummph in going up the hills as there use to be. It was annoying but not extreme and didn’t significantly impact the fuel efficiency of the car. Honda warranties their batteries for 8 years or 80k miles and it was past the 8-year mark by the time I noticed it.

Last summer I made one of the biggest mistakes a hybrid car owner can make: I let the car sit for months without driving it.2 When we started driving it again, the Integrated Motor Assist (IMA) light and the check engine light would periodically come on. Then they came on and stayed on. Throughout all of this, the overall fuel efficiency didn’t really decrease— still in the low 40s or upper 30 MPGs. The act of driving the car, however, was miserable: no power when going up hills and getting the car going from a stop on a hill was painful.

A Honda dealership will happily sell and install a new hybrid battery for the tune of $3500. As if that wasn’t expensive enough for a car with a Blue Book value of around $4k, most of the batteries they sell you have refurbished cells. I then found Bumblebee Batteries who will sell you a battery with brand new cells for $2100 including a 3-year warranty. They’re based out of Portland so we combined a trip down there to see friends with buying, and installing, a new hybrid battery. They usually ship it to you so I called prior to make sure we could pick it up and install it in their parking lot and they said sure.

We did a trial run of removing the battery in our garage before we left to ensure we had all the necessary tools, and to give ourselves confidence that we could do it. We drove down to their location in a light industrial area, bought the battery, and installed it in about 30 minutes. They were gracious enough to let us use their empty garage to do the work! They take the old battery, replace the cells with new ones, recycle the old ones, and resell it.

The car drives like it’s brand new. In fact, it’s better than when I got it. It has power going up hills! A drive up Snoqualmie Pass doesn’t involve downshifting into 3rd and puttering along at 45 MPH. Interestingly, the car still gets about the same gas mileage. The little gasoline engine is the main workhorse of the car whether with or without the assist, but the electric engine is what makes the car enjoyable to drive.

If you have an older hybrid that is in great condition but drive horribly because of the battery, I encourage you to consider getting a new battery rather than buying a new car. I also strongly recommend the good folks at Bumblebee Batteries as a source of that battery!

1 Eiffel is named after the band Eiffel 65 from their hit song Blue (Da Ba Dee). Sadly, the car isn’t as blue as the song might convey.

2 I’ve since learned that letting a NiMH hybrid battery sit for several months and then getting in the car and driving it is a Very Bad Thing. The battery cells discharge fully while sitting for months, but the car will only charge the battery until one cell is fully charged leaving most of the cells well below a full charge. The car isn’t capable of ever fully charging all cells and you’re down to driving on dribbles.

Cutting the sabbatical short

Six months ago today I left for my sabbatical (which EMC HR insists on calling an unimaginative “leave of absence”) with the intent of being gone for a full year. Then while I was on vacation in Europe, Dell announced they were buying EMC which put a kink into my plan (and not a in a good way).

The deal is expected to close sometime between May and October, and yes that’s a huge window. From the terms of my leave of absence I have a pretty strong incentive to either be back in the office or to quit before that happens.

Today I went into the office and had a day full of meetings with various managers and coworkers to get a sense of how things are going and where I would fit in should I come back. The meetings were quite useful and well worth the visit.

We’re still working out the details on when and where exactly, but it looks like I’ll probably be going back to EMC Isilon sometime in April. Hope they’re ready for me by then!

T-Mobile is your friend traveling abroad; in the US, not so much

Two years ago I switch from AT&T to T-Mobile and in that time I’ve done a fair bit of traveling in the US and abroad. If you travel abroad and aren’t on T-Mobile, you might want to reconsider.

Traveling abroad

If you’re traveling abroad, particularly to anywhere in Europe, T-Mobile is awesome. If you’re on a T-Mobile US plan, you likely get free unlimited data roaming out of the country. It’s limited to 3G speeds, but it’s vastly better than no data whatsoever. In the past 18 months I’ve traveled to Australia, Spain, Canada, Germany, Czech Republic, and Austria and in every country I’ve had unlimited data access and excellent coverage.

As someone who has traveled internationally a decent amount in the past 15 years, let me tell you that traveling with data access drastically changes how you travel for the better. Instead of researching and printing out maps before you get there, you can bring up Google Maps and get directions in situ. Prior, I would have never gotten on a bus in a foreign city for fear of not knowing where to get off. With data, however, you can bring up Google Maps1 and catch a bus (or trolley or tram or U-bahn) without pre-planning. Wanting food recommendations? Yelp away. Can’t wait to share that photo with friends? Post it right after you take it.

With data readily available on your mobile device2, you aren’t locked into staying places that have WiFi. Or hanging out at coffee shops with free WiFi just to get that email out to a loved one or look up dinner options. Other providers offer data plans overseas, but they aren’t unlimited and have to be set up ahead of time — and you have to pay extra for them. Then, when you’re using them, you’re rationing your data usage like you were living a decade ago. With T-Mobile it’s all automatic and included in your usual T-Mobile monthly charge. I’ve found this to significantly reduce travel friction.

In addition to unlimited data, you also have unlimited texting with T-Mobile overseas. Need to text a travelling companion to coordinate when and where to meet? Text away without worrying about getting charged for each one.

Keep in mind that you still have to pay for voice calls when travelling, so if you expect to do a lot of phone calls that aren’t over WiFi, unlimited data won’t help you much. Still, the per-minute rates are reasonable for the one-off calls to your hotel to confirm or change a reservation.

With T-Mobile’s Simple Choice plans, as long as you have a supported phone, you can enroll in one of their plans on a month-to-month basis with no long-term commitments. Depending on when and where you’re traveling it may be cost effective (in dollars and/or stress) to get a T-Mobile SIM card just for the duration of your travel.

Travel in the US

Unlike traveling abroad, traveling in the US with T-Mobile is a mixed bag. T-Mobile’s headquarters are in Bellevue, just outside of Seattle, so their coverage in and around the Seattle metro area is excellent. They have a strong presence elsewhere in the US, primarily centered in large metro areas. If your US travel has you most often in big cities you’re likely good to go with high-speed data.

In some states in the US heartland (also known as the fly-over states) there is no T-Mobile service whatsoever and you’re roaming. This was particularly apparent when driving from St. Paul, Minnesota to Seattle, Washington last year.

T-Mobile provides data roaming in their plans, but it isn’t unlimited. In fact, the amount of data you have varies depending on your plan and after you hit it, you’re cut off, not throttled — something we found out the hard way in the aforementioned cross-country drive. This was quite a shock after having traveled abroad prior with unlimited data. So while we almost always had voice coverage on the trip, we had either no or rationed data. If you are on T-Mobile, I recommend calling them and asking how much US roaming data is included in your plan before heading out for that cross-country road trip.

If you expect to do a lot of traveling across the US, other providers like AT&T or Verizon might have better coverage. With AT&T’s contract-less plans, it might even be cost effective to get an AT&T SIM card depending on your needs.

1 While Google Maps is great, it isn’t necessarily the best for public transit in every city. We used Google Maps to take busses and trains in Sydney without a problem, but it didn’t know anything about Melbourne’s public transit. There was, however, an app from a local transit authority which got us everywhere we needed to go on the trams. For trains in Europe, the Deutche Bahn mobile app is better than Google Maps even outside Germany.

2 Overseas data does not include tethering. Or, more accurately, it doesn’t allow port 80 access over tethering in Australia. I set up an SSH SOCKS proxy over a different port and got web traffic to work for me over a tether.

Matching $10k TAMS Scholarship Forever Fund donations

TAMS was the best thing that has ever happened to me and I want to help make it possible for other Texas high school students to have that same experience. Between now and midnight December 30th 2015, I will match any donation made to the TAMS Alumni Association Scholarship Forever Fund up to the first $10,000.

No need to tell me about it, the friendly folks at the TAMS Alumni Association can let me know the totals — just make a donation on the website. And, of course, it’s total gravy that donations are tax deductible.

Interested in joining me in matching funds either in series or parallel? I’m happy to coordinate, just let me know.

Why the Scholarship Forever Fund? Because I’m taking the strategic, long-term view. Building an endowment to fund scholarships enables students of all economic backgrounds to attend TAMS as inexpensively as possible year after year. (On the tactical front I am also fully-funding one scholarship a year starting next year, but that’s for a later post.)

We have 3 weeks, lets get to it!

Casey Peel, TAMS class of 1997

TAMS: the best thing that has ever happened to me

TAMS, the Texas Academy of Mathematics and Science, is an advanced placement program at the University of North Texas where I finished my junior and senior years of high school while simultaneously acquiring 72 hours of college credit. It’s a relatively small program that accepts ~187 students a year from Texas high schools. All 374 students1 live in the same dorm on the UNT campus and take roughly the same classes, including honors biology, chemistry, physics, pre-cal, and calculus among others.

Your first thought is probably “that must have given you a leg up academically”. And while you are correct, that’s not the most important thing that I acquired from the program. The most important thing was the close circle of friends that have changed my life for the better.

I grew up and went to high school in Littlefield, a small farming community northwest of Lubbock in the Texas panhandle. My sophomore class had about 100 people in it — people I had known all throughout my academic career. I wasn’t the smartest person in my class (although if asked at the time, I might have arrogantly said I was) but I was hands-down the nerdiest. Like many small towns in Texas, the focus of the school was on football, not on academics, and I was no jock. While I had friends, I didn’t really feel like I belonged.

TAMS changed all of that. I went from having 99 classmates who were unlike me to 373 classmates who were very like me. I went from feeling like one of the smartest people in school to feeling like the dumbest. I learned some very much needed social skills and was challenged academically like never before. It was exhilarating, overwhelming, and the best thing that has ever happened to me in my entire life.

I made a core group of friends that I cherish to this day: Janice, Matt, Jenny, John, Shaun, Jodi, Stona, and Karen. We’ve helped each other through rough times, celebrated great times, and stayed in touch despite the years and being far-flung across the US and the globe. Somehow, we even manage to have mini-reunions and got all 9 of us together in 2013:

965713_10201653881257301_28388964_o

Without TAMS and these friends, my life would be far different and far, far worse. And while AP classes and the like can help with the academic leg up, they can’t replace the camaraderie and community that a program like TAMS provides.

Many states have programs like TAMS; check out the Wikipedia entry for the National Consortium for Specialized Secondary Schools of Mathematics, Science and Technology for many of them. If you know a bright, socially-awkward high school freshman or sophomore interested in STEM fields, encourage them to seek out similar opportunities — it might just change their life.

1 187 students/year at a 2/year program = 374 students enrolled in a year.

Casey’s 2015 Mix CD

Once again, it’s time for my anachronistic tradition of making a mix CD (which is just one step away from the 80s mix tapes of old). This marks the my 5th year of doing it (see the mix cd tag for the others). Thanks to the power of Google, if you subscribe to Google Music service you can listen to it directly.

This year’s songs are a mix of both newly discovered and rediscovered. The stress leading up to my sabbatical played a role in this year’s selection, as did the relaxation of being on it.

  1. Your Song (instrumental) – Craig Armstrong
  2. I Just Want To Dance With You – George Straight
  3. Girl In A Country Song – Maddie & Tae
  4. A Night Like This – Caro Emerald
  5. Stickshifts and Safetybelts – Cake
  6. The Motown Song – Rod Stewart
  7. Morning Train – Sheena Easton
  8. My Give A Damn’s Busted – Jo Dee Messina
  9. Take This Job and Shove It – Johnny Paycheck
  10. Tears Dry On Their Own – Amy Winehouse
  11. Rhythm of My Heart – Rod Stewart
  12. Miniature Disasters – KT Tunstall
  13. Georgia on My Mind – Michael Buble
  14. Quando, Quando, Quando – Michael Buble
  15. If You Go Away (Ne Me Quitte Pas) – Caro Emerald
  16. Silent Sea – KT Tunstall
  17. Touch Me With Your Love (instrumental) – Beth Orton