Garbage in – default out?

We were freshly into production with this system that had a lot of moving parts. The whole system consisted of several big monolithic applications and half a dozen newer microservice-style services integrated into a “living organism”.

A user was not getting the results she expected. We investigated the problem and found out that a service we were calling was not returning some of the data we were interested in. In particular, data older than three months was not seen by us. A support person for the service said: ‘you should use wide search if you want to see the older items’. But wait – we were using wide search, always had been!

I double-checked the code and sure enough, it did say widesearch=”true”. Strange. So I blew the dust off the API spec and found this: widesearch=”1|0”. Yikes, the value should be a number! And I had coded this, totally my bad. I guess I’ve grown too used to using true and false for boolean values in XML. We were a month into production when we found out we were not using wide search. Never had been!  

Why did we not get an error when we called the service with bad parameters? 1 and 0 look like numbers to me, but the service had been happily accepting the string “true” all along. I decided to run a little experiment against the service in the test environment:

<request key="..." widesearch=”anything goes?” />
<response>OK ...valid search results...</response>. 

Okay, so it seems there is no validation of the parameter whatsoever. The code probably reads something like this:

if("1".equals(request.attr("widesearch")) ...

Why did we not find this bug during testing? This integration point was hammered a lot because during the project there were a lot of version updates on both sides of the API fence. Before going into production, the integration was in use for at least half a year in the testing environment. Somehow we just managed to not hit the three-month limit. For us ‘widesearch’ was just a mandatory constant we had to set in the API, not a thing that would be interesting to test.

Now, imagine that the service actually checked that the parameters in the API call have correct data types. We would have found our bug on day one of testing. Actually earlier – the first time we tried to call the test service. Silently choosing a default when the input is garbage does a big disservice to making the whole system robust.

When computers talk to computers it is better to be strict in the protocol. Problems get flushed out way quicker.

Advertisement is great but … always RTFM

Back in the good (?!?) old days of C++ and WinAPI … I was reviewing this bit of code. There was a Windows API call to some registry manipulation function. The next line was an identical call to the exact same function with the exact same parameters and a comment /*this must be here*/. Whaaat?

A quick peek to the MSDN documentation and it was immediately obvious: “oh, he’s disregarding the out-parameter and the return value”. The developer had not read the documentation and therefore did not understand how the API worked.

This was the day I realized not all developers read the documentation of the libraries and tools they use. Sometimes people just fiddle with it until it works… or at least seems to work. Since then I’ve encountered lots of examples of hack-it-until-it-obeys gone wrong. Sometimes you see very strange and unnecessarily complex log4j or log4net configuration files due to the writer not understanding the logger hierarchy and how appenders relate to it.

When I use a tool that is new to me, I like to try to get my mental model aligned with the tool developers. Usually you can get the big picture by reading just the introductory chapters of the manual.

Libraries can also hide nasty surprises buried deep in the documentation. Like this bit from the jQuery css()-method:

“When a number is passed as the value, jQuery will convert it to a string and add px to the end of that string.”

Except, it is not really buried very deep. It is the third paragraph in the method description. I do not tend to read API documentation front to back like a book but anytime I use a method I have not used before, I like to see what the API developers had to say about it.

Browsing the documentation can also give you positive surprises. Fast forward to the land of C# in 2015. I was reviewing some ASP.NET MVC code together with the team member who wrote it. It looked alright to me. The code was using System.Web.Mvc.OutputCacheAttribute.

[OutputCache(Duration = 8*3600)] /* 8 hours */
public ActionResult GetCoolStuff() { … }

I said: “Looks OK to me… … … BTW, have you used this cache thing before? Have you read the API docs?”
“err… no.”
“Well, let’s go see if we find anything useful.”

No need to wade through all the methods & properties, ‘remarks’ is where the important stuff is. In this case we found this sentence: “To avoid code duplication, you can set a cache profile in the Web.config file instead of setting cache values individually in pages.” Hey, if we use that we can change the cache invalidation time on the fly.

[OutputCache(CacheProfile = "CoolStuffOutputCache")]
public ActionResult GetCoolStuff() { … }

It took us all of 2 minutes to browse to MSDN and read the remarks section of the OutputCache class. We gained the ability to change the cache settings on the fly. The original version would have needed a recompile if we wanted to change the cache invalidation time.

Known your tools. Stackoverflow is awesome but to really know your way around you must Read The Fine Manual.