Friday, November 18, 2011

How many bugs can you find?

Working in C++ you are given an "immutable string" class, which we'll call "IMS", that you're forced to work with for some kind of bureaucratic BS reasons that you probably disagree with. You don't know much about it other than it's a class written in C++ which stores a string and doesn't allow it to be modified. Management has heard, and ignored, your suggestion to just use a "const std::string," because that sounds complicated and they're management, so just do it. A coworker approaches and explains that they've figured out how to do string equality comparisons with the new IMS class, and shows you the code they drafted up to show how it works. See how many bugs you can find.

IMS istr("hello");
if(istr.c_str() == "hello")
   std::cout << "it worked!" << endl;

Note that when run, "it worked!" gets printed out, and note that the "IMS::c_str()" function returns a char*. See if you can answer these questions:

  1. Why won't this type of string comparison work?
  2. What exactly happens when the comparison in the if statement is executed?
  3. Why does "it worked!" get printed out?
  4. What bug can you conclude is in the IMS class?
The questions are sorted in order of difficulty. Detailed answers follow, so stop here if you want to try to figure them out on your own..

Questions 1 and 2 are closely related. What happens in the istr.c_str() == "hello" boils down to a simple pointer comparison. In C++ when pointers are compared with the == operator, they are just compared numerically. There is no pointer dereferencing or anything like that, the pointers are just compared as numbers. So the only way this comparison can evaluate to true is if both pointers point to the exact same memory location; it has nothing to do with comparing strings. To do a string comparison on a pair of c strings, you need to dereference the pointers and step through the strings comparing the characters along the way. A function like strcmp() or strncmp() would be a good solution.

Question 3 is tricky because it's not specified C++ behavior that is causing it; it's a compiler optimization. When the compiler encounters the string literal "hello" on the first line, it stores it somewhere in the text segment of the program's memory space. When it encounters it again on line 2, the compiler is smart enough to identify that it has already stored that string literal, so instead of duplicating it, it just uses a pointer to the first one. This is possible because the text segment is read-only, so neither pointer can be used to modify the string which would mess it up for the other pointer.  The comparison passes because it's comparing two pointers which are pointing to the exact same memory location in the text segment. Knowing this, we can solve question 4.

The IMS class is supposed to be an "immutable" string, but let's look at what's going on in this short code segment. An IMS is instantiated with a char* pointing to the string "hello".  On the next line when we call c_str() on that IMS object, we get back the exact same pointer value pointing to the exact same "hello" string. Apparently the IMS class is just using whatever memory the given char* is pointing to to store it's string internally. The problem is, the IMS class has no idea where that char* came from, what memory it's pointing at, who else has access to it, and most importantly, it has no idea if it will be modified. For all it knows, the char* could be pointing to a string being temporarily stored on the stack, rendering the "immutuable" string anything but.

4 comments:

  1. The string most certainly doesn't get stored in .text, that's for code (instructions) only, not data. It's stored in the .rdata section (Windows) aka the .rodata section (ELF) which is for read-only data.

    ReplyDelete
  2. hey rhomboid, I probably shouldn't have glossed over that point so quickly but this behavior is actually implementation defined. The compiler can pick where it wants to put the string literals and the text segment and read-only data segment are both valid choices.

    ReplyDelete
  3. Well yes, it is implementation defined, but saying it goes in the text segment is just plain wrong. As the wiki page says (that you linked to) the text segment is for executable instructions. Strings are not executable instructions. They don't go there.

    ReplyDelete
  4. Devil's advocate: You're making a few assumptions that just glancing at the code can't be verified.

    If management is *really* devious/inept, then c_str() might not actually behave as std::string's function. c_str() might stand for "changeable string" which returns a mutable version of the immutable string (e.g., an "MS" object). This MS object might then overload the == operator, thus causing the comparison to behave as expected (i.e., not just comparing addresses).

    ReplyDelete