Look out behind you, lady, it’s the Blob!
I was going to write about our second requirements session for the Carbon Copy Whiteboard. But I wanted to do something a little less structured. Requirements aren’t that important anyway. </sarcasm>
I’ll note that I write this post at a time when I really just don’t feel like writing a post. But I’m sticking with my Atwood blog diet of posting even if you don’t have anything to say or don’t feel like it.
Anyway, the Blob is better known as the Binary Large OBject. Its use is obvious: storing images, movies, PDFs, or any other kind of binary data. But of what possible use can CLOBs be? A CLOB is a Character Large OBject.
(this plus the title of the post should give away why this picture is here)
At work, CLOBs are really called BLOBs. I spent most of this past week dealing with blobbing and testing a blobber class. After this experience, I still don’t understand why I did what I did.
You see, at work, data access is segregated. Brown v. Board Of Education hasn’t taken hold yet. To retrieve data, you use stored procedures given to you by the database people. To create, update, or delete, you have to submit what we’ll call a “Modification Request.”
An MR is basically a character blob. The first x bytes have the name of the request. The next x bytes might have some header information. We need an example because this practice is still quite bizarre in my eyes.
Let’s say you have a Modification Request named “EQUIPMENT”. The first, say, 20 bytes of the MR will be reserved for the name of the request. EQUIPMENT doesn’t quite fill it out, so the extra space will be left blank. Yep. Blank.
Next might come a one-byte code that indicates what you’re doing. Let’s say C means Create, U means Update, and D means delete. Imaginative, no? It is never this straight-forward at work, though.
Next, three different segments might be defined. The first x bytes are dedicated to the Create Equipment case. We might reserve the first 256 bytes for description, the next 16 bytes for the serial number, and so forth. Since equipment might have sub-types, this gets even uglier.
Ditto for the Update and Delete cases. The point is that you can only do one at a time: you can only do a Create, Update, or a Delete. The other two unused areas are left blank. Yep. Blank.
An MR is then submitted, which means it goes to some database somewhere and executed by a single or a small group of stored procedures. The code for this is oddly and dangerously OO: request.submit(). I was shocked. Why isn’t there a RequestSubmitter.submit(request)? </sarcasm>
Anyway, I am not privy to the details, but I imagine some non-trivial database code is required to parse this pile of goo to get the work done.
Why would you ever do it this way?
I dare not question this out loud at work, lest someone with a thin skin overhear me and take offense. A separate but equally troubling practice is to store some data as raw XML in the database. I have heard the justification of this, though: apparently, changing the tables directly is a big pain.
The only reason I can remotely think of is that the indirection of the stored procedure wasn’t enough, that things are so turbulent over there in data that constantly revising stored procedures was too time consuming.
So instead of a plethora of stored procedures that were constantly changing, let’s pass blobs around that account for every possible thing those mid-tier guys might need to do.
So instead of using stored procedures to create, update, and delete, you have to use a blobbing API to create this huge pile of character goo, two-thirds of which is all blank. But how do you write a unit test for this?
My tech lead showed me one person who actually counted characters and asserted that certain substrings were what they were supposed to be. Last week, I opted to use the API’s toXML() method and chop up the tree with DOM. Time-consuming, yes, but absolutely necessary in my eyes because the Modification Request was so complicated that testing all possibilities would’ve resulted in over 200 test cases.
I used orthogonal arrays to cut this down by an order of magnitude, but still, writing XML tree-walking code just to unit test something doesn’t feel right to me.
At the very least, this gave me a chance to exercise some design freedom. The suggested design was a single blobber to handle all three cases, but my eye couldn’t stop twitching as I started to write this. I gave into myself and broke each case into its own blobber: CreateEquipmentBlobber, ModifyEquipmentBlobber, DeleteEquipmentBlobber. They all extend from a EquipmentBlobber abstract class.
Then, I have an EquipmentBlobberFactory to choose the right one to give the client.
Also, from a larger perspective, the fact that storage and retrieval is segregated gives a prime chance to introduce the Repository into this codebase as a useful abstraction. I have a Repository that delegates its storage to the data access object that represents the stored procedure and its storage to a Modification Request Factory.
This is so nice. Now all data access, retrieval or storage, is centralized. Who’d've thought?
The way it is currently done is to access the DAO directly from either a Service or the Application Layer. The Modification Request is usually accessed from the Application Layer only. Again, I ask you why, why are these two highly-related concepts segregated?
Of course, this implies that my Repository has to live as high as the Application Layer to avoid having Services or Domain Objects referencing each other, which is a no-no in SOA land.
Finally, the last design challenge was how to handle the main createBlob() method in the EquipmentBlobber. You see, the ModifyEquipmentBlobber takes in two Equipment objects: one that is the old object, and one that is the new object. This way, you can find out what changed and only send that to the database.
OK, let me get this straight. We’re so concerned with efficiency that we don’t want to send unnecessary updates to storage, but we’re perfectly fine sending thousands of blank spaces?!?
On second thought, I don’t think I want to get that straight.
Anyway, the dilemma is that two classes should have a createBlob(Equipment) method, and one should have a createBlob(Equipment, Equipment) method. Yet, I don’t want clients downcasting. That destroys the purpose of hiding the type through the EquipmentBlobberFactory.
My first lame solution was to use one createBlob(List) and enforce ordering on the elements of the list, i.e. if you’re creating or deleting, List had to point to Equipment. If you were modifying, List points to the old Equipment, and List points to the new Equipment.
That sucks. What I ended up doing is to create an EquipmentHolder interface. The implementers are either SingleEquipmentHolder or ModifyEquipmentHolder. The client, in this case, is coupled to the specific implementation of the holder. They have to be — after all, the client knows whether they are modifying or creating or deleting.
This gives you createBlob(EquipmentHolder). There is a downcast, though, in the sub-classes. But, that’s the breaks.
Oh wait, there was one other very very very familiar design problem I faced. How do you get the Equipment out of EquipmentHolder?
I almost wrote a getter. Then I stopped myself. I used Double Dispatch. Blobber.createBlob(EquipmentHolder) calls EquipmentHolder.createBlobWith(this) calls Blobber.createUsing(this.old, this.new) or Blobber.createUsing(this.equipment), depending on the EquipmentHolder.
Oh yes, the code review with my tech lead is going to be pretty entertaining. Want to buy a front row seat?