31.5.06. Third Time, Slow Time, Inventing, Revisiting, and Fighting

A couple of months ago, we started work on a major rewrite of one of our core sites. What started as a small and specialized e-commerce service a few years ago has grown mightily over those same years. The architecture, however, has hit a limit - especially when compared to the large push for what we want to do (and can do) next. The current site works great and handles its load and duties very well. But there are small annoyances internally with some things, and there is some growth and market potential that it cannot satisfy.

The first version of the site’s primary work was done in less than a month. The main focus of that version was the visitor’s experience. The original HTML mockups drove a lot of the site, and the implementation was primarily done in Zope 2, through the web, using templates, scripts, and SQL Methods. It was a nice enough system for the time - we were very fast with it, we could tweak elements in each other’s offices, etc. It ended with an all-night debug-and-deploy session, wrought with all sorts of strange problems and angst and fun. But it worked. We continued to work on it over the next few weeks, of course, filling in some major holes that couldn’t be filled in initial development and responding to site usage.

The biggest problem with that implementation was the administration side. Since we were the only users of the administration screens, we punted on that issue. There was no form validation. All of the database CRUD statements were entered manually. Zope’s SQL Methods made this a bit easier, but their main optimization is for read-queries. Any new field required numerous code updates all over the place. The whole business was a little messy back then - requiring us to print out certain reports and delivering them to our providers on time-constrained schedules.

The site was never empty, but it got perilously close a few times during the lifetime of version 1. But there was enough interest to keep it going. With eyes on expanding and getting new providers, with providing new services (including physical goods), and with running the business side of things better, we went into version 2.

As I saw all of the HTML forms and database statements I’d have to change just to accommodate the expanding and changing data requirements, I started looking at options for easing that pain. I ultimately rolled my own system focused primarily on the common CRUD operations. There was no object-relational mapping solution at the time that I liked. A problem, at that time, was that we had many queries in place that would not work in the object-relational tools available.

What I really wanted was an architecture that:

  1. Helped me get important / core business and administration logic out of the ZODB. We wanted to have as much of the core software on disk and under source control as possible.
  2. Dynamic form and SQL generation for common tasks (primarily in the administration section). I wanted this so that adding new fields to the database schema required only one or two changes.
  3. Still allowed us to use Zope’s SQL Methods for generating complex queries.
  4. Could be introduced in a way that the public application might not even be aware: instead of a folder full of Python scripts in the ZODB, there’d be a single persistent instance of a class that had many of those scripts as methods.

The resulting framework was essentially a service layer comprised of very rough Table Data Gateways. You couldn’t load an object out and do a jeff.hair = 'bleached' and have it save. The get/create/save statements basically took a dictionary (hash table) of values to save and would flush it out to the database, after harvesting some information. It was expected that the gateways would have the data prepared before passing it off to the lowest level interface, handle_op, which would perform the requested operation.

When dealing with fairly set data, such as a form, this didn’t really matter. I cared about loading and binding the data to the widgets when a page was loaded, and then parsing/validating/converting that data on save. For that, this system worked like a champ. For some other pieces of business logic, it did OK. It wasn’t great, but it was better than what we had before - especially because it didn’t require specialized update SQL to be written for those situations where just a couple of values needed to be changed.

That site has served us fine, and now the offerings on the site are quite full (many many pages). The content providers have access to their items and reports, so we no longer have to run around town delivering reports and lists.

Both versions were deployed, in full, and then continuously massaged over time. There was no ‘beta’. Some features sneaked in after the deployment deadlines. Some specialized parts of the system were overhauled a couple of times to deal with scaling issues - ones that we could never have predicted.

We’ve kept laundry lists over the years of the things we wanted to do, but couldn’t under the first two designs. Some of these came from dealing with some hacks pulled together for a couple of special customers, some came from wishful thinking, some came from complicated set-ups required to make certain rare items available for purchase in the existing system. In a getting real sense, this was OK, but not great. It limited our ability to start working with larger and more specialized providers.

Architectures have continued to improve over the past couple of years as well. When Zope 3, version 3.1 specifically, came out, we were able to do a major rewrite of a content system for one of our oldest customers. We were even able to pick up additional related customers as a result. We’ve delivered other solutions on top of Zope 3 and the ZODB (Zope Object Database) that have been both impressive and fun to work on, although sometimes Zope 3 can still cause me to go off in a huge screaming match.

I had hoped (and still wish) that for this particular site’s version 3 implementation we could ditch the relational database completely and just use the ZODB. Instead of classic Zope ZODB usage, where scripts and templates fill up the database, Zope 3 makes it much easier to keep that stuff OUT, and then easier to keep real data in. This is one of the nicer parts about working with the Zope 3 / ZODB stack: Python is just Python is just Python. There’s no translation layer, no tricks, trying to fake inheritance. There’s no real worry about translating statements into a query language - Python is the query language (with tools like the Catalog providing application level indexes for larger queries). I had an application that I was working on in Zope 3 that I thought would be a good testbed for writing as a Rails app - until I realized that Active Record’s inheritance model could never match my object model, at least as of Rails 1.0. I also played a little bit with Python’s Turbogears and SQLObject stacks, and they had me feeling pissed and frustrated within seconds. No offense to their authors, but they just did NOT work for me.

But the decision was made that for this site, we would continue with using a relational database. Well, I knew the strengths and limitations of our current systems. And I admit, I was envious of Ruby on Rails. In fact, I even suggested that we ditch Zope for this project and use Rails instead! But it was feared that there was no time on the schedule to learn Ruby. We did, however, agree on wanting a real object-relational system, or at least something more object oriented than what we used for Version 2. Version 3 has richer business logic requirements, and I’m happiest when that code can look as clean and natural as possible. I also wanted it to be easy to define new pages and views and to have to go through less guess-and-hope work than Zope 3 (as of 3.2) typically requires.

So we’ve birthed yet another in-house framework because there’s little out there that’s satisfactory. The one thing that is satisfactory is SQLAlchemy. I built a base storage framework on top of that which allowed us to use SQL Alchemy fairly transparently from within Zope. Fields and properties now let us define and use relationships fairly transparently. Items get bound to their context, are easily traversable, and can even masquerade as Zope containers without interfering with SQLAlchemy’s on system of managing work. We also birthed a base web framework that provides some useful base classes for constructing the kinds of pages and sub-views that we need with ease. These base classes corral some core Zope 3 features and some other in-house features together to ease our web development.

It’s all pretty cool, for what it is. But it’s going so slowly that I can’t help but wonder if we would have been better off working in Rails. I still haven’t found anything in the Python world that I’d leave Zope for, but Zope 3 continues to jump between being insanely cool and powerful and flexible, and being incredibly frustrating and aggravating. That I’ve been able to pull many of the tricks I’ve already been able to pull is a testament to the better parts of its design.

Even with all of my tricks and base classes, it can sometimes still take an entire morning just to get what seems like a simple page together and rendering. On the other hand, once you get pieces in place they’re pretty sturdy. There’s low likelihood of accidentally overriding a critical method name or a view/page that applies to a different context or interface layer.

But as it’s been going so seemingly slow, this development, I start to wonder why it feels that way. I really was trying to give myself a system where I could enjoy the benefits of Zope 3’s Component Architecture with an intelligent database / object-relational mapper while also enjoying the benefits of Getting Real. “Getting Real” isn’t anything really new to me (or many people). It just helps give you the OK to say “no”. Martin Fowler’s book, Refactoring, made me realize that bad smells are OK now, so long as they’re cleaned up later. In the interest of getting something done, it’s hard to go for purity. Or in the case of Python’s “import this”:

Special cases aren’t special enough to break the rules.
Although practicality beats purity.

So, what is the problem with me then?

I think it comes down to a very special case: the major rewrite. This is not a new application for us, nor is it a moderate upgrade or maintenance release. We’ve acknowledged that the existing architecture has issues when it comes to certain growth paths. To this point, we’ve also been good in saying “no” to those growth paths. But business situations and opportunities have changed.

One problem with the rewrite is the knowledge that it’s already been done. “We already did this three years ago, why can’t you show me it working in the new system already?” There senses of worry, fear, excitement all change. This is further exaggerated by the fact that these kind of rewrites are often architectural, perhaps deeply architectural. All of those agile decisions made in earlier versions no longer apply, since many of those decisions may have led to the limitations you’re trying to overcome. Those decisions were right at the time, and have been right for the past few years. But now you realize that being able to do more promotions, discount options, special offers, are among the things you really want to do with the cart. You also need to handle more delivery options instead of the two you’ve used for the past two years. You fight and push and think and tango with the cart, just focusing on shipping, with coupons in the back of the mind. Part of you thinks “I just want to add a shipping column / attribute here and be done with it, and we’ll expand it later…. Oh wait… This IS later… Oh yeah, we need to support more options… Oh yeah, this is one of the growth areas we’re focusing on. Crap!”

It’s an interesting struggle now, those two sides in my mind. In practical terms, I just want to add an extra column or attribute and be done. But I know in a week it’ll already be pushed to its limits. Better to apologize for lack of screens and work this the hell out now. Nothing fancy - keep the interfaces and collaborators simple, and expand on them if needed when more data presents itself.