16.10.08. Leaps and Pains (or - changing development/deployment and scm tools to more closely realize the component architecture dream)

A year or more ago, I was really struggling with zc.buildout, a Python based tool for building out "repeatable" deployments. Buildout makes setuptools actually usable, particularly for development and deployment of web apps, although there are many other uses.

Buildout keeps everything local, allowing one app to use version 3.4.2 of one package while another app can use 3.5.2. But more than just being an 'egg' / Python package manager, it can do other tasks as well - local builds of tools (from libxml to MySQL and more), again allowing one app to build and use MySQL 5.0.x and another app to use 5.1.x; or just allowing an app to be installed onto a new box and get everything it needs, from web server to RDBMS to Memcached and beyond. We don't use all of these features (yet), but it's a nice dream.

Already it's very nice to be able to make a git clone of a customer app, run buildout, and then start it up. Buildout will put setuptools to work to ensure that proper versions of dependent components are installed (and, quite nicely, it's very easy to share both a download cache and a collection of 'installed eggs' - multiple versions living side by side, with individual buildouts picking the one they desire).

But it was not easy to get to this golden land. Prior to using Buildout, we'd check our code out of our CVS repository. Our customer apps were just another Python package, nothing special (not an application, and - more importantly - not packaged up in 'distutils' style). As we started to make more and more reusable parts, we had to do a lot of checkouts; and so I wrote a tool to help automate this checkout process. It would also check out other third party code from public Subversion repositories; all because it was easier to check out a particular tag of 'SQLAlchemy' or 'zc.table' than to try to install them into a classic-style Zope 3 'instance home'.

But it was getting harder and harder to keep up with other packages. We couldn't follow dependencies in this way, for one thing; and it required some deep knowledge of some public SVN repository layouts in order to get particular revision numbers or tags.

'Buildout' promised to change all of that, and offer us the chance to use real, honest-to-goodness distributed Python packages/eggs. But getting there was so very hard when there are deadlines beating you down.

I took a lot of my frustration out on both Setuptools (which is so goddamn woefully incomplete) and Buildout. But the fault was really in ourselves... at least, in a way. As mentioned above, it was easier to just checkout 'mypackage' into$INSTANCE_HOME/lib/python/mypackage than to figure out the install options for distutils/setuptools. As such, NONE of our code was in the Python 'distutils' style. We put some new packages into that style, but would still just check out a sub-path explicitly with CVS just like we were doing with public SVN code.

Part of the big problem that we had which made it so difficult was that we had hung onto CVS for, perhaps, too long. And doing massive file and directory restructuring with CVS is too painful to contemplate. But moving to Subversion never seemed worth the effort, and so we held on to CVS. But I knew I'd have to restructure the code someday.

Fortunately, Git arrived. Well, it had been there for a while; but it was maturing and quite fascinating and it offered us a chance to leapfrog over SVN and into proper source code management. Git is an amazing tool (perhaps made more so by being chained to CVS for so long), and it provided me with the opportunities to really restructure our code, including ripping apart single top-level packages into multiple namespaced packages (ie - instead of 'example' being the root node with 'core' and 'kickass' subpackages, I could split that into 'example.core' and 'example.kickass' as separate packages and Git repositories while keeping full histories).

For a while, I used Git with its cvsimport and cvsexportcommit tools to clean up some of our wayward branches in CVS, while starting to play with Buildout. I was still struggling to get a Zope 3 site up and running using our frameworks. And here... well, the fault was partly in ourselves for having to go through fire to get our code into acceptable 'distutils' style packages, which made learning Buildout all the more hard. But the available documentation (comprehensive, but in long doctest style documents) for some of the Zope 3 related recipes was very difficult to follow. Hell - just knowing which recipes to use was difficult!

But after many months of frustrated half-attempts, often beaten down by other pressures, I opened a few different tabs for different core Buildout recipes in my browser and furiously fought through them all... And boom! Got something working!

Unfortunately it was one of those processes where by the time I got out of the tunnel, I had no idea how exactly I had made it through. One of my big complaints as I was struggling was the lack of additional information, stories of struggle and triumph, etc. And there I was - unable to share much myself! I can't even remember when I was able to break through. It's been quite a few months. Just a couple of weeks ago we deployed our last major old customer on this new setup; and we can't imagine working any other way now.

'Git' and 'Buildout' have both been incredibly empowering. What was most difficult, for us, was that it was very difficult to make the move in small steps. Once we started having proper distutils style packages in Git, they couldn't be cloned into an instance home as a basic Python package (ie, we couldn't do the equivalent of cvs checkout -d mypackage Packages/mypackage/src/mypackage and get just that subdirectory). And we couldn't easily make distributions of our core packages and use them in a classic Zope 3 style instance home (I did come up with a solution that used virtualenv to mix and match the two worlds, but I don't think it was ever put to use in production).

So it was a long and hard road, but the payoffs were nearly immediate: we could start using more community components (and there are some terrific components/packages available for Zope 3); we could more easily use other Python packages as well (no need to have some custom trick to install ezPyCrypto, or be surprised when we deploy onto a new server and realize that we forgot some common packages). Moving customers to new server boxes was much easier, particularly for the smaller customers. And we can update customer apps to new versions with greater confidence than before when we might just try to 'cvs up' from a high location and hope everything updated OK (and who knows what versions would actually come out the other end). Now a customer deployment is a single Git package - everything else is supplied as fully packaged distributions. It's now very hard to 'break the build' as all of the components that are NOT specific to that customer have to come from a software release, which requires a very explicit action.

Labels: , , , , ,

1.10.08. Giddy-up 401, File Uploads, and Safari

I've recently been doing some work to support ZODB 3.8 BlobFiles in our Zope 3 based sites and applications. Doing this brought me back around to seeing some behavior I've seen in the past and probably learned to ignore: uploading a large file from Safari using a basic HTML form (with proper encoding type, POST, etc) seems to take inexplicably long. Even worse - once behind Apache, you might not get an expected response, if any. You might get a 'timed out' response, unsure if the app server has everything and will finish the request/response cycle on its own.

It turns out that Safari does not eagerly send along authentication information along with each request when logged in with Basic Auth. When it does, it seems to have a very short time window.

So say you're logged in to your application with basic auth (for better or worse). The normal pattern is that when encountering an unauthenticated situation, Zope will challenge with a 401 status code and the WWW-Authenticate header (or something like that - I'm away from all specs right now). If you're not logged in, then you get the basic auth dialog box and send along the credentials. If you are "logged in", then the browser doesn't ask for credentials again, but just sends them along.

The downside is that this causes a request to be repeated. And if you just tried uploading a 5 MB file, then that whole file has to be re-submitted to make the full request.

It's the right thing to do with the basic (ha!) information at hand - trying to post all of that data in a single request. But Safarishould recognize that it's submitting to the same server (if not the same URL!) and should automatically include the auth headers. Safari seems to do this, but only on requests in very short windows.

Firefox, on the other hand, seems to have a much longer window in which it will send the credentials along automatically on the first request, instead of waiting for the challenge.

I don't know how other browsers do it. I'm not sure what the spec says, if anything. Glancing at O'Reilly's "HTTP - The Definitive Guide" didn't seem to give any indication of whether it's better for the client to assume that it should send the authentication tokens along with each request back to the same server, or if it's better for the client to hold off on constantly sending that info along unless challenged.

Most of the time this doesn't really seem to matter - it's not something end users typically see as it goes by fast enough to rarely be noticed. Of course there are other ways of getting credentials (cookies, sessions, subdomain/remote ip mapping, etc) which we often use on the main public-facing side of our sites. But for content management and admin, Basic Auth is such an easy fallback, especially in a framework like Zope (1, 2, or 3) which has long had a strong security story and would automatically handle this crap for you way back in the days of Bobo (1996, if not earlier).

It's just an annoyance. Glad I nailed it down to this: uploading large files with Safari (I think IE is, or was, similar) to basic-auth protected sites often can time out because the browser posts once, gets the 401-Unauthorized challenge, and does the post again - this time with the auth info.

Solutions:

  • don't use basic auth for sites that expect moderate to heavy uploading via forms.
  • recommend and/or use browsers that send the auth token along more often in the first request.
  • provide better interfaces for uploading files; providing better communication with the uploader about status, and perhaps having a better interface into the destination web app. Fortunately there are appear to be some free and open solutions out there already.

Wheee!

Labels: , ,