16.10.08. Leaps and Pains (or - changing development/deployment and scm tools to more closely realize the component architecture dream)

A year or more ago, I was really struggling with zc.buildout, a Python based tool for building out "repeatable" deployments. Buildout makes setuptools actually usable, particularly for development and deployment of web apps, although there are many other uses.

Buildout keeps everything local, allowing one app to use version 3.4.2 of one package while another app can use 3.5.2. But more than just being an 'egg' / Python package manager, it can do other tasks as well - local builds of tools (from libxml to MySQL and more), again allowing one app to build and use MySQL 5.0.x and another app to use 5.1.x; or just allowing an app to be installed onto a new box and get everything it needs, from web server to RDBMS to Memcached and beyond. We don't use all of these features (yet), but it's a nice dream.

Already it's very nice to be able to make a git clone of a customer app, run buildout, and then start it up. Buildout will put setuptools to work to ensure that proper versions of dependent components are installed (and, quite nicely, it's very easy to share both a download cache and a collection of 'installed eggs' - multiple versions living side by side, with individual buildouts picking the one they desire).

But it was not easy to get to this golden land. Prior to using Buildout, we'd check our code out of our CVS repository. Our customer apps were just another Python package, nothing special (not an application, and - more importantly - not packaged up in 'distutils' style). As we started to make more and more reusable parts, we had to do a lot of checkouts; and so I wrote a tool to help automate this checkout process. It would also check out other third party code from public Subversion repositories; all because it was easier to check out a particular tag of 'SQLAlchemy' or 'zc.table' than to try to install them into a classic-style Zope 3 'instance home'.

But it was getting harder and harder to keep up with other packages. We couldn't follow dependencies in this way, for one thing; and it required some deep knowledge of some public SVN repository layouts in order to get particular revision numbers or tags.

'Buildout' promised to change all of that, and offer us the chance to use real, honest-to-goodness distributed Python packages/eggs. But getting there was so very hard when there are deadlines beating you down.

I took a lot of my frustration out on both Setuptools (which is so goddamn woefully incomplete) and Buildout. But the fault was really in ourselves... at least, in a way. As mentioned above, it was easier to just checkout 'mypackage' into$INSTANCE_HOME/lib/python/mypackage than to figure out the install options for distutils/setuptools. As such, NONE of our code was in the Python 'distutils' style. We put some new packages into that style, but would still just check out a sub-path explicitly with CVS just like we were doing with public SVN code.

Part of the big problem that we had which made it so difficult was that we had hung onto CVS for, perhaps, too long. And doing massive file and directory restructuring with CVS is too painful to contemplate. But moving to Subversion never seemed worth the effort, and so we held on to CVS. But I knew I'd have to restructure the code someday.

Fortunately, Git arrived. Well, it had been there for a while; but it was maturing and quite fascinating and it offered us a chance to leapfrog over SVN and into proper source code management. Git is an amazing tool (perhaps made more so by being chained to CVS for so long), and it provided me with the opportunities to really restructure our code, including ripping apart single top-level packages into multiple namespaced packages (ie - instead of 'example' being the root node with 'core' and 'kickass' subpackages, I could split that into 'example.core' and 'example.kickass' as separate packages and Git repositories while keeping full histories).

For a while, I used Git with its cvsimport and cvsexportcommit tools to clean up some of our wayward branches in CVS, while starting to play with Buildout. I was still struggling to get a Zope 3 site up and running using our frameworks. And here... well, the fault was partly in ourselves for having to go through fire to get our code into acceptable 'distutils' style packages, which made learning Buildout all the more hard. But the available documentation (comprehensive, but in long doctest style documents) for some of the Zope 3 related recipes was very difficult to follow. Hell - just knowing which recipes to use was difficult!

But after many months of frustrated half-attempts, often beaten down by other pressures, I opened a few different tabs for different core Buildout recipes in my browser and furiously fought through them all... And boom! Got something working!

Unfortunately it was one of those processes where by the time I got out of the tunnel, I had no idea how exactly I had made it through. One of my big complaints as I was struggling was the lack of additional information, stories of struggle and triumph, etc. And there I was - unable to share much myself! I can't even remember when I was able to break through. It's been quite a few months. Just a couple of weeks ago we deployed our last major old customer on this new setup; and we can't imagine working any other way now.

'Git' and 'Buildout' have both been incredibly empowering. What was most difficult, for us, was that it was very difficult to make the move in small steps. Once we started having proper distutils style packages in Git, they couldn't be cloned into an instance home as a basic Python package (ie, we couldn't do the equivalent of cvs checkout -d mypackage Packages/mypackage/src/mypackage and get just that subdirectory). And we couldn't easily make distributions of our core packages and use them in a classic Zope 3 style instance home (I did come up with a solution that used virtualenv to mix and match the two worlds, but I don't think it was ever put to use in production).

So it was a long and hard road, but the payoffs were nearly immediate: we could start using more community components (and there are some terrific components/packages available for Zope 3); we could more easily use other Python packages as well (no need to have some custom trick to install ezPyCrypto, or be surprised when we deploy onto a new server and realize that we forgot some common packages). Moving customers to new server boxes was much easier, particularly for the smaller customers. And we can update customer apps to new versions with greater confidence than before when we might just try to 'cvs up' from a high location and hope everything updated OK (and who knows what versions would actually come out the other end). Now a customer deployment is a single Git package - everything else is supplied as fully packaged distributions. It's now very hard to 'break the build' as all of the components that are NOT specific to that customer have to come from a software release, which requires a very explicit action.

Labels: , , , , ,

24.4.07. Python's Make Rake and Bake, another and again

Ian Bicking wrote a post recently titled “Python’s Makefile”. He advocates using / re-using distutils… er… setuptools. (I can’t keep them straight - they’ve both become absolute nightmares in my opinion). He then goes off about entry points, separate setup.cfg files, and other things that still go way over my head. The example he shows is convoluted, and I’m ultimately not entirely sure what he’s really advocating (besides the idea - which isn’t bad - of using the near-standard setup.py file/system instead of re-inventing).

But he mentions, earlier:

Because really people are talking about something more like rake — something where you can put together a bunch of code management tools. These aren’t commands provided by the code, these are commands used on the code.

We do have the infrastructure for this in Python, but no one is really using it. So I’m writing this to suggest people use it more: the setup.py file. So where in another environment someone does rake COMMAND, we can do python setup.py COMMAND.

For me, having an easy way to say bla bla COMMAND isn’t as important as having a good system for automating common tasks that I and/or my colleagues do frequently. As we started to depend on more and more code from internal and external repositories, due to our increased re-use when building on Zope 3, I really needed to automate checkouts and exports. Not everything was neatly packaged as an egg, or the released egg didn’t have a bugfix applied, and I still don’t understand how to make eggs work well with Zope 3 in a manner that I’m comfortable with.

I was initially excited about zc.buildout as a way to automate the monotonous but important tasks that revolve around setting up both deployment and development environments. But I didn’t like how zc.buildout specified its tasks/commands in INI format. It was relatively easy to write new ‘recipes’, so I wrote some recipes to do Subversion and CVS checkouts/exports.

But the INI format just pissed me off. It didn’t fit my needs, basically, wherein I needed more conditional control. More code control. And managing complex sets of parameters required making new top-level sections instead of nesting. Before long I was staring at a very long and very narrow file. And in the end, it was building Zope in a way that wouldn’t work for us. So I abandoned it.

I briefly looked at some tools that let you write these task files in “pure” Python. In this way, Scons appeared to be the closest thing in Python to Rake, which uses Ruby. But Scons seemed far more focused on general compilation issues (compiling C, Java, etc), but that’s never a problem that crosses my path.

I just wanted something like rake. What I liked about every Rakefile that I’ve seen is that it’s been quite readable. Rake makes common file / path commands readily available as Ruby methods, classes, and objects. Rake takes advantage of Ruby’s syntax, particularly blocks (and optional parenthesis) in a way that makes it not seem like, well, Ruby. It looks like something makefile-ish, something shell-scripting-ish, etc. That’s what I wanted; but, of course, in Python.

So I came up with a system. It’s not yet released to the world - far from finished, and there are many competing ideas out there that I don’t feel like competing with - but it’s already proven to be very useful internally. Generally, it’s been used to automate what I mentioned above: retrieving software from multiple repositories, both Subversion and CVS, and placing them in the proper directories. In particular, we try to stick with certain revisions for third party dependencies, and I got tired of trying to capture this information in READMEs and other files that we could refer to when installing certain configurations. It’s even been useful for downloading such software and applying internal patches::

patch = Command('patch')

@task('mysqldbda')
def mysqldbda():
    """ Installs mysqldbda from subversion and applies patch """
    svn = Subversion('svn://svn.zope.org/repos/main')
    svn.co('mysqldbda/tags/mysqldbda-1.0.0', target='mysqldbda')

    # patch mysqldbda
    log.info("patching mysqldbda")
    patchfile = path('fixes/mysqlda.1-5-07.patch')
    if patchfile.exists():
        print patch.read('-p1', '-i', patchfile)

@task('formencode')
def formencode():
    svn = Subversion('http://svn.colorstudy.com/FormEncode')
    svn.co('tags/0.6/formencode')

task('install', ['mysqldbda', 'formencode'])

It’s also been useful for tasks like getting MochiKit and generating all sorts of packed versions. A lot of what makes this possible is the path.py module, which provides a more object-oriented interface over os, os.path, and other Python file utilities.

ROCKFILEPATH = globals().get('ROCKFILEPATH', path('.'))
MOCHIKIT_LIB = ROCKFILEPATH/'libs'/'mochikit'
MOCHIKIT_DL = ROCKFILEPATH/'mochikit_dl'
MOCHIKIT_SRC = MOCHIKIT_DL/'MochiKit'
SCRATCH = MOCHIKIT_LIB/'_scratch.js'
mochikit = namespace('mochikit')

@mochikit.task('get')
def getmochikit():
    if MOCHIKIT_DL.exists() and bool(MOCHIKIT_DL.listdir()):
        return
    svn = Subversion('http://svn.mochikit.com/mochikit')
    svn.co('trunk', target=MOCHIKIT_DL)

@mochikit.task('clearmochilib')
def clearmochilib():
    for jscript in MOCHIKIT_LIB.files('*.js'):
        jscript.remove()

@mochikit.task('make-noexport')
def makenoexport():
    info = Subversion().info(MOCHIKIT_DL)
    src = NOEXPORT.safe_substitute(**info)
    file(MOCHIKIT_LIB/'NoExport.js','w').write(src)

@mochikit.task('build', ['get', 'clearmochilib', 'make-noexport'])
def mochi_install():
    for source in MOCHIKIT_SRC.files('*.js'):
        log.info('copy %s -> %s' % (source, MOCHIKIT_LIB))
        source.copy(MOCHIKIT_LIB)

# Javascript Packing tools (JSPack not shown - essentially it's a wrapper
# around combining and piping Javascript through Dojo's custom_rhino.jar
# to use its compression system)
def packmodules(sourcedir, modules, target):
    mods = [ (sourcedir/mod) for mod in modules ]
    log.info('Packing %s modules', path(target).name)
    JSPack(mods, target).run()

    if SCRATCH.exists():
        SCRATCH.remove()

def jsmin(sources, target):
    packmodules(MOCHIKIT_LIB, sources, MOCHIKIT_LIB/'min'/target)

@mochikit.task('minimize')
def mochiMinimize():
    """
    Generates packed versions of most individual MochiKit files, while
    combining a few core ones together.
    """
    mindir = MOCHIKIT_LIB/'min'
    for jscript in mindir.files('*.js'):
        jscript.remove()
    jsmin(['NoExport.js', 'Base.js', 'Iter.js', 'DOM.js'], 'base-iter-dom.js')
    jsmin(['Style.js', 'Signal.js'], 'style-signal.js')
    jsmin(['Async.js'], 'async.js')
    jsmin(['Color.js'], 'color.js')
    # ...

mochikit.task('install', ['build', 'minimize']).comment('INSTALL!')

I don’t think this falls under the jurisdiction of setup.py (distutils/setuptools). Nor would I want to specify these as zc.buildout recipes and have a separate configuration file to then name all of the files and directories. And, being Python, I don’t really have to deal with compilation steps so I don’t need wrappers around gcc and friends. I’m not (yet) specifying how to build large deployment scenarios. I just need to automate some development tasks, and I need to be able to write them easily. I want to write them in Python, but I want to ensure that they don’t accidentally get imported into normal projects (hence, the files above don’t have a .py extension). And as this is a specialized task, I’ll allow myself to get away with Python shortcuts that I would never touch in normal development, such as import *. In fact, it’s the import * that gives me a lot of the common commands/tools, such as the classes for interacting with Subversion and CVS, managing working directories, etc.

This really stemmed from reading this article by Martin Fowler about people wanting to replace ant with Rake with the advent of JRuby. In the post, Martin states:

The thing with build scripts is that you need both declarative and procedural qualities. The heart of a build file is defining tasks and the dependencies between them. This is the declarative part, and is where tools like ant and make excel. The trouble is that as builds get more complex these structures aren’t enough. You begin to need conditional logic; in particular you need the ability to define your own abstractions. (See my rake article for examples.)

Rake’s strength is that it gives you both of these. It provides a simple declarative syntax to define tasks and dependencies, but because this syntax is an internal DomainSpecificLanguage, you can seamlessly weave in the full power of Ruby.

At that point, I decided that this was the way to go: use Python decorators to wrap ‘task’ functions. The wrapper maintains dependency links, comments, and other things of interest to the internal system; and the wrapper allows the task name to be independent of the function name, allowing easier-to-type tasks for use from the file system. But the ‘task’ function is plain Python. Or, like some of the examples above show, task can be called without the @ symbol that makes it a decorator. Multiple callable actions can be added to a task, potentially allowing for more ‘declarative’ style:

mochikit.task('minimize').using_action(
  JSMinMap(
    {'style-signal.js': ['Style.js', 'Signal.js']},
    {'async.js': ['Async.js']},
  ))

Useful, I imagine, for very common patterns. Er. “Recipes”. In any case, it’s a very useful kind of tool. Beats setup.py, INI, or XML based automation language any day.

Labels: , , , , , , ,