6.9.11. HTTP Basic Auth problems that affect Zope (2 and 3) in Safari 5.1, Mac OS X Lion

One very frustrating experience that I encountered after upgrading to OS X Lion (10.7) was that Safari 5.1, as included in Lion, would constantly pop up basic-authentication dialog boxes on our CMS sites, which are based on the Zope Toolkit (kind of between Zope 3.4 and Bluebream). Just about every page in our admin UI would do this. To get around this, I switched to Firefox for interacting with our admin screens, but still used Safari as my primary browser (better OS X citizen, takes advantage of Lion features which I enjoy, bookmark syncing, etc). This lead to problems as Firefox could get pushed way back in the usage stack and would be paged out, and it did NOT like to wake up after long periods of inactivity.

A couple of days ago I decided to take another look at the WebKit project as I was certain that I was not the only person having this issue. And I found a bug that had been recently closed relating to this issue. It's WebKit bug 66354 and it's resolution is in the WebKit nightly builds since at least September 2, 2011.

Apparently this only affects OS X Lion as it has to do with low level CFNetwork changes in Lion. The bug occurs not with basic auth, exactly, as I was able to use other systems behind basic auth. The bug occurs when there are redirects with Basic Auth, which our CMS uses a fair bit in its admins for basic navigation links.

As of OS X 10.7.1 and its Safari (Version 5.1 (7534.48.3)) this is broken. If you use OS X Lion and Safari and encounter HTTP Basic Auth problems, I'd recommend switching to the nightly builds.

 

Labels: , , ,

16.10.08. Leaps and Pains (or - changing development/deployment and scm tools to more closely realize the component architecture dream)

A year or more ago, I was really struggling with zc.buildout, a Python based tool for building out "repeatable" deployments. Buildout makes setuptools actually usable, particularly for development and deployment of web apps, although there are many other uses.

Buildout keeps everything local, allowing one app to use version 3.4.2 of one package while another app can use 3.5.2. But more than just being an 'egg' / Python package manager, it can do other tasks as well - local builds of tools (from libxml to MySQL and more), again allowing one app to build and use MySQL 5.0.x and another app to use 5.1.x; or just allowing an app to be installed onto a new box and get everything it needs, from web server to RDBMS to Memcached and beyond. We don't use all of these features (yet), but it's a nice dream.

Already it's very nice to be able to make a git clone of a customer app, run buildout, and then start it up. Buildout will put setuptools to work to ensure that proper versions of dependent components are installed (and, quite nicely, it's very easy to share both a download cache and a collection of 'installed eggs' - multiple versions living side by side, with individual buildouts picking the one they desire).

But it was not easy to get to this golden land. Prior to using Buildout, we'd check our code out of our CVS repository. Our customer apps were just another Python package, nothing special (not an application, and - more importantly - not packaged up in 'distutils' style). As we started to make more and more reusable parts, we had to do a lot of checkouts; and so I wrote a tool to help automate this checkout process. It would also check out other third party code from public Subversion repositories; all because it was easier to check out a particular tag of 'SQLAlchemy' or 'zc.table' than to try to install them into a classic-style Zope 3 'instance home'.

But it was getting harder and harder to keep up with other packages. We couldn't follow dependencies in this way, for one thing; and it required some deep knowledge of some public SVN repository layouts in order to get particular revision numbers or tags.

'Buildout' promised to change all of that, and offer us the chance to use real, honest-to-goodness distributed Python packages/eggs. But getting there was so very hard when there are deadlines beating you down.

I took a lot of my frustration out on both Setuptools (which is so goddamn woefully incomplete) and Buildout. But the fault was really in ourselves... at least, in a way. As mentioned above, it was easier to just checkout 'mypackage' into$INSTANCE_HOME/lib/python/mypackage than to figure out the install options for distutils/setuptools. As such, NONE of our code was in the Python 'distutils' style. We put some new packages into that style, but would still just check out a sub-path explicitly with CVS just like we were doing with public SVN code.

Part of the big problem that we had which made it so difficult was that we had hung onto CVS for, perhaps, too long. And doing massive file and directory restructuring with CVS is too painful to contemplate. But moving to Subversion never seemed worth the effort, and so we held on to CVS. But I knew I'd have to restructure the code someday.

Fortunately, Git arrived. Well, it had been there for a while; but it was maturing and quite fascinating and it offered us a chance to leapfrog over SVN and into proper source code management. Git is an amazing tool (perhaps made more so by being chained to CVS for so long), and it provided me with the opportunities to really restructure our code, including ripping apart single top-level packages into multiple namespaced packages (ie - instead of 'example' being the root node with 'core' and 'kickass' subpackages, I could split that into 'example.core' and 'example.kickass' as separate packages and Git repositories while keeping full histories).

For a while, I used Git with its cvsimport and cvsexportcommit tools to clean up some of our wayward branches in CVS, while starting to play with Buildout. I was still struggling to get a Zope 3 site up and running using our frameworks. And here... well, the fault was partly in ourselves for having to go through fire to get our code into acceptable 'distutils' style packages, which made learning Buildout all the more hard. But the available documentation (comprehensive, but in long doctest style documents) for some of the Zope 3 related recipes was very difficult to follow. Hell - just knowing which recipes to use was difficult!

But after many months of frustrated half-attempts, often beaten down by other pressures, I opened a few different tabs for different core Buildout recipes in my browser and furiously fought through them all... And boom! Got something working!

Unfortunately it was one of those processes where by the time I got out of the tunnel, I had no idea how exactly I had made it through. One of my big complaints as I was struggling was the lack of additional information, stories of struggle and triumph, etc. And there I was - unable to share much myself! I can't even remember when I was able to break through. It's been quite a few months. Just a couple of weeks ago we deployed our last major old customer on this new setup; and we can't imagine working any other way now.

'Git' and 'Buildout' have both been incredibly empowering. What was most difficult, for us, was that it was very difficult to make the move in small steps. Once we started having proper distutils style packages in Git, they couldn't be cloned into an instance home as a basic Python package (ie, we couldn't do the equivalent of cvs checkout -d mypackage Packages/mypackage/src/mypackage and get just that subdirectory). And we couldn't easily make distributions of our core packages and use them in a classic Zope 3 style instance home (I did come up with a solution that used virtualenv to mix and match the two worlds, but I don't think it was ever put to use in production).

So it was a long and hard road, but the payoffs were nearly immediate: we could start using more community components (and there are some terrific components/packages available for Zope 3); we could more easily use other Python packages as well (no need to have some custom trick to install ezPyCrypto, or be surprised when we deploy onto a new server and realize that we forgot some common packages). Moving customers to new server boxes was much easier, particularly for the smaller customers. And we can update customer apps to new versions with greater confidence than before when we might just try to 'cvs up' from a high location and hope everything updated OK (and who knows what versions would actually come out the other end). Now a customer deployment is a single Git package - everything else is supplied as fully packaged distributions. It's now very hard to 'break the build' as all of the components that are NOT specific to that customer have to come from a software release, which requires a very explicit action.

Labels: , , , , ,

1.10.08. Giddy-up 401, File Uploads, and Safari

I've recently been doing some work to support ZODB 3.8 BlobFiles in our Zope 3 based sites and applications. Doing this brought me back around to seeing some behavior I've seen in the past and probably learned to ignore: uploading a large file from Safari using a basic HTML form (with proper encoding type, POST, etc) seems to take inexplicably long. Even worse - once behind Apache, you might not get an expected response, if any. You might get a 'timed out' response, unsure if the app server has everything and will finish the request/response cycle on its own.

It turns out that Safari does not eagerly send along authentication information along with each request when logged in with Basic Auth. When it does, it seems to have a very short time window.

So say you're logged in to your application with basic auth (for better or worse). The normal pattern is that when encountering an unauthenticated situation, Zope will challenge with a 401 status code and the WWW-Authenticate header (or something like that - I'm away from all specs right now). If you're not logged in, then you get the basic auth dialog box and send along the credentials. If you are "logged in", then the browser doesn't ask for credentials again, but just sends them along.

The downside is that this causes a request to be repeated. And if you just tried uploading a 5 MB file, then that whole file has to be re-submitted to make the full request.

It's the right thing to do with the basic (ha!) information at hand - trying to post all of that data in a single request. But Safarishould recognize that it's submitting to the same server (if not the same URL!) and should automatically include the auth headers. Safari seems to do this, but only on requests in very short windows.

Firefox, on the other hand, seems to have a much longer window in which it will send the credentials along automatically on the first request, instead of waiting for the challenge.

I don't know how other browsers do it. I'm not sure what the spec says, if anything. Glancing at O'Reilly's "HTTP - The Definitive Guide" didn't seem to give any indication of whether it's better for the client to assume that it should send the authentication tokens along with each request back to the same server, or if it's better for the client to hold off on constantly sending that info along unless challenged.

Most of the time this doesn't really seem to matter - it's not something end users typically see as it goes by fast enough to rarely be noticed. Of course there are other ways of getting credentials (cookies, sessions, subdomain/remote ip mapping, etc) which we often use on the main public-facing side of our sites. But for content management and admin, Basic Auth is such an easy fallback, especially in a framework like Zope (1, 2, or 3) which has long had a strong security story and would automatically handle this crap for you way back in the days of Bobo (1996, if not earlier).

It's just an annoyance. Glad I nailed it down to this: uploading large files with Safari (I think IE is, or was, similar) to basic-auth protected sites often can time out because the browser posts once, gets the 401-Unauthorized challenge, and does the post again - this time with the auth info.

Solutions:

  • don't use basic auth for sites that expect moderate to heavy uploading via forms.
  • recommend and/or use browsers that send the auth token along more often in the first request.
  • provide better interfaces for uploading files; providing better communication with the uploader about status, and perhaps having a better interface into the destination web app. Fortunately there are appear to be some free and open solutions out there already.

Wheee!

Labels: , ,

12.10.07. Catching Up

These periods between posts keep getting longer, don’t they?

I’ve got nothing earth-shattering to talk about. Work’s been very busy, and we continue to be served well by Zope 3. I’m still royally confused by things like setuptools and eggs, mostly in regards to how they work in a Zope 3 world when you’ve already got long entrenched ways of doing software. I could not get a good answer from anyone I asked (in fact, I often got wildly competing opinions). So I’m sticking with our internal make-rake-like-ish toolkit which is primarily helpful for automating checkouts from internal and external repositories. I did have some success with zc.buildout, but I don’t yet foresee a time when I can use it to deploy whole sites/applications. I can barely see a time when I can use it on anything but small projects that are relatively stand-alone. There’s just a big gap between The Way Things Have Been Done and The Way That It Seems That Maybe Things Should Be Done In The Future.

Of course, neither setuptools nor zc.buildout seem to have “proper” releases. zc.buildout is in an endless 1.0 beta (beta-30 at this point), and setuptools is at 0.6c7. Does that mean that it’s not even at release 0.6 quality yet? None of this instills confidence in this hurried developer.

The big problem is the legacy code, which is in CVS. Some of it is being extracted out into individual packages that have the proper ‘setup.py’, ‘buildout.cfg’, etc. Finally. But I have no idea how to apply it to the bigger picture, and I’ve found very little written words that target our situation.

The biggest downside of being so busy with customer related work is that it’s very difficult to keep up with discussions, conversations, plans, etc. And I’m sure that my frustrations with lack of documentation, seemingly unfinished releases, and so on, are really the fruit of other hurried developers. I admire them for at least releasing something. It’s more than I’ve done in a long time. It’s more than I see myself being able to do for quite some time.

Anyways, the revolving door of Javascript toolkits keeps turning. I’m now deeply enamored with jQuery. “Write less, do more”. I like it. I like that it doesn’t trample all over Javascript, and thus plays well with others (especially others that play well with others, like MochiKit). MochiKit is just so big… I think I might make a stab at writing, at least for internal use, a lightweight version that brings many of its best concepts out without overlapping jQuery’s functionality. MochiKit brings many wonderful Python-ic functions and tools to the Javascript table that make general development much easier.

I’m also deeply enamored with zc.resourcelibrary which is a Zope 3 add-on that makes it much easier to manage javascript and CSS resources and their relations to each other. Among other things, it helps save resources when they’re not needed. For example::

if len(rendered_boxes) <= 3:
    return self.just_render_the_damn_boxes(rendered_boxes)
else:
    zc.resourcelibrary.need('fancy.scrolling.library')
    return self.render_the_advanced_widget(rendered_boxes)

I’ve also adjusted my coding style, returning to the underscore_separated_words style instead of the camelCasedWords style, at least for functions, attributes, and methods. This is closer in style to PEP 8 (the main style guide for Python code). The Zope style guide differs on this point, using camelCased instead. And PEP 8 does say that it’s OK, if not downright preferred, to stay true to the style around you.

But one thing I learned from looking through Rails code was that the underscore_style was easier to read, since the underscore acts like a space. And I’ve become a big fan of writing code that communicates intent; that reads like a story (somewhat). Extract Method is your friend. I’ve grown very distrustful of excessive nesting, or of having very long bodies inside of a ‘for’ or ‘if’ block.

That’s about it. Hell of an update, huh? Well, work’s really started to become work, and is quite enjoyable. I’ve got a good flow going and don’t feel I have as much need (nor place) to be an advocate or crank. As I’ve mentioned before, we’ve gotten incredible levels of code re-use by building our internal libraries and applications on top of Zope 3, and we’ve been able to grow them so much that they’re really the first level of framework. It was such a struggle to do this in Zope 2, but in Zope 3 it does fall (fairly) neatly into place. Nothing else in the Python web-framework-whatsit world comes close.

The only toolkit that’s even better? SQLAlchemy. It’s pretty much the only way I’ll interact with RDBMS systems in Python from this point out. And I don’t mean I’ll be writing every RDBMS interaction as an object-relational mapping. SQLAlchemy is great because it provides a good connection / pooling infrastructure; a good Pythonic query building infrastructure; and then a good ORM infrastructure that is capable of complex queries and mappings (as well as some pretty stone-simple ones).

Labels: , , , , , , ,

4.5.07. ABC may be easy as 123, but it can't beat zope.interface

I guess the deadline may have come and gone for getting in PEPs for Python 3000. Guido’s already written up a PEP Parade.

Of particular interest to me has been the appearance of PEPs for Abstract Base Classes (PEP 3119) and the more exhaustive PEP 3124 which covers “Overloading, Generic Functions, Interfaces, and Adaptation.”

Both of these aim to provide ways of saying “this is file-ish”, “this is string-ish,” without requiring subclassing from a concrete “built-in” type/class. But I think they both fall short a little bit, while zope.interface (from the Zope 3 family) provides the best solution.

PEP 3119 (Abstract Base Classes) has a section covering comparisons to alternative techniques, and it specifically mentions “For now, I’ll leave it to proponents of Interfaces to explain why Interfaces are better.” So this is my brief attempt at explaining why.

A quote from PEP 3119 that I particularly like is “Like all other things in Python, these promises are in the nature of a gentlemen’s agreement…” The Interfaces as specified and used in Zope 3 and some other systems are the same way. They are not “bondange and discipline” Interfaces. They are not the ultra-rigid Eiffel contracts, nor are they the rigid and limited Interfaces as used by Java. They are basically a specification, and they can be used (as mentioned in PEP 3119) to provide additional metadata about a specification. There are some simple tools in zope.interface.verify to check an implementation against a specification, but those are often used in test suites; they’re not enforced hard by any system. The agreement might be “I need a seekable file”, which might mean it expects the methods/messages ‘read’, ‘seek’, and ‘tell’. If you only provide ‘read’ and ‘seek’, then it’s your fault for not living up to the agreement. That’s no different than the Python of today. What Interfaces and Abstract Base Classes aim to provide is a better clarification of what’s expected. Sometimes “file-like” in Python (today) means it just needs a ‘read’ method. Sometimes it means the full suite of file methods (read, readlines, seek, tell). Same thing with sequences: sometimes it just means “something iterable”. Other times it means “support append and extend and pop”.

Another side benefit of Interfaces as specification is that they provide a common language for, well, specifications. Many PEPs propose some sort of API, especially informational PEPs like WSGI (PEP 333) or API for Cryptographic Hash Functions (PEP 247). I’ll use PEP 247 as an example for my attempt at explaining why Zope 3’s Interfaces are Better.

A problem with Abstract Base Classes is this: they’re limited to classes. Even when PEP 3119 mentions Interfaces, it does so like this:

“Interfaces” in this context refers to a set of proposals for additional metadata elements attached to a class which are not part of the regular class hierarchy…

It then goes on to mention that such specifications (in some proposals and implementations) may be mutable; and then says that’s a problem since classes are shared state and one could mutate/violate intent. That’s a separate discussion that I’m not going to go into here.

What is important is this severely limited focus on classes. zope.interface works on objects as well, and not just normal ‘instances of a class’ object, but on classes themselves, and also modules.

There are two important verbs in zope.interface: implements and provides. provides is the most important one - it means that this Object, whatever that object may be, provides the specified interface directly.

implements is often used in class definitions. It means “instances of this class will provide the specified interface”. It can also be thought of in terms of Factories and/or Adaptation - “calling this object will give you something that provides the desired interface.”

“What does that matter?” you might ask. Well, there are all sorts of ways to compose objects in Python. A module is an object. It has members. A class is an object. An instance of a class is, of course, an object. Functions and methods are also objects in Python, but for the most part what we care about here are Modules, Classes, and Instances.

Because when it comes down to actual usage in code, it doesn’t particularly matter what an object is. In PEP 3124, the author (Phillip J Eby) shows the following interface:

class IStack(Interface):
    @abstract
    def push(self, ob)
        """Push 'ob' onto the stack"""

    @abstract
    def pop(self):
        """Pop a value and return it"""

Ignore the @abstract decorators, as they’re artifacts of the rest of his PEP and/or related to PEP 3119. What is important is the use of self.

“self” is an artifact of implementation that is invisible to use. Sure, you can write a Stack implementation like this. (Note: I’m going to use zope.interface terminology and style from here on out):

import zope.interface

class Stack(object):
    zope.interface.implements(IStack)

    def __init__(self):
        self._stack = []

    def push(self, ob):
        self._stack.append(ob)

    def pop(self):
        return self._stack.pop()

But when it’s being used, it’s used like this:

def do_something_with_a_stack(stack):
    stack.push(1)
    stack.push(2)
    # ...
    top = stack.pop()

stack_instance = Stack()
IStack.providedBy(stack_instance)
# True
IStack.providedBy(Stack)
# False

do_something_with_a_stack(stack_instance)
# works fine
do_something_with_a_stack(Stack)
# raises an exception because `Stack.push(1)` is passing `1` 
# to `self`.. unbound method, bla bla bla.

Notice that there is no ‘self’ reference visibly used when dealing with the IStack implementation. This is an extremely important detail. What are some other ways that we may provide the IStack interface.

One way is to do it with class methods and properties, effectively making a singleton. (This isn’t a good way to do it, and is just here as an example).

import zope.interface

class StackedClass(object):
    zope.interface.classProvides(IStack)

    _stack = []

    @classmethod
    def push(class_, ob):
        class_._stack.append(obj)

    @classmethod
    def pop(class_):
        return class_._stack.pop()

IStack.providedBy(StackedClass)
# True

do_something_with_a_stack(StackedClass)
# this time it works, because `StackedClass.push(1)` is a class method,
# and is passing `StackedClass` to the `class_` parameter, and `1` 
# to `ob`.

Another variation of the above is using Static Methods:

import zope.interface

class StaticStack(object):
    zope.interface.classProvides(IStack)

    _stack = []

    @staticmethod
    def push(ob):
        StaticStack._stack.append(ob)

    @staticmethod
    def pop():
        return StaticStack._stack.pop()

Again, StaticStack.push(1) and StaticStack.pop() work fine. Now lets try a third way - in a module! Let’s call this module mstack (file - mstack.py)

import zope.interface

zope.interface.moduleProvides(IStack)

_stack = []

def push(ob):
    _stack.push(ob)

def pop():
    return _stack.pop()

Then in other code:

import mstack

IStack.providedBy(mstack)
# True
mstack.push(1)
mstack.push(2)

print mstack.pop()
# 2

So whether we’re dealing with the instance in the first example (stack_instance), the classes in the second two examples (StackedClass and StaticStack), or the module in the last example (mstack), they’re all objects that live up to the IStack agreement. So having self in the Interface is pointless. self is a binding detail.

Jim Fulton, the main author of zope.interface, taught me this a long time ago. Because in Zope 2, you could also make an IStack implementation using a Folder and a pair of Python scripts. Well, those Python scripts (as used in Zope 2 “through-the-web” development) have at least 4 binding arguments. Instead of ‘self’, the initial arguments are context, container, script, traverse_subpath. Just like self is automatically taken care of by the class-instance binding machinery, the four Zope Python Script binding arguments are automatically taken care of by Zope 2’s internal machinery. You never pass those arguments in directly, you just use it like push(ob) and pop().

So there it is - many ways to provide this simple “Stack” Interface. And I believe that both [PEP 3119] and [PEP 3124] are short sighted by focusing on the class-instance relationship exclusively (or so it appears).

And since many objects, particularly instances, are mutable, one could compose an IStack implementation on the fly.

class Prototype(object):
    """ Can be anything... """

pstack = Prototype()
pstack._stack = []

def pstack_push(ob):
    pstack._stack.append(ob)

def pstack_pop():
    return pstack._stack.pop()
pstack.push = pstack_push
pstack.pop = pstack_pop

# Now we can say that this particular instance provides the IStack
# interface directly - has no impact on the `Prototype` class
zope.interface.directlyProvides(pstack, IStack)

pstack.push(1)
pstack.push(2)
print pstack.pop()
2

# We can remove support as well
del pstack.push
zope.interface.noLongerProvides(pstack, IStack)

Examples of dynamically constructed objects in the real world - a network services client, particularly one that’s in an overwraught distributed object system (CORBA, SOAP, and other things that make you cry in the night). Dynamic local ‘stub’ objects may be created at run time, but those could still be said to provide a certain interface.

So now let’s look at whether it matters that you’re dealing with a class or not:

@implementer(IStack)
def PStack():
    pstack = Prototype()
    pstack._stack = []

    def pstack_push(ob):
        pstack._stack.append(ob)

    def pstack_pop():
        return pstack._stack.pop()

    pstack.push = pstack_push
    pstack.pop = pstack_pop
    zope.interface.directlyProvides(pstack, IStack)

    return pstack

@implementer(IStack)
def StackFactory():
    # Returns a new `Stack` instance from the earlier example
    return Stack()

import mstack
import random

@implementer(IStack)
def RandomStatic():
    # chooses between the two class based versions and module
    return random.choice([StackedClass, StaticStack, mstack])

All three are factories that will return an object that provides an IStack implementation, which is exactly the same as the Stack class in the first example. That also claimed that it implements(IStack). When the class is instantiated / called, a new object is made that provides the IStack interface. In Python, another thing that doesn’t really matter is whether something is a class or function. All of the following lines of code yield a result that is the same to the consumer. The internal details of what is returned may vary, but the IStack interface works on all of them:

Stack()         # class
PStack()        # 'Prototype' dynamically constructed object
StackFactory()  # Wrapper around basic class
RandomStatic()  # Chooses one of the class/static method implementations.

And whether we’re looking at the class implementation, or any of the factory based implementations, the result should be the same:

IStack.implementedBy(Stack) # class
# True
IStack.providedBy(Stack)
# False
IStack.providedBy(Stack())
# True

IStack.implementedBy(PStack)    # Factory
# True
IStack.providedBy(PStack)
# False
IStack.providedBy(PStack())
# True

No matter which method of instantiation is used, they should all pass the verifyObject check, which checks to see whether all of the specified members are provided and that the method/function signatures match the specification

from zope.interface import verifyObject
verify_stack = partial(verifyObject, IStack)

all(verify_stack, [Stack(), PStack(), StackFactory(), RandomStatic()])
# True

Now the class-based options will fail on the implementedBy check, because it’s the Class that provides the implementation, not an instance like with Stack

IStack.implementedBy(StackedClass)
# False
IStack.providedBy(StackedClass)
# True
IStack.providedBy(StackedClass())
# False

“OK”, you might say, “but still, why does it matter? Why might we really care about whether these abstract specifications work only with classes? It seems smaller, simpler.”

The main advantage is that specification should (generally) make no assumptions about implementation. If the specification, aka “gentlemen’s agreement” is generally met, it shouldn’t matter whether it’s provided by a Class, an instance, a module, an extension module, or some dynamically constructed object. The specification language should be the same

Going back to PEP 247, the “cryptographic hash API”: there is a specification in that module about what the ‘module’ must provide, and for what the hash objects must provide. Consider also the WSGI spec, the DB-API specs, and all of the other formal and informal specs that are floating around just in the PEPs. Using zope.interface, those specifications can be spelled out in the same fashion. WSGI just cares about a particular function name signature. It can be provided by a single function in a simple module, or as a method from an object put together by a large system like the full Zope 3 application framework and server. It just wants a callable. This is a little bit ugly in zope.interface… but in reality, actually, I think it works. Here’s how it could be specified:

class IWSGIApplication(Interface):
    def __call__(environ, start_response):
        """ Document the function """
    # and/or use tagged values to set additional metadata

This just means that a WSGIApplication must be a callable object taking environ and start_response arguments. A callable object may be a function (taken from PEP 333):

def simple_app(environ, start_response):
    """Simplest possible application object"""
    status = '200 OK'
    response_headers = [('Content-type','text/plain')]
    start_response(status, response_headers)
    return ['Hello world!\n']

Or a class (the __init__ is what is callable here). Maybe the WSGI spec might also state that the result “should be iterable (support __iter__)” Maybe that’s loosely enforced, but the following example shows how the class can make separate declarations about what the class directly provides, and what its instances implement. Instead of using any decorators or magic-ish “class decorators” (the implements, classProvides calls above), we’ll make the declarations for both AppClass and simple_app in the same manner, which matches the style in PEP 3124.

class AppClass(object):
    def __init__(self, environ, start_response):
        self.environ = environ
        self.start = start_response

    def __iter__(self):
        status = '200 OK'
        response_headers = [('Content-type','text/plain')]
        self.start(status, response_headers)
        yield "Hello world!\n"

from zope.interface import directlyProvides, classImplements

# Both 'simple_app' and 'AppClass' are callable with the same arguments,
# so they both *provide* the IWSGIApplication interface

directlyProvides(simple_app, IWSGIApplication)
directlyProvides(AppClass, IWSGIApplication)

# And we can state that AppClass instances are iterable by supporting
# some phantom IIterable interface
classImplements(AppClass, IIterable)

What are the benefits of this, beyond just having a common way of spelling specifications? Instead of, or in addition to, abstract base classes, the core Python libraries can include all of these specs, even if they don’t provide any concrete implementation. Then I could have a unit test in my code that uses verifyClass or verifyObject to ensure I stay inline with the specification.

def test_verifySpec(self):
    verifyClass(ICryptoHash, MyHashClass)

Then, if the specification changes in a new version of Python or in a new version of someone elses library or framework, I can be notified.

Of if the specification undergoes a big change, a new spec could be written, such as IWSGI2Application. Then by process of adaptation (not covered in this post) or interface querying, a WSGI Server could respond appropriately to implementations of the earlier spec:

if IWSGI2Application.providedBy(app):
    # Yay! We don't have to do anything extra!
    # ... do wsgi 2 work
elif IWSGIApplication.providedBy(app):
    # We have to set up the old `start_response` object
    # ... do wsgi 1 work
else:
    raise UnsupportedOrUndeclaredImplementation(app)

Adaptation could provide a means of doing the above… (still, not going into the details.. trying not to!)

@implementer(IWSGI2Application)
@adapts(IWSGIApplication)
def wsgi1_to_wsgi2(app):
    return wsgi2wrapper(app)

# And then, replacing the `if, else` above:
wsgi_app = IWSGI2Application(app, None)
if app is None:
    raise UnsupportedOrUndeclaredImplementation(app)
# ... do wsgi2 work

When you have both specification and adaptation, then you can write your code against the spec. In the above example, the main code does IWSGI2Application(app, None) which means “for the object app, give me an object that provides IWSGI2Application, or None if there is no means of providing that interface.”

If app provides that interface directly, then app is returned directly. Otherwise an adaptation registry is found, and it’s queried for a callable object (an adapter) that will take ‘app’ as its argument and return an object that provides IWSGI2Application.

Another example: knowing that Python 3000 is going to change a lot of core specifications and implementations, such as the attributes for functions (func_code, func_defaults, etc). If an IPy2Function interface were made (and zope.interface or something like it was added to Python 2.x), then code that works with function object internals could program against their preferred spec by adding a line of code:

func = IPy2Function(func)
if my_sniffer(func.func_code):
    raise Unsafe(func)

On Python 2, you’d get the regular function straight through. In Python 3000 / 3.0, an adapter would translate __code__ into func_code, for example. I don’t expect this to happen in reality, but it’s an example of how migration paths could be made between two major software versions, allowing code to run in both.

By taking advantage of this system, my company has seen more re-use with Zope 3 than at any time in our company history. And because (most of) Zope 3 is programmed against specification, we’ve been able to plug in or completely make over the whole system by providing alternative implementations of core specs. This is very hard to do in native Zope 2 (the CMF, on which Plone is based, was probably the first Zope system that started these concepts, which Plone and others were able to take advantage of by providing new tools that matched the provided spec).

At the heart of it, again, is the gentlemen’s agreement, but brought out in full: it doesn’t matter who you are or where you came from (ie, it doesn’t matter what classes are in your family tree or if you are a simple module), as long as you get the job done. There’s a simple contract, and as long as the contract is fulfilled, then everybody is happy.

But if the gentlemen involved can only come from the class system, then there’s still a nasty aristocracy that excludes a large chunk of the populace, all of whom can potentially fulfill the contract. Let’s not cause an uprising, OK?

Labels: , , ,

17.4.07. Reuse and non use

We’ve been using Zope 3 in earnest for just over a year and a half now. I would like to report that in that year and a half our little company has achieved more re-use than at any time in our history. This is real re-use too: libraries of tools and objects that are easily shared among both horizontal and vertical markets, yet customized for each customer as needed. Benefits for one are fairly easily shared with all.

In the Zope 2 days, we tried hard to achieve this. But we were constantly having to re-invent the kind of architecture that I believe really makes this work: adaptation, which also brings dynamic view binding, dynamic UI generation (ie - registering a ‘tab’ for a particular object / interface and having it show up in the UI as necessary, etc. We had to spend a lot of time making the frameworks that would let us make frameworks.

“Frameworks for making frameworks?” - you heard right. Let’s face it: most web work is custom development. Sometimes custom development is best served by tools like Ruby on Rails or Pylons, or even by plain old PHP. But sometimes you know you’re going to have at least five customers all needing variations on the same thing in the coming months; and potentially more after that. You’re going to need to at least make a library or two.

See, Model-View-Controller isn’t just about “separating business logic from presentation”. It’s about separating it in a way that you can take business objects and logic (the ‘model’ layer; or models and services) and put more than one view on them. And by “more than one view”, I don’t mean “more than one template.” I mean putting wholly different user interfaces on it. I mean being able to take a base library and override a few select options (or many select options) as they appeal to a customer.

We tried to achieve this on some of our Zope 2 products, but it was hard to extract frameworks. We did OK, however, but I think that the most re-use we ever got was about three or four customers on one toolkit. That was over a three or four year span. We re-used patterns and snippets quite often, but it took a lot of work to extract an e-commerce toolkit from a particular customer’s site, and more work still to make it adaptable and workable for different customer requirements.

In the year and a half since using Zope 3 full time, we’ve had double that - and with far greater results. It’s not an easy system to just start using from scratch, but it can be quite worth it.

Being back at work on some legacy Zope 2 projects has made me all the more appreciative.

By the way: for a simpler Zope 3 development experience, check out Grok.

Labels:

10.2.07. SOAP. Web Services. Flash. Waaa.

As a result of a peculiar chain of events, we’ve been stuck with a compiled Flash file requiring data off of a web service. The old database is gone. The old server is gone. We have been unable to get much information about the data required. We don’t have the Flash source. All we have is a single ColdFusion Component that is used a web service via WSDL and SOAP.

I’ve never used ColdFusion. Fortunately, Allaire, er, Macromedia, er, Adobe makes demos available. ColdFusion even comes with a developer version that can live beyond the typical 30 days. So using that, I was able to re-generate the WSDL. I didn’t understand it, but I had it. Then I installed a demo of Flash Professional. I made a quick document and put some web service links in it by stumbling around, and watched the HTTP/SOAP communication between Flash and Apache/ColdFusion.

I tried looking at the specs for SOAP, and recoiled. Then I naively thought I could emulate the response as a Zope Page Template emitting XML, looping over the little bit of data that the Flash client wanted. But then I realized that the SOAP style in use was RPC style, and the ‘method’ was buried in the posted XML, so I couldn’t even easily answer the incoming requests.

Since there’s no standard SOAP toolkit available for Python, Zope 3 only holds a skeleton off which one can hang a SOAP handling REQUEST/RESPONSE pair. I’m too dumb, or perhaps just too busy, to wrestle with XML parsing myself, and I quickly realized that I needed some kind of toolkit. The data that I saw in the WSDL and SOAP communications appeared far too complicated for me to read and write on my own. Again I tried going to the specs, and again I ran rapidly away as I couldn’t even figure out how the WSDL connected to the SOAP requests and responses.

Now, obviously this stuff must work out for somebody. When we first approached this problem, the advice that we got was along the lines “just put up an XML gateway.” Um, what? I then realized that the tools were probably doing everything and the developers with whom we were communicating never gave much thought to what was going on. Data format? There is no data format - just put up an XML gateway!

I can’t fault anyone for this, per se. We all get abstracted away from something. Many web developers plug along happily without knowing too many details about HTTP. Even when one knows quite a bit about the more advanced or nuanced details of HTTP, one’s not likely to know much about what’s going on at the TCP/IP layer.

While stumbling around Flash Professional, I was a wee bit impressed with how quickly I could make a web service connector, wire it to a table widget, write a quick button-click handler, and could see the results in the table. The ColdFusion Component had its WSDL generated by merely adding ?wsdl to the end of the URL. Since both tools are maintained by Macromedia, er, Adobe, the SOAP messages actually seem to work together.

But my gods, it sucks.

I have been so spoiled by HTML and Python, and by Javascript, CSS, Zope, etc. There’s no end to the value of the web browser’s “View Source” command. With Python, I can read the source of almost any library and figure out its innards if I can’t get answers from the documentation. CSS and JavaScript are equally open to public viewing. One can easily send plain text as results of HTTP calls - I think that if we had designed the system we’re wresting with right now, we would have served some kind of delimited text that any tool could read - even human eyes. This week I realized just how hard it is to make a simple SOAP request by hand, leaving me to use a TCP watcher/proxy to test requests and responses.

Using ZSI from the Python Web Services project, I was able to mash together a basic SOAP Publisher for Zope 3. It doesn’t deal with multipart messages, WS-Addressing or whatever the hell that is, “document/literal”, or even Request argument processing. Well, that’s made available on the Parsed Soap instance from ZSI that’s put on the request. The handlers - the remote procedures / “controllers” / “views” / whatever you want to call them, still have to do quite a bit of work themselves, at least with the Parsed Soap message. This is because, as far as I can tell, one can’t do much without using explicitly build custom types, generated from tools like wsdl2python. Ahhh, this takes me back to all of the crap I had to deal with in ILU/CORBA - baroque generated code that one couldn’t live without, nor was likely to understand.

Even after all of that, I had to do a lot of ZSI typecode wrangling in said generated code before I even began to get results like what was generated from ColdFusion. And these are simple calls, really - no request arguments, and the returns are just lists of dictionaries. I stand amazed at just how much work is involved with getting these little details right, even with a toolkit.

Even with that toolkit’s aid, the status line in FireFox would read “Waiting for data from…” This was puzzling, as I thought had finally mangled a seemingly proper Map and Array typecode / values together from the available tools. I started to worry - is there something at the HTTP level that SOAP is doing differently? Does it want chunked results? Is it not closing the connections? I started crawling deep in the networking guts of Zope and Twisted, trying different transfer encodings, verifying that the content-length header matched the content length in the response … nothing.

Then, on a whim, I decided to try dimensioning my array. ZSI’s array, by default, doesn’t seem to handle specifying some kind of variable length in the array type. So it was generating xsd:anyType[]. The ColdFusion results had xsd:anyType[n] with n being the length of the array. I finally got this working by dynamically altering half of a tuple on a typecode that descended from the strange Result type/typecode that I was building. I’m not even sure if this is thread safe, come to think of it. But by finally getting that dimension in there, Flash started to respond again.

Ugh. Remember, The S Stands For Simple.

That kind of development - closed, compiled, heavily dependent on tools like Flash that work on few desktops and cost a good chunk of money - is just so foreign to me. I’m no die-hard open source zealot: I love a lot of the commercial tools and applications for Mac OS X, and would rather use it than any of the “free desktops”. But when it comes to web development, I’m used to dynamic languages and source code that is often readily available or shareable without the need of heavy tools. This is (usually) code that I can read, which can teach me new techniques by real implementation and not vague “hello world” examples followed by nightmare specs. I’m used to the layers below that being either so lightweight or established that I don’t have to think about them much. I’m grateful that I still seldom have to think in XML. I’m glad that I have next to none of the endless Java acronyms memorized. Sure, I guess there’s a place for them all, but I’m happy that I’ve been able to make a living for the past ten years without all of that. And I’ll be happier still when this particular problem is behind us.

Labels: , , , , ,