3.2.13. Coming Around to Sublime Text

So, quite late to the party, I'm starting to come around to liking Sublime Text 2 after many months of using BBEdit 10 as my primary development editor. I still like BBEdit, but Sublime Text is starting to pull away.

Some of the things that I like about BBEdit is that it doesn't have this runaway set of extensions and packages. It just works pretty damn well out of the box. BBEdit also has a real manual. And a real preferences UI. BBEdit is fast, handles large files well, and is very Mac native.

I decided to take another look at Sublime Text recently. One reasons is that my previous script for integrating PyFlakes into BBEdit broke in BBEdit 10.5, and I was starting to lose time to little mistakes that PyFlakes can find before I restart a dev server. I had a "PyFlakes on Save" in TextMate that I was really starting to miss. I knew there was something similar for Sublime Text and wanted to check it out. Enter SublimeLinter.

I'm not a fan of keeping track of lots of custom little packages and learning them. I have important work to do, and one of the reasons I stayed away from Sublime Text was my worry that I would spend more time tinkering with it to make it useful than I would spend doing my work. However, on re-evaluating Sublime Text, the following built-in functionality is really killer and has caused me to seriously consider a full switch:

  • Knowledge of directories in 'quick open / open anything'. Neither TextMate nor BBEdit seemed to do this. I can have large Python customer or internal projects with many files named 'interfaces.py' or 'base.py' or 'configure.zcml'. Doing a quick open to get to one of those files in other editors hasn't worked out too well. In SublimeText, I can start typing the path and hit '/' to start matching against that subdirectory.
  • Open Anything rocks. Being able to not just jump to a particular file quickly, but to then navigate right into its classes/methods/etc, all from the keyboard, is again pretty damn amazing. And fast!
  • Multiple Selection. being able to quickly rename a variable or method and also change all of its other uses in one fell swoop is pretty neat. I've generally solved this with "use selection for find / use selection for replace" but Sublime Text's option seems even better.
  • Split Windows. TextMate 1 never offered this. BBEdit allowed splitting the same file, but I don't think it allowed splitting multiple files. In general, this problem is solved with multiple windows, but I've missed being able to split windows with different files like I could do so easily in Emacs and Vim. It's nice to have a fairly modern editor that is fairly native to OS X that does this. This is especially nice in the full screen and 'distraction free' modes that I like to hit from time to time.

Something that was bothering me back when I used TextMate, and Emacs (and sometimes Vim) before that, was how out of sync my home and work setups could get. I do very little coding at home any more, and any time I would open up my editor, all of the little settings and massages that I had set up at work wouldn't be there. BBEdit 10 solved this with some native support for storing preferences and snippets and scripts in DropBox. I found some tips and tricks for storing Packages and Settings for Sublime Text and they seem to work.

Labels: , ,

29.3.11. Needlessly adding complexity in tests in order to hide their complexity

Some tweets from DHH on testing:

I respect the guys behind it and I'm all for experimentation, but the proliferation of rSpec and Cucumber makes me sad. (source)

RSpec offends me aesthetically with no discernible benefit for its added complexity over test/unit. (source)

Cucumber makes no sense to me unless you have clients reading the tests. Why would you build a test-specific parser for English? (source)

The important thing is of course that we get people testing, so tools shouldn't matter too much. But the added complexity still upsets me. (source)

I agree, and I'm glad that these kinds of tests have never really caught fire in the Python world. There are implementations of the RSpec and Cucumber ideas, but they don't seem to be as fully embraced. In my opinion, the dark side of testing in the Python world is the abuse of doc tests, thinking that they make both good test cases and good documentation, when in fact they're neither. There are good use cases for doc tests, but I think they've been horribly abused by certain sects within the Python world. However, even when they've been horribly abused, the people writing them seem to go after pretty broad code coverage, and they don't waste a lot of time trying to be cleverly concise (in fact, the verbosity of these large doc tests is what makes them so awful when they're also treated as documentation).

One of my main issues with RSpec and Cucumber as I've seen them in the wild is that there seems to be very few tests, and they're not terribly useful. They seem to be repeats of the classic "baby's first use case", which is "user logs in". Maybe their usage outside of the open-source world is different, but the few projects I've seen which use them have very few test cases that always has me going "that's it? you think you're tested?"

And as David points out, the complexity going on behind the scenes to make the tests just seems silly. Granted, 'Unit Tests' aren't always that easy to read, but they offer a finer example of API interactions. And if you want clarity, just add some comments. Take a cucumber-esque line like "verify that the file is encoded in UTF-8" and "now the file is encoded as latin-1" and put them as comments above the test/assert/verify statements that prove that line.

RSpec and Cucumber feel like the kinds of fiddly things that get in the way of doing real work - you can spend a lot of time writing all the back-end support to get a single test to read like an english Haiku. Or you can spend time writing a good battery of tests that actually get good coverage of the system.

Labels: , , ,

3.3.10. Python Buildutils for local release management

Earlier today, I saw a question on Twitter asking "is there some setuptools extensions for uploading a sdist to a server over scp?" I responded with a "yes - we use buildutils' publish command for that."

Buildutils is a collection of useful extensions to Distutils. It hasn't been touched in a while, but it mostly works. It adds some commands like "stats", "pyflakes" (useful, but does not work with recent versions of PyFlakes), and "publish", which allows you to upload a release via SCP or SFTP.

The "publish" command is an extension that must be configured explicitly, so we do it in our 'setup.cfg'. We have a setup.cfg template that we use for all of our packages that uploads releases into one big directory. It looks like this:

; Enable the buildutils publish command
[global]
command_packages = buildutils.publish_command

; Set the destination of the publish command
[publish]
dist_dest = scp://@internalserver/path/to/releases/

; In one command, generate a distribution and checksum,
; upload them, and then clean out the build-related turds.
[aliases]
mkrelease = sdist checksum publish clean --all

We also use zc.buildout in-house, and I've been using that to ensure that I have Buildutils and setuptools-git (uses what's tracked by Git to generate the Manifest of files to include in the distribution package). In our buildout configs, I usually have a 'devtools' section that looks like this:

[devtools]
recipe = zc.recipe.egg:scripts
interpreter = py
eggs =
    current.package [test]
    buildutils
    setuptools-git

With the above, I get a 'bin/pbu' script that has the Buildutils and setuptools-git extensions installed. 'pbu' is a convenience command-line tool from Buildutils that is basically shorthand for 'python setup.py'. Using Buildout like this, I just ensure that the tools I need to generate and publish a distribution are available, no matter what machine I'm using. It's not needed, but I just find it convenient, particularly when other developers in our company need to generate a release from their machines and may not have remembered to install something like setuptools-git.

Labels: ,

1.10.08. Giddy-up 401, File Uploads, and Safari

I've recently been doing some work to support ZODB 3.8 BlobFiles in our Zope 3 based sites and applications. Doing this brought me back around to seeing some behavior I've seen in the past and probably learned to ignore: uploading a large file from Safari using a basic HTML form (with proper encoding type, POST, etc) seems to take inexplicably long. Even worse - once behind Apache, you might not get an expected response, if any. You might get a 'timed out' response, unsure if the app server has everything and will finish the request/response cycle on its own.

It turns out that Safari does not eagerly send along authentication information along with each request when logged in with Basic Auth. When it does, it seems to have a very short time window.

So say you're logged in to your application with basic auth (for better or worse). The normal pattern is that when encountering an unauthenticated situation, Zope will challenge with a 401 status code and the WWW-Authenticate header (or something like that - I'm away from all specs right now). If you're not logged in, then you get the basic auth dialog box and send along the credentials. If you are "logged in", then the browser doesn't ask for credentials again, but just sends them along.

The downside is that this causes a request to be repeated. And if you just tried uploading a 5 MB file, then that whole file has to be re-submitted to make the full request.

It's the right thing to do with the basic (ha!) information at hand - trying to post all of that data in a single request. But Safarishould recognize that it's submitting to the same server (if not the same URL!) and should automatically include the auth headers. Safari seems to do this, but only on requests in very short windows.

Firefox, on the other hand, seems to have a much longer window in which it will send the credentials along automatically on the first request, instead of waiting for the challenge.

I don't know how other browsers do it. I'm not sure what the spec says, if anything. Glancing at O'Reilly's "HTTP - The Definitive Guide" didn't seem to give any indication of whether it's better for the client to assume that it should send the authentication tokens along with each request back to the same server, or if it's better for the client to hold off on constantly sending that info along unless challenged.

Most of the time this doesn't really seem to matter - it's not something end users typically see as it goes by fast enough to rarely be noticed. Of course there are other ways of getting credentials (cookies, sessions, subdomain/remote ip mapping, etc) which we often use on the main public-facing side of our sites. But for content management and admin, Basic Auth is such an easy fallback, especially in a framework like Zope (1, 2, or 3) which has long had a strong security story and would automatically handle this crap for you way back in the days of Bobo (1996, if not earlier).

It's just an annoyance. Glad I nailed it down to this: uploading large files with Safari (I think IE is, or was, similar) to basic-auth protected sites often can time out because the browser posts once, gets the 401-Unauthorized challenge, and does the post again - this time with the auth info.

Solutions:

  • don't use basic auth for sites that expect moderate to heavy uploading via forms.
  • recommend and/or use browsers that send the auth token along more often in the first request.
  • provide better interfaces for uploading files; providing better communication with the uploader about status, and perhaps having a better interface into the destination web app. Fortunately there are appear to be some free and open solutions out there already.

Wheee!

Labels: , ,

10.6.08. Looking to a Snow Leopard Winter.. er... Summer.

I’m a bit excited about Mac OS X “Snow Leopard”. Few user-visible changes, with a focus on fine-tuning and giving developers better access to capabilities of modern hardware. It appears that Apple’s experience in making a lightweight Mac OS X “Core OS” for the iPhone will also drive this release.

One of my favorite operating system releases was OS/2 “Warp” (OS/2 3.0). OS/2 2.0 was a fascinating creature - completely divorced from Microsoft, OS/2 2.0 delivered an aggressively object-oriented runtime built on SOM (a desktop implementation of some of CORBA 1.x, I believe). It was radically different from Window 3.x. It’s hardware requirements were a bit high for the times, but it was a solid OS.

What impressed me about OS/2 3.0 “Warp” was that it’s system requirements were in some cases significantly LESS than OS/2 2.0, while performing better. I don’t know of any majoro user-visible adjustments (this was before operating system releases became the giant dog’n’pony shows that have been expected since both Windows 95 and Mac OS X).

I think that even though desktop and laptop hardware continue to get better, the rapid growth rates seen between 1995 and 2005 are slowing down. Now the pressure is on connectivity, portability, and storage storage storage for all of those mp3s and movies and photos. I think both Windows XP and Vista, along with Mac OS X 10.4 and particularly 10.5 have been a bit cavalier about their usage / expectation of resource availability without doing a good job of cleaning up afterwards. Removing a ‘TemporaryFiles’ folder used by Apple’s “Soundtrack Pro” program gave me back 25 GB of disk space. 25GB! I expect that when doing lossless audio work, I’m going to leave a lot of turds behind. But not that many. That’s an accumulation over only a few months. Now some of that may have been due to crashes brought on by the instability in Mac OS X 10.5.2’s audio subsystem (particularly in relation to some USB audio devices). But still - 25 gig! Over the course of just a couple of months!

I think that Apple is at a good place to do this. Good housekeeping is required - otherwise you end up with situations like Mac OS ‘Classic’ or even Windows Vista, where there is so much old baggage, bad hacks, outdated mentalities, etc, all in play; it makes it difficult to move the platform forward. Some companies and developers have always been mindful of this, electing to keep their products lean and fast, always (see Ableton Live - hands down, the most impressive audio application out there). Other companies don’t support that philosophy for whatever reason - backwards compatibility, rush to market, a combination of the two, etc.

This far into the Mac OS X life cycle, there’s not many new dog’n’pony features to add. The API’s have stabalized, the developer tools offer more than they ever have (Interface Builder 3 is a terrific update), the Finder and Spotlight are actually fast and usable; applications and utilities from both inside and outside of Apple are going to really shine on Mac OS X 10.5 with all that it offers to developers. A new age of PDA’s are upon us, whether it’s a device like an iPhone, an ultra-mobile Asus Eee-PC style portable, or even the Macbook Air: secondary and tertiary devices are really taking off.

I think that an underlying aspect part of the ‘Snow Leopard’ plan is to allow such devices, made by Apple (naturally), to proliferate. When it was announced that the iPhone was built on Mac OS X, I was surprised - Mac OS X has been a pretty wasteful OS - or at least, one that would consume more resources than realized (often for caching, interestingly enough). A standard install is full of crap that may be useful, but often takes up space. How many gigabytes of printer drivers now? To take the fine tuning and resource management ideas from the iPhone variation of OS X into the main system is what I think will allow for Apple to finally make the Eee PC style portable that everyone wanted the Macbook Air to be.

I’m putting my money on some kind of small device, priced around $600-$800, coming out at or around the same time as Snow Leopard. Combined with Mobile Me and Snow Leopard Server’s increasingly Exchange-like feature set (but better priced and more understandable for small organizations), the ubiquitous-data-access capability is there.

Today’s full-featured laptops (MacBooks, Inspirons, whatever) are their own entities; my aging iBook gets used rarely as I just don’t have as much data or software set up on it, and it’s sometimes too big of a pain to keep in sync.

The XO and Eee-PCs (or whatever they’re called) are also separate from the rest of one’s life; useful as a fun or educational toy, or as a geek’s favorite gadget to see what they can get running on such a little device. Most of the other developments I’ve seen in this area have centered around “how cheap and how small can we make a laptop/portable that will run (Linux/Windows XP)”. But outside of education, if this is the only focus being given, then these companies are going to be making nothing more than the next round of casual gadgets that get tossed or buried after a few months - especially if a key factor of what made Palm devices so popular (for a while) is completely neglected.

The Macbook Air is deliberately designed as a complementary computer, using the master’s optical drive even. While sexy, I think the Macbook Air misses the mark on a few items. But I think it’s an indication of things to come - laptops deliberately designed to complement your main machine. Smaller devices, from the Palm to the iPhone, have done this. And they’ll also be designed to work with your (or your company’s) data, which the Blackberry has done (and the iPhone will do when its new ‘enterprise’ support rolls out). Getting this onto other devices, without being constrained to an enterprisey system like Notes or Outlook, is where things really appear to be headed. It’s certainly something that I’d like to have. And the more I look at Snow Leopard, the more I believe that Apple is sneaking ahead of the crowd into delivering this into the hands of consumers. They’re skating to where the puck is going to be.

Granted, Windows “Live Mesh” looks to be heading in the same direction. But after Vista, Microsoft needs to reign in the Windows kernel and distribution. Windows Server 2008 and some of what has been leaked (or speculated) about “Windows 7” seem to indicate that Microsoft is aware of this. And how could they not be? But I think that even with their vast resources, Microsoft has a long ways to go to catch up - even though it appears that they’ve been playing in this area (tablet computing, ultra-mobile pc’s) for a while. A deep cleansing of the Windows core is desparately needed. And then a deep re-implementation of the UI may be needed.

Apple had a terrific luxury (and great idea) with the iPhone. While sharing the same kernel and many same APIs as the desktop (and server) Mac OS X, it has an entirely new UI that is dedicated to its intended use. Windows CE, on the other hand, tried to bring the Windows 95 look and feel to tiny devices and now I’m really not so sure it was a good idea. It allowed Microsoft to punt on some usability and design issues by falling back on the way things work on the desktop. I still see this, even in some of the newest and fanciest “iPhone killers”: some of these have a very fancy launcher app; some even have a very fancy phone and contact app that spins around in 3D and responds to gestures. But then, suddenly, you’re in the tiny-font, tiny-scrollbar, pixelated, stylus-driven world of the interior. It’s like going into a grand building like The Plaza (back when it was a hotel, at least), and finding the inside full of grey linoleum floors, flickering flourescent lights, and cinderblock walls reminiscent of an old hospital or elementary school. Quite the let-down (a lot of courthouses are like this, actually).

I also think Apple was smart to NOT have an SDK at the launch of the iPhone. I bet they would have liked one, but I think the iPhone had to launch when it did, and perhaps not-quite-everything was ready yet. If one looks back at the classic Macintosh and Palm devices and operating systems, you see systems that pulled of very clever hacks to fit within the price and size constraints of the time. The Lisa was much more than a $10,000 Macintosh - it had many features from power management to an OpenDoc style multi-tasking document based UI. But to offer those features, it was priced well out of reach. The Macintosh squeezed as much as it could down into a 128K Ram machine, and the compromises they had to make in order for that to work would end up haunting the company until its near-death. The Palm, too, took the ideas of the Newton and other tablet devices and stripped them down into a size and price point approachable by the masses. And like Apple, the design decisions that were made to make that work have crippled the Palm OS so much that even Palm sells half of its devices with Windows CE (or whatever CE is called these days). Those compromises are bad enough to deal with on your own - but when having to support third party developers and then provide some degree of backwards compatibility, it can just kill you.

By taking the time to put the SDK into beta, to polish up the OS and its APIs, I think Apple will avoid a repeat of that story. Instead of having to support every little exposed compromise that may have been made to get the iPhones out the door last June, Apple could tidy them up. By using a beta period for the SDK and next major release of the software, Apple can respond to feedback and make changes and adjustments before they become permanent.

Labels: , , , ,

2.12.07. Distributed VCS's are the Great Enablers (or: don't fear the repo)

The more I play with the new breed of VCS tools, the more I appreciate them. The older generations (CVS, SVN) look increasingly archaic, supporting a computing and development model that seems unsustainable. Yet most of us lived with those tools, or something similar, for most of our development-focused lives.

When I speak of the new breed, the two standouts (to me) are Git and Mercurial. There are some other interesting ones, particularly Darcs, but Git and Mercurial seem to have the most steam and seem fairly grounded and stable. Between those two, I still find myself preferring Git. I’ve had some nasty webs to untangle and Git has provided me with the best resources to untangle them.

Those webs are actually all related to CVS and some messed up trunks and branches. Some of the code lives on in CVS, but thanks to Git, sorting out the mess and/or bringing in a huge amount of new work (done outside of version control because no one likes branching in CVS and is afraid of ‘breaking the build’) was far less traumatic than usual.

One of those messes could have been avoided had we been using Git as a company (which is planned). One of the great things these tools provide is the ability to easily do speculative development. Branching and merging is so easy. And most of those branches are private. One big problem we have with CVS is what to name a branch: how to make the name unique, informative, and communicative to others. And then we have to tag its beginnings, its breaking off points, its merge points, etc, just in case something goes wrong (or even right, in the case of multiple merges). All of those tags end up in the big cloud: long, stuffy, confusing names that outlive their usefulness. It’s one thing to deal with all of this for an important branch that everyone agrees is important. It’s another to go through all of this just for a couple of days or weeks of personal work. So no one does it. And big chunks of work are just done dangerously - nothing checked in for days at a time. And what if that big chunk of work turned out to be a failed experiment? Maybe there are a couple of good ideas in that work, and it might be worth referring to later, so maybe now one makes a branch and does a single gigantic check-in, just so that there’s a record somewhere. But now, one can’t easily untangle a couple of good ideas from the majority of failed-experiment code. “Oh!” they’ll say in the future, “I had that problem solved! It’s just all tangled up in the soft-link-experimental-branch in one big check in and I didn’t have the time to sort it out!”

I speak from personal experience on that last one. I’m still kicking myself over that scenario. The whole problem turned out to be bigger than expected, and now there’s just a big blob of crap, sitting in the CVS repository somewhere.

With a distributed VCS, I could have branched the moment that it looked like the problem was getting to be bigger than expected. Then I could keep committing in small chunks to my personal branch until I realized the experiment failed. With smaller check-ins, navigating the history to cherry-pick the couple of good usable ideas out would have been much easier, even if everything else was dicarded. I wouldn’t have to worry about ‘breaking the build’ or worry about a good name for my branch since everyone else would end up seeing it. I could manage it all myself.

This is the speculative development benefit that alone makes these tools great. It’s so easy to branch, MERGE, rebase, etc. And it can all be done without impacting anyone else.

One thing that I often hear when I start advocating distributed VCS’s is “well, I like having a central repository that I can always get to” or “is always backed up” or “is the known master copy.” There’s nothing inherant in distributed VCS’s that prevents you from having that. You can totally have a model similar to SVN/CVS in regards to a central repository with a mixture of read-only and read/write access. But unlike CVS (or SVN), what you publish out of that repository is basically the same thing that you have in a local clone. No repository is more special than any other, but that policy makes it so. You can say “all of our company’s main code is on server X under path /pub/scm/…”.

And unlike CVS (or SVN), really wild development can be done totally away from that central collection. A small team can share repositories amongst themselves, and then one person can push the changes in to the central place. Or the team may publish their repository at a new location for someone else to review and integrate. Since they all stem from the same source, comparisons and merges should all still work, even though the repositories are separate.

Imagine this in a company that has hired a new developer. Perhaps during their first three months (a typical probationary period), they do not get write access to the core repositories. With a distributed VCS, they can clone the project(s) on which they’re assigned, do their work, and then publish their results by telling their supervisor “hey, look at my changes, you can read them here …” where here may be an HTTP or just a file system path. Their supervisor can then conduct code reviews on the new guys work and make suggestions or push in changes of his own. When the new developers code is approved, the supervisor or some other higher developer is repsonsible for doing the merge. It’s all still tracked, all under version control, but the source is protected from any new-guy mistakes, and the new-guy doesn’t have to feel pressure about committing changes to a large code-base which he doesn’t yet fully grasp.

But perhaps the most killer feature of these tools is how easy it is to put anything under revision management. I sometimes have scripts that I start writing to do a small job, typically some kind of data transformation. Sometimes those scripts get changed a lot over the course of some small project, which is typically OK: they’re only going to be used once, right?

This past week, I found myself having to track down one such set of scripts again because some files had gotten overridden with new files based on WAY old formats of the data. Basically I needed to find my old transformations and run them again. Fortunately, I still had the scripts. But they didn’t work 100%, and as I looked at the code I remembered one small difference that 5% of the old old files had. Well, I didn’t remember the difference, I just remembered that they had a minor difference and I had adjusted the script appropriately to finish up that final small set of files. But now, I didn’t have the script that worked against the other 95%. When I did the work initially, it was done in such a time that I was probably using my editors UNDO/REDO buffer to move between differences if needed.

Now if I had just gone in to the directory with the scripts and done a git init; git add .; git commit sequence, I would probably have the minor differences right there. But I didn’t know such tools were available at the time. So now I had to rewrite things. This time, I put the scripts and data files under git’s control so that I had easy reference to the before and after stages of the data files, just in case this scenario ever happened again.

I didn’t have to think of a good place to put these things in our CVS repo. I just made the repository for myself and worried about where to put it for future access later. With CVS/SVN, you have to think about this up front. And when it’s just a personal little project or a personal couple of scripts, it hardly seems worth it, even if you may want some kind of history.

Actually, that is the killer feature! By making everything local, you can just do it: make a repository, make a branch, make a radical change, take a chance! If it’s worth sharing, you can think about how to do that when the time is right. With the forced-central/always-on repository structure of CVS and SVN, you have to think about those things ahead of time: where to import this code, what should I name this branch so it doesn’t interfere with others, how can I save this very experimental work safely so I can come back to it later without impacting others, is this work big enough to merit the headaches of maintaining a branch, can I commit this change and not break the build….?

As such, those systems punish speculation. I notice this behavior in myself and in my colleages: it’s preferred to just work for two weeks on something critical with no backup solution, no ability to share, no ability to backtrack, etc, than it is do deal with CVS. I once lost three days worth of work due to working like this - and it was on a project that no one else was working on or depending on! I was just doing a lot of work simultaneously and never felt comfortable committing it to CVS. And then one day, I accidentally wiped out a parent directory and lost everything.

Now, in a distributed VCS, I could have been committing and committing and could have lost everything anyways since the local repository is contained there: but I could have made my own “central” repository on my development machine or on the network to which I could push from time to time. I would have lost a lot less.

There are so many good reasons to try one of these new tools out. But I think the most important one comes down to this: just get it out of your head. Just commit the changes. Just start a local repository. Don’t create undue stress and open loops in your head about what, where, or when to import or commit something. Don’t start making copies of ‘index.html’ as ‘index1.html’, ‘index2.html’, index1-older.html’ ‘old/index.html’, ‘older/index.html’ and hope that you’ll remember their relationships to each other in the future. Just do your work, commit the changes, get that stress out of your head. Share the changes when you’re ready.

It’s a much better way of working, even if it’s only for yourself.

Labels: , , , , , , , ,