- Thursday, June 12, 2008
DVCS Myths
My last post on distributed version control systems generated some interesting discussion, both in the comments here and elsewhere on the Web. A number of the responses were interesting and thought provoking, while others were so full of FUD and misinformation I couldn't help but wonder if they were serious. I'll admit that I was surprised by some of the negative backlash against DVCS. I have explained it to many former users of centralized systems, and it simply never struck me as a very controversial technology. I don't want to just completely ignore the criticism, however. This post is an attempt to respond directly to some of the more common criticisms, and hopefully convince some of the skeptics that even if DVCS isn't the solution for you, at least it won't start your computer on fire.
DVCS Myth #1: You must change your workflow to adopt DVCS
Many descriptions of DVCS focus on the new and interesting workflows it enables. Indeed, this is a key feature of distributed version control, but it has a tendency to give the implication that DVCS is only useful if you really need to change your workflow.
This is entirely untrue. DVCS is flexible, and can be implemented in some very interesting and unique ways. However, it can also act just like your centralized system, and its advantages are no less significant.
At our company, for example, we switched from Subversion to Mercurial without changing our model at all, at least initially. We kept the same branch structure, used the same server, and did things in generally the same way. As our team has grown and diversified, our needs have as well, so we've leveraged some of the strengths of the DVCS model to match our workflow. The key is that DVCS works with your desired workflow rather than dictating it. If your desired workflow is similar to or identical to the "central server" model, that's a perfectly acceptable use case for applying DVCS.
DVCS Myth #2: Workflows enabled by DVCS are less natural than the centralized workflow
For long-time users of centralized systems, this is an understandable belief. Indeed, the workflow mandated by a centralized system may in some cases be the most natural. In these cases, DVCS offers the best implementation of the centralized workflow I've found. It's in cases where the centralized model is not the most natural workflow, however, that the unique properties of DVCS really shine.
As a specific example, DVCS has enabled me to manage changes to my home directory much more naturally than in a centralized system. I keep the contents of my home directory (dot files, elisp, etc.) under version control. I was using Subversion prior to discovering DVCS. With Subversion, I ran the server on my home development workstation, which I left powered on during the day so it was accessible from work (forcing me to pay otherwise unnecessary power costs). In addition, I paid $5 per month to my ISP for a static IP (dynamic DNS was unfortunately not an option due to the NAT configuration of my fiber-to-home service).
Despite these costs, the workflow in this setup was extremely unnatural. When I would make an update to the repository on the bus, I would have to leave the files in a modified state. Upon arriving at work, I would then have to open the laptop, connect to the network, then either make a bulk checkin with all of the changes or manually partition the modified files into the proper groups for changesets.
If, on the other hand, I made changes on my work computer and wanted to check them in while my home server was down (because of a network outage, or simply because I forgot to turn it on in the morning), I would have to manually generate patches from the repository (again, forcing myself to later reassemble them into logical changesets). Of course, accessing source control over the Internet is never ideal from a performance perspective, even when the server is always up. This is particularly true when using a strained corporate connection to talk to a server on an upload limited consumer line.
This was an annoying process, to say the least, and while it was a huge improvement over manually copying my home directory around, it left much room for improvement.
With DVCS, all of the annoyances of the previous model are gone. I can make commits from the bus without network access, and these commits are properly organized into the appropriate changesets as opposed to a giant single patch. I can easily pull these changes into my work computer's repository when I get to work, or I can leave the laptop in the bag and merge them another time. The changes I make on my work computer, meanwhile, need to make it back to my home machine. However, with my DVCS-powered workflow I now keep my machine turned off during the day (making DVCS the green SCM choice). I have also canceled my static IP service, saving myself $5 a month. In the absence of direct access to my home repository, I use a variety of mechanisms for sharing changes. Most commonly, I transfer changesets to my home machine via my laptop's repository. In other cases, I will export a handful of changesets and transfer them with a USB thumb drive or via email. In general, I use the most convenient option available, though I have used all three in various situations.
Regardless of where the work happens or how it's transferred, merging the changesets is simple with DVCS because, well, DVCS is designed to make merging changesets simple. It's simple no matter where the changesets originated, in part because DVCS uses unique hashes to identify changesets. Of course, it also tracks the parent revision of each changeset, so it can determine cases where a merge isn't necessary at all. This, unsurprisingly is the most common case given that I'm the only user in this scenario.
One thing you may have noticed in the workflow description is that it's a bit ambiguous which of my computers is "the server". Previously, it was my home machine, but why? It could just as easily have been any other machine ... in fact, I probably would have been better off running the centralized server on the laptop, though that doesn't seem quite right either. The fact is, this is a workflow where "the server" is naturally ambiguous. There is no real value in designating my home machine (or any other) in this role. Thus, the centralized model for version controlling my home directory simply isn't a natural fit. The DVCS model, on the other hand, easily and naturally supports my desired workflow. There are no "hacks" required to make this work cleanly.
As an added bonus, I get free offsite backups of my home directory repository. This leads to our next myth.
DVCS Myth #3: DVCS users don't believe in backups
The idea that DVCS users don't believe in backups is surprisingly pervasive, perhaps because of the passive attitude DVCS advocates tend to have about server outages. At our company, we have the same attitude, but we also make very frequent backups of our centralized repository. Using DVCS may theoretically reduce the need for backups, but by no means does it eliminate it.
So, why make backups of a source control server with so many backups? It is improbable that many servers will suffer catastrophic hardware failures simultaneously, but it is not impossible. A more likely scenario might be a particularly nasty computer virus that sinks its teeth into an entire network of vulnerable machines. In any case, the probability of any or all of your backups becoming suddenly unavailable is really not the point. The bottom line is that using independent clones as canonical backups (as opposed to temporary stopgaps) is a suboptimal strategy.
Security, for example, should be considered. If you are using authorization rules to control access to specific portions of your repository, canonicalizing an arbitrary clone of the repository effectively renders those rules useless. While this would rarely be a matter of practical concern in a controlled corporate environment, it is nonetheless possible. It is worth noting that in an environment where a backup process is infeasible (for financial, political or other reasons), backing up hashes of the repository files and their revisions for post-backup verification could provide a mitigation.
The key win of DVCS for backups, then, is that you don't really need to invest in a "hot" backup. When the server inevitably goes down, DVCS will buy you time. Lots of time. You'll essentially be running at full productivity (or very nearly so) while you rebuild your server from backup. When changesets created during the server downtime are pushed back to the restored server, the freshly restored authorization rules will be reapplied and you'll be back on track.
DVCS Myth #4: Authentication and authorization don't exist with DVCS
I touched on this a bit in the previous myth, but it's worth emphasizing. Authentication and authorization absolutely do exist in the DVCS model. They only apply where you choose to apply them, however.
Our company has a canonical source control server which applies both authentication and authorization rules. The authentication rules are specified via the Apache server configuration which provides network access to the repositories. In fact, we leveraged the exact same Apache authentication configuration we had used for our Subversion installation (this configuration allows us to leverage the user database in the company's Windows domain). For authorization, we use the more flexible options offered by Mercurial's ACL configuration. In the simplest case, we have developer-specific copies of mainline development branches which can be pulled by anyone, but only pushed (written) to by the developer who owns it. Grouping users and splitting access based on subpaths of a single repository are nearly as simple.
Because the authorization rules are applied when changesets are pushed, developers working on local repositories are not denied any flexibility until they attempt to push their changes. A user in this scenario can still commit changes against that repository, they just can't push them directly. Thus, they would have to convince a developer who does have such permission that their changes are worthy of inclusion. Despite the flexibility, the effectiveness of the authentication and authorization rules are not compromised.
DVCS Myth #5: DVCS can be used in corporate environments, but its advantages are mostly geared towards open source projects
DVCS is indeed quite popular for open source projects, and the reasons are fairly obvious. When many disconnected developers are working on the same project, the workflow flexibility provided by DVCS becomes increasingly important. It also provides a clean mechanism for remote users without commit access to the primary repository to create new functionality within the code base.
A user working on a new experimental feature, for example, can perform the work on their local clone of the repository. Within this local copy, they can commit changesets and integrate changes from other users. Importantly, as the mainline codebase evolves, they can also merge their changes with updated upstream code in a clean and organized way. In a centralized system, they would be forced to maintain their changes as a set of patches, manually rebasing them when they pull down new changes in the upstream repository. When their experimental feature is complete, they can easily export the new work and send a compact package to the appropriate maintainer.
The workflow flexibility DVCS offers is particularly valuable in an open source project with multiple maintainers. The maintainers in this scenario would be responsible for integrating contributions from the community for the modules they are responsible for. The contributions would mostly come in the form of structured patches exported by the DVCS client. The process of integrating these patches is easier and more organized with DVCS. Additionally, merging responsibilities can easily be split amongst several maintainers for patches that are accepted.
In corporations, on the other hand, it is rare to find groups collaborating by sharing patches. Thus, at a glance the flexibility offered by DVCS model might seem to be overkill. In some cases, this is true, I doubt any company or organization needs every feature provided by DVCS. However, having the capability to restructure your workflow is extremely valuable, even if you don't need it yet. And the parts of DVCS you don't use certainly don't cost you anything.
Perhaps you want to prevent a group of developers from committing changes to your mainline repository until they are reviewed. Using a centralized system, the developers from this group must submit patches for review and integration with the main code base. Their commit logs are thus lost, overwritten by the reviewing developer who applies the patch. DVCS makes this significantly easier. Those responsible for reviewing the changes simply pull reviewable changesets directly from the developer, or they can pull them via a developer-specific branch on the server. If the changes pass review, they can push them along to the main repository (only they would have access to do this). This group might also collect their changes together into their own shared repository, enabling a variety of changes to be tested together.
These sorts of scenarios can be especially useful when collaborating on a single project with an external development organization. I have seen attempts to use centralized version control systems in these scenarios fail miserably. Corporate centralized servers are rarely designed to be exposed on public networks, so naturally administrators shy away from enabling remote access, except via VPN. When no shared central server is available, the inevitable result is a hackish process. In the best case this might be a process based on exchanging patches from known parents, but more commonly it involves trading full copies of the source code with all revision history lost, then performing painful manual merges. With DVCS, you simply send the whole repository once, then share the exported changesets by whatever transfer mechanism is most convenient (server access is just a bonus). Merges can again be performed by either development organization, which is especially convenient.
DVCS Myth #6: Having a server with perfect uptime invalidates the advantages of DVCS
Even if you are not on a plane, you may very well be on the bus, or at home, or at a coffee shop, or in a hotel room, or on vacation. Just because your server has perfect uptime doesn't mean you're always in position to access it.
I have personally needed source control repository access in every single one of these places (including the plane), and I cannot rely on high speed Internet access in most of them. A couple of weeks ago I got a call about a bug that needed fixing while riding in on the bus. I was only 10 minutes from my destination, but that was enough time to crack open my laptop and run a bisect session to uncover the changeset which introduced the bug. Upon arriving at work, I knew exactly what the problem was and I was able to fix it immediately. With more time, I could have prepared the fixed changeset on the bus, ready to transfer as soon as I arrived. DVCS not only gives you access to your repository everywhere, it offers the most performant experience possible in all of these scenarios.
The performance aspect is a relevant point when speaking about uptime. As far as I'm concerned, if my "annotate" command takes 10 seconds to run, that counts as 9.5 seconds of downtime, because with DVCS it's virtually instantaneous. A slow responding centralized server can easily cost you as much productivity as an occasionally inaccessible one in the long run.
Hiccups can be mitigated to some extent by spending a great deal of money on your source control server, the hardware between it and your workstations, and appropriately qualified staff to make it all work together. However, I've rarely seen companies willing to make the required investment (never in my own experience, and quite rarely in others). Even in those that do, it's still only a mitigation (a server running on good hardware still inevitably gets sluggish under load), and it only lasts so long. As your repository grows and your team expands, the scaling pressure increases. DVCS, on the other hand, grows with your company, and always provides optimal performance. Thus, you don't need to spend an extra $25,000 on your server hardware "for future hires".
DVCS Myth #7: DVCS encourages chaos in your development process
This seems to be the issue that, more than any other, causes the anti-DVCS crowd to load up the FUD cannons. I've not seen any evidence to suggest that DVCS encourages degradation in teams. In fact, I have seen the opposite effect. Because DVCS can be shaped to the natural workflows of your team, when implemented properly it enables teams to work more smoothly, with less communication overhead. With that said, the fear, uncertainty and doubt surrounding this issue isn't going away, so it is only fair to address what seems to be such a "hot button" issue.
For starters, let's define chaos. I think it's important to understand that "chaos" is a moving target. A happy Subversion user might see the flexibility offered in the DVCS model as potentially chaotic. Meanwhile, there are many happy Visual SourceSafe users (really, there are -- I've met some, they are nice people despite this) who find the idea of a non-exclusive locking source control system to be the very definition of chaos.
Most people who have been writing software for a long period of time in a team environment accept that having multiple developers edit the same file at the same time is not only an acceptable form of chaos, but a very necessary one. It may not be intuitive (it sounds a lot like chaos until you've realized its importance), but it's almost universally recognized within mature development organizations that the cost of merging file changes when multiple developers edit the same file far outweighs the massive productivity cost of a strict locking system.
So at least on this single issue, the DVCS proponents and detractors agree with each other wholeheartedly, or at least the vast majority of them do. It is not a huge step, then, to imagine other scenarios which might seem chaotic on the surface, but in fact enable huge gains in productivity.
We can acknowledge, having established that chaos can be valuable, that DVCS allows for chaos. All systems allow for some form of it. It is up to your team to determine the appropriate level of chaos that is permitted, and to enforce the process. This is true no matter what system or process you are introducing. If a particular developer working under a centralized source control system never checks in their work, that's a process failure, not a technology problem.
Many common situations in centralized systems lead to chaos as well. To me, the fact that a user cannot check in a set of changes until they've merged in everyone else's work on the same branch is chaos. This makes it far too easy to lose work because of a "merge gone wrong". I have seen developers switch workstations to resolve merge conflicts on more than one occasion. Being disallowed from checking in broken changes that you don't wish to share with others also leads to chaos. Developers wishing to add this layer of control with a centralized system today are forced to either do it manually (by making a copy of the in-progress repository or the relevant patches in case you want to back out) or to adopt a local DVCS.
The rapidly increasing popularity of running DVCS locally on top of centralized repositories really speaks to the need for the flexibility it offers. If you ask around, you'll find a number of different reasons why a given developer might have adopted this strategy. Nearly all of them are good arguments for DVCS in general. Some may want to version changes at a more granular level before sharing their changesets. Some might want a layered mechanism for transferring partial changesets between different environments. Others might value the ability to seamlessly create private branches for managing a particular single user workflow.
Indeed, DVCS provides significant benefits when used by a sole developer on top of a centralized server. But when enabled for an entire organization, it becomes even more powerful. For starters, all users of the system instantly gain access to the valuable features of DVCS. Even developers that don't take advantage of the more advanced DVCS features will instantly benefit from a speed improvement. More importantly, the workflow flexibility enjoyed by the individual user now extends to the entire team.
Having a source control system that supports your workflow and enables people to work together optimally is very likely to lead less chaos in your company or organization.
DVCS Myth #8: All DVCS proponents think centralized version control systems are useless pieces of garbage, and that you're insane for using them
I think this perception is common, and triggers a defense mechanism that in many cases gets in the way of having a rational discussion of DVCS. First of all, most DVCS users used a centralized version control system before switching over. And most of them didn't choose to use diff & patch in lieu of that centralized system (with one rather notable exception).
I personally have several years of experience with CVS, Perforce and Subversion. I have actually had generally positive experiences with all of those tools, and I'd take any of them over a diff & patch based version control strategy. However, part of the reason for my being able to co-exist peacefully with these tools is that I bent my development processes to fit the limitations of the tools. Subversion's sub-par branching, for example, was annoying but not crippling because I avoided having lots of branches, instead choosing to unnaturally manipulate process (or even release dates). Perforce won't let you blink without server access, so I wrote a layer of proprietary code on top of p4 to manually reattribute files and generate scripts to eventually notify the server of opened-and-or-changed-but-the-server-doesn't-know-about-it files (yeah, and DVCS is chaotic). Everyone I worked with either had their own hacky solution to this problem, or they stopped getting work done when they didn't have server access.
As a generally content user of these centralized systems, I was curious enough about DVCS to read the occasional article touting it, but it never really hit me that it could make such a significant impact on my own development process, or the process at our company. It's difficult to see just how broken particular workflows are until they're fixed. As I began to better understand the advantages of DVCS, I started to become more aware of the annoying hacks that I was employing in an attempt to get work done under a centralized system.
DVCS Myth #9: DVCS is hard to learn
Before becoming a DVCS user, I definitely had this perception. DVCS can seem very intimidating. Typical explanations of DVCS are littered with complex workflow descriptions that are rarely familiar or intuitive to users indoctrinated in a centralized source control system mindset. This often makes DVCS seem overly complex or even irrelevant to one's needs.
To a degree, DVCS is difficult to learn. A system that allows for a great deal of flexibility is naturally more difficult to learn than a system with limited capability. However, in the context of a particular need one needs to solve, DVCS is quite easy to learn. If, for example, you decide to replace Subversion with Mercurial and continue using the same trunk / branch model, there is very little to learn in order to make the switch.
Thus, DVCS itself is not "hard to learn". It can be quite challenging, however, to determine the best possible workflow for change management at your company. Because DVCS expands your options in this area, it's easy to mistake it as "difficult". Conceptually, DVCS is really quite simple. It's the optimized application of DVCS that is challenging. If you're intimidated by it, start by using it to imitate your existing workflow, then look for gaps in the efficiency or flexibility of your workflow. Chances are, DVCS will be able to solve them.
DVCS Myth #10: DVCS is hard to use
Once a particular DVCS workflow has been established, the difficulty of day-to-day usage of the system is very similar to centralized systems with equally complex workflows. Many DVCS implementations include more granular commands than are offered by centralized systems, but it's usually simple to emulate them. Following are a few examples of common Subversion commands and their equivalent in Mercurial.
Operation Subversion Mercurial Commit changes to remote server svn ci hg ci && hg push Get changes from remote server svn up hg pull -u Show change log svn log hg log Annotate a revision svn blame hg annotate Show status of changed files svn status hg status Show changes in current files svn diff hg diff Print a file's contents at a particular revision svn cat -r 55 hg cat -r 55 Cherry pick a single revision from branches/main to trunk svn merge -r720 ../../branches/main hg transplant -s ../../branches/main f587e Merge all unmerged revisions from branches/main to trunk svn log | grep -i merging
...
svn merge -r640:646 ../branches/main
svn merge -r681:682 ../branches/main
svn merge -r689:662 ../branches/main
svn merge -r667:669 ../branches/main
svn merge -r676:719 ../branches/main
svn merge -r725:730 ../branches/main
svn merge -r734:HEAD ../branches/mainhg pull ../branches/main To learn this basic set of commands given a background with a centralized system and a similar or identical workflow would take only a couple of minutes. Fortunately, you'll buy back those minutes and many, many more each time you run these commands. It can be a bit startling at first to adjust to all of your VCS commands running so fast, but you'll cope, I promise. And if you're a Subversion or CVS user, you can stop scheduling "branch days" on your calendar.
DVCS Myth #11: DVCS is a fad
At some point, it became acceptable to discount the value of all new technology with a reference to some unrelated technological flop. DVCS is the new Betamax, apparently, simply by virtue of the fact that it's new and different. Despite these inane comparisons, the question itself is worth pondering.
For a technology to be a fad, there needs to be some initial period of excitement and adoption, followed by a relatively rapid dilution of interest after this initial period. Technologies that end up in the "fad" category tend to be those that can drum up excitement with marketable promises, but either fail to deliver on that promise or miss a key element required to reach a "tipping point". Most technologies that we associate with the "fad" term were interesting enough to justify at least some initial excitement at one point in their history. Laserdisc was a failure, historically speaking, but putting video content on an optical disk and enabling interactive features doesn't seem like such a bad idea these days.
DVCS certainly meets the criteria for fad potential, at least at this point in its history. It has a strong and growing base of highly passionate users and evangelists. It's also a relatively new technology, despite having a few years of success stories in its wake. So, will DVCS continue to accelerate? Let's look at some of the "fad factors" as they apply to DVCS.
We might decide that Laserdisc was a failure because of a poor technical implementation. That is, putting video content on an optical disc was a good idea, but the discs were too large or the quality was too low to back it up. So, does DVCS have the same problem? I think there was some legitimacy to the "good idea, bad implementation" complaint as recently as a couple of years ago. There were a several DVCS tools to choose from at that time, but each had significant quirks. In the meantime, however, the quality of the DVCS experience has increased dramatically. Excellent newcomers like git and Mercurial have burst onto the scene, while quirks have steadily been disappearing from their competition.
From a technological implementation perspective, the state of DVCS implementations is strong, and getting stronger. Having used Mercurial for over a year now (well before their 1.0 release), I'm amazed by how trouble free it has been. As the repository has grown and the complexity of our source control usage has increased, Mercurial has continued to be as fast and pleasant to use as it was on day 1. Perhaps it's just our good luck, but it has also been less painful to administer than any source control system I've managed in the past.
A bigger concern when evaluating whether or not something has fad potential is marketing a product that the market doesn't exist for. This is sometimes because the product is "ahead of its time", but more often it's because the benefits were oversold. The Segway comes to mind, although I'm not entirely sure they ever had the initial adoption to justify the "fad" label.
With DVCS, this argument is a bit more challenging to evaluate, because it involves some speculation. I know from personal experience that DVCS offers unique capabilities that at least some segment of the market needs. However, even if I'd been doing this for a lifetime, it is a pretty microscopic sample size size. Perhaps a better way of looking at it is to understand what you lose by moving to DVCS. It's very, very difficult to imagine a company or organization that can't benefit from at least one aspect of DVCS. Any organization that allows employees to work from home, as a simple example, would benefit from improved productivity with DVCS. But what do they lose?
Obviously the answer to this question depends on the specific scenarios, but even assuming that you want to keep your centralized workflow I don't see much downside. You end up checking in merged changesets more often in DVCS, though every modern DVCS system has a way of making this happen nearly as seamlessly as in a traditional centralized system. The difference between the two is basically a safety tradeoff (that is, the ability to commit your changes before merging). And the safety tradeoff is optional in many cases, for brave users who wish to merge remote changes with uncommitted files. Of course, there are other advantages to decoupling commit and merge which are not realized in this case.
What about cost? Most DVCS software implementations are available free of charge. Compared with best of breed commercial implementations of centralized systems, this can save quite a bit of money right off the bat. Perhaps more significantly, the DVCS model minimizes the amount of money you need to spend on your server hardware. All operations that don't involve sharing changesets are done on local clones of the repository, so the server has far less work to do. Thus, your shared repository will happily run on less-than-stellar hardware without impacting most SCM use cases. Server administration is essentially identical to a centralized server, so you won't find hidden costs there either. The only relevant cost consideration, in fact, is the cost of the initial migration.
While there are no guarantees that DVCS will break through into the mainstream, it's difficult to find many compelling arguments against it. For all its pros, there just aren't many cons. The limitations that do exist today can be eliminated with modifications to the technology implementations as opposed to the idea itself. There is no doubt in my mind that centralized systems will continue to exist for some time (CVS is still quite popular, and its been years since there was a legitimate case for starting a project on it). However, it is inevitable that centralized systems will start to gain more and more DVCS functionality.
It's easy to imagine these DVCS / centralized system hybrids eventually becoming quite popular, in fact. They might operate in full "DVCS mode" for the majority of operations, but automatically consult the server for files larger than a certain threshold, or for inspecting changesets that are several years old. Or perhaps they will be able to enforce certain aspects of policy to ease the fears of those who remain fearful of "DVCS chaos" but desire the productivity boost it provides.
DVCS Myth #12: DVCS is the perfect solution in all cases
Having spent a fair amount of time talking about the benefits of DVCS, it's only fair to spend some time talking about cases where it might not be the optimal solution, at least in its current forms.
If your company or organization has a single centralized repository with hundreds of thousands of files or millions of revisions, it may be infeasible to store the entire repository on each client. As we discussed in our last myth, this doesn't necessarily disqualify the DVCS concept, but current DVCS implementations do not yet have features to optimize this scenario. Not all companies or organizations keep the entirety of their source code in a single repository, but it's certainly not uncommon. That said, there is no reason that future DVCS implementations (perhaps in hybrid form) shouldn't excel in this scenario.
Even in these "massive repository" cases, it is sometimes possible to restructure the repository into a collection of smaller repositories (see OpenJDK). This allows users to work optimally with a full repository clone in the area or areas of the system that are relevant to them. A downside is that changesets cannot span repositories, so this is not always ideal. In any case, this scenario is relevant in a very limited number of cases (if you don't work at a very large company, it probably doesn't apply to you). Looking forward, it would take an army of developers several years to develop that much source code, and in a few years it will no longer be prohibitive to store repositories of this size on each client. If this problem doesn't affect you today, it probably never will.
- Wednesday, May 14, 2008
A DVCS Story
Seattle, 1990
Bob leaned back, stretched his arms and took in the view from his window. It was a lazy Sunday, and he'd just finished reading his messages on his favorite BBS. An aging Amiga decorated with pink stickers courtesy of Bob's daughter sat at his feet. His sleek new home phone was perched prominently on the corner of the desk, the cord snaking its way to the wall by way of a splitter shared with his modem.
As Bob surveyed the scene, his phone started to ring. Pausing a moment, he smiled, enjoying the harmonic ring of the new phone that he lovingly customized earlier that morning before answering.
"Hey Bob, it's Alice!" Alice, a co-worker, was one of Bob's closest friends. They had both graduated from the computer science program at the same school, managing to stay in touch over the years, usually by trading "war stories". Recently, Alice had joined Bob's group from another company. To the chagrin of Bob and his co-workers, she was brought in directly as a lead programmer. "Bah, it's just a title", Bob was fond of saying. He hid his resentment well when interacting with Alice socially.
"Bob, my life just changed forever." Bob's eyebrow slowly began to raise. His mind raced through the possibilities. Had she been promoted again? Was she moving on to another opportunity? I wonder if I'll make lead if she does!. On the verge of Bob's moment of imagination turning into an uncomfortable pause, Bob snapped out of it.
"Alice, my goodness, I've never heard you so excited."
"Okay, you know I'm not normally ahead of the technology curve," Alice continued. "But today, I just got a cellular telephone!".
Bob's eyes narrowed. A cellular phone? He'd recently seen a story about them on the evening news, and thought they were the most ridiculous thing he'd ever seen. Who on earth needs to make phone calls in the middle of the street?, Bob had thought to himself when he saw the story. Bob responded forcefully. "A cellular phone? Those crazy looking things with the huge antennas? Alice, you've got to be kidding me." Bob admiringly moved his hand across the smooth plastic shell of his phone's handset.
"I'm serious! I know they're a bit different from what you're used to, but to be honest I kind of like the styling. I didn't realize how much fun it could be to be on the cutting edge of technology. You should have seen the looks I got in the park down the street this morning!"
Bob rolled his eyes and interrupted. "Look, I'm sure it's fun to be mistaken for an FBI agent, but you can't possibly think this is rational. When the heck would you need to make a phone call in the middle of the street? I have two phones at home, and another at work. What else could I possible need?"
"I can see where you're coming from. I'm sure that to you it doesn't sound that different from your home phone. I mean, you pick up the phone, dial the numbers, and it connects to the other side. Nothing groundbreaking there. But it's so freeing to be able to talk anywhere. I can completely understand how this might seem like a novelty. To be honest, listening to myself explain it to you, it doesn't really sound that compelling. But imagine the possibilities! I'm always connected. I can have meetings in my car, and I can leave the office for a long lunch on a sunny day without worrying about missing out on anything important."
Bob chuckled, making sure he drew it out long enough to suggest that he'd heard enough. As the final note of his chuckle diminished, Bob tried to finish the conversation.
"Okay, Alice, I guess I can sort of see how that might be nice. But learning a whole new speed dial system, and working with a phone I don't understand just doesn't seem worth it for those few situations where it might be useful. I'm glad your happy with your purchase, in any case."
Alice shook her head slowly, and put in a last word before hanging up. "You'll see, Bob, just wait."
Seattle, Present Day
Bob was crouched beneath his desk, reaching for the mini-USB cable attached to the back of his computer. He plugged the other end into his battery-drained Blackberry, and its bright welcome screen came to life. Come on, come on he urged. After a brief delay, the phone's LCD dimmed and the network connected. A high pitched bell ring alerted him to a new message, and Bob quickly pressed the "Read" button. She remembered!, Bob thought to himself excitedly. His daughter had sent the text message announcing her SAT results, just as promised. I knew it!, thought Bob, proudly inspecting the score. She's smarter than her old man after all.
As Bob turned to his computer, he heard the familiar sound of his favorite classical melody. He had only last week taken an MP3 of the song and made a custom ring tone, something that impressed his daughter far more than his last promotion. Who's calling me during lunch?, he thought. Leaning back, Bob peered under the lip of his desk, eyeing the blazing white LCD screen of his phone, now perched atop his computer chassis. Recognizing the name on caller ID, he immediately grabbed the phone and answered.
It had been several months since Bob last spoke to Alice. The time between their phone conversations had grown progressively longer of late, and Bob was happy to hear her voice. It had been five years since Alice left for Silicon Valley to run the software group at a small search startup. Having moved his way up to middle management, he was always excited to hear from someone who still had their finger on the pulse of technology.
Alice got right to the point. "Bob, I am about to change your life." Bob sensed excitement in her voice. In the two decades they had known each other, Alice had embraced the role of technology evangelist, and Bob that of technology skeptic. They each enjoyed their roles, and Bob could tell immediately that Alice couldn't wait to tell him about the next big thing. He gathered up as much feigned skepticism as he could in a feeble attempt to mask his genuine curiousity, and offered a response.
"Alright, what is it this time?"
"DVCS. Distributed Version Control Systems", Alice responded. "We just migrated our entire source control system to Mercurial. I think in the first week using it we've already gained 100 hours of productivity. The developers love it. I've made it my mission to tell everyone I know."
Bob pursed his lips. She's got to be kidding, he thought, this is what she's so excited about?
Bob announced his skepticism. "Alice, Alice, Alice. I've been around a long time, and it's not quite so easy these days to put one past me. What are you really so excited about?". Alice laughed long and hard.
"Oh you old curmudgeon, here we go again. Look, we've gone through this before, and I am going to convince you that distributed version control is serious stuff. Why are you so skeptical already? What have you heard about it?"
"To be honest", he started, "I first started hearing about distributed version control when we hired on a new developer who'd been working on an open source project. He was almost as excited as you about it. He had some trouble explaining what was so great about it, so he sent the entire team a link to a video where the presenter bashed our current version control system for an hour. The guy who manages our VCS server was really offended. So as far as I can tell, distributed version control is only relevant in open source projects run by opinionated bullies who see diff & patch as a perfectly acceptable source control system. I'm an old corporate soul -- I haven't used patch since college, Alice, and I don't miss it."
"Ah, I know the video you're talking about," Alice said. "Forget about that. I'm going to make the DVCS case for you, right here, right now. Pretend you've never heard of it."
"You've got your work cut out for you. To be honest with you, nobody has ever really bothered to tell me what problem this thing solves for me."
"That's always the first question!", Alice said. "I've actually practiced this speech on a few other ex-colleagues, and everybody asks that question. Unfortunately, that's the wrong question to be asking. Instead of asking what problem it solves, you should be asking what new possibilities it offers. That's been the real win for us."
Bob leaned as far back as his chair would allow, and propped his feet up on his desk. He had been through this many times before, and he knew he was in for a long ride. He gave Alice her opening: "Alright, I'll admit to being slightly intrigued. But we're really happy with our current VCS, everybody here knows how to use it, and it handles everything we need without any issues."
Alice smiled widely. She normally had to work much harder than this to get Bob on the hook. She knew that beneath the curmudgeon, he still had a passion for technology. It was her job to make sure that he didn't lose that, and she embraced it proudly.
"Okay, I'm going to ask you a very important question, and I expect an honest answer. Has your source control server ever had any downtime?"
Bob thought for a moment. "I suppose, sure, but nothing more than the usual. I mean, our server is up all the time, really. We plan all our maintenance for the weekends, so other than the occassional hiccup ..."
Before he could finish his sentence, Alice interrupted. "Yes! The hiccups! Those 15 minutes of downtime because of the urgent security patch, the 10 minutes of slowness when two machines are pulling down copies of the repository! They happen, and you shrug them off because it's just the way it is. But do you know how much productivity you lose when somebody loses their train of thought because the server isn't available for their 'annotate' command?"
"Look, I get what you're saying. But really, these hiccups aren't very common", Bob retorted. "I mean, even if they were, you have to compare them against the cost of having everybody switch over to a whole new system. The last time we did that was a total nightmare. One of our developers even quit over it! And that didn't involve explaining to everyone this crazy new distributed source control model. I'd have a revolution on my hands."
Alice responded reassuringly. "I completely understand. I had my own reservations until very recently. One of our developers had been trying to get us to adopt a distributed source control system for a few months. She liked DVCS so much that she had found some way to use it on her local system and still interoperate with our centralized system. I was skeptical for the same reasons as you, though, Bob. I convinced myself that the server was more reliable than it was, and I tried to forget that our VPN would sometimes be down all weekend, forcing developers to come into work to make fixes. Like you, I didn't really think it was worth it."
Bob interjected impatiently. "OK, so what changed your mind?"
"Last week our system administrator told us they were going to be rebooting the server during lunch time. This wasn't a big deal at the time. Everybody had advance warning, and they made all of their checkins before lunch, just in case something went wrong. When we returned, we found our source control system in pieces, its parts splayed all over the desk. The admin had installed a few security patches, and the server wasn't booting. He said he thought it might be bad memory, but he wasn't really sure."
Alice's horror story had reminded Bob of a similar situation a year ago when something remarkably similar happened at his own company. They expected the server to be repaired in 30 minutes, but it dragged to 60, 90, then 120 minutes. By 3:30 PM most of the office had cleared out, leaving a few frustrated developers emailing files to each other as they raced towards the important deadline which was now in jeopardy.
"Bob, are you still there?" Bob was growing increasingly nervous as he enumerated in his mind the many things that can go wrong with a server.
"Yes, yes, sorry, you just reminded me of something. Please, go on."
"Okay, so the admin couldn't give us an ETA on the repair. We were feeling really helpless, and called a team meeting to decide on a protocol for getting work done while the server was down. Fortunately, one of our team members had a plan. You remember that developer I told you about earlier who was using the DVCS on her own system?" Alice was talking excitedly now, and didn't bother to wait for Bob's answer. "So, she immediately took control of the meeting, and laid out the plan. She had created a complete copy of the source system just before it went down, you know, just in case of emergency, and it was checked into the DVCS repository on her system. She gave the group some brief instruction on how to copy the repository, and in less than thirty minutes our entire team was back to work with their own copies of the source."
"Wait a second," Bob interrupted. "You're telling me that you temporarily switched your entire source control system over in a half hour in the middle of an emergency? Forgive me for being skeptical, but I'm not buying it. For starters, what happens to your revision history?"
"When she took a backup of the centralized system, she had done it via a script that preserved the entire revision history. She said that migrating from centralized systems was common, and writing the script was a breeze since most of the 'heavy lifting' was built into Mercurial."
Bob made no attempt to mask his skepticism. "Okay, I get that, and it's all very impressive, really. But what about all the work you did while the server was down? How did you get it back to the main repository once it was revived?"
"That's the best part. We didn't." Alice heard a squeak on the other end of the phone, and knowing that Bob was about to ask her if she was crazy, continued without hesitation. "The admin kept working on the server throughout the day. During this time, after everybody had their servers up ..."
Bob cut her off. "Wait, what? Servers? I thought they just checked out the source code from their co-worker's machine? Where did all of these servers come from?"
"That's exactly the difference between DVCS and a centralized system. You don't check out the source code from the server, you clone the repository from the server to your local machine. Once you've made the copy, you're a server too."
Bob was finding this DVCS concept more and more ridiculous as Alice went on. "Wait, wait, wait. Last I checked you had a team of 20 developers."
"We're up to 25 now", Alice corrected.
"And you're telling me that in response to a temporary server outage, you created 25 separate source control servers, and that's somehow a good thing? How on earth does anybody know what state the source code is in? Please tell me I'm missing something. I'm starting to believe you've gone mad."
"I know, it sounds like chaos. And frankly, it could be if we let it. That's where process comes in. In this case, everybody initially cloned the repository from the same server. So, we designated that as the official server where everybody shared their changes during the downtime."
"Aha!" Bob was sure that by the end of this conversation, he'd would bring Alice back down to earth. "So you're taking this confusing and complicated distributed system, and making it act just like a centralized system! What happens when that server goes down? You're no better off!"
Alice let out a slightly annoyed chuckle. "Not exactly. If the server we've designated as our central server goes down, which it surely will at some point, so what? That's yet another wonderful feature of a distributed system. We've got backups of the code all over the place, without even trying. Every developer's server contains a backup copy of most or all of the code from the shared repository. So, if the main server goes down, we can designate another server as the central server temporarily, or we can not worry about it, since developers can make checkins to their own servers, only sharing changes when needed."
"Wait a second!", Bob interrupted. "If everybody is making checkins to separate servers, how do you get those changes back together in the so-called central server once it's back up?"
"That, my skeptical friend, is called a merge."
"Oh, great," Bob murmered, "you mean like merging branches?"
"That's exactly what I mean. In fact, repositories and branches are pretty much the same thing in Mercurial. Whether or not there is a central server, everybody commits changes to their own servers. Period. That's the only place you can commit to. When you want to combine your changesets with those from your repository, you need to merge them."
Bob groaned. "Yuck! Branches are such a huge pain. And you're seriously suggesting that making people merge branches every single time they want to share changes is a good thing? I'm serious, Alice, if you need some help, I know some really good people."
"You actually have a point there. We used to think branches were a pain too. In fact, we only kept a single branch aside from our main repository because merging branches was such a pain. This is another huge difference between distributed and centralized systems. Because you merge branches all the time, distributed version control systems make it incredibly easy. Easy to create branches, easy to share them. Flexibility, Bob, that's what it's all about."
"I don't see what's so flexible about chaos. All this complexity, just to compensate for a few minor blips of downtime? I'm not buying it."
"Yes, initially it was just a response to the downtime." Alice sensed that she was losing Bob, a phenomenon she was quite familiar with. She softened her tone. "On the day we started using Mercurial, it was going to be a temporary thing. My thought was the same as yours, that it was handy as a temporary crutch, but too complex to keep around. However, our temporary outage ended up being not-so-temporary."
"What a nightmare." Bob said. As Alice spoke, he had been working hard trying to convince himself that this was an incredibly uncommon scenario. Geez, I guess I'd better confirm that we're backing our server up daily, Bob thought. And I wonder if I can get budget for a backup source control server?
Bob had neglected to mention that he had only a year ago spent $25,000 on licenses for his company's source control software. While he was proud to now be using the same source control system as big companies like Google and Microsoft, it left him with very little budget for hardware. As a result, their source server was running on an underpowered machine, and it was not uncommon to hear complaints about its sluggishness. Bob had also recently received the bill to renew his yearly contract for support and upgrades. That was going to take another $5,000 out of his budget, at the cost of some much needed upgrades to his developers' machines.
Alice continued. "It turned out that our source server's RAID card had died, and we were going to be down for three more days before the part could be delivered. It was in these three days that we discovered the great value of DVCS, not before. One group of our developers, for example, had previously been trying to do 'buddy builds', where they share their changes with each other before committing to the main repository. They had initially tried to coordinate this work on a separate branch, but the pain of merging so frequently was killing them, not to mention the fact that they often forgot to make their changes on the branch. Then they started emailing each other patches, but this, too was prohibitively cumbersome. To get around this, they ended up sharing their source directories to each other directly, and emailing the names of files that needed to be copied over. Changes got lost all the time, it was a total mess."
Alice was talking quickly now, hoping to stop Bob from interrupting. "The afternoon we started using Mercurial, however, all this stopped. Each developer would finish their work, commit the relevant changes to their own servers, and then make the changesets available for their colleagues to pull and merge into their repositories."
Alice had said the magic word. "Ack! More merging!"
"Yes! More merging! I'm making it sound too complicated though. You only have to actually merge files if the changes you're pulling overlap with your own changes. When you do inevitably have to merge, though, you do so after you've checked in your own changes. Thus, even if the merge goes awry, you never lose your work because of it, something that is extremely important to us. Regardless, merging is so simple and fast that it doesn't even matter."
"Well, my source control system supports branches," Bob replied, "why couldn't we do this?"
"Many of the new and interesting workflows enabled by DVCS are possible in centralized systems, but are simply too much of a pain in practice to have any chance of adoption. Plus, they can grow organically. You don't need to explicitly decide to start working on a branch for buddy builds, you can just make some checkins and choose where to send them. You're always working on a branch, in effect, since your local repository is a branch. Not to mention the fact that you don't need to ask a server administrator for permission to create a branch."
"Okay, so you can do buddy builds", protested Bob, "and if the server hiccups it's not that big of a deal. But so what? Those things aren't very useful for us. People around here just come in and do their work. Even if these benefits are as profound as you suggest, I really don't think it's worth training my entire team on some complicated new source control system."
Alice hadn't expected to sway Bob in only a single conversation, but she was nonetheless growing a bit frustrated with her progress. She had used the same sales pitch on others to great effect, but there was something missing. It just wasn't having the same effect on Bob. She continued with her defense. "The things I've mentioned so far are 'big deals'. Having a server down even for only a minute or two is a huge deal, in fact. Developer time is expensive, and knocking a developer out of the zone because of a server hiccup is, in my mind, totally and completely unacceptable. Heck, even a bit of server latency can ruin a developer's ability to stay in the zone."
"One thing I think you're forgetting is that we're using the best centralized system money can buy," Bob said. "I'm sure our server is significantly more speedy and reliable than your free, open source centralized server was. Everybody here is really happy with it. I've even heard them bragging about it to friends at other companies."
"Okay," said Alice. "I'll play by your rules, even though it's unfair. Let's assume that your server is up 100% of the time, never any downtime, never a hiccup, never a bit of slowness. Oh, and everybody has constant access to it. DVCS still offers advantages beyond simply not needing live server access to get work done."
"Oh? Like what?", queried Bob, now feeling proud of his progress towards the goal of dragging Alice back into reality.
"Like no more 'check-in races', for starters. It's a common complaint every place I've worked. The first person to commit their changes to the central server avoids having to be the one who merges changes. Thus, the next committer is forced to perform the merge, whethero r not they are the best person to merge the changes. I've actually seen people switch workstations for ten minutes so the changes could be merged by the proper resource. How ridiculous is that? With DVCS, because changes are committed locally, nobody is ever denied the ability to do so because of someone else's changes. And once they are committed, anyone can pull the changes together and merge them. If you had two developers racing towards a deadline, another less busy developer could volunteer to do all of the change merging."
"Yeah, well," Bob smirked, "I guess our needs aren't quite as sophisticated as yours. Don't get me wrong, some of the stuff you're talking about sounds interesting, and hey, if we weren't using the best centralized system that money can buy, I might be more interested. Really, whatever flaws there are in our current system, we've made the necessary adjustments. People here are happy. We're doing great work, the team gets along well. I just don't see the case for DVCS."
Alice realized that there was no more she could do. She bit her lip, resisting the urge to respond defensively. Planting the seed, she realized, was the best she could hope for with Bob. She leaned back in her chair, shook her head softly, and ended the conversation. "You'll see, Bob, just wait."
- Wednesday, October 03, 2007
BCrypt.net - Strong Password Hashing for .NET and Mono
Using raw hash functions to authenticate passwords is as naive as using unsalted hash functions. Don’t.
Thomas PtacekBCrypt.net is an implementation of OpenBSD's Blowfish-based password hashing code, described in "A Future-Adaptable Password Scheme" by Niels Provos and David Mazières. It is a direct port of jBCrypt by Damien Miller, and is thus released under the same BSD-style license. The code is fully managed and should work with any little-endian CLI implementation -- it has been tested with Microsoft .NET and Mono.
Why BCrypt?
Most popular password storage schemes are based on fast hashing algorithms such as MD5 and SHA-1. BCrypt is a computationally expensive adaptive hashing scheme which utilizes the Blowfish block cipher. It is ideally suited for password storage, as its slow initialization time severely limits the effectiveness of brute force password cracking attempts. How much overhead it adds is configurable (that's the adaptive part), so the computational resources required to test a password candidate can grow along with advancements in hardware capabilities.
Usage
Using BCrypt in your code is very simple:
// Pass a logRounds parameter to GenerateSalt to explicitly specify the // amount of resources required to check the password. The work factor // increases exponentially, so each increment is twice as much work. If // omitted, a default of 10 is used. string hashed = BCrypt.HashPassword(password, BCrypt.GenerateSalt(12)); // Check the password. bool matches = BCrypt.CheckPassword(candidate, hashed);
The source code is available via the links below. You can download the packaged version, which includes an NUnit-based test suite, or download the source directly via
BCrypt.cs.Attachments
- Tuesday, September 11, 2007
A Better .NET Regular Expression Tester
Because the only other online tool I could find for testing .NET regular expressions was slow and covered with ads, I decided to write a simple AJAX regular expression tester. It's certainly not fancy, but it works for me.
- Saturday, March 24, 2007
Emulating Vista's User Directory Structure on XP
Perhaps the best new "feature" of Windows Vista is the Unix-inspired re-organization of user home directories. That is, instead of placing user-specific files under the abhorrent
Documents and Settingsdirectory, each user's folder resides under the command-line-friendlyUsersdirectory in the root system drive. Even better, your music is no longer considered a document. It's hard to argue against the fact thatC:\Users\Derek\Musicis cleaner thanC:\Documents and Settings\Derek\My Documents\My Music(or in some cases, the 8.3 friendlyC:\DOCUME~1\Derek\MYDOCU~1\MYMUSI~1, ack!).I use Vista on a couple of my computers, and I've grown to appreciate the new folder structure. Gone are the days of hacking together your own home directory structures just to try and cope with the unfriendly madness (who hasn't created a folder or two under
C:\just to make them typable?). Unfortunately, I still use several machines running XP, so it's hard to rely on this sane structure -- leading to grotesque hacks such as functions calledghetto-dosifyin my.emacsfile (I wish I were kidding).There are some additional advantages to utilizing the built-in home directory structure rather than creating your own. For example, the Run dialog resolves path reference relative to your home directory. So, if you type
Start -> Run -> ., your home directory will open. Sub-folders can be accessed just as easily -- typingStart -> Run -> dev\blogopens my blog source tree. Also, many tools (backup, indexing, etc.) will only consider content in your actual profile directory (the one that Windows knows about).Another nice benefit is that the compact path structure is Cygwin-friendly, so you can safely set your
HOMEenvironment variable (also used by Emacs, of course). Thus,~/Desktopin Cygwin matches~\Desktopin Powershell. I keep myHOMEdirectory synced across many machines (Vista, XP, and Linux) using Subversion, and it's nice being able to keep everything in the same place.Fortunately, we can make some relatively simple tweaks to our XP systems to make the structure more closely resemble that of our Vista and Unix friends.
Junction, Junction, What's your Function?
Chances are you've been running your operating system for a while, and you've got a bunch of clutter under your
Documents and Settingsfolder. More importantly, you probably have a bunch of applications with stored absolute paths under that folder, not to mention all of those common dialog pointers. So obviously, just renaming theDocuments and Settingsfolder is going to cause more problems than it'll solve.If XP had symlinks (another feature added in Vista), of course, we'd have an easy solution. We could simply type
ln -s DOCUME~1 Users, and we'd be off to a good start. We could update environment variables to use our new path structure, and we wouldn't break the old path references. Fortunately, XP does support directory symlinks ... well, sort of. XP's default file system, NTFS, supports something called a "junction point". For our needs, it'll do.While junctions are supported by NTFS, there is no built-in tool for creating them. Enter Junction, a simple command line tool for creating (and viewing targets of) NTFS junction points. With the tool located in your PATH, run the following command sequence to create the link:
C:\Documents and Settings\Derek>cd \ C:\>junction Users "Documents and Settings" Junction v1.04 - Windows junction creator and reparse point viewer Copyright (C) 2000-2005 Mark Russinovich Systems Internals - http://www.sysinternals.com Created: C:\Users Targetted at: C:\Documents and Settings
Special Folders
At this point, we can access our home directory under the
Userspath, but unfortunately Windows has no idea that we've actually intended to change our home directory. For example, starting a fresh command line session stills shows that old, annoyingDocuments and Settingspath.To fix this, we need to dig into the registry. Under
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList, there should be a number of keys in the formS-x-x-xx-.... Browse through the keys until you find the one with aProfileImagePathvalue ending with your user name (e.g.%SystemDrive%\Documents and Settings\Derek). Change this value to use the new path (e.g.%SystemDrive%\Users\Derek). You will need to restart Windows for it to pick up the change.
After rebooting, you should default to the new
Userspath when starting a fresh console session. And because it's actually a junction point, all of your existing files should be right there with you. This is a nice start, but there are still some annoyances:My Documentsshould beDocuments,My Music(and its similarly named friends) should also drop the silly "My" prefix, and ascend one directory (unless you think music is a document, of course).Again, junction points are exactly what we need. Use the following command sequence to set them up:
C:\Users\Derek>junction Documents "My Documents" ... C:\Users\Derek>for %f in (Music Pictures Videos) do junction %f "My Documents\My %f" ...
You might also want to hide the
My Documentsfolder, so you don't see two references to the same folder when browsing your home directory. You could just make it hidden, but if you're like me you have hidden folder display enabled. In this case, we just need to make it hidden and give it the system attribute using the following command:attrib +S +H "My Documents".Our paths are now much prettier, but we do get stuck with the ugly default folder icon. Fortunately, you can right click on the folder whose icon you wish to update, click Properties, and change the icon in the Customize tab (you must do this before the next step, where they are assigned as special folders). I use the normal icons in
shell32.dllforDocumentsand the like. For my root home directory (Derek), I useuser-home.ico(attached to this post) from the Tango Desktop Project. The icon was conveniently translated to ICO format by Ben Brown. You might receive a warning about "enabling task folders", this is safe to ignore.As with the
Usersjunction point, we need to tell Windows about our new paths. For this, we'll use a Microsoft tool called Tweak UI. Install the tool, start it (Start -> Run -> tweakui), and select theSpecial Foldersitem underMy Computerin the tree control. Here you can select each path for which we created a junction, and assign the new path.
Common Dialogs
We've solved our ugly path problem, but overall usability still leaves something to be desired. For example, our common dialogs all have direct pointers to our
Documentsfolder, but it's still rather painful to get to our root home directory to, say, open a music file.Fortunately, Tweak UI comes in handy here as well. Under
Common Dialogs, selectPlaces Bar, and chooseCustom Places Bar. Here, you can define up to five paths that you will have easy access to using the common file dialog used by most Windows applications.
Here's the new dialog in action:

Another nice convenience is having a home directory link in your start menu, as below (simply drag the link there to create the shortcut):

Attachments
- Saturday, February 24, 2007
Screencast: Formatting a CSS File with Emacs
When I wrote The Case for Emacs, the main point I was attempting to convey was that Emacs is an amazingly effective editor even without the customizations it's so famous for. Right out of the box, you can do some pretty incredible things with its broad set of built-in commands. And if a simple Emacs configuration is good enough for Donald Knuth, it's good enough for us, right?
The other day I came across a CSS file that was in need of some formatting tweaks, so I slapped together a quick macro and fixed it up. Nothing special, but it got me thinking -- why not take this example to an extreme, creating the ugliest CSS file of all time, and create a screencast of cleaning the file up using a bare bones Emacs configuration? Well, here we are.
The CSS file you see getting the Emacs treatment in the screencast is not real. I intentionally created about the ugliest file I could, butchering indentation, casing, structure, mixing tabs and spaces ... you name it (the CSS file is attached, if you're especially curious).
I'm not necessarily suggesting that Emacs is the best tool for the specific task we're performing here, but it's a pretty broadly understood file format, so I thought it would be interesting way to demonstrate some of Emacs core functionality. If your favorite CSS editor has a magical "auto-format" button, by all means use it -- it's better than this strategy to be sure, but it's a lot less flexible!
The Emacs instance you see in the screencast is extremely close to a stock distribution. I made the following modifications, in the interest of making it easier to see what's going on:
- Disabled the tool bar and menu bar.
- Loaded mwe-log-commands.el, for demonstrating the keys pressed.
- Enabled the
downcase-regioncommand, normally disabled. - Loaded a simple CSS mode for font locking.
- Changed the default font to a narrower version, Consolas.
On with the show.
Here's a breakdown of how we attacked the file.
- Converted tabs to spaces with
untabify. This is a good first step when encountering a file as hideous as this one. - Normalized all spacing by compressing multiple spaces and newlines to a single space, using a regular expression replacement.
- Added newlines after all open braces (
{) using a macro. Also usedjust-one-spacebefore the brace to make the spacing consistent. - Added newlines after all semi-colons, again using a macro. Also compressed space in front of the semi-colons, and added a space after the colon delimiting the property from the value.
- Used another macro for adding newlines before and
after the close braces. I added the extra spaces before using
delete-blank-linesbecause the spacing varied based on whether or not a trailing semi-colon was present. - Killed a few empty blocks, using
backward-paragraphto quickly navigate blocks. - Executed another macro to make the property name case consistent (made them all lower-case).
- Sorted properties by name within each block using a
macro to regionize the block and call
sort-lines. - Compressed expanded forms of margin and padding specification to a single line, using a multi-line regular expression replacement with a reference to a captured group in the replacement string.
- Performed a few simple manual cleanups, and updated the messaging.
Attachments
- Friday, January 26, 2007
Emacs Hack #3: Compile Emacs from CVS on Windows
In previous hacks, we learned how to install and configure stable binary builds of Emacs. While the stable version (currently 21) is the best version to run for most users, you may be brave or curious enough to try one of the newest pre-release versions, 22 or 23. Because there are no official binary builds of Emacs beyond version 21, you will either need to install it from an unofficial source, or compile it yourself. This hack covers the latter.
Get the Source
The first step in compiling your own version of Emacs is getting the source. It is sometimes possible to obtain gzipped archives of the Emacs source at a given point, but it's typically more convenient to grab the source from CVS (this also ensures that you're using the most up-to-date version of the given branch).
If you don't already have CVS installed, fetch a recent version of
cvs.exefrom its distribution site, and place it somewhere in your path.Next up, we pay a visit to Savannah, the GNU development site. The Emacs project page contains details on how to obtain the sources. At the time of this writing, the following command will download the HEAD version of the code (Emacs 22).
cvs -z3 -d:pserver:anonymous@cvs.savannah.gnu.org:/sources/emacs co emacs
If you wish to build a version of Emacs other than 22, you will need to pass a tag name to your checkout command (or a subsequent update command). For example, adding
-r EMACS_21_3would check out the code for the Emacs 21.3 release (the stable release at the time of this writing). For a sneak preview of (the very unstable) Emacs 23, use-r emacs-unicode-2.After checking out the sources,
cdto thentdirectory. As a safety measure, issue acvs up -kbcommand here to make sure that all files in the directory have proper line endings. At this point, the source tree is ready to be built.Prerequisites
The Emacs build process requires a handful of tools that probably don't already exist on your system. These include GNU versions of tools with Windows equivalents (
cp,rm), as well as tools that are often unique to a GNU system (makeinfo). If you want your build to support images, you will also need a variety of libraries for rendering image formats. All of these tools are available from GnuWin32, a project which provides native Windows binaries of GNU tools.The packages you will need to install are listed below. In each case, I suggest installing the latest "setup" package, which will run an installer and place the binaries in a consistent location.
- CoreUtils
- This package contains a variety of tools, and is
required in order to build Emacs. Specifically, it contains
cpandrm. - TexInfo
- This package contains
makeinfo, which generates Info documentation from the texinfo sources in CVS. While technically optional, I strongly recommend installing it.
If you wish to build Emacs with image support (optional), you will also need the following image libraries (from the GnuWin32 Packages page:
At the time of this writing, the Xpm library is missing a required header file (
simx.h). You can either get it from the source package, or download it from the attachments in this post. This file should be placed in the GnuWin32includedirectory.Compiling with MinGW (GCC)
The simplest way to build Emacs is using MinGW, a collection of freely available tools for building native Windows binaries. The package includes the GNU C compiler, a port of
make, and various header files (including those from the Win32 API, such aswindows.h).The first step, of course, is to obtain the MinGW distribution if you don't already have it installed. The current version at the time of this writing is 5.1.3, and is downloadable at SourceForge.net (via the project page). Install to the location of your choosing, being sure to select the following components:
- MinGW base tools
- MinGW Make
After installing MinGW, add its
bindirectory to your path using a normal Windows Command Prompt session (set PATH=%PATH%;C:\MinGW\bin). Next, runconfigure.batfrom thentdirectory as follows to build aMakefile:configure.bat --no-debug --with-gcc
If you installed the libraries for image support, you will also need to pass the appropriate include path to the configuration script. If you installed to a directory with a space (the default), use the DOS name of the directory as demonstrated below:
configure.bat --cflags -IC:\Progra~1\GnuWin32\include --no-debug --with-gcc
If all goes well, you'll get a message telling you to run
gmaketo build Emacs. We're not quite ready for that yet, however. Because we got our source from CVS, we need to perform a "bootstrap" build. This creates a bootstrap Emacs binary to build autoloads and byte compile the Elisp (.el) files in the distribution. Begin the build process as below (mingw32-makeis the MinGW name forgmake):mingw32-make bootstrap
After the bootstrap completes, you're ready to build the source code. You'll also want to build the info files (make sure
makeinfo.exeis in your path) before installing. Run the following commands in sequence to finish compiling and install your Emacs build.mingw32-make info mingw32-make mingw32-make install
By default, Emacs will be installed "in place". If you prefer to install it somewhere else, run
configure.batwith a--prefix <dir>argument pointing to your preferred installation directory.Compiling with MSVC
The most popular C/C++ compiler for Windows is MSVC, Microsoft's Visual C++ compiler. Traditionally, this compiler was only available commercially, when purchased as part of Microsoft's Visual Studio development product. In recent years, however, Microsoft has made simple versions of the compiler available free of charge. Fortunately, the simple versions suffice for building Emacs.
At the time of this writing, the VC8 compiler (part of Visual Studio 2005) is the most recent version available. Unfortunately, Emacs cannot currently built with this version of the compiler without modifying the source. The only other freely available version of the MSVC compiler was made available as part of the Visual C++ Toolkit 2003. Microsoft no longer distributes this compiler (it was replaced by Visual C++ 2005 Express Edition), but if you already have it installed, or happen to have downloaded it previously (the file name is
VCToolkitSetup.exe), you can proceed to build Emacs using this compiler. If you have a commercial version of Visual Studio 2003 installed, the steps should be very similar.The Visual C++ Toolkit installation is very basic. In order to build Emacs, you need a variety of header files in addition to those packaged with the toolkit. To obtain these, you will need to download and install a (pre-Vista) version of Microsoft's Platform SDK. Because you only need a base set of components, the Web Install method is probably optimal. The following screen shot shows the components you need to have selected:

Next, add
cl.exeto your PATH in a normal Windows Command Prompt window. The easiest way to do this is to run thevcvars32.batscript packaged with the download, e.g.:"%ProgramFiles%\Microsoft Visual C++ Toolkit 2003\vcvars32.bat"
Next, you need to add the Platform SDK paths to your environment. For example:
set SDKROOT=%ProgramFiles%\Microsoft Platform SDK for Windows Server 2003 R2 set INCLUDE=%INCLUDE%;%SDKROOT%\Include set LIB=%LIB%;%SDKROOT%\Lib
If you are building with image support, you'll also need to add the GnuWin32 paths:
set INCLUDE=%INCLUDE%;%ProgramFiles%\GnuWin32\include set LIB=%LIB%;%ProgramFiles%\GnuWin32\lib
Normally we'd be done at this point, but we're still missing a couple of pieces. The first is
setargv.obj. Fortunately it's only missing in binary form -- it can be generated from the Platform SDK's CRT source directory. To build the required file and place it in the Platform SDK's lib directory, run the following command:cl /c /D_CRTBLD /I"%SDKROOT%\src\crt" /Fo"%SDKROOT%\lib\setargv.obj" \ "%SDKROOT%\src\crt\setargv.cWe're also missing a few essential binaries the build process expects:
nmake.exe,rc.exe, andlib.exe. All of these files are packaged with commercial versions of Visual Studio, but you'll need to find them elsewhere if you're using the Toolkit. The first two can be obtained free of charge by installing the .NET Framework SDK. Make sure you add itsbindirectory to your path after installing.lib.exeis simply an alias forlink /lib, so a simple batch file (lib.bat) will suffice. A sample is attached to this post -- place it somewhere in your PATH.Finally, we're ready to build Emacs using MSVC. As with the MinGW build, the process begins with running
configure.bat:configure.bat --no-debug --with-msvc
Things are pretty straighforward from here. As with the MinGW build steps, we need to start with a bootstrap build, after which we can build the complete distribution. Refer to the MinGW steps for details:
nmake bootstrap nmake info nmake nmake install
Optimized Builds
One of the advantages of compiling from source yourself is that you can optimize builds for your platform. For example, if you're using a modern AMD or Intel processor with SSE extensions, you can enable the compiler to generate optimized code for your processor. To build an optimized version of Emacs, you will need to pass arguments to the compiler by way of
configure.bat's--cflagsargument. To build a version of Emacs optimized for an SSE2-capable Athlon or Pentium 4 processor with MSVC, for example, you would run the script as follows:configure.bat --no-debug --with-msvc --cflags /O2 --cflags /G7 --cflags /arch:SSE2
For a similarly optimized build with MinGW (GCC), use:
configure.bat --no-debug --with-gcc --cflags -msse2 --cflags -O3
Optimized builds will typically be incompatible with older processors that do not support the selected extensions (in most cases, they will crash at runtime). If you are creating an Emacs build to share, minimize optimizations that require specific processor features such as SSE.
Final Steps
If you built Emacs with image support, you need to copy the runtime DLLs to Emacs'
binpath (or your system path, if compiling only for a single system). You can copy them manually, or script it as below (specific file names may vary slightly if versions change):for %f in (giflib4 jpeg62 libpng13 libtiff3 xpm4 zlib1) do \ echo copy "%ProgramFiles%\GnuWin32\bin\%f.dll" ..\binAttachments
- Friday, January 05, 2007
Emacs Hack #2: Manage Emacs Instances with gnuserv
If you followed Hack 1, you can now launch Emacs and open files from within the interface. On Windows (and in some other environments), you can also drag files to Emacs from your window manager.
But what if you want to send a file to Emacs from a console session, or via a shortcut? You could invoke Emacs using
runemacs(or create a shortcut to it), but every time you do that you end up with a whole extra instance of Emacs -- a unique operating system process which is independent from any previous instance(s).In some cases, this might be what you want. In the rare case of an Emacs crash, for example, you would not lose any active buffers in the other instances (note that Emacs' auto-save facility would probably mitigate the damage here). Most of the time, however, you'd rather open the file in an existing instance of Emacs. This has a number of advantages:
- Memory is conserved by sharing a single process.
- Shortened start-up time since Emacs is already loaded.
- All open buffers in the shared instance can be quickly accessed from any Emacs frame.
- The buffer list can be used to see every open file across frames, and to perform actions on them.
- Dynamic abbreviations (covered in a future hack) can be sourced from a larger set of files.
- Contention issues (multiple processes accessing the same file) are avoided.
Fortunately, there is a small client / server program called gnuserv which enables us to do just this.
Installing
Installing gnuserv consists of two parts -- some platform-specific binaries (
gnuserv,gnudoit,gnuclientand, on Windows,gnuclientw), and an Emacs Lisp file which we'll load into Emacs. The Lisp code will spawn thegnuservprocess after Emacs has started, listening for received commands ongnuserv's standard output stream.On Windows, the first step is to unpack the ZIP file containing the gnuserv binaries (on Unix and derivatives, simply install the gnuserv package for your distribution). The Windows port is available at Guy Gascoigne - Piggford's site, or attached to this post if the site is unavailable. Extract the binaries to a directory in your
%PATH%(see Hack 1 for some tips on setting up your path), e.g.C:\Program Files\gnuserv.Next, we need to install the Emacs Lisp portion of gnuserv. This involves placing the
gnuserv.elfile (attached to this post) somewhere in Emacs' load path. The easiest way to do this is to copy the file to thesite-lispdirectory, typically located under your base Emacs installation directory (Windows) or at/usr/local/share/emacs/site-lisp(Unix and its derivatives). Files placed here are automatically available to Emacs.Configuring
We've installed all of the pieces necessary for gnuserv, but nothing has really changed in terms of Emacs' behavior. Even though we've added
gnuserv.elto its load path, we still need to instruct Emacs to load the file. We'll do this by adding some initialization code to our.emacsfile (see Hack 1).From within Emacs, type
C-x C-f(that's the control key plus x, followed by the control key plus f -- you can keep the control key pressed the whole time). This invokes Emacs'find-filefunction, which prompts you for a file path to open using the minibuffer (the bottom line in the frame). Emacs will present you with a suggested path. Type~/.emacs, pressTab(Emacs will remove the suggested path as part of its completion algorithm), and then pressEnterto open the file.If you're a Windows user, this path might look strange. The tilde (
~) character is simply a handy shortcut for the folder represented by your%HOME%environment variable, and the forward slash is Emacs' preferred path separator (that is,C:/Usersis preferred toC:\Users). The path syntax for the home directory is not only convenient to type, it also makes your Emacs configuration more portable. For example,~/Desktopalways maps to your Windows desktop, regardless of your login name or other machine-specific details.At this point, you should be staring at a blank buffer. Add the following lines to your
.emacsfile:(require 'gnuserv) (gnuserv-start) (setq gnuserv-frame (selected-frame))
This might look somewhat strange -- it's actually just some simple Elisp which tells Emacs to load and start gnuserv. The first line calls the built-in
requirefunction, which tells Emacs to load gnuserv if it hasn't already done so (this causes Emacs to go findgnuserv.elin our load path). The next line calls a function ingnuserv.el, which starts (or restarts) thegnuservprocess. The last line is an optional customization which tells Emacs that it should open new files from the clients of gnuserv in its startup frame (if you find that you prefer to have multiple Emacs frames, comment this line out by prefixing it with a;character, or delete it entirely).Emacs reads its
.emacsfile and evaluates these lines each time it starts up. You can either restart Emacs now, or pressM-x(M stands for meta, which is typicallyAlt), typeeval-buffer, and pressEnter(this tells Emacs to evaluate the contents of the current buffer without requiring a restart).Launching Emacs
At this point, you should have an Emacs instance running with
gnuservstarted. So far, so good. Now it's time to use the client programs which came with the gnuserv distribution.gnuclient and gnuclientw
The
gnuclientprogram can be used to open a file in a running instance of Emacs. For example, typinggnuclient READMEfrom a console session will open a new buffer for the fileREADME(even if it doesn't yet exist). If Emacs is not currently running, it will automatically be started (at which point it will evaluate its.emacsfile, loading gnuserv).You may notice that when a file is opened using
gnuclient, a message is displayed in Emacs' minibuffer: "When done with a buffer, type C-x #.". You may also have noticed that when you invokegnuclient, it does not immediately exit (that is, you can't continue using the console). What's happening is that thegnuclientprocess is waiting for you to finish working on the file -- when you pressC-x #from within Emacs, thegnuclientprocess exits. This is convenient in many cases -- for example, when editing a Subversion commit message. If the process returned immediately, Subversion would not be able to read the message you typed in to Emacs because there is no longer any connection between the editor and the launcher.In other cases, however, you just want to open a file in Emacs and continue using your console session immediately. In this case, you can use
gnuclientw(Windows only -- on other platforms, usegnuclient -q). This is a variation ofgnuclientwhich returns immediately. It's also compiled as a Windows application, so it's ideal for creating Windows shortcuts (gnuclientopens an unneeded console window in these cases). I typically add a shortcut tognuclientwin%USERPROFILE%\SendToand%USERPROFILE%\Desktop. In practice, I nearly always usegnuclientw(with or without arguments) to start Emacs on Windows (because it's in your path, it can also be executed from the Run dialog). If you find yourself runninggnuclientwa lot when Emacs is already open, try adding a-xoption to "top" the Emacs frame (make it visible if it's hidden).gnudoit
The last (and in this case, the least) client program is
gnudoit, which can be used to evaluate an Elisp form. For example,gnudoit (list-buffers)will open a window within your Emacs frame listing the currently open buffers. This command is deprecated because the same behavior can be achieved usinggnuclient-- on Windows, usegnuclient -e (list-buffers). On other platforms, typegnuclient -batch -eval (list-buffers). In practice, you probably won't be running these forms very often.Attachments
- Saturday, December 09, 2006
Emacs Hack #1: Install Emacs on Windows
Installing Emacs on most platforms is a common and well supported operation. On Linux, for example, it's typically installed via the package management system for the particular distribution you've chosen. This hack covers installing Emacs on Windows, where it's a bit more challenging.
There are some shortcuts to getting Emacs installed and running on Windows, but we're going to walk through the steps from scratch. There are a number of "decision points" throughout the installation. When we reach those, I'll give a specific recommendation as well as provide you with the alternate options.
Which Emacs?
The first decision greets us before we even embark on the installation process. There are two primary flavors of Emacs -- GNU Emacs, and XEmacs. The former is the "original" Emacs, and the latter was branched from an earlier version of GNU Emacs by a company called Lucid. If you're interested in the history, Wikipedia has a nice summary.
I've used both versions extensively and had similarly positive experiences with each. I'm going to recommend installing GNU Emacs, however, since in my experience it seems to be the more popular version. Because it's more common, it's more likely that additional packages and snippets that you adopt after the installation will be more compatible. Future posts in this "hacks" series will also assume that you're running a flavor of GNU Emacs.
OK, so we've decided to install GNU Emacs. However, we're not quite finished yet. There are several versions and packagings of Emacs (yes, even when we limit our target platform to Windows), and so we've got yet another decision to make. Currently, the "stable" version which is distributed by GNU officially is version 21.x. The next version of Emacs, 22.x (and the version after that, 23.x), can be had by compiling your own version from CVS (there are also unofficial binary builds).
I'm going to walk you through the steps of installing the official, stable distribution of Emacs (21.x). If a newer version is distributed by the time you end up reading this, the steps are likely to be identical. Many of these steps will also be relevant for unofficial distributions, so it's a good place to start even if that's where you end up.
Before we bother to even find the official binary distribution of Emacs, though, we need to make sure our Windows environment is set up properly.
Home is Where the Files Are
As you use Emacs, it will occasionally need to read and write files to your hard drive to keep track of various settings and customizations (we'll cover many of these in this "hacks" series). The primary customizations are placed in a file called
.emacs(the.preceding the name causes the file to be "hidden" on Unix file systems). On Unix and Linux systems, this file is placed in the "home" directory, a user-specific location denoted by theHOMEenvironment variable (e.g./users/derek).Windows does not set the
HOMEenvironment variable by default, so Emacs assumes that your "home" directory isC:\. This is generally not a good place for user-specific settings. On most Windows systems, you need administrative privileges to write to the root directory. Also, multiple users on a given Windows system would likely want their own configurations. Lastly, many Windows backup and migration tools only save the contents of user-specific directories.Fortunately, it's a simple matter to tell Emacs where our real "home" directory is -- we simply need to set the
HOMEenvironment variable. I strongly suggest using the value of theUSERPROFILEvariable as the basis for settingHOME. This variable is always available and is properly set to the real "home" directory of the current user.We can temporarily set environment variables from the console using the
setcommand, but we need to make this particular variable persistent. If you're using Windows Vista, you can set persistent environment variables using thesetxcommand (you can also do this on Windows XP if you install the necessary support tools). To setHOMEfrom a normal user console session usingsetx, type:setx HOME "%USERPROFILE%"
If you don't have the
setxcommand, you can set this variable using a graphical tool. ChooseSystemfrom the Control Panel menu, click theAdvancedtab, and press theEnvironment Variablesbutton. In theUser variablessection, add a new environment variable with the nameHOMEand the value set to your user profile directory (e.g.C:\Documents and Settings\Derek). You can typeecho %USERPROFILE%from a command prompt to get the correct value.
To test the value, start a new console session, and type
echo %HOME%. On Windows XP, for example, you should see something like this:Microsoft Windows XP [Version 5.1.2600] (C) Copyright 1985-2001 Microsoft Corp. C:\Documents and Settings\Derek>echo %HOME% C:\Documents and Settings\Derek
Getting the Distribution
With that out of the way, we can download Emacs. Begin by visiting the official Windows distribution site. You'll notice a number of different files here:
emacs-21.x-barebin-i386.tar.gz- This contains Emacs binaries without any precompiled Lisp files. You could install Emacs with this file, but it wouldn't be very useful (you would have to add / compile the Lisp files yourself to make Emacs useful).
emacs-21.x-bin-i386.tar.gz- The is a full installation of Emacs with precompiled Lisp files. This is a good choice if you just want to install Emacs and use it.
emacs-21.x-fullbin-i386.tar.gz- This is functionally the same as the previous item, but includes the full Lisp source in addition to the compiled versions. This is the best choice if you want to be able to see the actual code behind much of Emacs. If you have the space, I recommend installing this version.
emacs-21.x-leim.tar.gz- This is the "Library of Emacs Input Methods" package, which is used for entering non-ASCII characters. You can probably skip this for now.
emacs-21.x-lisp.tar.gz- This file contains the Lisp source for the Emacs distribution. If you installed the regular binary version, you could use this to turn it into the "fullbin" version.
emacs-21.x-undumped-i386.tar.gz- This version contains a special executable you can use to rebuild Emacs after changing built-in files. You almost certainly don't need this.
fns-21.x.x.el- This file contains the load history for built-in libraries. This file comes with the normal distributions, so you can safely ignore it.
The installation process is the same whether you install "bin" or "fullbin", so download whichever option makes more sense for you.
Unpacking
Because the files are in gzipped GNU Tar format, you cannot extract them using the built-in compression tool in Windows. If you already own a commercial compression tool such as WinZip or WinRAR, you can easily extract the binaries using one of those tools. If you don't have one of these tools, I recommend installing 7-Zip, which has similar capabilities and is completely free.
The root of the binary distribution contains a single directory,
emacs-21.x. Because there are no user-specific files in the binary distribution, it makes sense to install it in a shared location. If you have administrative privileges, I suggest extracting it to your "Program Files" directory (typeecho %ProgramFiles%from a command prompt to get the full path). Users with no administrative privileges whatsoever can install Emacs in theirHOMEdirectory instead.If you're using 7-Zip, open it and navigate to the downloaded file. If you're using Windows Vista or are running as limited user account, you will need to start 7-Zip using the "Run As" feature of your particular flavor of Windows. If you don't, it won't have access to write to the shared directory. Once you've opened the downloaded file using 7-Zip, double-click
[Content], select theemacs-21.xdirectory, and clickExtract. Enter the full path to your "Program Files" directory, and click "OK".
You can now close 7-Zip, navigate to your "Program Files" directory and confirm that the extraction was successful.
Installing
You now have a usable version of Emacs. If you execute
runemacs.exefrom thebindirectory, Emacs will open and be fully functional. This isn't the most convenient way of starting Emacs, however. Thus, there is one last installation step.In the
bindirectory containing therunemacs.exefile, there is a file calledaddpm.exe. This file "installs" Emacs by adding a start menu program group and adding a few registry entries inHKEY_LOCAL_MACHINE\SOFTWARE\GNU\Emacs. Simply double-click the file, click "OK", and Emacs will be fully installed. Now you can open Emacs from the "Gnu Emacs" program group in your start menu.Now What?
Now that Emacs is installed, feel free to start using it. If you're accustomed to Windows editors, chances are it will feel a bit strange at first. Perhaps the most useful thing you can do at this point is to invoke the Emacs Tutorial (click the
Helpmenu item to access it). This will cover some of the basic Emacs commands and concepts. If you complete the entire tutorial, you should start to feel comfortable enough in Emacs to do some basic editing. Practice using the Emacs commands as much as you can afford to. You'll learn hundreds of them over time, but the commands covered in the tutorial are some of the most important ones. - Sunday, December 03, 2006
The Case for Emacs
When well-seasoned programmers get together to speak of the "good old days", there are a few persistent topics. There are always at least a couple of archaic programming languages ... depending on the number of reminiscing participants, there can even be enough to inspire a game of "I had it worse than you did!". Inevitably, this is followed by some form of reference to ancient multi-user operating systems, mainframes, or -- although increasingly rare -- systems involving punched cards. More often then not, this piece of the conversation will lead to discussions of operating ancient editors via terminals -- with ed nearly always starting that conversation (and earning several grimaces at its first mention).
If you've heard enough of these discussions, as I have, you've surely observed that the tone of these discussions suggests that we are now in a golden age of development tools ... I've been writing software, and hearing these discussions just long enough to start to wonder if these discussions were any different back then (it's all relative, isn't it?). Me? Well, I started this game a bit late relative to most of the "good old days" participants I've observed. They had Pascal, I had VB5 -- they had ed, I had ... well, VB5 ... they had libraries, I had BBSes, they had printed manuals, I had CD documentation ... the list goes on.
The Beginning
I stumbled into programming during high school (I'm excluding the
10 print "derek", 20 goto 10fun that so entertained me in grade school). I was responsible for inputting data penned in by customers entering contests, signing up for mailing lists, etc. into a database. Every time I re-entered the same postal code three times in a row I could feel my brain shrinking.Within the friendly environment that was VBA (in Microsoft Access '95), I managed to script a form into filling in a few fields for me when encountered with one of the most popular postal codes. I was hooked -- entering the addresses had been transformed from mind-numbing boringness to a creative exercise in trying so save as many keystrokes as possible.
My initial fondness for VBA (and Microsoft's influence on programming curricula in the greater Puget Sound area) led me to a Visual Basic programming course at a local community college. I enjoyed it, and it eventually led me to my first programming job (which really was a web design job, but nobody seemed to mind -- perhaps because I was still in high school, or perhaps because I was making $8/hour).
In these early years of my career (before it was officially my career, I should add), there was never a distinction between language, framework or development environment. My language was Visual Basic, I called Visual Basic functions, and I did all of this using a product called Visual Basic ... the explosion of Active Server Pages brought Visual InterDev along for the ride, and C++ tasks were handled by Visual C++. There was always a "right" tool for the job.
A Hot Cup of Java
After a couple of years of making ASPaghetti, I started toying with a language called Java. I use the word "toying" quite intentionally, because at the time my best description of it would have been "language used to make cool buttons on Web pages". I had been quite ignorant of the growing Java juggernaut, but our company had recently bought some Oracle licenses (it was the Dot-Com era, why not?) and it seemed to be the language of choice for folks in that camp.
I quickly moved from "toying" to "using". For a Visual Basic programmer, Java was like heaven -- actually, Java was heaven. After two weeks of using it, I was ready to have the "good old days" conversation about old Visual Basic. Goodbye class modules, hello beans!
One thing that was very different in the Java world, and it was new to me, was choice in environment. In the Microsoft world (and this is still true today), there was a supported development environment released with the language ... in Java, the tutorials were telling me to set a CLASSPATH environment variable (bad advice, by the way --
-cpis much less painful, but I digress) and use the editor of my choice. Using Notepad was fine for my "First Cup of Java", but I knew I'd be needing auto-completion and a "debug toolbar" before long.My quest for the perfect Java IDE never really finished -- I started with Oracle JDeveloper (basically an early version of JBuilder), and later dabbled with VisualAge, Webgain, Netbeans, JBuilder itself, and just about everything else that came around ... I never fell in love with any of them, but most were a pretty nice step up from the development tools I had worked with previously. They all had the basic features I was looking for -- syntax highlighting, intelligent completion, click-to-run, debugging ... none of them stood out, but they were all Good Enough[tm].
The Dark Side
There is a dark side of the programming world. Some are never exposed to it ... some have only had enough exposure to be scared by it. The dark side consists of a wide network of programmers, about 1 to 2 for every 10, who throw away the rules. They run strange operating systems, they spend free time on weekends experimenting with programming languages you've never heard of (Hask-what? Eif-who?) ... and they don't use the same tools that you do, to say the least.
I remember the first time I witnessed a member of this "dark side" in action. I had wandered into the cubicle of one of our recent hires, a long-haired "Unix guy" who was in the midst of a rather furious coding session in a very strange looking editor (Vim, for the curious). Watching him operate his editor was truly an experience. He was working at a pace I'd never seen before, his cursor flying around the buffer like a pinball. I was like a child seeing a magician for the first time (that hat was empty!).
I rudely interrupted his session -- in
