- Wednesday, June 10, 2015
Five Years of Google Closure
It's been five years since I first began using Google Closure in a professional setting, and nearly as long since I wrote my introduction to it here. Over that time, we (Seattle-based startup Appature, now IMS Health) used Google Closure to build a very large web application for digital marketing.
Motivations
At the time we began using it, Google Closure had been public for around 6 months. While it wasn't gaining much traction, it promised to solve a number of difficult problems we were experiencing. As with most web applications of the era, ours was built using a small base library (prototype.js, though jQuery had conquered the world for newer applications).
A key challenge when building a large application around a small library is the need to constantly add supplemental libraries to fill in missing functionality. Add a date picker, a drag & drop library, and a rich text editor, and you've already added hundreds of kilobytes to your application payload. Our worse case pages had nearly 700 kilobytes of JavaScript spread across several files (in an era where we still supported IE6, I should add). Concatenation and minification were of little help. Though we may have only been using a small percentage of some of the libraries we were consuming, we had no reliable way to eliminate the unused parts.
Worse, these additional libraries were typically authored by a completely different team, individual or organization, making code quality and even implementation patterns (such as event handling) highly variable. Absent standards for UI elements, each new library added additional costs for users as well. For example, our pre-Google Closure application's WYSIWYG editor had its own system for dialog boxes, whose look and function were completely different from those found elsewhere in the application.
Adoption
At the time of our initial investigation, Closure was not widely used outside of Google, and documentation (particularly conceptual documentation) was scant. However, given its usage on very high profile Google projects (Gmail, Docs, etc), we dove in and have used Google Closure ever since.
The risk was worth taking, as Google Closure offered a compelling solution to our most significant challenges. The massive Closure Library -- hundreds of thousands of lines of good quality JavaScript code -- matched well to our needs, and was designed to be extended. For example, over the years we've added extensive functionality to the WYSIWYG editor via plugins, which look and feel like seamless extensions of the base editor.
More importantly, the Closure Compiler has allowed us to turn the massive size of the Closure Library into a huge asset. We compile our script with advanced optimizations, which eliminates dead code automatically, meaning we only pay for exactly the code we are using. Closure Complier also supports splitting code into modules, which again is done automatically (like most non-Google users, we use plovr). We have a base module for common code and separate modules for major areas of functionality (email, SMS, customer search, settings, and many more).
Our consumption of Closure Library has grown significantly over time. Shortly after our initial adoption, we extended our localization support and found support for locale-aware parsing of dates, numbers and currency right in the box. Below is a list of classes and namespaces we've consumed from Closure Library. This only accounts for elements of the library we've directly referenced from our own application code:
- goog.dom
- goog.dom.forms
- goog.array
- goog.ui.Component
- goog.asserts
- goog.style
- goog.string
- goog.events.EventType
- goog.events.Event
- goog.events.EventHandler
- goog.dom.TagName
- goog.object
- goog.async.Deferred
- goog.events
- goog.structs.Map
- goog.dom.classes
- goog.dom.DomHelper
- goog.ui.Dialog.EventType
- goog.ui.Dialog.ButtonSet
- goog.json
- goog.ui.Zippy
- goog.net.XhrIo
- goog.Timer
- goog.ui.Dialog
- goog.string.format
- goog.ui.Popup
- goog.ui.LabelInput
- goog.dispose
- goog.events.InputHandler
- goog.userAgent
- goog.dom.dataset
- goog.debug.Logger
- goog.ui.registry
- goog.iter
- goog.ui.Component.EventType
- goog.testing.jsunit
- goog.dom.query
- goog.dom.NodeType
- goog.ui.PopupBase.EventType
- goog.ui.MenuItem
- goog.events.EventTarget
- goog.date.Date
- goog.async.Throttle
- goog.async.DeferredList
- goog.ui.Menu
- goog.ui.Control
- goog.structs.Set
- goog.net.IframeIo
- goog.fx.DragDropGroup
- goog.fx.DragDrop
- goog.functions
- goog.ui.editor.AbstractDialog.EventType
- goog.ui.PopupMenu
- goog.math
- goog.fx.easing
- goog.events.BrowserEvent
- goog.editor.plugins.AbstractDialogPlugin
- goog.Uri
- goog.uri.utils
- goog.ui.editor.ToolbarFactory
- goog.ui.editor.AbstractDialog
- goog.ui.ZippyEvent
- goog.ui.Dialog.DefaultButtonKeys
- goog.ui.DatePicker.Events
- goog.positioning.Corner
- goog.math.Coordinate
- goog.fx.AbstractDragDrop.EventType
- goog.dom.ViewportSizeMonitor
- goog.date.DateRange
- goog.Disposable
- goog.ui.editor.ToolbarController
- goog.ui.editor.DefaultToolbar
- goog.ui.editor.AbstractDialog.Builder
- goog.ui.decorate
- goog.ui.ac.Remote
- goog.ui.Tooltip
- goog.ui.TabBar
- goog.ui.Tab
- goog.ui.SubMenu
- goog.ui.MenuSeparator
- goog.ui.IdGenerator
- goog.ui.Dialog.Event
- goog.positioning
- goog.math.Size
- goog.i18n.DateTimeParse
- goog.fx.DragListGroup
- goog.fx.Animation.EventType
- goog.format.EmailAddress
- goog.events.KeyCodes
- goog.editor.plugins.LinkBubble
- goog.editor.plugins.AbstractBubblePlugin
- goog.editor.Field
- goog.date.Interval
- goog.History
- goog.window
- goog.ui.editor.TabPane
- goog.ui.ac.AutoComplete.EventType
- goog.ui.ac.AutoComplete
- goog.ui.ToolbarSeparator
- goog.ui.PopupColorPicker
- goog.ui.MenuButton
- goog.ui.ColorPicker.EventType
- goog.ui.AnimatedZippy
- goog.structs.Queue
- goog.positioning.AnchoredPosition
- goog.net.XhrManager
- goog.math.Box
- goog.locale
- goog.iter.Iterator
- goog.i18n.NumberFormatSymbols
- goog.i18n.NumberFormat.Format
- goog.i18n.NumberFormat
- goog.graphics
- goog.fx.dom.SlideFrom
- goog.fx.dom.Slide
- goog.fx.dom.FadeOut
- goog.fx.dom
- goog.fx.AnimationQueue
- goog.fx
- goog.events.KeyHandler
- goog.events.InputHandler.EventType
- goog.editor.range
- goog.editor.plugins.UndoRedo
- goog.editor.plugins.RemoveFormatting
- goog.editor.plugins.ListTabHandler
- goog.editor.plugins.LinkDialogPlugin
- goog.editor.plugins.EnterHandler
- goog.editor.plugins.BasicTextFormatter
- goog.editor.Plugin
- goog.editor.Link
- goog.dom.Range
- goog.debug.Logger.Level
- goog.async.Delay
- goog.ui.emoji.SpriteInfo
- goog.ui.emoji.EmojiPicker
- goog.ui.editor.messages
- goog.ui.ac.Renderer
- goog.ui.ac.InputHandler
- goog.ui.ac.ArrayMatcher
- goog.ui.ac
- goog.ui.Zippy.Events
- goog.ui.ToolbarMenuButtonRenderer
- goog.ui.Toolbar
- goog.ui.TabRenderer
- goog.ui.TabBarRenderer
- goog.ui.Prompt
- goog.ui.PopupBase
- goog.ui.ItemEvent
- goog.ui.InputDatePicker
- goog.ui.DatePickerEvent
- goog.ui.DatePicker
- goog.ui.Component.Error
- goog.ui.ColorPicker
- goog.ui.ColorPalette
- goog.ui.ColorMenuButton
- goog.ui.Checkbox
- goog.testing.mockmatchers
- goog.testing.asserts
- goog.testing.StrictMock
- goog.testing.MockClock
- goog.testing.MockClassFactory
- goog.structs
- goog.string.linkify
- goog.positioning.Overflow
- goog.net.XhrLite
- goog.net.IframeLoadMonitor
- goog.net.EventType
- goog.net.Cookies
- goog.math.Rect
- goog.i18n.currency
- goog.i18n.NumberFormat.CurrencyStyle
- goog.i18n.DateTimeSymbols
- goog.i18n.DateTimeFormat
- goog.graphics.Path
- goog.fx.dom.Scroll
- goog.fx.dom.PredefinedEffect
- goog.fx.DragListDirection
- goog.fx.DragDropItem
- goog.events.KeyHandler.EventType
- goog.events.KeyEvent
- goog.editor.plugins.TableEditor
- goog.editor.plugins.SpacesTabHandler
- goog.editor.node
- goog.editor.Table
- goog.editor.Command
- goog.editor.BrowserFeature
- goog.dom.iframe
- goog.dom.BrowserFeature
- goog.debug.LogManager
- goog.debug.ErrorHandler
- goog.debug.Console
- goog.date.UtcDateTime
- goog.date.DateTime
- goog.date
- goog.color
As you would expect given its usage within Google, both Closure Library and Closure Compiler have done an excellent job of staying up-to-date. As HTML5 has blossomed, and browsers have evolved, Google Closure has been quick to respond, adding to and evolving its existing functionality.
Closure Today
If the measure of success for a JavaScript framework is popularity, Google Closure has failed miserably. It's old, unsexy, and has missed its turn on the hype train. Google spends no effort on PR for the library, and seems content with it quietly powering the majority of its flagship applications. And given that, the lack of hype has mattered very little to us, as it's been a very good solution to our problems over the past five years.
That's not to say that adoption doesn't matter. The absence of a rich community around Google Closure has been disappointing. Only a single book on Google Closure has been published, and the surrounding tools have been slow to evolve. Contrast this with Angular, which has achieved off-the-charts hype and has developed a massive community despite minimal usage in production Google applications. Many books and countless blog posts have been authored explaining various patterns and concepts, making it much easier to get started.
Despite the lack of popularity, a number of companies have successfully used Google Closure for their production applications. Medium, Yelp, CloudKick (acquired by Rackspace), Cue (acquired by Apple), and IMS Health (my company) all use (or have used) Google Closure to power their production applications. And, importantly, the majority of Google's flagship applications continue to be powered by Google Closure.
Looking Ahead
So, what about the next five years? I expect that we'll be using Google Closure in some form over that time period, though our usage will likely evolve. One area where we're currently venturing outside of Closure Library is for our UI components, which have not been as quick to evolve as other parts of the library. Specifically, we're using React, whose component architecture has fit in very nicely with our existing Google Closure code. With a bit of effort, React components can also take advantage of Closure Compiler.
ClojureScript
One of the more innovative projects in the JavaScript landscape is ClojureScript, a version of Clojure which compiles to JavaScript. Far more than just an alternate syntax for JavaScript, ClojureScript also makes consuming Google Closure incredibly easy (that it rhymes is purely a confusing coincidence).
ClojureScript bundles Closure Library and seamlessly integrates Closure Compiler, requiring no extra tools to gain all of Google Closure's benefits. It is also capable of solving one of Google Closure's biggest annoyances -- its verbosity when used directly from JavaScript. Consider the following example from my original introduction, compared to its ClojureScript equivalent:
Closure Library in JavaScriptgoog.provide("my.app"); goog.require("goog.dom"); goog.require("goog.dom.classes"); var d1 = goog.dom.getElement("decorateme-1"); var d2 = goog.dom.getElement("decorateme-2"); goog.dom.classes.toggle(d1, "aaa"); goog.dom.classes.remove(d1, "bbb"); goog.dom.classes.add(d2, "bbb");
Closure Library in ClojureScript(ns my.app (:require [goog.dom :as dom] [goog.dom.classes :as c])) (let [d1 (dom/getElement "decorateme-1") d2 (dom/getElement "decorateme-2")] (c/toggle d1 "aaa") (c/remove d1 "bbb") (c/add d2 "bbb"))
The ClojureScript community over the past couple of years has been a great source of innovation for application models using React's immediate mode rendering (read this post for a taste). Libraries like Om and Reagent have demonstrated the simplifying effect immutable data brings to UI development, and re-frame has provided a pattern for extending beyond the rendering layer.
Figwheel deserves special mention, as it enables a unique interactive coding environment. Using Figwheel to build reloadable UIs in the style promoted by the libraries above, you can update code live while in the middle of a user workflow. The concept is best presented visually, see this brief video by the author, Bruce Hauman. Having built a couple of UI applications in this style, the traditional browser refresh workflow simply feels slow.
Googe Closure is a first class citizen in the ClojureScript universe. Consider the recent support for Closure Compiler modules, and the many libraries which provide idiomatic wrappers around the robust functionality of Closure Library. ClojureScript promises to be a well supported platform for building rich web applications with Google Closure for many years to come.
- Wednesday, June 30, 2010
Introducing Closure Snippets
I just posted a first version of Closure Snippets, an unofficial collection of snippets for Google's Closure Library. As demonstrated in my recent overview of Closure Tools, Closure Library has a tremendous amount of value, but it comes at the cost of brevity.
Fortunately, we have plenty of tools for making things easier and faster to write. One of my favorites is YASnippet, an Emacs extension which offers TextMate-inspired support for snippets. This forms the basis for Closure Snippets. The snippets in the package are designed to run when js2-mode is active.
I put together a quick screencast to show Closure Snippets in action. As in my previous Emacs screencast, I've used a slightly modified but mostly standard Emacs configuration:
Want to try it for yourself? Grab the source, and check out the README for configuration instructions.
- Sunday, June 27, 2010
An Introduction to Google Closure
Late last year, Google released Closure Tools, a trio of open source technologies aimed at developers writing large scale JavaScript applications. Included in the initial release were Closure Compiler, a sophisticated JavaScript compiler, Closure Library, a massive library of JavaScript code designed for use with the Compiler, and Closure Templates, a templating system implemented in both JavaScript and Java.
The sales pitch was certainly compelling. Closure had been used internally at Google since 2005, with contributions from over 400 engineers. This was the technology that powered Gmail, and several other high profile Google applications (Docs, Reader, Calendar, Maps and Wave amongst them). Google pitched it as their "standard library" for JavaScript-based applications. It would have be reasonable to expect that the hype meter would have been off the charts.
Despite this, the response was rather tepid. The announcement spread broadly, but the standard response seemed to be "so what?". Not much has changed in the six-plus months since its release, either. In the past 30 days, there have been a grand total of three questions tagged with google-closure on Stack Overflow. During the same time period, there have been 3,161 questions about jQuery.
So, what's the problem? Is it the confusing name (a closure is a JavaScript language feature)? "Anti-hype" due to the fact that it was released by Google? Exhaustion from the rapid pace of new JavaScript library releases? A lack of conceptual documentation? Or, is it just too complex to actually use?
I think all of these may be factors to some degree, but the real reason is much simpler: it's simply not the right tool for a lot of applications. That's not intended as a criticism of Closure Tools, it's just that it was designed with far different goals than other popular libraries like, say, jQuery and Prototype.
For small applications, blogs, design galleries, or static content sites which just need some simple form validation, Closure is probably overkill. It is a perfectly suitable solution, as the Closure Compiler enables an efficient footprint for applications of any size. However, the learning curve is quite steep, there are far fewer examples, and you need to find a way to integrate the compiler into your workflow. With a library like jQuery, you just add a script reference (if using a CDN, you don't even need to upload it to your server) and start coding away.
So aside from large Web applications from Google, who is Closure for? Having studied Closure exhaustively for the past few weeks, I would suggest that it is an excellent option to consider for any "medium-plus" sized application, regardless of the back-end "technology stack". These applications will almost always involve multiple developers, and are likely to contain at least 100 kilobytes of non-library source code. If there is no build system in place to combine scripts, an average page in an application of this size probably references 5 to 10 external scripts, if not more.
If your application has reached this level of complexity (or you project it to), this is a good starting point where the benefits of Closure start to become significant. The impact of Closure Compiler on your code's execution speed and size (and, if your scripts aren't being combined, HTTP request overhead) will be significant, and you're likely to be in position to benefit from a large portion of Closure Library as well. As your application grows, so do the benefits. Let's take a closer look at the two primary components of Closure Tools (we'll not be covering Closure Templates).
Closure Compiler
The Closure Compiler is the arguably the flagship component of Closure Tools. The Compiler supports two primary optimization modes: "minification" (whitespace-only or simple) and "advanced compilation".
A JavaScript minifier is nothing new. There have been high quality minifiers available for several years. A JavaScript compiler isn't even necessarily new, as compilers from other languages to JavaScript have also been floating around for some time. What is unique about Closure Compiler is that it is a JavaScript-to-JavaScript compiler, capable of performing optimizations to JavaScript code that were previously unheard of.
Closure Compiler can be used as a standalone component against an existing JavaScript codebase, or in concert with Closure Library, which has been explicitly designed for maximum optimization when used with the compiler. It's easier to understand the power of these advanced compilation features by way of a simple example. Consider the following script:
function unused() { alert('i am only ever called by a function that is uncalled'); } function unused2() { unused(); alert('i am never called'); } function hello(name) { alert('Hello, ' + name + '!'); } var hi = hello; hi('friend');
After running this through the compiler in advanced mode (which you can easily try for yourself), we're left with the following:
alert("Hello, friend!");
The result speaks for itself. The compiler eliminated dead code, inlined functions, optimized string concatenation, and left us with a significantly smaller but functionally identical result.
Hey, where'd my functions go?
It might seem strange or abitrary that in the examples above, the unused functions simply disappeared in the output. The compiler considers this removal perfectly safe and reasonable because it assumes that you have provided it with the entire source code for your application. If you want one of your functions to be available outside of the context of your core JavaScript code (say, from a
script
element in an HTML page), you need to explicitly tell the compiler to export it. See the compiler documentation for details.This isn't to say that there won't be any function calls in your final compiled output if you don't export anything. Closure Compiler will only inline code when it considers it appropriate. When not inlining, functions will typically be renamed. If the compiler chose not to inline our
hello
function, for example, the output would look something like the following:function a(b){alert("Hello, "+b+"!")}a("friend");
I'm supposed to debug that?
If your reaction to the above code is one of terror, you are probably not alone. Anyone who has written large systems in JavaScript knows that debugging code is an inevitable part of the process. The significant optimizations provided by Closure Compiler might make your code smaller, but they also make it extremely difficult to map back to the original code. If you experience a runtime failure that only happens in the compiled code, what do you do?
Fortunately, Closure Compiler is capable of creating a source map, which enables areas of the compiled code to be definitively traced back to the original source from which it was compiled. Even better, you don't have to work with the source map file itself, as Google provides Closure Inspector, a Firebug extension (yes, Firefox only for now) which integrates into your standard debugging experience.
Is that it?
While Closure Compiler is an excellent tool for optimizing code size and execution speed, that's not the only value it provides. It also supports a huge number of JSDoc annotations which enable it to help you find bugs in your code (these also allow the compiler to even better optimize your code). For example, we could redefine our
hello
function from above with a type annotation as follows:/** @param {string} name */ function hello(name) { alert('Hello, ' + name + '!'); }
Here, we have told the compiler that we expect a single parameter of type
string
. Now, let's add some problematic calls to the function:hello(); hello(3.14);
Now, when we compile this code, the compiler issues the following warnings.
JSC_WRONG_ARGUMENT_COUNT: Function hello: called with 0 argument(s). Function requires at least 1 argument(s) and no more than 1 argument(s). at line 5 character 5 hello(); ^ JSC_TYPE_MISMATCH: actual parameter 1 of hello does not match formal parameter found : number required: string at line 6 character 6
With large programs in particular, these warnings can help you quickly isolate hard to find bugs that would otherwise only be visible at runtime. The only downsides are the need to add the JSDoc tags to your code comments (though these are also quite useful as documentation when reading the code), and the need to "cast" from time to time, as below:
// Explicitly "cast" a {Node} to an {Element} var element = /** @type {Element} */ (node); goog.dom.setTextContent(element, 'success!');
Closure Library
Closure Library is a massive library of JavaScript code optimized for use with Closure Compiler. In theory, pairing it with the Compiler is optional. And in fact, this is an incredibly important feature in development environments, as it makes the compilation step optional and dramatically speeds some stages of development. In production, however, use of the compiler is not optional, as the uncompiled library code is very large, and the strategy used to include the files is not designed to be efficient.
So just how big is Closure Library? The following chart provides some indication. This is the uncompressed, non-minified JavaScript on-disk file size of a few popular libraries (excluding test code), as reported by du. The file sizes include comments and indentation, so it is not meant to be indicative of the actual number of "lines of code". All libraries shown are the current stable versions at the time of this writing:
So yes, the library is indeed massive. But is it any good? For starters, there are no "versions" of Closure Library. The code is simply updated periodically in a public Subversion repository, which is fed from Google's internal source control system. That doesn't mean that it's "prerelease" (or "beta") quality, it simply means that Google is confident enough in the quality of the code that the latest release is always considered to be (in their own words) "very stable". Many parts of Closure Library are actively in production at Google, and it comes with a massive suite of unit tests. At the end of the day, it's up to you and or your organization to determine if it is "stable" enough for your needs. Having spent a large amount of time in various modules of the Closure Library code recently, my personal opinion is that it is uniformly well designed, well documented, and very well written.
So what's in there, taking up all of that space? A surprising amount is in comments, believe it or not. Closure Library is exhaustively documented, and most of the documentation is inline. This makes browsing the code a great strategy for learning more about Closure. As far as the code itself is concerned, some portion of the library is dedicated to things that would be broadly useful in any JavaScript project, including those that do not target Web browsers (array, crypt, date, math, string, and more). A bigger portion of the code is exclusively designed for JavaScript applications targeting browsers (dom, editor, fx, history, style, ui, useragent, and more). A good starting point for understanding what's available in Closure Library is the demos folder (index), where you'll find examples for things as basic as event handling, and as exotic as a popup "emoji" picker. Lastly, Closure Library also includes selected third party code. For example, Dojo's Query implementation is exposed as
goog.dom.query
.Closure Library code is modular, with related code organized into "namespaces". This organization is well suited for large scale projects with large teams. Another advantage of this namespacing strategy is that Closure Library is extremely unlikely to interfere with existing libraries, as all of the library code is organized under a single global symbol (
goog
). Note that this does not necessarily mean that other libraries will not impact the behavior of Closure Library. Existing libraries that pollute the global namespace unpredictably, or modify the prototypes of built-in JavaScript types could cause Closure Library code to behave undesirably (as it would with any other third party library). To put it more succinctly, Closure Library plays well with others, provided that they too play well with others.The downside to the namespacing strategy is that your code ends up being more verbose. There are many ways to control for this. You might decide that your application won't need to interoperate with other libraries, and can safely add aliases for commonly used functions in the global namespace. For example, if
goog.dom.getElement
is too much typing for your taste, you can selectively trade off namespacing by simply adding a global alias as$
:goog.exportSymbol('$', goog.dom.getElement);
There are, of course, risks to modifying the global namespace in this fashion, so it's undoubtedly a good thing that the Closure Library designers left the decision to us.
Closure Library vs. jQuery, Prototype
So if we don't modify the global namespace, what does Closure Library code actually look like? This is probably easier to understand by example. Following are ten simple code examples comparing the syntax of Closure Library to two other popular libraries: jQuery and Prototype. I created these examples by looking through a few codebases built on top of the libraries, choosing the most popular examples which corresponded to functionality common to all three. My intent is not to make one library look better than the other, but rather to compare the style and syntax. The examples are functionally equivalent.
Comparison #1: Get a single element by ID
jQueryvar title = $("#title").get(0); // - or - var title = $("#title")[0];
Prototypevar title = $("title");
Closure Libraryvar title = goog.dom.getElement("title");
Comparison #2: Get an array of elements by ID
jQueryvar elements = $("#title,#description,#footer");
Prototypevar elements = $("title", "description", "footer");
Closure Libraryvar elements = goog.array.map([ "title", "description", "footer" ], goog.dom.getElement);
Comparison #3: Find an element's closest ancestor by element name
jQueryvar list = $("#list\\.item").closest("ul")[0];
Prototypevar list = $("list.item").up("ul");
Closure Libraryvar list = goog.dom.getAncestorByTagNameAndClass( goog.dom.getElement("list.item"), goog.dom.TagName.UL);
Comparison #4: Hide all paragraphs which are immediate descendants of div elements
jQuery$("div > p").hide();
Prototype$$("div > p").invoke("hide");
Closure Librarygoog.array.forEach(goog.dom.query("div > p", null), function(e) { goog.style.showElement(e, false); });
Comparison #5: Get an element's text content
jQueryvar text = $("#decorateme-1").text();
Prototypevar text = $("decorateme-1").innerHTML.stripTags();
Closure Libraryvar text = goog.dom.getTextContent(goog.dom.getElement("decorateme-1"));
Comparison #6: Retrieve the cumulative offset of an element
jQueryvar offset = $("#positioning").offset(); console.log("left: " + offset.left + "px top: " + offset.top + "px");
Prototypevar offset = $("positioning").positionedOffset(); console.log("left: " + offset.left + "px top: " + offset.top + "px");
Closure Libraryvar rect = goog.style.getPageOffset(goog.dom.getElement("positioning")); console.log("left: " + rect.x + "px top: " + rect.y + "px");
Comparison #7: Create a DOM element and append to the current document body
jQuery$("<div class=\"created\" id=\"createdDiv\">content</div>").appendTo("body");
Prototype$(document.body) .insert(new Element("div", { "class": "created", id: "createdDiv" }) .update("content"));
Closure Librarygoog.dom.appendChild( document.body, goog.dom.createDom("div", { className: "created", id: "createdDiv" }, "content"));
Comparison #8: Add, remove and toggle CSS classes
jQueryvar d1 = $("#decorateme-1"); var d2 = $("#decorateme-2"); d1.toggleClass("aaa").removeClass("bbb"); d2.addClass("bbb");
Prototypevar d1 = $("decorateme-1"); var d2 = $("decorateme-2"); d1.toggleClassName("aaa").removeClassName("bbb"); d2.addClassName("bbb");
Closure Libraryvar d1 = goog.dom.getElement("decorateme-1"); var d2 = goog.dom.getElement("decorateme-2"); goog.dom.classes.toggle(d1, "aaa"); goog.dom.classes.remove(d1, "bbb"); goog.dom.classes.add(d2, "bbb");
Comparison #9: Create a manual hover effect using mouse events
jQuery$("h1:first") .mouseover(function() { $(this).css("color", "red"); }) .mouseout(function() { $(this).css("color", ""); });
Prototypefunction setColor(e, color) { e.element().setStyle({ color: color }); } var h1 = $$("h1")[0]; h1.observe("mouseover", setColor.bindAsEventListener(this, "red")); h1.observe("mouseout", setColor.bindAsEventListener(this, ""));
Closure Libraryfunction setColor(color, e) { e.target.style.color = color; } var h1 = goog.dom.getElementsByTagNameAndClass("h1")[0]; goog.events.listen(h1, goog.events.EventType.MOUSEOVER, goog.partial(setColor, "red")); goog.events.listen(h1, goog.events.EventType.MOUSEOUT, goog.partial(setColor, ""));
Comparison #10: Update the contents of a DOM element using XmlHTTPRequest response content
jQuery$.get("hello.txt", null, function(data) { $("#ajaxResponse").html(data); });
Prototypenew Ajax.Updater("ajaxResponse", "hello.txt", { method: "get" });
Closure Librarygoog.net.XhrIo.send("hello.txt", function(e) { goog.dom.getElement('ajaxResponse').innerHTML = e.target.getResponseText(); });
In general, the Closure Library examples are more verbose than the equivalents written using jQuery and Prototype. After being run through the compiler, they will be more compact at runtime, but there is some extra visual overhead to deal with when reading and writing Closure Library code. Of course, as discussed above, Closure Compiler helps a great deal when writing this code, as it can find bugs before you run them in the browser.
All of these libraries are quite easy to work with for such trivial examples as those above. The real benefits of Closure are found in larger codebases. If you're working on such a codebase, or are about to, I strongly suggest giving Closure Tools a look.
- Thursday, June 12, 2008
DVCS Myths
My last post on distributed version control systems generated some interesting discussion, both in the comments here and elsewhere on the Web. A number of the responses were interesting and thought provoking, while others were so full of FUD and misinformation I couldn't help but wonder if they were serious. I'll admit that I was surprised by some of the negative backlash against DVCS. I have explained it to many former users of centralized systems, and it simply never struck me as a very controversial technology. I don't want to just completely ignore the criticism, however. This post is an attempt to respond directly to some of the more common criticisms, and hopefully convince some of the skeptics that even if DVCS isn't the solution for you, at least it won't start your computer on fire.
DVCS Myth #1: You must change your workflow to adopt DVCS
Many descriptions of DVCS focus on the new and interesting workflows it enables. Indeed, this is a key feature of distributed version control, but it has a tendency to give the implication that DVCS is only useful if you really need to change your workflow.
This is entirely untrue. DVCS is flexible, and can be implemented in some very interesting and unique ways. However, it can also act just like your centralized system, and its advantages are no less significant.
At our company, for example, we switched from Subversion to Mercurial without changing our model at all, at least initially. We kept the same branch structure, used the same server, and did things in generally the same way. As our team has grown and diversified, our needs have as well, so we've leveraged some of the strengths of the DVCS model to match our workflow. The key is that DVCS works with your desired workflow rather than dictating it. If your desired workflow is similar to or identical to the "central server" model, that's a perfectly acceptable use case for applying DVCS.
DVCS Myth #2: Workflows enabled by DVCS are less natural than the centralized workflow
For long-time users of centralized systems, this is an understandable belief. Indeed, the workflow mandated by a centralized system may in some cases be the most natural. In these cases, DVCS offers the best implementation of the centralized workflow I've found. It's in cases where the centralized model is not the most natural workflow, however, that the unique properties of DVCS really shine.
As a specific example, DVCS has enabled me to manage changes to my home directory much more naturally than in a centralized system. I keep the contents of my home directory (dot files, elisp, etc.) under version control. I was using Subversion prior to discovering DVCS. With Subversion, I ran the server on my home development workstation, which I left powered on during the day so it was accessible from work (forcing me to pay otherwise unnecessary power costs). In addition, I paid $5 per month to my ISP for a static IP (dynamic DNS was unfortunately not an option due to the NAT configuration of my fiber-to-home service).
Despite these costs, the workflow in this setup was extremely unnatural. When I would make an update to the repository on the bus, I would have to leave the files in a modified state. Upon arriving at work, I would then have to open the laptop, connect to the network, then either make a bulk checkin with all of the changes or manually partition the modified files into the proper groups for changesets.
If, on the other hand, I made changes on my work computer and wanted to check them in while my home server was down (because of a network outage, or simply because I forgot to turn it on in the morning), I would have to manually generate patches from the repository (again, forcing myself to later reassemble them into logical changesets). Of course, accessing source control over the Internet is never ideal from a performance perspective, even when the server is always up. This is particularly true when using a strained corporate connection to talk to a server on an upload limited consumer line.
This was an annoying process, to say the least, and while it was a huge improvement over manually copying my home directory around, it left much room for improvement.
With DVCS, all of the annoyances of the previous model are gone. I can make commits from the bus without network access, and these commits are properly organized into the appropriate changesets as opposed to a giant single patch. I can easily pull these changes into my work computer's repository when I get to work, or I can leave the laptop in the bag and merge them another time. The changes I make on my work computer, meanwhile, need to make it back to my home machine. However, with my DVCS-powered workflow I now keep my machine turned off during the day (making DVCS the green SCM choice). I have also canceled my static IP service, saving myself $5 a month. In the absence of direct access to my home repository, I use a variety of mechanisms for sharing changes. Most commonly, I transfer changesets to my home machine via my laptop's repository. In other cases, I will export a handful of changesets and transfer them with a USB thumb drive or via email. In general, I use the most convenient option available, though I have used all three in various situations.
Regardless of where the work happens or how it's transferred, merging the changesets is simple with DVCS because, well, DVCS is designed to make merging changesets simple. It's simple no matter where the changesets originated, in part because DVCS uses unique hashes to identify changesets. Of course, it also tracks the parent revision of each changeset, so it can determine cases where a merge isn't necessary at all. This, unsurprisingly is the most common case given that I'm the only user in this scenario.
One thing you may have noticed in the workflow description is that it's a bit ambiguous which of my computers is "the server". Previously, it was my home machine, but why? It could just as easily have been any other machine ... in fact, I probably would have been better off running the centralized server on the laptop, though that doesn't seem quite right either. The fact is, this is a workflow where "the server" is naturally ambiguous. There is no real value in designating my home machine (or any other) in this role. Thus, the centralized model for version controlling my home directory simply isn't a natural fit. The DVCS model, on the other hand, easily and naturally supports my desired workflow. There are no "hacks" required to make this work cleanly.
As an added bonus, I get free offsite backups of my home directory repository. This leads to our next myth.
DVCS Myth #3: DVCS users don't believe in backups
The idea that DVCS users don't believe in backups is surprisingly pervasive, perhaps because of the passive attitude DVCS advocates tend to have about server outages. At our company, we have the same attitude, but we also make very frequent backups of our centralized repository. Using DVCS may theoretically reduce the need for backups, but by no means does it eliminate it.
So, why make backups of a source control server with so many backups? It is improbable that many servers will suffer catastrophic hardware failures simultaneously, but it is not impossible. A more likely scenario might be a particularly nasty computer virus that sinks its teeth into an entire network of vulnerable machines. In any case, the probability of any or all of your backups becoming suddenly unavailable is really not the point. The bottom line is that using independent clones as canonical backups (as opposed to temporary stopgaps) is a suboptimal strategy.
Security, for example, should be considered. If you are using authorization rules to control access to specific portions of your repository, canonicalizing an arbitrary clone of the repository effectively renders those rules useless. While this would rarely be a matter of practical concern in a controlled corporate environment, it is nonetheless possible. It is worth noting that in an environment where a backup process is infeasible (for financial, political or other reasons), backing up hashes of the repository files and their revisions for post-backup verification could provide a mitigation.
The key win of DVCS for backups, then, is that you don't really need to invest in a "hot" backup. When the server inevitably goes down, DVCS will buy you time. Lots of time. You'll essentially be running at full productivity (or very nearly so) while you rebuild your server from backup. When changesets created during the server downtime are pushed back to the restored server, the freshly restored authorization rules will be reapplied and you'll be back on track.
DVCS Myth #4: Authentication and authorization don't exist with DVCS
I touched on this a bit in the previous myth, but it's worth emphasizing. Authentication and authorization absolutely do exist in the DVCS model. They only apply where you choose to apply them, however.
Our company has a canonical source control server which applies both authentication and authorization rules. The authentication rules are specified via the Apache server configuration which provides network access to the repositories. In fact, we leveraged the exact same Apache authentication configuration we had used for our Subversion installation (this configuration allows us to leverage the user database in the company's Windows domain). For authorization, we use the more flexible options offered by Mercurial's ACL configuration. In the simplest case, we have developer-specific copies of mainline development branches which can be pulled by anyone, but only pushed (written) to by the developer who owns it. Grouping users and splitting access based on subpaths of a single repository are nearly as simple.
Because the authorization rules are applied when changesets are pushed, developers working on local repositories are not denied any flexibility until they attempt to push their changes. A user in this scenario can still commit changes against that repository, they just can't push them directly. Thus, they would have to convince a developer who does have such permission that their changes are worthy of inclusion. Despite the flexibility, the effectiveness of the authentication and authorization rules are not compromised.
DVCS Myth #5: DVCS can be used in corporate environments, but its advantages are mostly geared towards open source projects
DVCS is indeed quite popular for open source projects, and the reasons are fairly obvious. When many disconnected developers are working on the same project, the workflow flexibility provided by DVCS becomes increasingly important. It also provides a clean mechanism for remote users without commit access to the primary repository to create new functionality within the code base.
A user working on a new experimental feature, for example, can perform the work on their local clone of the repository. Within this local copy, they can commit changesets and integrate changes from other users. Importantly, as the mainline codebase evolves, they can also merge their changes with updated upstream code in a clean and organized way. In a centralized system, they would be forced to maintain their changes as a set of patches, manually rebasing them when they pull down new changes in the upstream repository. When their experimental feature is complete, they can easily export the new work and send a compact package to the appropriate maintainer.
The workflow flexibility DVCS offers is particularly valuable in an open source project with multiple maintainers. The maintainers in this scenario would be responsible for integrating contributions from the community for the modules they are responsible for. The contributions would mostly come in the form of structured patches exported by the DVCS client. The process of integrating these patches is easier and more organized with DVCS. Additionally, merging responsibilities can easily be split amongst several maintainers for patches that are accepted.
In corporations, on the other hand, it is rare to find groups collaborating by sharing patches. Thus, at a glance the flexibility offered by DVCS model might seem to be overkill. In some cases, this is true, I doubt any company or organization needs every feature provided by DVCS. However, having the capability to restructure your workflow is extremely valuable, even if you don't need it yet. And the parts of DVCS you don't use certainly don't cost you anything.
Perhaps you want to prevent a group of developers from committing changes to your mainline repository until they are reviewed. Using a centralized system, the developers from this group must submit patches for review and integration with the main code base. Their commit logs are thus lost, overwritten by the reviewing developer who applies the patch. DVCS makes this significantly easier. Those responsible for reviewing the changes simply pull reviewable changesets directly from the developer, or they can pull them via a developer-specific branch on the server. If the changes pass review, they can push them along to the main repository (only they would have access to do this). This group might also collect their changes together into their own shared repository, enabling a variety of changes to be tested together.
These sorts of scenarios can be especially useful when collaborating on a single project with an external development organization. I have seen attempts to use centralized version control systems in these scenarios fail miserably. Corporate centralized servers are rarely designed to be exposed on public networks, so naturally administrators shy away from enabling remote access, except via VPN. When no shared central server is available, the inevitable result is a hackish process. In the best case this might be a process based on exchanging patches from known parents, but more commonly it involves trading full copies of the source code with all revision history lost, then performing painful manual merges. With DVCS, you simply send the whole repository once, then share the exported changesets by whatever transfer mechanism is most convenient (server access is just a bonus). Merges can again be performed by either development organization, which is especially convenient.
DVCS Myth #6: Having a server with perfect uptime invalidates the advantages of DVCS
Even if you are not on a plane, you may very well be on the bus, or at home, or at a coffee shop, or in a hotel room, or on vacation. Just because your server has perfect uptime doesn't mean you're always in position to access it.
I have personally needed source control repository access in every single one of these places (including the plane), and I cannot rely on high speed Internet access in most of them. A couple of weeks ago I got a call about a bug that needed fixing while riding in on the bus. I was only 10 minutes from my destination, but that was enough time to crack open my laptop and run a bisect session to uncover the changeset which introduced the bug. Upon arriving at work, I knew exactly what the problem was and I was able to fix it immediately. With more time, I could have prepared the fixed changeset on the bus, ready to transfer as soon as I arrived. DVCS not only gives you access to your repository everywhere, it offers the most performant experience possible in all of these scenarios.
The performance aspect is a relevant point when speaking about uptime. As far as I'm concerned, if my "annotate" command takes 10 seconds to run, that counts as 9.5 seconds of downtime, because with DVCS it's virtually instantaneous. A slow responding centralized server can easily cost you as much productivity as an occasionally inaccessible one in the long run.
Hiccups can be mitigated to some extent by spending a great deal of money on your source control server, the hardware between it and your workstations, and appropriately qualified staff to make it all work together. However, I've rarely seen companies willing to make the required investment (never in my own experience, and quite rarely in others). Even in those that do, it's still only a mitigation (a server running on good hardware still inevitably gets sluggish under load), and it only lasts so long. As your repository grows and your team expands, the scaling pressure increases. DVCS, on the other hand, grows with your company, and always provides optimal performance. Thus, you don't need to spend an extra $25,000 on your server hardware "for future hires".
DVCS Myth #7: DVCS encourages chaos in your development process
This seems to be the issue that, more than any other, causes the anti-DVCS crowd to load up the FUD cannons. I've not seen any evidence to suggest that DVCS encourages degradation in teams. In fact, I have seen the opposite effect. Because DVCS can be shaped to the natural workflows of your team, when implemented properly it enables teams to work more smoothly, with less communication overhead. With that said, the fear, uncertainty and doubt surrounding this issue isn't going away, so it is only fair to address what seems to be such a "hot button" issue.
For starters, let's define chaos. I think it's important to understand that "chaos" is a moving target. A happy Subversion user might see the flexibility offered in the DVCS model as potentially chaotic. Meanwhile, there are many happy Visual SourceSafe users (really, there are -- I've met some, they are nice people despite this) who find the idea of a non-exclusive locking source control system to be the very definition of chaos.
Most people who have been writing software for a long period of time in a team environment accept that having multiple developers edit the same file at the same time is not only an acceptable form of chaos, but a very necessary one. It may not be intuitive (it sounds a lot like chaos until you've realized its importance), but it's almost universally recognized within mature development organizations that the cost of merging file changes when multiple developers edit the same file far outweighs the massive productivity cost of a strict locking system.
So at least on this single issue, the DVCS proponents and detractors agree with each other wholeheartedly, or at least the vast majority of them do. It is not a huge step, then, to imagine other scenarios which might seem chaotic on the surface, but in fact enable huge gains in productivity.
We can acknowledge, having established that chaos can be valuable, that DVCS allows for chaos. All systems allow for some form of it. It is up to your team to determine the appropriate level of chaos that is permitted, and to enforce the process. This is true no matter what system or process you are introducing. If a particular developer working under a centralized source control system never checks in their work, that's a process failure, not a technology problem.
Many common situations in centralized systems lead to chaos as well. To me, the fact that a user cannot check in a set of changes until they've merged in everyone else's work on the same branch is chaos. This makes it far too easy to lose work because of a "merge gone wrong". I have seen developers switch workstations to resolve merge conflicts on more than one occasion. Being disallowed from checking in broken changes that you don't wish to share with others also leads to chaos. Developers wishing to add this layer of control with a centralized system today are forced to either do it manually (by making a copy of the in-progress repository or the relevant patches in case you want to back out) or to adopt a local DVCS.
The rapidly increasing popularity of running DVCS locally on top of centralized repositories really speaks to the need for the flexibility it offers. If you ask around, you'll find a number of different reasons why a given developer might have adopted this strategy. Nearly all of them are good arguments for DVCS in general. Some may want to version changes at a more granular level before sharing their changesets. Some might want a layered mechanism for transferring partial changesets between different environments. Others might value the ability to seamlessly create private branches for managing a particular single user workflow.
Indeed, DVCS provides significant benefits when used by a sole developer on top of a centralized server. But when enabled for an entire organization, it becomes even more powerful. For starters, all users of the system instantly gain access to the valuable features of DVCS. Even developers that don't take advantage of the more advanced DVCS features will instantly benefit from a speed improvement. More importantly, the workflow flexibility enjoyed by the individual user now extends to the entire team.
Having a source control system that supports your workflow and enables people to work together optimally is very likely to lead less chaos in your company or organization.
DVCS Myth #8: All DVCS proponents think centralized version control systems are useless pieces of garbage, and that you're insane for using them
I think this perception is common, and triggers a defense mechanism that in many cases gets in the way of having a rational discussion of DVCS. First of all, most DVCS users used a centralized version control system before switching over. And most of them didn't choose to use diff & patch in lieu of that centralized system (with one rather notable exception).
I personally have several years of experience with CVS, Perforce and Subversion. I have actually had generally positive experiences with all of those tools, and I'd take any of them over a diff & patch based version control strategy. However, part of the reason for my being able to co-exist peacefully with these tools is that I bent my development processes to fit the limitations of the tools. Subversion's sub-par branching, for example, was annoying but not crippling because I avoided having lots of branches, instead choosing to unnaturally manipulate process (or even release dates). Perforce won't let you blink without server access, so I wrote a layer of proprietary code on top of p4 to manually reattribute files and generate scripts to eventually notify the server of opened-and-or-changed-but-the-server-doesn't-know-about-it files (yeah, and DVCS is chaotic). Everyone I worked with either had their own hacky solution to this problem, or they stopped getting work done when they didn't have server access.
As a generally content user of these centralized systems, I was curious enough about DVCS to read the occasional article touting it, but it never really hit me that it could make such a significant impact on my own development process, or the process at our company. It's difficult to see just how broken particular workflows are until they're fixed. As I began to better understand the advantages of DVCS, I started to become more aware of the annoying hacks that I was employing in an attempt to get work done under a centralized system.
DVCS Myth #9: DVCS is hard to learn
Before becoming a DVCS user, I definitely had this perception. DVCS can seem very intimidating. Typical explanations of DVCS are littered with complex workflow descriptions that are rarely familiar or intuitive to users indoctrinated in a centralized source control system mindset. This often makes DVCS seem overly complex or even irrelevant to one's needs.
To a degree, DVCS is difficult to learn. A system that allows for a great deal of flexibility is naturally more difficult to learn than a system with limited capability. However, in the context of a particular need one needs to solve, DVCS is quite easy to learn. If, for example, you decide to replace Subversion with Mercurial and continue using the same trunk / branch model, there is very little to learn in order to make the switch.
Thus, DVCS itself is not "hard to learn". It can be quite challenging, however, to determine the best possible workflow for change management at your company. Because DVCS expands your options in this area, it's easy to mistake it as "difficult". Conceptually, DVCS is really quite simple. It's the optimized application of DVCS that is challenging. If you're intimidated by it, start by using it to imitate your existing workflow, then look for gaps in the efficiency or flexibility of your workflow. Chances are, DVCS will be able to solve them.
DVCS Myth #10: DVCS is hard to use
Once a particular DVCS workflow has been established, the difficulty of day-to-day usage of the system is very similar to centralized systems with equally complex workflows. Many DVCS implementations include more granular commands than are offered by centralized systems, but it's usually simple to emulate them. Following are a few examples of common Subversion commands and their equivalent in Mercurial.
Operation Subversion Mercurial Commit changes to remote server svn ci hg ci && hg push Get changes from remote server svn up hg pull -u Show change log svn log hg log Annotate a revision svn blame hg annotate Show status of changed files svn status hg status Show changes in current files svn diff hg diff Print a file's contents at a particular revision svn cat -r 55 hg cat -r 55 Cherry pick a single revision from branches/main to trunk svn merge -r720 ../../branches/main hg transplant -s ../../branches/main f587e Merge all unmerged revisions from branches/main to trunk svn log | grep -i merging
...
svn merge -r640:646 ../branches/main
svn merge -r681:682 ../branches/main
svn merge -r689:662 ../branches/main
svn merge -r667:669 ../branches/main
svn merge -r676:719 ../branches/main
svn merge -r725:730 ../branches/main
svn merge -r734:HEAD ../branches/mainhg pull ../branches/main To learn this basic set of commands given a background with a centralized system and a similar or identical workflow would take only a couple of minutes. Fortunately, you'll buy back those minutes and many, many more each time you run these commands. It can be a bit startling at first to adjust to all of your VCS commands running so fast, but you'll cope, I promise. And if you're a Subversion or CVS user, you can stop scheduling "branch days" on your calendar.
DVCS Myth #11: DVCS is a fad
At some point, it became acceptable to discount the value of all new technology with a reference to some unrelated technological flop. DVCS is the new Betamax, apparently, simply by virtue of the fact that it's new and different. Despite these inane comparisons, the question itself is worth pondering.
For a technology to be a fad, there needs to be some initial period of excitement and adoption, followed by a relatively rapid dilution of interest after this initial period. Technologies that end up in the "fad" category tend to be those that can drum up excitement with marketable promises, but either fail to deliver on that promise or miss a key element required to reach a "tipping point". Most technologies that we associate with the "fad" term were interesting enough to justify at least some initial excitement at one point in their history. Laserdisc was a failure, historically speaking, but putting video content on an optical disk and enabling interactive features doesn't seem like such a bad idea these days.
DVCS certainly meets the criteria for fad potential, at least at this point in its history. It has a strong and growing base of highly passionate users and evangelists. It's also a relatively new technology, despite having a few years of success stories in its wake. So, will DVCS continue to accelerate? Let's look at some of the "fad factors" as they apply to DVCS.
We might decide that Laserdisc was a failure because of a poor technical implementation. That is, putting video content on an optical disc was a good idea, but the discs were too large or the quality was too low to back it up. So, does DVCS have the same problem? I think there was some legitimacy to the "good idea, bad implementation" complaint as recently as a couple of years ago. There were a several DVCS tools to choose from at that time, but each had significant quirks. In the meantime, however, the quality of the DVCS experience has increased dramatically. Excellent newcomers like git and Mercurial have burst onto the scene, while quirks have steadily been disappearing from their competition.
From a technological implementation perspective, the state of DVCS implementations is strong, and getting stronger. Having used Mercurial for over a year now (well before their 1.0 release), I'm amazed by how trouble free it has been. As the repository has grown and the complexity of our source control usage has increased, Mercurial has continued to be as fast and pleasant to use as it was on day 1. Perhaps it's just our good luck, but it has also been less painful to administer than any source control system I've managed in the past.
A bigger concern when evaluating whether or not something has fad potential is marketing a product that the market doesn't exist for. This is sometimes because the product is "ahead of its time", but more often it's because the benefits were oversold. The Segway comes to mind, although I'm not entirely sure they ever had the initial adoption to justify the "fad" label.
With DVCS, this argument is a bit more challenging to evaluate, because it involves some speculation. I know from personal experience that DVCS offers unique capabilities that at least some segment of the market needs. However, even if I'd been doing this for a lifetime, it is a pretty microscopic sample size size. Perhaps a better way of looking at it is to understand what you lose by moving to DVCS. It's very, very difficult to imagine a company or organization that can't benefit from at least one aspect of DVCS. Any organization that allows employees to work from home, as a simple example, would benefit from improved productivity with DVCS. But what do they lose?
Obviously the answer to this question depends on the specific scenarios, but even assuming that you want to keep your centralized workflow I don't see much downside. You end up checking in merged changesets more often in DVCS, though every modern DVCS system has a way of making this happen nearly as seamlessly as in a traditional centralized system. The difference between the two is basically a safety tradeoff (that is, the ability to commit your changes before merging). And the safety tradeoff is optional in many cases, for brave users who wish to merge remote changes with uncommitted files. Of course, there are other advantages to decoupling commit and merge which are not realized in this case.
What about cost? Most DVCS software implementations are available free of charge. Compared with best of breed commercial implementations of centralized systems, this can save quite a bit of money right off the bat. Perhaps more significantly, the DVCS model minimizes the amount of money you need to spend on your server hardware. All operations that don't involve sharing changesets are done on local clones of the repository, so the server has far less work to do. Thus, your shared repository will happily run on less-than-stellar hardware without impacting most SCM use cases. Server administration is essentially identical to a centralized server, so you won't find hidden costs there either. The only relevant cost consideration, in fact, is the cost of the initial migration.
While there are no guarantees that DVCS will break through into the mainstream, it's difficult to find many compelling arguments against it. For all its pros, there just aren't many cons. The limitations that do exist today can be eliminated with modifications to the technology implementations as opposed to the idea itself. There is no doubt in my mind that centralized systems will continue to exist for some time (CVS is still quite popular, and its been years since there was a legitimate case for starting a project on it). However, it is inevitable that centralized systems will start to gain more and more DVCS functionality.
It's easy to imagine these DVCS / centralized system hybrids eventually becoming quite popular, in fact. They might operate in full "DVCS mode" for the majority of operations, but automatically consult the server for files larger than a certain threshold, or for inspecting changesets that are several years old. Or perhaps they will be able to enforce certain aspects of policy to ease the fears of those who remain fearful of "DVCS chaos" but desire the productivity boost it provides.
DVCS Myth #12: DVCS is the perfect solution in all cases
Having spent a fair amount of time talking about the benefits of DVCS, it's only fair to spend some time talking about cases where it might not be the optimal solution, at least in its current forms.
If your company or organization has a single centralized repository with hundreds of thousands of files or millions of revisions, it may be infeasible to store the entire repository on each client. As we discussed in our last myth, this doesn't necessarily disqualify the DVCS concept, but current DVCS implementations do not yet have features to optimize this scenario. Not all companies or organizations keep the entirety of their source code in a single repository, but it's certainly not uncommon. That said, there is no reason that future DVCS implementations (perhaps in hybrid form) shouldn't excel in this scenario.
Even in these "massive repository" cases, it is sometimes possible to restructure the repository into a collection of smaller repositories (see OpenJDK). This allows users to work optimally with a full repository clone in the area or areas of the system that are relevant to them. A downside is that changesets cannot span repositories, so this is not always ideal. In any case, this scenario is relevant in a very limited number of cases (if you don't work at a very large company, it probably doesn't apply to you). Looking forward, it would take an army of developers several years to develop that much source code, and in a few years it will no longer be prohibitive to store repositories of this size on each client. If this problem doesn't affect you today, it probably never will.
- Wednesday, May 14, 2008
A DVCS Story
Seattle, 1990
Bob leaned back, stretched his arms and took in the view from his window. It was a lazy Sunday, and he'd just finished reading his messages on his favorite BBS. An aging Amiga decorated with pink stickers courtesy of Bob's daughter sat at his feet. His sleek new home phone was perched prominently on the corner of the desk, the cord snaking its way to the wall by way of a splitter shared with his modem.
As Bob surveyed the scene, his phone started to ring. Pausing a moment, he smiled, enjoying the harmonic ring of the new phone that he lovingly customized earlier that morning before answering.
"Hey Bob, it's Alice!" Alice, a co-worker, was one of Bob's closest friends. They had both graduated from the computer science program at the same school, managing to stay in touch over the years, usually by trading "war stories". Recently, Alice had joined Bob's group from another company. To the chagrin of Bob and his co-workers, she was brought in directly as a lead programmer. "Bah, it's just a title", Bob was fond of saying. He hid his resentment well when interacting with Alice socially.
"Bob, my life just changed forever." Bob's eyebrow slowly began to raise. His mind raced through the possibilities. Had she been promoted again? Was she moving on to another opportunity? I wonder if I'll make lead if she does!. On the verge of Bob's moment of imagination turning into an uncomfortable pause, Bob snapped out of it.
"Alice, my goodness, I've never heard you so excited."
"Okay, you know I'm not normally ahead of the technology curve," Alice continued. "But today, I just got a cellular telephone!".
Bob's eyes narrowed. A cellular phone? He'd recently seen a story about them on the evening news, and thought they were the most ridiculous thing he'd ever seen. Who on earth needs to make phone calls in the middle of the street?, Bob had thought to himself when he saw the story. Bob responded forcefully. "A cellular phone? Those crazy looking things with the huge antennas? Alice, you've got to be kidding me." Bob admiringly moved his hand across the smooth plastic shell of his phone's handset.
"I'm serious! I know they're a bit different from what you're used to, but to be honest I kind of like the styling. I didn't realize how much fun it could be to be on the cutting edge of technology. You should have seen the looks I got in the park down the street this morning!"
Bob rolled his eyes and interrupted. "Look, I'm sure it's fun to be mistaken for an FBI agent, but you can't possibly think this is rational. When the heck would you need to make a phone call in the middle of the street? I have two phones at home, and another at work. What else could I possible need?"
"I can see where you're coming from. I'm sure that to you it doesn't sound that different from your home phone. I mean, you pick up the phone, dial the numbers, and it connects to the other side. Nothing groundbreaking there. But it's so freeing to be able to talk anywhere. I can completely understand how this might seem like a novelty. To be honest, listening to myself explain it to you, it doesn't really sound that compelling. But imagine the possibilities! I'm always connected. I can have meetings in my car, and I can leave the office for a long lunch on a sunny day without worrying about missing out on anything important."
Bob chuckled, making sure he drew it out long enough to suggest that he'd heard enough. As the final note of his chuckle diminished, Bob tried to finish the conversation.
"Okay, Alice, I guess I can sort of see how that might be nice. But learning a whole new speed dial system, and working with a phone I don't understand just doesn't seem worth it for those few situations where it might be useful. I'm glad your happy with your purchase, in any case."
Alice shook her head slowly, and put in a last word before hanging up. "You'll see, Bob, just wait."
Seattle, Present Day
Bob was crouched beneath his desk, reaching for the mini-USB cable attached to the back of his computer. He plugged the other end into his battery-drained Blackberry, and its bright welcome screen came to life. Come on, come on he urged. After a brief delay, the phone's LCD dimmed and the network connected. A high pitched bell ring alerted him to a new message, and Bob quickly pressed the "Read" button. She remembered!, Bob thought to himself excitedly. His daughter had sent the text message announcing her SAT results, just as promised. I knew it!, thought Bob, proudly inspecting the score. She's smarter than her old man after all.
As Bob turned to his computer, he heard the familiar sound of his favorite classical melody. He had only last week taken an MP3 of the song and made a custom ring tone, something that impressed his daughter far more than his last promotion. Who's calling me during lunch?, he thought. Leaning back, Bob peered under the lip of his desk, eyeing the blazing white LCD screen of his phone, now perched atop his computer chassis. Recognizing the name on caller ID, he immediately grabbed the phone and answered.
It had been several months since Bob last spoke to Alice. The time between their phone conversations had grown progressively longer of late, and Bob was happy to hear her voice. It had been five years since Alice left for Silicon Valley to run the software group at a small search startup. Having moved his way up to middle management, he was always excited to hear from someone who still had their finger on the pulse of technology.
Alice got right to the point. "Bob, I am about to change your life." Bob sensed excitement in her voice. In the two decades they had known each other, Alice had embraced the role of technology evangelist, and Bob that of technology skeptic. They each enjoyed their roles, and Bob could tell immediately that Alice couldn't wait to tell him about the next big thing. He gathered up as much feigned skepticism as he could in a feeble attempt to mask his genuine curiousity, and offered a response.
"Alright, what is it this time?"
"DVCS. Distributed Version Control Systems", Alice responded. "We just migrated our entire source control system to Mercurial. I think in the first week using it we've already gained 100 hours of productivity. The developers love it. I've made it my mission to tell everyone I know."
Bob pursed his lips. She's got to be kidding, he thought, this is what she's so excited about?
Bob announced his skepticism. "Alice, Alice, Alice. I've been around a long time, and it's not quite so easy these days to put one past me. What are you really so excited about?". Alice laughed long and hard.
"Oh you old curmudgeon, here we go again. Look, we've gone through this before, and I am going to convince you that distributed version control is serious stuff. Why are you so skeptical already? What have you heard about it?"
"To be honest", he started, "I first started hearing about distributed version control when we hired on a new developer who'd been working on an open source project. He was almost as excited as you about it. He had some trouble explaining what was so great about it, so he sent the entire team a link to a video where the presenter bashed our current version control system for an hour. The guy who manages our VCS server was really offended. So as far as I can tell, distributed version control is only relevant in open source projects run by opinionated bullies who see diff & patch as a perfectly acceptable source control system. I'm an old corporate soul -- I haven't used patch since college, Alice, and I don't miss it."
"Ah, I know the video you're talking about," Alice said. "Forget about that. I'm going to make the DVCS case for you, right here, right now. Pretend you've never heard of it."
"You've got your work cut out for you. To be honest with you, nobody has ever really bothered to tell me what problem this thing solves for me."
"That's always the first question!", Alice said. "I've actually practiced this speech on a few other ex-colleagues, and everybody asks that question. Unfortunately, that's the wrong question to be asking. Instead of asking what problem it solves, you should be asking what new possibilities it offers. That's been the real win for us."
Bob leaned as far back as his chair would allow, and propped his feet up on his desk. He had been through this many times before, and he knew he was in for a long ride. He gave Alice her opening: "Alright, I'll admit to being slightly intrigued. But we're really happy with our current VCS, everybody here knows how to use it, and it handles everything we need without any issues."
Alice smiled widely. She normally had to work much harder than this to get Bob on the hook. She knew that beneath the curmudgeon, he still had a passion for technology. It was her job to make sure that he didn't lose that, and she embraced it proudly.
"Okay, I'm going to ask you a very important question, and I expect an honest answer. Has your source control server ever had any downtime?"
Bob thought for a moment. "I suppose, sure, but nothing more than the usual. I mean, our server is up all the time, really. We plan all our maintenance for the weekends, so other than the occassional hiccup ..."
Before he could finish his sentence, Alice interrupted. "Yes! The hiccups! Those 15 minutes of downtime because of the urgent security patch, the 10 minutes of slowness when two machines are pulling down copies of the repository! They happen, and you shrug them off because it's just the way it is. But do you know how much productivity you lose when somebody loses their train of thought because the server isn't available for their 'annotate' command?"
"Look, I get what you're saying. But really, these hiccups aren't very common", Bob retorted. "I mean, even if they were, you have to compare them against the cost of having everybody switch over to a whole new system. The last time we did that was a total nightmare. One of our developers even quit over it! And that didn't involve explaining to everyone this crazy new distributed source control model. I'd have a revolution on my hands."
Alice responded reassuringly. "I completely understand. I had my own reservations until very recently. One of our developers had been trying to get us to adopt a distributed source control system for a few months. She liked DVCS so much that she had found some way to use it on her local system and still interoperate with our centralized system. I was skeptical for the same reasons as you, though, Bob. I convinced myself that the server was more reliable than it was, and I tried to forget that our VPN would sometimes be down all weekend, forcing developers to come into work to make fixes. Like you, I didn't really think it was worth it."
Bob interjected impatiently. "OK, so what changed your mind?"
"Last week our system administrator told us they were going to be rebooting the server during lunch time. This wasn't a big deal at the time. Everybody had advance warning, and they made all of their checkins before lunch, just in case something went wrong. When we returned, we found our source control system in pieces, its parts splayed all over the desk. The admin had installed a few security patches, and the server wasn't booting. He said he thought it might be bad memory, but he wasn't really sure."
Alice's horror story had reminded Bob of a similar situation a year ago when something remarkably similar happened at his own company. They expected the server to be repaired in 30 minutes, but it dragged to 60, 90, then 120 minutes. By 3:30 PM most of the office had cleared out, leaving a few frustrated developers emailing files to each other as they raced towards the important deadline which was now in jeopardy.
"Bob, are you still there?" Bob was growing increasingly nervous as he enumerated in his mind the many things that can go wrong with a server.
"Yes, yes, sorry, you just reminded me of something. Please, go on."
"Okay, so the admin couldn't give us an ETA on the repair. We were feeling really helpless, and called a team meeting to decide on a protocol for getting work done while the server was down. Fortunately, one of our team members had a plan. You remember that developer I told you about earlier who was using the DVCS on her own system?" Alice was talking excitedly now, and didn't bother to wait for Bob's answer. "So, she immediately took control of the meeting, and laid out the plan. She had created a complete copy of the source system just before it went down, you know, just in case of emergency, and it was checked into the DVCS repository on her system. She gave the group some brief instruction on how to copy the repository, and in less than thirty minutes our entire team was back to work with their own copies of the source."
"Wait a second," Bob interrupted. "You're telling me that you temporarily switched your entire source control system over in a half hour in the middle of an emergency? Forgive me for being skeptical, but I'm not buying it. For starters, what happens to your revision history?"
"When she took a backup of the centralized system, she had done it via a script that preserved the entire revision history. She said that migrating from centralized systems was common, and writing the script was a breeze since most of the 'heavy lifting' was built into Mercurial."
Bob made no attempt to mask his skepticism. "Okay, I get that, and it's all very impressive, really. But what about all the work you did while the server was down? How did you get it back to the main repository once it was revived?"
"That's the best part. We didn't." Alice heard a squeak on the other end of the phone, and knowing that Bob was about to ask her if she was crazy, continued without hesitation. "The admin kept working on the server throughout the day. During this time, after everybody had their servers up ..."
Bob cut her off. "Wait, what? Servers? I thought they just checked out the source code from their co-worker's machine? Where did all of these servers come from?"
"That's exactly the difference between DVCS and a centralized system. You don't check out the source code from the server, you clone the repository from the server to your local machine. Once you've made the copy, you're a server too."
Bob was finding this DVCS concept more and more ridiculous as Alice went on. "Wait, wait, wait. Last I checked you had a team of 20 developers."
"We're up to 25 now", Alice corrected.
"And you're telling me that in response to a temporary server outage, you created 25 separate source control servers, and that's somehow a good thing? How on earth does anybody know what state the source code is in? Please tell me I'm missing something. I'm starting to believe you've gone mad."
"I know, it sounds like chaos. And frankly, it could be if we let it. That's where process comes in. In this case, everybody initially cloned the repository from the same server. So, we designated that as the official server where everybody shared their changes during the downtime."
"Aha!" Bob was sure that by the end of this conversation, he'd would bring Alice back down to earth. "So you're taking this confusing and complicated distributed system, and making it act just like a centralized system! What happens when that server goes down? You're no better off!"
Alice let out a slightly annoyed chuckle. "Not exactly. If the server we've designated as our central server goes down, which it surely will at some point, so what? That's yet another wonderful feature of a distributed system. We've got backups of the code all over the place, without even trying. Every developer's server contains a backup copy of most or all of the code from the shared repository. So, if the main server goes down, we can designate another server as the central server temporarily, or we can not worry about it, since developers can make checkins to their own servers, only sharing changes when needed."
"Wait a second!", Bob interrupted. "If everybody is making checkins to separate servers, how do you get those changes back together in the so-called central server once it's back up?"
"That, my skeptical friend, is called a merge."
"Oh, great," Bob murmered, "you mean like merging branches?"
"That's exactly what I mean. In fact, repositories and branches are pretty much the same thing in Mercurial. Whether or not there is a central server, everybody commits changes to their own servers. Period. That's the only place you can commit to. When you want to combine your changesets with those from your repository, you need to merge them."
Bob groaned. "Yuck! Branches are such a huge pain. And you're seriously suggesting that making people merge branches every single time they want to share changes is a good thing? I'm serious, Alice, if you need some help, I know some really good people."
"You actually have a point there. We used to think branches were a pain too. In fact, we only kept a single branch aside from our main repository because merging branches was such a pain. This is another huge difference between distributed and centralized systems. Because you merge branches all the time, distributed version control systems make it incredibly easy. Easy to create branches, easy to share them. Flexibility, Bob, that's what it's all about."
"I don't see what's so flexible about chaos. All this complexity, just to compensate for a few minor blips of downtime? I'm not buying it."
"Yes, initially it was just a response to the downtime." Alice sensed that she was losing Bob, a phenomenon she was quite familiar with. She softened her tone. "On the day we started using Mercurial, it was going to be a temporary thing. My thought was the same as yours, that it was handy as a temporary crutch, but too complex to keep around. However, our temporary outage ended up being not-so-temporary."
"What a nightmare." Bob said. As Alice spoke, he had been working hard trying to convince himself that this was an incredibly uncommon scenario. Geez, I guess I'd better confirm that we're backing our server up daily, Bob thought. And I wonder if I can get budget for a backup source control server?
Bob had neglected to mention that he had only a year ago spent $25,000 on licenses for his company's source control software. While he was proud to now be using the same source control system as big companies like Google and Microsoft, it left him with very little budget for hardware. As a result, their source server was running on an underpowered machine, and it was not uncommon to hear complaints about its sluggishness. Bob had also recently received the bill to renew his yearly contract for support and upgrades. That was going to take another $5,000 out of his budget, at the cost of some much needed upgrades to his developers' machines.
Alice continued. "It turned out that our source server's RAID card had died, and we were going to be down for three more days before the part could be delivered. It was in these three days that we discovered the great value of DVCS, not before. One group of our developers, for example, had previously been trying to do 'buddy builds', where they share their changes with each other before committing to the main repository. They had initially tried to coordinate this work on a separate branch, but the pain of merging so frequently was killing them, not to mention the fact that they often forgot to make their changes on the branch. Then they started emailing each other patches, but this, too was prohibitively cumbersome. To get around this, they ended up sharing their source directories to each other directly, and emailing the names of files that needed to be copied over. Changes got lost all the time, it was a total mess."
Alice was talking quickly now, hoping to stop Bob from interrupting. "The afternoon we started using Mercurial, however, all this stopped. Each developer would finish their work, commit the relevant changes to their own servers, and then make the changesets available for their colleagues to pull and merge into their repositories."
Alice had said the magic word. "Ack! More merging!"
"Yes! More merging! I'm making it sound too complicated though. You only have to actually merge files if the changes you're pulling overlap with your own changes. When you do inevitably have to merge, though, you do so after you've checked in your own changes. Thus, even if the merge goes awry, you never lose your work because of it, something that is extremely important to us. Regardless, merging is so simple and fast that it doesn't even matter."
"Well, my source control system supports branches," Bob replied, "why couldn't we do this?"
"Many of the new and interesting workflows enabled by DVCS are possible in centralized systems, but are simply too much of a pain in practice to have any chance of adoption. Plus, they can grow organically. You don't need to explicitly decide to start working on a branch for buddy builds, you can just make some checkins and choose where to send them. You're always working on a branch, in effect, since your local repository is a branch. Not to mention the fact that you don't need to ask a server administrator for permission to create a branch."
"Okay, so you can do buddy builds", protested Bob, "and if the server hiccups it's not that big of a deal. But so what? Those things aren't very useful for us. People around here just come in and do their work. Even if these benefits are as profound as you suggest, I really don't think it's worth training my entire team on some complicated new source control system."
Alice hadn't expected to sway Bob in only a single conversation, but she was nonetheless growing a bit frustrated with her progress. She had used the same sales pitch on others to great effect, but there was something missing. It just wasn't having the same effect on Bob. She continued with her defense. "The things I've mentioned so far are 'big deals'. Having a server down even for only a minute or two is a huge deal, in fact. Developer time is expensive, and knocking a developer out of the zone because of a server hiccup is, in my mind, totally and completely unacceptable. Heck, even a bit of server latency can ruin a developer's ability to stay in the zone."
"One thing I think you're forgetting is that we're using the best centralized system money can buy," Bob said. "I'm sure our server is significantly more speedy and reliable than your free, open source centralized server was. Everybody here is really happy with it. I've even heard them bragging about it to friends at other companies."
"Okay," said Alice. "I'll play by your rules, even though it's unfair. Let's assume that your server is up 100% of the time, never any downtime, never a hiccup, never a bit of slowness. Oh, and everybody has constant access to it. DVCS still offers advantages beyond simply not needing live server access to get work done."
"Oh? Like what?", queried Bob, now feeling proud of his progress towards the goal of dragging Alice back into reality.
"Like no more 'check-in races', for starters. It's a common complaint every place I've worked. The first person to commit their changes to the central server avoids having to be the one who merges changes. Thus, the next committer is forced to perform the merge, whethero r not they are the best person to merge the changes. I've actually seen people switch workstations for ten minutes so the changes could be merged by the proper resource. How ridiculous is that? With DVCS, because changes are committed locally, nobody is ever denied the ability to do so because of someone else's changes. And once they are committed, anyone can pull the changes together and merge them. If you had two developers racing towards a deadline, another less busy developer could volunteer to do all of the change merging."
"Yeah, well," Bob smirked, "I guess our needs aren't quite as sophisticated as yours. Don't get me wrong, some of the stuff you're talking about sounds interesting, and hey, if we weren't using the best centralized system that money can buy, I might be more interested. Really, whatever flaws there are in our current system, we've made the necessary adjustments. People here are happy. We're doing great work, the team gets along well. I just don't see the case for DVCS."
Alice realized that there was no more she could do. She bit her lip, resisting the urge to respond defensively. Planting the seed, she realized, was the best she could hope for with Bob. She leaned back in her chair, shook her head softly, and ended the conversation. "You'll see, Bob, just wait."