Why I Went From Python to Go (and not node.js)
People often ask me why I have decided that I’d be writing the bulk of my new code in Go, which I started programming;in November of 2011 while attending Hacker School. At that time, concurrency was a very hot topic in Hacker School, and we were all trying out different ways of writing concurrent code. A bunch of us pitched in and helped out with Brubeck, a framework for doing concurrency programming with Python, which is probably the least awkward way of doing concurrency in a Python web application that I’ve found. But let’s rewind a little bit, because the context here is exceptionally important to understand why I chose Go, as it explains why it appeals to me. Ultimately, your experiences may be very, very different.
The first time I ever wrote any concurrent code was doing some multithreaded Java stuff while reading through Killer Game Programming in Java, but I didn’t take it very far, opting instead to toy around with ActionScript, because I wanted people to be able to see my work and Java applets make me deeply ashamed. Some time later, I picked up on the ChucKprogramming language and started writing drum machines for the Novation Launchpad, which is a very fun concurrency problem to try to solve correctly, and I was very impressed with ChucK’s concurrency model. But these things weren’t within the context of web programming, which is what I do professionally. For web programming, my progression was more along the lines of ASP.NET -> PHP -> Python.
Being a Python programmer, I had seen the light. All other languages were for some reason inferior, and as a Python programmer, I was the member of an elite cabal of superhuman ultranerds, smarter than those childish Rails/JavaScript/PHP/whatever developers that couldn’t write a bubble sort or comprehend even basic algorithmic complexity, but more in touch with reality than the grey-bearded wizards of Lisp/Haskell/whatever that sat in their caves/towers/whatever solving contrived, nonexistent problems for people that don’t exist, or those insane Erlang programmers who are content writing sumerian cuneiform all day long. Of course, I knew everything, so I’d just do concurrency in Python, because obviously that’s a good idea, right?
Well, not exactly. As a Django developer, there wasn’t a straightforward and obvious way to just do things in the background on a page request. People suggested I try Celery, but I didn’t like that option at all. A distributed task queue? What? I just want to do something in the background without making the user wait; I don’t need some super comprehensive ultimate computing machine. The whole notion that I would need to set up and configure one of these supported brokersmade my spidey sense tingle. I don’t want to set up another daemon, I just want to send some email in the background!
So I looked at gevent. Wait, greenlets? Libevent vs libev? Monkey patching? Wait, monkey patching? Seriously? Isn’t that the purview of those amateurs over in Rubyland? I’m a Python programmer; I don’t monkey patch. I’m above that. But look at those numbers! And all those Python programmers that are much better than me say it’s really good, so I’ll give it a shake.
But it didn’t appeal to me. Even this very first example seemed somehow cumbersome:
import gevent
from gevent import socket
urls = ['www.google.com', 'www.example.com', 'www.python.org']
jobs = [gevent.spawn(socket.gethostbyname, url) for url in urls]
gevent.joinall(jobs, timeout=2)
[job.value for job in jobs]
Output:
['74.125.79.106', '208.77.188.166', '82.94.164.162']
Something in my gut said that using a plain-old, library-level function to spawn a new background process was unnatural. I could understand the code just fine on the very first read, but to me, the control flow wasn’t explicit enough. It was clear…ish. But it didn’t strike me as being a significantly better system than an event loop, and the idea that I’d be literring my codebase with all these little named functions because of the weak lambda function support in Python just made me annoyed. And what’s in that socket library? What about other libraries? You mean I have to patch those ones, too? Monkey patching all of your i/o to implement a control flow feature at the library level is, in my opinion, a canary in the coal mine signaling that something is deeply troubled in the mythical land of Python.
So, to reiterate: concurrency is possible in Python, but it is an awkward affair that often feels bolted on, fractures the experienced Python programmers into different factions (Twisted is better! No, Tornado is better! Wait, you’re both dumb, use Brubeck!), and confuses the inexperienced Python programmers. To say that concurrency is possible in Python, but it is not idiomatic in Python, would be an understatement.
At this point, you may be saying “just use node.js; it doesn’t have these problems”. There’s a lot to like about node.js. JavaScript’s anonymous function support is, quite frankly, much better than Python’s. I also deeply appreciate that the event loop in node.js is explicit. That is a good design choice. I also appreciate that all the libraries are written with this concurrency model in mind; there’s no question of whether an existing library is compatible or not, because the answer is always yes. Sometimes there is value in starting over.
But the notion of putting everything in a single thread seems wrong to me. Callback-passing as a concurrency strategy makes your code an ugly tangled soup. The cluster module? What? Come on. I’m happy that the notion of concurrency is put front and center, but now parallelism feels bolted on (if you’re raising your eyebrow at the distinction, Rob Pike’s talk Concurrency is not Parallelism is recommended). It’s not like the whole multi-core processor thing is cutting edge; multi-core processors have been around for years. That I have to go out of my way to utilize them seems like an error. And how do I know that work will be distributed to all cores in a reasonably even fashion, even if I do figure out how to get out of the single-threaded mindset? These types of problems are interesting problems, but in the context of wanting to write an application, they are a distraction.
Now. Around the time that I was attending Hacker School, Pat Crosby (former CTO of okcupid, all-around smart person) wrote an application called Langalot, which is a multi-language dictionary as a web app that translates words into other languages on the server while you type them. It is a very, very fast application. I saw in the FAQ there was a question that simply said “Why is it fast”, and the answer is “It’s written in Go”, and it was the first time I had ever actually seen a usable application that was written in Go. I asked Pat how he felt about it, and he spoke very favorably on behalf of the language, and said that he was so pleased with it that he decided to build his new startup, StatHat, entirely in Go. I tend to at least try the things that are being used by people in the field that are more experienced than I am, especially if they are people that I know. Pat’s endorsement was, to me, was a strong enough push to get me to try A Tour of Go, and I’ve never looked back.
I’ll end with these two examples. The first is a node.js application that uses the cluster module, and the second is the equivalent in Go, as it appears in this gist posted by Andrew Gerrand:
var cluster = require('cluster');
var http = require('http');
var numCPUs = require('os').cpus().length;
if (cluster.isMaster) {
// Fork workers.
for (var i = 0; i < numCPUs; i++) {
cluster.fork();
}
cluster.on('exit', function(worker, code, signal) {
console.log('worker ' + worker.pid + ' died');
});
} else {
// Workers can share any TCP connection
// In this case its a HTTP server
http.createServer(function(req, res) {
res.writeHead(200);
res.end("hello world\n");
}).listen(8000);
}
and the same thing in Go…
package main
import (
"net/http"
)
func main() {
handler := func(w http.ResponseWriter, r *http.Request) {
w.Write([]byte("hello world\n"))
}
http.ListenAndServe(":8000", http.HandlerFunc(handler))
}