In Part 1, we looked at the code of two web services that implement an authentication web service. One written in Java, and one written in Go. It’s time to beat up on them a little bit.
For generating load, I decided to use ApacheBench in a number of different scenarios:
- 1,000,000 requests, concurrency = 1
- 1,000,000 requests, concurrency = 2
- 1,000,000 requests, concurrency = 5
- 1,000,000 requests, concurrency = 20
- 1,000,000 requests, concurrency = 50
Each request hits the /authenticate endpoint with the same credentials. Before each test suite is run, the service under test is send 10,000 consecutive queries to allow it to warm up.
The service under test and the load generator run on separate but identical machines:
- 48GB RAM
- 12-core Intel Xeon 3GHz CPU with HT
- Ubuntu 10.04.3 LTS
The machines are connected via a 10Gb link (was not fully saturated during any of the tests).
Services Under Test
We’ll test four different service configurations:
- Java 1.7 (8GB max heap) service configured with a Dropwizard default max HTTP threadpool of 254
- Go 1.0.2 service (GOMAXPROCS=1)
- Go 1.0.2 service (GOMAXPROCS=2)
- Go 1.0.2 service (GOMAXPROCS=12)
- Go 1.0.2 service (GOMAXPROCS=24)
Go by default will only utilize one CPU unless you specify a different value for GOMAXPROCS. Most of the time, this is actually not a huge deal, as goroutines will yield control to the scheduler when performing IO operations, using a select statement, sending on a channel, or explicitly yielding using runtime.Gosched(). Since the JVM runtime will automatically distribute thread workload over the available CPUs, it’s reasonable to give the Go service the same capability.
To make this comparison fair, we must set the Java service’s threadpool to at least the level of concurrency so that requests do not start queueing up. The Go http dispatcher creates a goroutine for each incoming request, and due to the yielding nature of goroutines, this allows us to achieve concurrency with any number of CPUs being utilized.
For each test, we measured average latency and throughput. Minimum latency was not very interesting, as it was < 1 ms across all test scenarios.
When all requests are serial (C=1) the average latency is very low on both services (about 1ms). It gets interesting, of course, when you start increasing the number of concurrent requests. It’s here that we can see that the default Go configuration to use one processor starts to introduce a lot of latency as C approaches 50.
In this graph, you can also see the marginal benefit of HyperThreading in the average latencies between Go (MP=12) and Go (M=24).
Wrapping It Up
While performance is not everything, it’s usually something. The Java service has the upper hand in terms of latency and throughput for highly concurrent workloads for this particular service. Go is still relatively young, too, and I think we can expect to see incremental improvement out of the compiler as well as the runtime/GC; it offers a neat way of modeling concurrency which in my opinion has a lot of promise.
I was somewhat expecting this outcome, but at the same time I was pleasantly surprised at the ease of writing Go webservices. Your previous article encouraged me to give Go a shot.
Thanks for the article!
You’ve got an axis labeling problem there.
What happens to these numbers when you change the Go code to use more than 1 database connection? (or, if the Java code’s auth.yml is changed to use only 1 connection)? Also, I didn’t see the SQL schema used in the git repo.
Hello! The golang sql package manages a connection pool for you, according to the docs.
Label your y-axes
try using tip version 50% and maybe more speed up difference in some points.
Both articles are fantastic for Java programmers trying to evaluate/switch to Go.
The concurrency is very low. It would be interesting to see the results with for example 10k (which is still low).
I forked your code at https://github.com/patrick-higgins/go-and-java and added some tweaks to get identically sized HTTP responses. I had to use httperf to do so. See the comments in the bench.sh script for the details. Also, I prepared the SQL statement as profiling showed a lot of resources went in preparing the statement each time.
Also, the JSON was not identical–the timestamp fields in Go included nanoseconds, making them much longer than Java. I have also changed this.
Finally, I added some SQL scripts so that the database could be created easily for anyone else who wanted to try this.
Thanks for the dropwizard example. I hadn’t seen it before and need something like it in the coming weeks. It looks like exactly what I have been looking for.
With these changes, I see Go outperform Java on my system, though I didn’t do extensive testing.
Thanks very much Patrick. I did not profile either application, but it looks like we could get some speed improvements very readily.
Also, I’m glad that you’ve learned about Dropwizard. I worked at Yammer previously and it was an invaluable part of our infrastructure there, as it also is at Boundary.
I don’t know if you’re aware but there’s a discussion going on about this on the mailing list:
The prevailing opinion seems to be that the problem is with the db driver library used in your go program.
I downloaded patricks fork and got 4300r/s with maxprocs 1 with the go version, out the box.
Recompiling and running with gomaxprocs=100, i got 4400r/s.
mvn deploy and running auth-1.0.jar on JDK 1.7 on my box similarly peaked out at 3100r/s.
It’s worth noting though, that I was using patricks httperf bench.sh modification, and it appears that httperf was cpu bound in both cases, with the kernel and postres taking about half a core between them.
Using wrk, by contrast, spun main (the go program) up to 2.5 cores, and 6000r/s. Java under wrk lit all cores for a time, then hit `Exception in thread “async-log-appender-0″ java.lang.OutOfMemoryError: Java heap space` `Caused by: ! org.postgresql.util.PSQLException: FATAL: sorry, too many clients already`
A bit more investigation and tuning later, using 10 clients `wrk -c 10 -r 100000 -t 4 -H ‘Authorization: basic apikey_value’ http://localhost:8080/authenticate` Java was spinning out 10kr/s. Go was limited to 6kr/s. It’s worth noting some more details however: Go only ever spun up two postgres forks, whereas java spun up 10. There’s scope for optimization there. Go used 8mb of ram, whereas java was sitting on 130mb after a few runs. The Java version, cranking out at 10kr/s was maxing the kernel out on one core, so that’s probably approaching the practical limit for single machine tests.
I suspect putting a load balancer in front of a couple of instances of the go program would allow you to totally smash the java performance, given that it lit 2 cores at 6kr/s, and java lit 8 at 10kr/s. The memory usage tradeoff is significant – JVM sitting at 130mb and Go sitting at 8mb. Clearly everyone needs to draw their own conclusions on this, for their own purposes. The JVM solution is carrying a stats server, a ton of other tooling and so on. The Go system has some – pprof was included, but it’s limited by comparison. Arguably gdb and so on can actually be of real use in the Go case, but that’s also an exercise for the reader.
Interesting any which way you look at it. There are a lot of other interesting side effects (that need working out) in both programs as evidenced by this simple testing. Postgres also needs some tuning if you really want to slam this with anything remotely resembling a real world scale test.
My tests were done on OSX 10.8 12B19 on a 2.3GHz i7 wiht 8GB of DDR3 at 1333MHz, an intel 510 SSD, totally untuned postgres. The machine is a Macbook from whatever year that is.
I don’t get it, this seems that Java blows away Go in terms of lower latency, and higher throughput.
I’m sure an optimized assembly program would then also blow away Java. This performance evaluation is one of many factors, and based on other comments there are also some obvious flaws.
Maybe. But, that’s sort of the whole purpose of this entire post.
I finally got back to this while being stuck in sick.
I did fix a bug in Go’s database/sql, but even before that the numbers in this article were quite far off, especially given the hardware.
After a few more tweaks, this was brought up to 16k rps for Go, compared to 10k rps for Java on the same machine (the Java speedup due to post-article tweaks by readers), and just on a macbook pro (essentially 10x speedup for Go on 1/3-1/6 equivalent of the original hardware).
Pretty soon, the responsible thing will be to edit the article and put a disclaimer at the top stating: “These results and conclusions have been purportedly invalidated by third party readers, and require require reconsideration”, while perhaps planning to write a followup article. Without at least the disclaimer, readers even a few years from now may make misinformed decisions due to lack of counter-example (and not every reader reads the community comments).
I could discuss the problems of using benchmarks for language evaluation (subsequent developments have flip-flopped the relative results), as well as benchmarking languages while in the process of learning them (it’s effectively impossible to do them equal justice without equal experience in both, up to a point), but I digress… This does bring up an important point, however: benchmarks could be an effective measure of the programmer capability curve relative to expertise in a given language, as long as an expert-provided equivalent benchmark were provided for comparison. I’d be interested to see the aggregated efficiency-vs-experience gap of various languages, as well as efficiency-vs-general-experience results (experience programming in general, rather than just experience with the language being used).
Java has the fastest VM. Take that. We have in production Java cache server, which is 2x faster than facebook-memcached fork (~ 560K requests per sec on 12 cores Xeon), has better scalability and comparable latency percentiles (99% < 0.6ms). It depends on who wrote the code.
[...] part 2 of this series we examine the performance characteristics under different load [...]