Benchmarking
Some quick tests of Ruby's speed and concurrency.
Inspired by Jesse Noller's efforts in Python, I thought I'd benchmark Ruby, so as to compare implementations of Ruby (original flavour vs. JRuby) and models of concurrency. What's slow, what's fast, what's the best way to exploit these multi-core machines that are everywhere these days.
Several caveats apply:
- This is not a scientific or rigorous study in any way. The various computing tasks are fairly arbitrary - what if I used IO a lot? what if I was doing mostly number crunching? - to illustrate extreme, landmark cases.
- Numbers of iterations and so forth were largely selected to allow tests to consume a non-trivial amount of computation, while completing in a reasonable time. Due to running in a shared computing environment, and the effects of garbage collection and the JIT, there's bound to be a certain amount of noise in the results. Collecting variances would be useful, although more work.
- Trying to get more speed via concurrency out of a scripting language like Ruby is arguably missing the point. If you want speed, write it in something else. Conversely, if easy mechanisms for concurrency are available, why not use them if your program model lends itself to it?
- This was a quick and dirty job, done in an afternoon. Doubtless it could be improved.
The benchmarks
5 computing tasks of different nature, were run on three different Ruby environments, using three different models of concurrent execution and 4 different extents of concurrency. This were timed and averaged over 100 replicates. One by one:
Tasks: To examine how different computing tasks performed in different implementations of Ruby and under different concurrency models, 5 different tasks were called:
- empty: essentially an function call that does nothing but immediately returns.
- fibo100: a number heavy operation, calculating the fibonacci number of 100.
- sumprimes1000: likewise, find all primes under 1000 and calculate their prime.
- readwrite: randomly seek 300 positions in a file, and write them out to /dev/null.
- readurl: fetch a page from a website. Your choice of website may be "interesting". The latency in site response will effect results greatly. And your program might get recognised as a 'bot ...
Ruby implementations: I used:
- Ruby (original flavour) 1.8.7. 1.9 was not available.
- JRuby 1.8
- JRuby 1.8, run with the --fast flag. Seriously. It has a --fast flag.
Execution models: So as to test Ruby's ability at concurrency, each test was run with a certain number of iterations (1, 2, 4 and 8), implemented in several different concurrency models.
- sequential: good old single path programming - if something has to be done 8 times, it just loops 8 times in a row over it.
- threaded: launch a thread for each iteration, wait for all to complete.
- processes: launch a separate process for each iteration, wait for all to complete.
Some other models were considered and dropped, see below.
Results
Test were carried out on a Xeon 3.2Ghz processor, with 1G of RAM, running Fedora Core 6:
Ruby 1.8.7
========================================================
Task Conc 1 2 4 8
--------------------------------------------------------
empty sequential 0.003 0.003 0.005 0.007
threaded 0.011 0.160 0.026 0.044
processes 2.170 3.542 6.264 12.510
fibonacci sequential 0.045 0.069 0.111 0.198
threaded 0.066 0.084 0.138 0.255
processes 3.098 5.259 10.162 22.592
sumprimes sequential 0.336 0.469 0.993 1.473
threaded 0.340 0.499 0.821 1.541
processes 2.378 8.247 18.282 25.351
readwrite sequential 5.378 8.006 13.459 24.360
threaded 5.635 8.248 13.875 25.011
processes 6.156 9.804 15.973 29.408
readurl sequential 0.809 1.606 2.092 3.770
threaded 0.607 1.120 1.203 2.311
processes 3.635 5.459 12.176 18.220
========================================================
JRuby 1.8
=========================================================
1 2 4 8
---------------------------------------------------------
empty sequential 0.007 0.005 0.005 0.004
threaded 0.071 0.064 0.084 0.151
fibonacci sequential 0.019 0.023 0.038 0.064
threaded 0.088 0.076 0.114 0.203
sumprimes sequential 0.160 0.236 0.383 0.672
threaded 0.189 0.267 0.426 0.759
readwrite sequential 9.870 15.055 25.024 45.123
threaded 8.830 13.715 22.636 40.873
readurl sequential 0.963 1.270 2.107 3.220
threaded 0.706 0.728 0.898 1.578
=========================================================
JRuby 1.8 --fast
========================================================
1 2 4 8
--------------------------------------------------------
empty sequential 0.009 0.004 0.003 0.004
threaded 0.069 0.063 0.085 0.150
fibonacci sequential 0.020 0.022 0.040 0.062
threaded 0.072 0.074 0.113 0.202
sumprimes sequential 0.127 0.209 0.305 0.552
threaded 0.174 0.243 0.379 0.683
readwrite sequential 9.979 15.037 25.069 45.231
threaded 8.868 13.692 22.620 40.749
readurl sequential 0.892 1.226 2.118 3.142
threaded 0.677 0.718 0.781 1.260
========================================================
Discussion
As regards concurrency:
- Results for the "empty" task probably largely reflect the cost of overhead - setting up and administering threads and processes.
- Ruby seems to be in the same situation as Python: threads don't win you any speed gain for cpu-bound computations. They're sharing the same process time and the overhead in swapping between threads and setting threads up leads to a small but appreciable speed loss.
- Process setup is expensive.
- For IO-bound operations, the models of concurrency perform on a more equal footing. However this may just be due to the longer test times (i.e. the overhead pales next to the task time). More investigation is needed.
- For high latency IO (i.e. fetching webpages), threads start to win out. This makes sense, as the downloading of a webpage can be a particularly slow task and concurrent execution would be particularly efficient.
- Originally, I had intended to test fibers and actors. However, actors are based on fibers - so should perform the same - and fibers are only available in Ruby 1.9. Given that fibers will probably perform like green threads (i.e. threads implemented in software), they will likely perform slightly worse than the sequential model in all cases.
As regards JRuby:
- JRuby (and Java) don't implement fork due to the limitations of the JVM. So processes are out.
- JRuby is sometimes faster and sometimes slower. IO was slow, perhaps due to security checks carried out by the JVM. Number-crunching was surprisingly fast. Overall, JRuby is ballpark-ish about the same as Ruby.
- Threads bring a big win to JRuby when doing IO.
- There's a few anomalous results where a small number of iterations is faster than a larger number and so on. This may be the effect of Java optimizing a frequently called function.

