Skip to content. | Skip to navigation

Personal tools
You are here: Home Programming Ruby Benchmarking

Benchmarking

Some quick tests of Ruby's speed and concurrency.

Inspired by Jesse Noller's efforts in Python, I thought I'd benchmark Ruby, so as to compare implementations of Ruby (original flavour vs. JRuby) and models of concurrency. What's slow, what's fast, what's the best way to exploit these multi-core machines that are everywhere these days.

Several caveats apply:

  • This is not a scientific or rigorous study in any way. The various computing tasks are fairly arbitrary - what if I used IO a lot? what if I was doing mostly number crunching? - to illustrate extreme, landmark cases.
  • Numbers of iterations and so forth were largely selected to allow tests to consume a non-trivial amount of computation, while completing in a reasonable time. Due to running in a shared computing environment, and the effects of garbage collection and the JIT, there's bound to be a certain amount of noise in the results. Collecting variances would be useful, although more work.
  • Trying to get more speed via concurrency out of a scripting language like Ruby is arguably missing the point. If you want speed, write it in something else. Conversely, if easy mechanisms for concurrency are available, why not use them if your program model lends itself to it?
  • This was a quick and dirty job, done in an afternoon. Doubtless it could be improved.

The benchmarks

5 computing tasks of different nature, were run on three different Ruby environments, using three different models of concurrent execution and 4 different extents of concurrency. This were timed and averaged over 100 replicates. One by one:

Tasks: To examine how different computing tasks performed in different implementations of Ruby and under different concurrency models, 5 different tasks were called:

  • empty: essentially an function call that does nothing but immediately returns.
  • fibo100: a number heavy operation, calculating the fibonacci number of 100.
  • sumprimes1000: likewise, find all primes under 1000 and calculate their prime.
  • readwrite: randomly seek 300 positions in a file, and write them out to /dev/null.
  • readurl: fetch a page from a website. Your choice of website may be "interesting". The latency in site response will effect results greatly. And your program might get recognised as a 'bot ...

Ruby implementations: I used:

  • Ruby (original flavour) 1.8.7. 1.9 was not available.
  • JRuby 1.8
  • JRuby 1.8, run with the --fast flag. Seriously. It has a --fast flag.

Execution models: So as to test Ruby's ability at concurrency, each test was run with a certain number of iterations (1, 2, 4 and 8), implemented in several different concurrency models.

  • sequential: good old single path programming - if something has to be done 8 times, it just loops 8 times in a row over it.
  • threaded: launch a thread for each iteration, wait for all to complete.
  • processes: launch a separate process for each iteration, wait for all to complete.

Some other models were considered and dropped, see below.

Results

Test were carried out on a Xeon 3.2Ghz processor, with 1G of RAM, running Fedora Core 6:

Ruby 1.8.7
========================================================
Task        Conc           1       2       4         8
--------------------------------------------------------
empty       sequential   0.003   0.003    0.005    0.007
            threaded     0.011   0.160    0.026    0.044
            processes    2.170   3.542    6.264   12.510
fibonacci   sequential   0.045   0.069    0.111    0.198
            threaded     0.066   0.084    0.138    0.255
            processes    3.098   5.259   10.162   22.592
sumprimes   sequential   0.336   0.469    0.993    1.473
            threaded     0.340   0.499    0.821    1.541
            processes    2.378   8.247   18.282   25.351
readwrite   sequential   5.378   8.006   13.459   24.360
            threaded     5.635   8.248   13.875   25.011
            processes    6.156   9.804   15.973   29.408
readurl     sequential   0.809   1.606    2.092    3.770
            threaded     0.607   1.120    1.203    2.311
            processes    3.635   5.459   12.176   18.220
========================================================

JRuby 1.8
=========================================================

                           1        2        4        8
---------------------------------------------------------
empty       sequential   0.007    0.005    0.005    0.004
            threaded     0.071    0.064    0.084    0.151
fibonacci   sequential   0.019    0.023    0.038    0.064
            threaded     0.088    0.076    0.114    0.203
sumprimes   sequential   0.160    0.236    0.383    0.672
            threaded     0.189    0.267    0.426    0.759
readwrite   sequential   9.870   15.055   25.024   45.123
            threaded     8.830   13.715   22.636   40.873
readurl     sequential   0.963    1.270    2.107    3.220
            threaded     0.706    0.728    0.898    1.578
=========================================================

JRuby 1.8 --fast
========================================================

                           1       2        4        8
--------------------------------------------------------
empty       sequential   0.009   0.004    0.003    0.004
            threaded     0.069   0.063    0.085    0.150
fibonacci   sequential   0.020   0.022    0.040    0.062
            threaded     0.072   0.074    0.113    0.202
sumprimes   sequential   0.127   0.209    0.305    0.552
            threaded     0.174   0.243    0.379    0.683
readwrite   sequential   9.979  15.037   25.069   45.231
            threaded     8.868  13.692   22.620   40.749
readurl     sequential   0.892   1.226    2.118    3.142
            threaded     0.677   0.718    0.781    1.260
========================================================

Discussion

As regards concurrency:

  • Results for the "empty" task probably largely reflect the cost of overhead - setting up and administering threads and processes.
  • Ruby seems to be in the same situation as Python: threads don't win you any speed gain for cpu-bound computations. They're sharing the same process time and the overhead in swapping between threads and setting threads up leads to a small but appreciable speed loss.
  • Process setup is expensive.
  • For IO-bound operations, the models of concurrency perform on a more equal footing. However this may just be due to the longer test times (i.e. the overhead pales next to the task time). More investigation is needed.
  • For high latency IO (i.e. fetching webpages), threads start to win out. This makes sense, as the downloading of a webpage can be a particularly slow task and concurrent execution would be particularly efficient.
  • Originally, I had intended to test fibers and actors. However, actors are based on fibers - so should perform the same - and fibers are only available in Ruby 1.9. Given that fibers will probably perform like green threads (i.e. threads implemented in software), they will likely perform slightly worse than the sequential model in all cases.

As regards JRuby:

  • JRuby (and Java) don't implement fork due to the limitations of the JVM. So processes are out.
  • JRuby is sometimes faster and sometimes slower. IO was slow, perhaps due to security checks carried out by the JVM. Number-crunching was surprisingly fast. Overall, JRuby is ballpark-ish about the same as Ruby.
  • Threads bring a big win to JRuby when doing IO.
  • There's a few anomalous results where a small number of iterations is faster than a larger number and so on. This may be the effect of Java optimizing a frequently called function.
Document Actions
Visitors
Locations of visitors to this page
Ads
 
Sections