This article is moving to
http://raid6.com.au/~onlyjob/posts/arena
Please update your links - it will disappear from here some time later.
Perl, Python, Ruby, PHP, C, C++, Lua, tcl, javascript and Java benchmark/comparison.
Understanding difference(s) between programming languages is crucial. If wrong language is chosen for a project it will take a lot of time and efforts to change the course and re-implement the project or its part in different language. Typically it takes years of efforts, misery and dissatisfaction for everyone: yourself, your colleagues, your clients and your systems administrator(s). Needless to mention it can be dangerous for business.
Knowledge of how languages differ from each other is the key to making right decisions. Environments may have different demands - for example what language will be the best choice for VPS with limited RAM? Sometimes it is not easy to answer questions like this, considering many false beliefs and rumors so common among developers.
This testing is designed to demonstrate the difference between popular programming languages.
I hope you consider results of this little research to be interesting.
Method
Test code grows text string by adding another string in cycle until it grows up to 4 mb. Each iteration substitutes some text. Every time string becomes 256 KiB larger program prints number of seconds passed since beginning of test. App's output is being piped to script capturing memory usage (using memstat) for every line printed.
String manipulation is the core functionality for all languages so this allows to compare languages fairly. Processing of large string(s) reasonably stresses memory which manifests difference between language's efficiency.
Because the test case is very simple it is easy to implement it in different languages in similar way. Obviously code itself should not be considered practical because its only purpose is to create some computational load for measurement. Code samples are available for review. All implementations are reasonably accurate and straightforward. Again, similar amount of work done in similar way should be considered fair for comparison.
String processing has been chosen for numerous reasons.
Most applications don't do long calculations. For serious math core functionality of any language is not good enough. Using 3rd party math libraries will make comparison unfair despite the fact that comparing libraries would be meaningless if we want to compare languages, not math libraries.
Moreover integer calculations are not a good subject to test because integer size may be different. Accuracy of floating point calculations is affected by default precision and may be even more hardware-dependent. String processing is essential in every application because strings are just data. More data means more stress for garbage collection etc. Processing of large strings is easy to compare because all the languages in this testing will be doing same amount of work.
Essentially string processing is very common - XML(-RPC), HTML, logs, messages, GUI - all of this processing string at low level even when details of this process are hidden from developer behind API. Strings processing is not accelerated by hardware. By processing large string languages do many memory (re)allocations and if necessary copy data in memory. Efficiency of such processes is the subject of this testing because it shows well enough how languages are different.
Tests run long enough to compare performance and memory usage, but not the time needed for runtime startup. That's why running each test once is good enough for comparison. During experiments I ran every test many times and noticed just a little deviation between results. I considered those deviation to be negligibly small (statistically insignificant) so final comparison made from just one execution of test case for every language without gathering results of multiple tests and comparing their average. Remember that precise numbers are not too important in this test because relative difference manifested very well.
During the test I compared speed, memory usage, and performance degradation as per grow of processed data. When application struggles with more data it affects processing speed which is important characteristic to understand.
Only core language functionality has been used for testing.
Originally I wanted to compare only mainstream cross platform interpreting languages - namely PHP, Perl5, Python, Ruby and Java (Sun's and OpenJDK). Then curiosity made me include C, C++, Javascript ("spidermonkey", Mozilla), Javascript ("V8", Webkit), tcl, Lua and Java GCJ.
Whilst it is interesting to compare languages to each other, Javascripts, tcl and Lua are falling outside of scope so I will not compare their features.
Technically C and C++ should not belong here because they are very different from interpreting languages by nature, however their results are important to match against.
All tests have been conducted on Intel Core2 Duo T7500@2.20Ghz CPU; 2 GB RAM; OS Debian GNU/Linux 2.6.32 i686
During tests there were always enough free memory to fully accommodate running test without swapping and no resource-hungry applications running. However more accurate results can be gathered if X server and most other processes will be stopped for the period of testing. Difference in running the same test with or without swap or with higher priority were negligibly small if any. During tests CPU power management was disabled so both CPUs (cores) were running at maximum speed.
Defaults has been used for all languages but PHP. By default PHP restrict maximum memory usage and maximum execution time. In order to complete test those parameters had to be changed in PHP runtime configuration.
Compilation time needed for C, C++ and Java wasn't counted in this testing.
This comparison consists of three parts:
Part 1: Speed.
Part 2: Memory usage.
Part 3: Language features.
October 2011 update: Python v3 added to comparison.
Speed
Execution speed is obviously important to understand the language. I would say that if you're not considering performance at all you simply don't care about your application. However performance alone is not the most important characteristic and therefore other aspects should be taken into consideration as well.
Line size Kb | Perl5 | PHP | Ruby | Python | C++ (g++) | C (gcc) | Javascript (V8) | Javascript (sm) | Python3 | tcl | Lua | Java (openJDK) | Java (Sun) | Java (gcj) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
256 | 2 | 6 | 7 | 7 | 7 | 2 | 3 | 30 | 17 | 33 | 49 | 39 | 38 | 451 |
512 | 7 | 23 | 29 | 32 | 26 | 8 | 21 | 131 | 81 | 141 | 203 | 162 | 157 | 1783 |
768 | 16 | 54 | 75 | 78 | 60 | 19 | 51 | 300 | 201 | 324 | 480 | 381 | 371 | 3937 |
1024 | 27 | 96 | 141 | 144 | 107 | 34 | 91 | 535 | 373 | 583 | 886 | 711 | 696 | 6952 |
1280 | 43 | 153 | 225 | 232 | 167 | 53 | 144 | 842 | 598 | 921 | 1423 | 1161 | 1145 | 10744 |
1536 | 62 | 227 | 328 | 342 | 242 | 76 | 208 | 1220 | 877 | 1334 | 2090 | 1751 | 1739 | 15372 |
1792 | 84 | 318 | 452 | 476 | 329 | 104 | 283 | 1672 | 1211 | 1823 | 2886 | 2489 | 2478 | 20819 |
2048 | 109 | 424 | 597 | 634 | 431 | 136 | 370 | 2203 | 1598 | 2387 | 3856 | 3370 | 3358 | 27132 |
2304 | 139 | 549 | 758 | 815 | 546 | 173 | 469 | 2799 | 2039 | 3030 | 4963 | 4453 | 4448 | 34302 |
2560 | 171 | 691 | 941 | 1019 | 675 | 214 | 578 | 3463 | 2533 | 3753 | 6198 | 5710 | 5719 | 42330 |
2816 | 206 | 849 | 1143 | 1248 | 817 | 259 | 700 | 4198 | 3070 | 4553 | 7568 | 7146 | 7186 | 51118 |
3072 | 245 | 1022 | 1366 | 1497 | 972 | 309 | 834 | 4997 | 3659 | 5422 | 9084 | 8852 | 8983 | 60779 |
3328 | 288 | 1211 | 1607 | 1771 | 1142 | 363 | 979 | 5875 | 4300 | 6378 | 10759 | 10784 | 10916 | 71275 |
3584 | 334 | 1414 | 1869 | 2064 | 1324 | 423 | 1136 | 6825 | 4992 | 7409 | 12594 | 12696 | 12867 | 82619 |
3840 | 384 | 1634 | 2150 | 2381 | 1522 | 487 | 1304 | 7848 | 5729 | 8503 | 14564 | 14861 | 15053 | 94686 |
4096 | 437 | 1869 | 2455 | 2720 | 1731 | 555 | 1484 | 8928 | 6534 | 9680 | 16674 | 17262 | 17426 | 107887 |
Line size Kib | Perl5 | PHP | Ruby | Python | C++ (g++) | C (gcc) | Javascript (V8) | Javascript (sm) | Python3 | tcl | Lua | Java (openJDK) | Java (Sun) | Java (gcj) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
256 | 0:00:02 | 0:00:06 | 0:00:07 | 0:00:07 | 0:00:07 | 0:00:02 | 0:00:03 | 0:00:30 | 0:00:17 | 0:00:33 | 0:00:49 | 0:00:39 | 0:00:38 | 0:07:31 |
512 | 0:00:07 | 0:00:23 | 0:00:29 | 0:00:32 | 0:00:26 | 0:00:08 | 0:00:21 | 0:02:11 | 0:01:21 | 0:02:21 | 0:03:23 | 0:02:42 | 0:02:37 | 0:29:43 |
768 | 0:00:16 | 0:00:54 | 0:01:15 | 0:01:18 | 0:01:00 | 0:00:19 | 0:00:51 | 0:05:00 | 0:03:21 | 0:05:24 | 0:08:00 | 0:06:21 | 0:06:11 | 1:05:37 |
1024 | 0:00:27 | 0:01:36 | 0:02:21 | 0:02:24 | 0:01:47 | 0:00:34 | 0:01:31 | 0:08:55 | 0:06:13 | 0:09:43 | 0:14:46 | 0:11:51 | 0:11:36 | 1:55:52 |
1280 | 0:00:43 | 0:02:33 | 0:03:45 | 0:03:52 | 0:02:47 | 0:00:53 | 0:02:24 | 0:14:02 | 0:09:58 | 0:15:21 | 0:23:43 | 0:19:21 | 0:19:05 | 2:59:04 |
1536 | 0:01:02 | 0:03:47 | 0:05:28 | 0:05:42 | 0:04:02 | 0:01:16 | 0:03:28 | 0:20:20 | 0:14:37 | 0:22:14 | 0:34:50 | 0:29:11 | 0:28:59 | 4:16:12 |
1792 | 0:01:24 | 0:05:18 | 0:07:32 | 0:07:56 | 0:05:29 | 0:01:44 | 0:04:43 | 0:27:52 | 0:20:11 | 0:30:23 | 0:48:06 | 0:41:29 | 0:41:18 | 5:46:59 |
2048 | 0:01:49 | 0:07:04 | 0:09:57 | 0:10:34 | 0:07:11 | 0:02:16 | 0:06:10 | 0:36:43 | 0:26:38 | 0:39:47 | 1:04:16 | 0:56:10 | 0:55:58 | 7:32:12 |
2304 | 0:02:19 | 0:09:09 | 0:12:38 | 0:13:35 | 0:09:06 | 0:02:53 | 0:07:49 | 0:46:39 | 0:33:59 | 0:50:30 | 1:22:43 | 1:14:13 | 1:14:08 | 9:31:42 |
2560 | 0:02:51 | 0:11:31 | 0:15:41 | 0:16:59 | 0:11:15 | 0:03:34 | 0:09:38 | 0:57:43 | 0:42:13 | 1:02:33 | 1:43:18 | 1:35:10 | 1:35:19 | 11:45:30 |
2816 | 0:03:26 | 0:14:09 | 0:19:03 | 0:20:48 | 0:13:37 | 0:04:19 | 0:11:40 | 1:09:58 | 0:51:10 | 1:15:53 | 2:06:08 | 1:59:06 | 1:59:46 | 14:11:58 |
3072 | 0:04:05 | 0:17:02 | 0:22:46 | 0:24:57 | 0:16:12 | 0:05:09 | 0:13:54 | 1:23:17 | 1:00:59 | 1:30:22 | 2:31:24 | 2:27:32 | 2:29:43 | 16:52:59 |
3328 | 0:04:48 | 0:20:11 | 0:26:47 | 0:29:31 | 0:19:02 | 0:06:03 | 0:16:19 | 1:37:55 | 1:11:40 | 1:46:18 | 2:59:19 | 2:59:44 | 3:01:56 | 19:47:55 |
3584 | 0:05:34 | 0:23:34 | 0:31:09 | 0:34:24 | 0:22:04 | 0:07:03 | 0:18:56 | 1:53:45 | 1:23:12 | 2:03:29 | 3:29:54 | 3:31:36 | 3:34:27 | 22:56:59 |
3840 | 0:06:24 | 0:27:14 | 0:35:50 | 0:39:41 | 0:25:22 | 0:08:07 | 0:21:44 | 2:10:48 | 1:35:29 | 2:21:43 | 4:02:44 | 4:07:41 | 4:10:53 | 26:18:06 |
4096 | 0:07:17 | 0:31:09 | 0:40:55 | 0:45:20 | 0:28:51 | 0:09:15 | 0:24:44 | 2:28:48 | 1:48:54 | 2:41:20 | 4:37:54 | 4:47:42 | 4:50:26 | 29:58:07 |
Speed graph
Speed tests fall into 4 categories:
Slowest: Java gcj (native executable)
Slow: Java (openJDK); Java (Sun); Lua
Not-so-fast: tcl; Javascript (spidermonkey)
Fastest: Python; Ruby; PHP; C++; Javascript V8; C; Perl5
As you can see from performance graph, processing speed slows down as the test string grow. The more graph curves up the more performance degrades. Graph reveals that performance of Java and Lua degrades dramatically.
All tested languages are good with manipulation of little strings but as the processed data grow the difference manifests itself.
Slow group [Java, Lua] suffer from severe performance degradation.
There are almost no difference in performance between OpenJDK Java and Sun Java. Lua's performance is very close to Java.
Initially GCJ Java interpreter crashed during the test, however GCJ Java can compile Java code to executable file which completed the test even though awfully slow. Here and below unqualified "Java" means only mainstream Sun/OpenJDK Java.
Let's have a closer look at Fastest group:
Pyhon, Ruby and PHP are slightly slower than than C++. This is not a surprise because those languages are optimised well enough.
Javascript V8 completed test slightly faster than C++.
This group of languages shows average slow down while performance of C and Perl5 is almost a flat line on graph indicating very little degradation. It means that C and Perl5 process increasing amount of data at (almost) constant speed.
Unexpected result: somehow Perl5 managed to finish faster than C. This came as unforeseen surprise which I found difficult to explain. Probably Perl does less memory reallocations to accommodate string growth.
I didn't do serious coding in C since 1995 but implementation is quite simple and straightforward so test result stands.
Perl5 is a clear winner with just a little more than 7 minutes needed to finish test against Java with worst result as big as nearly 5 hours to do the same. (Worst result of GCJ Java - almost 30 hours, doesn't worth comparing against)
Perl5 is not only superior in performance but it shows very little slow down on larger data. This is as close to C (compiled to machine code) as it can be for scripting language. Absolutely amazing!
Interesting to note that with "use strict;" Perl completed the same test ~6 seconds quicker.
In the table below Perl5 has been taken as 1 and other language's performance measured in Perls so you can see how many times slower a particular language comparing to Perl5 in this test. Because of performance degradation it will be incorrect to say something like "This is twice faster than That". Some language's performance degrade faster than others so in beginning of this test Java somewhat 20 times slower than Perl5 and in the end Java is about 40 times slower (for same amount of data).
Clearly this is an important characteristic - size matters! This is correspond with observation of some Java applications which behave well under little load and degrade exponentially as the load increases.
Line size Kib | Perl5 | PHP | Ruby | Python | C++ (g++) | C (gcc) | Javascript (V8) | Javascript (sm) | Python3 | tcl | Lua | Java (openJDK) | Java (Sun) | Java (gcj) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
256 | 1 | 3.00 | 3.50 | 3.50 | 3.50 | 1.00 | 1.50 | 15.00 | 8.50 | 16.50 | 24.50 | 19.50 | 19.00 | 225.50 |
512 | 1 | 3.29 | 4.14 | 4.57 | 3.71 | 1.14 | 3.00 | 18.71 | 11.57 | 20.14 | 29.00 | 23.14 | 22.43 | 254.71 |
768 | 1 | 3.38 | 4.69 | 4.88 | 3.75 | 1.19 | 3.19 | 18.75 | 12.56 | 20.25 | 30.00 | 23.81 | 23.19 | 246.06 |
1024 | 1 | 3.56 | 5.22 | 5.33 | 3.96 | 1.26 | 3.37 | 19.81 | 13.81 | 21.59 | 32.81 | 26.33 | 25.78 | 257.48 |
1280 | 1 | 3.56 | 5.23 | 5.40 | 3.88 | 1.23 | 3.35 | 19.58 | 13.91 | 21.42 | 33.09 | 27.00 | 26.63 | 249.86 |
1536 | 1 | 3.66 | 5.29 | 5.52 | 3.90 | 1.23 | 3.35 | 19.68 | 14.15 | 21.52 | 33.71 | 28.24 | 28.05 | 247.94 |
1792 | 1 | 3.79 | 5.38 | 5.67 | 3.92 | 1.24 | 3.37 | 19.90 | 14.42 | 21.70 | 34.36 | 29.63 | 29.50 | 247.85 |
2048 | 1 | 3.89 | 5.48 | 5.82 | 3.95 | 1.25 | 3.39 | 20.21 | 14.66 | 21.90 | 35.38 | 30.92 | 30.81 | 248.92 |
2304 | 1 | 3.95 | 5.45 | 5.86 | 3.93 | 1.24 | 3.37 | 20.14 | 14.67 | 21.80 | 35.71 | 32.04 | 32.00 | 246.78 |
2560 | 1 | 4.04 | 5.50 | 5.96 | 3.95 | 1.25 | 3.38 | 20.25 | 14.81 | 21.95 | 36.25 | 33.39 | 33.44 | 247.54 |
2816 | 1 | 4.12 | 5.55 | 6.06 | 3.97 | 1.26 | 3.40 | 20.38 | 14.90 | 22.10 | 36.74 | 34.69 | 34.88 | 248.15 |
3072 | 1 | 4.17 | 5.58 | 6.11 | 3.97 | 1.26 | 3.40 | 20.40 | 14.93 | 22.13 | 37.08 | 36.13 | 36.67 | 248.08 |
3328 | 1 | 4.20 | 5.58 | 6.15 | 3.97 | 1.26 | 3.40 | 20.40 | 14.93 | 22.15 | 37.36 | 37.44 | 37.90 | 247.48 |
3584 | 1 | 4.23 | 5.60 | 6.18 | 3.96 | 1.27 | 3.40 | 20.43 | 14.95 | 22.18 | 37.71 | 38.01 | 38.52 | 247.36 |
3840 | 1 | 4.26 | 5.60 | 6.20 | 3.96 | 1.27 | 3.40 | 20.44 | 14.92 | 22.14 | 37.93 | 38.70 | 39.20 | 246.58 |
4096 | 1 | 4.28 | 5.62 | 6.22 | 3.96 | 1.27 | 3.40 | 20.43 | 14.95 | 22.15 | 38.16 | 39.50 | 39.88 | 246.88 |
Average: | 1 | 3.84 | 5.21 | 5.59 | 3.89 | 1.23 | 3.23 | 19.66 | 13.92 | 21.35 | 34.36 | 31.16 | 31.12 | 247.32 |
Memory usage
During testing memory usage were captured as per every completed step.
Line size Kb | C (gcc) | C++ (G++) | Perl5 | Python | Python3 | Ruby | Lua | tcl | PHP | Javascript (sm) | Javascript (V8) | Java (gcj) | Java (OpenJDK) | Java (Sun) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1,668 | 2,932 | 4,776 | 5,352 | 10,328 | 11,040 | 2,416 | 1,236 | 36,752 | 7,720 | 39,272 | 49,156 | 72,4832 | 658,560 |
256 | 1,928 | 3,444 | 5,052 | 6,384 | 13,404 | 9,620 | 3,960 | 13,696 | 38,040 | 50,664 | 47,236 | 68,320 | 725,852 | 661,056 |
512 | 2,184 | 3,956 | 5,308 | 5,876 | 16,476 | 11,672 | 5,404 | 14,720 | 39,064 | 29,672 | 47,636 | 76,200 | 725,852 | 661,056 |
768 | 2,440 | 3,956 | 5,564 | 7,676 | 19,548 | 7,328 | 6,428 | 18,052 | 40,088 | 16,872 | 49,404 | 84,392 | 725,852 | 661,056 |
1024 | 2,696 | 4,980 | 5,820 | 6,388 | 14,420 | 12,704 | 7,820 | 14,716 | 41,112 | 53,224 | 46,540 | 92,584 | 725,852 | 661,056 |
1280 | 2,952 | 4,980 | 6,076 | 9,212 | 15,444 | 8,604 | 6,104 | 15,228 | 42,136 | 44,520 | 47,044 | 110,072 | 725,852 | 661,056 |
1536 | 3,208 | 4,980 | 6,332 | 6,900 | 16,468 | 11,164 | 10,572 | 18,816 | 43,160 | 21,480 | 50,124 | 118,264 | 725,852 | 662,080 |
1792 | 3,464 | 4,980 | 6,588 | 7,156 | 17,492 | 8,856 | 11,812 | 16,252 | 44,184 | 38,376 | 51,916 | 126,976 | 725,852 | 662,080 |
2048 | 3,720 | 7,028 | 6,844 | 11,516 | 18,516 | 13,724 | 10,908 | 16,764 | 45,208 | 51,176 | 47,540 | 126,976 | 725,852 | 662,080 |
2304 | 3,976 | 7,028 | 7,100 | 7,668 | 19,540 | 12,700 | 6,644 | 17,276 | 46,232 | 38,376 | 46,252 | 161,824 | 725,852 | 662,080 |
2560 | 4,232 | 7,028 | 7,356 | 7,924 | 20,564 | 11,160 | 15,592 | 22,912 | 41,876 | 41,960 | 44,452 | 161,824 | 725,852 | 662,080 |
2816 | 4,488 | 7,028 | 7,612 | 8,180 | 21,588 | 14,748 | 16,848 | 18,300 | 42,388 | 79,336 | 50,612 | 161,824 | 725,852 | 662,080 |
3072 | 4,744 | 7,028 | 7,868 | 8,436 | 22,612 | 15,772 | 15,716 | 18,812 | 49,304 | 73,704 | 51,636 | 161,824 | 725,852 | 662,080 |
3328 | 5,000 | 7,028 | 8,124 | 8,692 | 23,636 | 16,796 | 19,492 | 19,324 | 50,328 | 39,400 | 55,996 | 170,536 | 725,852 | 662,080 |
3584 | 5,256 | 7,028 | 8,380 | 12,536 | 24,660 | 17,820 | 17,072 | 19,840 | 43,924 | 27,624 | 46,500 | 170,536 | 725,852 | 662,080 |
3840 | 5,512 | 7,028 | 8,636 | 9,204 | 25,684 | 18,844 | 23,276 | 20,348 | 44,436 | 29,160 | 58,556 | 170,536 | 725,852 | 662,080 |
4096 | 5,768 | 11,124 | 8,892 | 9,460 | 26,708 | 15,768 | 20,200 | 20,860 | 44,948 | 96,232 | 59,836 | 170,536 | 725,852 | 662,080 |
Memory usage - there is no "mainstream" Java on graph because of constantly high usage.
Result fall into five categories:
Highest: Java OpenJDK, Java Sun
High:Java GCJ
Medium:Javascript V8, Javascript sm., PHP
Low:tcl, Lua, Ruby
Lowest: Python, Perl5, C++, C
Highest group - mainstream Java pre-allocates a fairly big chunk of memory (certain percentage) by default and does memory management inside this chunk. During this test memory usage hasn't change and was constantly high - so it is not present on graph: if included it makes all other results appear as flat lines well below.
To capture internal memory usage I introduced print statements to Java code to show internal memory usage as per string growth. (It doesn't affect performance) Unfortunately printed numbers has no correspondence with string growth. This shows that Java garbage collection works completely independent from application code. Output numbers appeared to be random, sometimes as high as up to 95% of pre-allocated memory. Even if internal memory usage did not correspond with the string size it seems that sometimes Java is using nearly all of its memory before garbage collection (GC) releases some of it.
Java memory management appears to be extremely ineffective which seems to be the primary cause for poor performance. I leave further investigation with specific Java-monitoring tools for those who might find it interesting. Java professionals may also try to improve results with fine tuning using miscellaneous GC parameters.
High group - Java GCJ compiled to native executable. Thanks to this special feature GCJ Java demonstrated predictable behaviour when memory allocation grows together with data processed. Comparing with other non-Java runtimes memory utilisation is huge.
Medium group: Javascript demonstrate more or less consistent grow in memory usage as per data growth. PHP shows very little grow but its heavy runtime uses a lot of memory from very beginning. Despite initial requirements PHP uses memory pretty wise. High memory usage upon startup is not necessarily bad thing: if meant for continuous execution it may be OK to pre-load common libraries. However this may be a limitation for PHP usage on VPS server i.e. when available memory is limited.
Let's have a closer look at Low and Lowest group:
Lua and tcl runtimes are tiny, but their memory management not very effective. Ruby used more memory than Python. Python utilises memory almost as good as Perl5 - perhaps their runtimes are almost the same size. Once again Perl5 performed amazingly well, demonstrating behaviour very similar to C - best among scripting languages. As expected C++ memory usage is roughly between C and Perl5.
As we did in speed test let's take Perl5 as 1 and see how other language's memory usage compares on every step and on average.
Line size Kb | C (gcc) | C++ (G++) | Perl5 | Python | Python3 | Ruby | Lua | tcl | PHP | Javascript (sm) | Javascript (V8) | Java (gcj) | Java (OpenJDK) | Java (Sun) |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0.35 | 0.61 | 1 | 1.12 | 2.16 | 2.31 | 0.51 | 0.23 | 7.70 | 1.62 | 8.22 | 10.29 | 151.77 | 137.89 |
256 | 0.38 | 0.68 | 1 | 1.26 | 2.65 | 1.90 | 0.78 | 2.15 | 7.53 | 10.03 | 9.35 | 13.52 | 143.68 | 130.85 |
512 | 0.41 | 0.75 | 1 | 1.11 | 3.10 | 2.20 | 1.02 | 2.51 | 7.36 | 5.59 | 8.97 | 14.36 | 136.75 | 124.54 |
768 | 0.44 | 0.71 | 1 | 1.38 | 3.51 | 1.32 | 1.16 | 2.35 | 7.20 | 3.03 | 8.88 | 15.17 | 130.46 | 118.81 |
1024 | 0.46 | 0.86 | 1 | 1.10 | 2.48 | 2.18 | 1.34 | 2.30 | 7.06 | 9.15 | 8.00 | 15.91 | 124.72 | 113.58 |
1280 | 0.49 | 0.82 | 1 | 1.52 | 2.54 | 1.42 | 1.00 | 1.65 | 6.93 | 7.33 | 7.74 | 18.12 | 119.46 | 108.80 |
1536 | 0.51 | 0.79 | 1 | 1.09 | 2.60 | 1.76 | 1.67 | 2.73 | 6.82 | 3.39 | 7.92 | 18.68 | 114.63 | 104.56 |
1792 | 0.53 | 0.76 | 1 | 1.09 | 2.66 | 1.34 | 1.79 | 2.27 | 6.71 | 5.83 | 7.88 | 19.27 | 110.18 | 100.50 |
2048 | 0.54 | 1.03 | 1 | 1.68 | 2.71 | 2.01 | 1.59 | 1.46 | 6.61 | 7.48 | 6.95 | 18.55 | 106.06 | 96.74 |
2304 | 0.56 | 0.99 | 1 | 1.08 | 2.75 | 1.79 | 0.94 | 2.25 | 6.51 | 5.41 | 6.51 | 22.79 | 102.23 | 93.25 |
2560 | 0.58 | 0.96 | 1 | 1.08 | 2.80 | 1.52 | 2.12 | 2.89 | 5.69 | 5.70 | 6.04 | 22.00 | 98.67 | 90.01 |
2816 | 0.59 | 0.92 | 1 | 1.07 | 2.84 | 1.94 | 2.21 | 2.24 | 5.57 | 10.42 | 6.65 | 21.26 | 95.36 | 86.98 |
3072 | 0.60 | 0.89 | 1 | 1.07 | 2.87 | 2.00 | 2.00 | 2.23 | 6.27 | 9.37 | 6.56 | 20.57 | 92.25 | 84.15 |
3328 | 0.62 | 0.87 | 1 | 1.07 | 2.91 | 2.07 | 2.40 | 2.22 | 6.19 | 4.85 | 6.89 | 20.99 | 89.35 | 81.50 |
3584 | 0.63 | 0.84 | 1 | 1.50 | 2.94 | 2.13 | 2.04 | 1.58 | 5.24 | 3.30 | 5.55 | 20.35 | 86.62 | 79.01 |
3840 | 0.64 | 0.81 | 1 | 1.07 | 2.97 | 2.18 | 2.70 | 2.21 | 5.15 | 3.38 | 6.78 | 19.75 | 84.05 | 76.67 |
4096 | 0.65 | 1.25 | 1 | 1.06 | 3.00 | 1.77 | 2.27 | 2.21 | 5.05 | 10.82 | 6.73 | 19.18 | 81.63 | 74.46 |
Average: | 0.53 | 0.85 | 1 | 1.20 | 2.79 | 1.87 | 1.62 | 2.09 | 6.45 | 6.28 | 7.39 | 18.28 | 109.87 | 100.13 |
Environment where applications work may have certain memory limits. It is true not only for popular Virtual Private Servers (VPS) where sometimes amount of RAM can be as little as 128 Mb for OS and all applications/services but also for embedded devices and heavily loaded servers.
Good understanding of memory utilisation is equally important for consideration as speed.
Read more after code section below.
Source codes and test results
C; Result: C gcc (Debian 4.4.4-1) 4.4.4
#include <stdio.h> #include <stdlib.h> #include <string.h> #include <time.h> int main(){ setbuf(stdout,NULL); //disable output buffering char *str=malloc(8); strcpy(str,"abcdefgh"); str=realloc(str,strlen(str)+8); strcat(str,"efghefgh"); //sprintf(str,"%s%s",str,"efghefgh"); int imax=1024/strlen(str)*1024*4; printf("%s","exec.tm.sec\tstr.length\n"); //fflush(stdout); time_t starttime=time(NULL); char *gstr=malloc(0); int i=0; char *pos; int lngth; char *pos_c=gstr; int str_len=strlen(str); while(i++ < imax+1000){ lngth=strlen(str)*i; gstr=realloc(gstr,lngth+str_len); strcat(gstr,str); //sprintf(gstr,"%s%s",gstr,str); pos_c+=str_len; pos=gstr; while(pos=strstr(pos,"efgh")){ memcpy(pos,"____",4); } if(lngth % (1024*256)==0){ printf("%dsec\t\t%dkb\n",time(NULL)-starttime,lngth/1024); //fflush(stdout); } } //printf("%s\n",gstr); }
C++ (source); Result: C++ g++ (Debian 4.4.3-7) 4.4.3
#include <iostream> #include <string> #include <time.h> using namespace std; main () { string str = "abcdefgh"; str += "efghefgh"; int imax = 1024 /str.length() * 1024 *4; time_t currentTime = time(NULL); cout << "exec.tm.sec\tstr.length" << endl; string find= "efgh"; string replace ="____"; string gstr; int i=0; int length; // int end=0; // size_t end=0; while(i++ < imax +1000){ gstr += str; gstr = gstr; size_t start, sizeSearch=find.size(), end=0; while((start=gstr.find(find,end))!=string::npos){ end=start+sizeSearch; gstr.replace(start,sizeSearch,replace); } length = str.length()*i; if((length%(1024 * 256))==0){ cout << time(NULL) - currentTime << "sec\t\t" << length/1024 << "kb" << endl; } } // cout << gstr << endl; return 0; }
Javascript (source); Results: Javascript (Spidermonkey - Mozilla) 1.8.0 pre-release 1 2007-10-03, Javascript (V8 - Chrome)
#!/usr/local/bin/js var str = "abcdefgh"+"efghefgh"; var imax = 1024 / str.length * 1024 * 4; var time = new Date(); print("exec.tm.sec\tstr.length"); var gstr = ""; var i=0; var lngth; while (i++ < imax+1000) { gstr += str; gstr = gstr.replace(/efgh/g, "____"); lngth=str.length*i; if ((lngth % (1024*256)) == 0) { var curdate=new Date(); print(parseInt(((curdate.getTime()-time.getTime())/1000))+"sec\t\t"+lngth/1024+"kb"); } }
Java (source); Results: Java (OpenJDK) "1.6.0_18", Java (Sun) "1.6.0_16", Java (gcj) (Debian 4.4.3-1) 4.4.3
public class java_test { public static final void main(String[] args) throws Exception { String str = "abcdefgh"+"efghefgh"; int imax = 1024 / str.length() * 1024 * 4; long time = System.currentTimeMillis(); System.out.println("exec.tm.sec\tstr.length\tallocated memory:free memory:memory used"); Runtime runtime = Runtime.getRuntime(); System.out.println("0\t\t0\t\t"+runtime.totalMemory()/1024 +":"+ runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024); String gstr = ""; int i=0; int lngth; while (i++ < imax+1000) { gstr += str; gstr = gstr.replaceAll("efgh", "____"); lngth=str.length()*i; if ((lngth % (1024*256)) == 0) { System.out.println(((System.currentTimeMillis()-time)/1000)+"sec\t\t"+lngth/1024+"kb\t\t"+runtime.totalMemory()/1024+":"+runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024); } } } }
Perl5 (source); Result: This is perl, v5.10.1 (*) built for i486-linux-gnu-thread-multi
#!/usr/bin/perl $|=1; #disable output buffering, this is necessary for proper output through pipe my $str='abcdefgh'.'efghefgh'; my $imax=1024/length($str)*1024*4; # 4mb my $starttime=time(); print "exec.tm.sec\tstr.length\n"; my $gstr=''; my $i=0; while($i++ < $imax+1000){ #adding 1000 iterations to delay exit. This will allow to capture memory usage on last step $gstr.=$str; $gstr=~s/efgh/____/g; my $lngth=length($str)*$i; ## my $lngth=length($gstr); # Perhaps that would be a slower way print time()-$starttime,"sec\t\t",$lngth/1024,"kb\n" unless $lngth % (1024*256); #print out every 256kb }
PHP (source); Result: PHP 5.3.1-5 with Suhosin-Patch (cgi-fcgi) (built: Feb 22 2010 17:38:41)
<?php $str="abcdefgh"."efghefgh"; $imax=1024/strlen($str)*1024*4; # 4mb $starttime=time(); print("exec.tm.sec\tstr.length\n"); $gstr=''; $i=0; while($i++ < $imax+1000){ $gstr.=$str; $gstr=preg_replace('/efgh/','____',$gstr); $lngth=strlen($str)*$i; if($lngth % (1024*256)==0){ print (time()-$starttime."sec\t\t".($lngth/1024)."kb\n"); } } ?>
Python (source); Result: Python 2.5.5
#!/usr/bin/python -u import re import time import sys str='abcdefgh'+'efghefgh' imax=1024/len(str)*1024*4 # 4mb starttime=time.time(); print "exec.tm.sec\tstr.length" sys.stdout.flush() gstr='' i=0 while (i < imax+1000): i=i+1 gstr+=str gstr=re.sub('efgh','____',gstr) lngth=len(str)*i if(lngth % (1024*256) == 0): print int(time.time()-starttime),"sec\t\t",(lngth/1024),"kb" sys.stdout.flush()
Python3 (source); Result: Python 3.1.3
#!/usr/bin/python3 -u import re import time import sys str='abcdefgh'+'efghefgh' imax=1024/len(str)*1024*4 # 4mb starttime=time.time(); print "exec.tm.sec\tstr.length" sys.stdout.flush() gstr='' i=0 while (i < imax+1000): i=i+1 gstr+=str gstr=re.sub('efgh','____',gstr) lngth=len(str)*i if(lngth % (1024*256) == 0): print int(time.time()-starttime),"sec\t\t",(lngth/1024),"kb" sys.stdout.flush()
Ruby (source); Result: ruby 1.8.7 (2010-01-10 patchlevel 249) [i486-linux]
#!/usr/bin/ruby $stdout.sync=true; str='abcdefgh'+'efghefgh'; imax=1024/str.length*1024*4; # 4mb starttime=Time.new; print("exec.tm.sec\tstr.length\n"); gstr=''; i=0; while i < imax+1000 i=i+1; gstr+=str; gstr=gstr.gsub(/efgh/, "____") lngth=str.length*i; if(lngth % (1024*256)==0) print(((Time.new-starttime).ceil).to_s+"sec\t\t",(lngth/1024).to_s,"kb\n"); end end #puts gstr;
Lua (source): Result: Lua 5.1.4
#!/usr/bin/lua io.stdout:setvbuf "no"; -- io.flush(); str='abcdefgh'..'efghefgh'; imax=1024/string.len(str)*1024*4; -- 4mb starttime=os.time(); print "exec.tm.sec\tstr.length"; gstr=''; i=0; while i < imax+1000 do i=i+1; gstr=gstr..str; gstr=string.gsub(gstr,"efgh","____"); lngth=string.len(str)*i; if(math.mod(lngth,1024*256)==0) then print(os.time()-starttime.."sec\t\t"..(lngth/1024).."kb"); end end
tcl (source): Result: tcl 8.4.19
#!/usr/bin/tclsh set str "abcdefgh" append str "efghefgh" set imax [expr {1024/[string length $str]*1024*4}] set starttime [clock clicks -milliseconds] puts "exec.tm.sec\tstr.length"; set gstr "" set i 0 while {$i<[expr {$imax+1000}]} { incr i append gstr $str; regsub -all {efgh} $gstr ____ gstr set lngth [expr {[string length $str]*$i}] if {[expr {$lngth % (1024*256)}] == 0} { puts "[expr int([expr [clock clicks -milliseconds] - $starttime] / 1000)]sec\t\t[expr {$lngth/1024}]kb" } } exit
Files:
June 2011 Update: One bright Java developer felt like I'm bashing Java so he decided to optimise Java test. Initially I was sceptical about it because two other Java programmers failed to do so.
As you may already noted from source codes, for high level languages I use regular expression to substitute substring on each iteration.
However when I decided to include C and C++ to the test case regex was replaced with traditional "moving window" technique where searching for substring start from position calculated on previous step instead of scanning the whole growing string every time.
This approach has been chosen because regular expressions are not part of core functionality of C/C++ and also because for low level languages this seems to be a natural way to do substitution.
Unfortunately this affected comparison fairness. (Perhaps all tests should have been using indexed substitutions.)
The fact that C++ example use "moving window" substitution instead of regular expression allow to rewrite Java code like in the following example:
public class java_test_optm { public static final void main(String[] args) throws Exception { String str = "abcdefgh"+"efghefgh"; int imax = 1024 / str.length() * 1024 * 4; long time = System.currentTimeMillis(); System.out.println("exec.tm.sec\tstr.length\tallocated memory:free memory:memory used"); Runtime runtime = Runtime.getRuntime(); System.out.println("0\t\t0\t\t"+runtime.totalMemory()/1024 +":"+ runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024); final StringBuilder gstr = new StringBuilder(); int i=0; int lngth; while (i++ < imax+1000) { gstr.append(str); int startIndx = gstr.indexOf("efgh"); while(startIndx != -1){ gstr.replace(startIndx, startIndx + 4, "____"); startIndx = gstr.indexOf("efgh", startIndx + 4); } lngth=str.length()*i; if ((lngth % (1024*256)) == 0) { System.out.println(((System.currentTimeMillis()-time)/1000)+"sec\t\t"+lngth/1024+"kb\t\t"+runtime.totalMemory()/1024+":"+runtime.freeMemory()/1024+":"+(runtime.totalMemory()-runtime.freeMemory())/1024); } } } } /* exec.tm.sec str.length allocated memory:free memory:memory used 0 0 32320:32103:216 2sec 256kb 32320:29420:2899 9sec 512kb 32320:29033:3286 21sec 768kb 32320:28250:4069 38sec 1024kb 32320:26692:5627 59sec 1280kb 32320:23612:8707 85sec 1536kb 32320:22116:10203 116sec 1792kb 32320:23647:8672 153sec 2048kb 32320:22101:10218 194sec 2304kb 32000:14067:17932 240sec 2560kb 32000:12571:19428 292sec 2816kb 32192:14283:17908 348sec 3072kb 32192:12713:19478 410sec 3328kb 32064:14356:17707 477sec 3584kb 32064:12827:19236 549sec 3840kb 32128:14615:17512 626sec 4096kb 32128:13095:19032 */
Surprisingly this took away stress from garbage collection allowing Java to finish the test in 626 seconds only. (Thanks Brian Bason!)
However IMHO this somehow proves that Java is ineffective and overcomplicated because with all the expertise and effort required to optimise Java test case, Perl code modified to use moving window substitution completed the test in less than 2 seconds - somewhat 300+ times faster than Java.
Once again to achieve reasonable performance Java require low level approach which is not only labour intensive but also can't compete with speed of other languages.
Language features
Sometimes comfort and speed of development may outweigh performance and memory usage. Or in other words, perhaps sometimes performance and memory usage may be sacrificed in favour of quicker/easier development. For example, it is understandable if higher level language is chosen over C in order to benefit from automatic memory management. In this section I'm going to briefly scratch the surface of comparing language features.
Whilst it's quite a philosophical statement, language features play an important role in development.
Let's see how easy can we parse an integer value from text string in popular languages. This task only looks straightforward. In fact there are plenty caveats.
In Java we could do something like
//Java int val; val = Integer.parseInt("10000000000");But there are problems. The example above will not only fail to parse correct value, but actually crash the entire application because of unhandled exception. Sometimes gotchas like this may byte you when you do not expect it: In this Java example
//Java val = Integer.parseInt("-10"); //this will work val = Integer.parseInt("+10"); //but not this - silly!parsing integer from "+10" crashing application. To emulate this behaviour in PHP or Perl we have to explicitly create point of failure:
$val=intval($str) or die("it didn't work");In Java pretty much any call that does something can be a failure point unless enclosed within ugly try-catch statements. So to avoid crash we have to wrap 'dangerous operations like this:
//Java try { val = Integer.parseInt(str); } catch (NumberFormatException nx) { //it didn't work, do something about it here }In fact try-catch is a fancy syntax for if-else. Similar operation in PHP will not crash, but we can wrap it with if-else to make sure number parsed successfully.
#PHP if($val=intval($str)){ # please note this has "zero case" caveat: in PHP and Perl 0 = 'false' print $val; # so $val will not get 0 if input string is '0' (zero) }Python and Ruby use similar to Java fatal behaviour. Is that good? Perhaps sometimes. However in many cases returning something is better than nothing. Application may not do exactly what's expected but it may be considered to be better than crash. Perhaps you want your application to keep running despite minor error instead of terminating. Maybe particular part of application is not too important to try-catch absolutely everything. I've seen many examples of this in web applications when seemingly innocent operation is in fact a fatal failure point leading to application crash. Several times I had to troubleshoot Java and Python web-apps made by different teams, in different companies, in different time but all of them used to crash on string transformations because of uncatched/unhandled exceptions when unexpected character came from database. Needless to say this was causing a great deal of frustration for users of those applications. You may argue that developers created those applications were incompetent. Could be. However development approach enforced by necessity of catching all possible exceptions is troublesome, difficult and slow. Obviously It clutters the code by generating 'noise' and implies a routine not strictly related to application's logic. I think forgiving nature of Perl better match Test Driven Development when developer is not distracted with try-catch and therefore can concentrate on making code better, create more tests, check input values etc.
String (str) | Java Integer.parseInt(str) or Integer.valueOf(str) | PHP intval($str) | Python int(str) | Ruby str.to_i | Ruby Integer(str) | Perl int($str) | C++ istringstream buffer(str); double val; buffer >> val; | C++ istringstream buffer(str); int val; buffer >> val; | C++ double val=atoi(str) | C++ int val=atoi(str) |
" 1111" | exception | OK | OK | OK | OK | OK | OK | OK | OK | OK |
"10.0" | exception | OK | exception | OK | exception | OK | OK | OK | OK | OK |
"10000000000" | exception | incorrect: 2147483647 | OK | OK | OK | OK | OK (1e+10) | incorrect: 134520252 | incorrect: 2.14748e+09 | incorrect: 2147483647 |
"2e+2" | exception | incorrect: 2 | exception | incorrect: 2 | exception | OK | OK (200) | incorrect: 2 | incorrect: 2 | incorrect: 2 |
"-10" | OK | OK | OK | OK | OK | OK | OK | OK | OK | OK |
"+10" | exception | OK | OK | OK | OK | OK | OK | OK | OK | OK |
"asdasd" | exception | 0 | exception | 0 | exception | 0 | 0 | incorrect: 134520248 | 0 | 0 |
"0.0" | exception | incorrect: No value parsed | exception | OK | exception | OK | OK | OK | OK | OK |
"00" | OK | incorrect: No value parsed | OK | OK | OK | OK | OK | OK | OK | OK |
"2+3" | exception | 2 | exception | 2 | exception | 2 | 2 | 2 | 2 | 2 |
1 2e+2=2*102=200
Java has the most number of exceptions to handle - of course you may handle them as one but, as demonstrated in this example, a usable value can be parsed in most cases so if you want to do a good job you have to do it yourselves, for every case. Java is the only language which couldn't extract value from "+10".
Python is slightly smarter with recognising numbers in strings.
Ruby has two different methods to do the job - it is confusing which one is better.
PHP silently parses incorrect values.
Complexity and power of C++ vividly manifested in this example: you can choose from 4 different ways to parse a value from string but as soon you know which one of them is right, results are nearly perfect.
Since return value has to be a number, it returns 0 for non-numeric strings so it can be treated as exception to somehow determine if it was an error or an actual value.
Perl demonstrated perfect result. From the first look you may see that it's almost similar to C++: it returns 0 from non-numeric string. However with standard
use warnings;a non-fatal warning will be issued: "Argument "asdasd" isn't numeric in int at ./tst.pl line 8." This warning can be converted to fatal with
use warnings FATAL=>'numeric';Now we have an exception to catch like in the following example:
#!/usr/bin/perl { use warnings FATAL=>'numeric'; my $str="asdasd"; my $num=eval {int $str}; if(defined $num){ print "we got it - it's $num"; }else{ print "error: $@"; # with "use English;" the line above could look like: print "error: $EVAL_ERROR"; } }There are some important things to note:
- Fatal exception is enabled by developer's decision
- Only for particular problem;
- Only for particular block, so exception scope is strictly defined
- Only core language functionality used
- It works perfectly, including "zero case" and "2+3"
- It provides human-readable explanation of failure
- It extracts all usable values
- With minimal effort
#!/usr/bin/perl use Try::Tiny; use warnings FATAL=>'numeric'; my $str="asdasd"; my $num = try { int $str; } catch { die "error: $_"; }; print q{we got it - it's },$num;
Some links below might be interesting in order to compare languages' syntax:
Compare structure of Perl, Ruby, Python, Java and PHP
Wikipedia: Exception handling syntax
Notes (per language)
PHP
PHP is not a universal language. Perhaps it may be considered for web development only.
Another problem with PHP is administration needed to configure runtime for different applications. Some PHP applications have different expectations regarding notorious "Magic quotes" runtime parameter. Read more in Wikipedia: Magic quotes criticism.
Runtime is fast but not very compact. PHP has reputation of lightweight and fast language. While first happen to be false (PHP memory usage is quite big comparing with Python, Ruby and Perl5) it is a close second after Perl5 in Performance.
In some situations PHP functions cannot be trusted as demonstrated in "parsing integer from string" example.
Ruby
Ruby is universal but relatively young language. Its availability on different platforms is still limited and history of introducing backward incompatible changes makes development and maintenance unnecessary complicated. Performance and memory usage of Ruby and Python are close to each other. While Ruby is slightly faster, Python utilises memory better.
Python
Python is ripe and universal language. It stands strong enough during this test. However Python is interpreting white spaces and tabs. This particular 'feature looks unnecessary and silly especially after so much being said about importance of separation presentation from logic. Presentation is logic in Python. Python enforces certain way of formatting code in the most rude way I can imagine. Unless it makes your eyes bleed you may find peace in Python especially after Java. Its "whitespace as constraint" could make reading/writing code harder. To my understanding the only explanation for such strange Python's feature is that you can literally see the code flow pretty much the way interpreter see it. I doubt that good coding style can be effectively enforced - readable code formatting can be easily achieved with other languages through exercising best practice guidelines.
In a way Python use military dress code - all applications should wear the same uniform.
How this can make programming task easier? I believe the more freedom programming language gives you - the better.
"There is no programming language - no matter how structured - that will prevent programmers from making bad programs."Read more about Python's white spacing in The hard edges of Python.
-- Larry Flon
Perl5
Perl5 demonstrated amazing performance and memory usage far beyond all other languages tested. It proved to be most optimised, ripe and stable language. While some people believe it to be the most advanced programming language in the world it is clearly a very good choice.
- Perl proved to be an extremely effective, highly optimised language.
- Perl has a massive library of reusable code.
- Perl is mature: it's 23 years old; (Perl5 is 17 years old).
- Perl is very portable.
- Perl is elegant and flexible.
Some of those myths:
- Myth: Perl is UNIX shell on steroids.
- This is really an insult to Perl which is much more than this. In year 2010 Perl is a very mature and universal language with perhaps largest library of reusable code available. In Perl you can write GUI applications, web applications, systems daemons etc. It is possible to pack Perl's application, runtime and libraries to windows executable and distribute as single .EXE file. Perl's object oriented features and flexibility are far beyond perhaps any other language. Learn more about Modern Perl (presentation).
- Myth: Perl is "write once - read never"
- Perls often falsely accused of lack of readability. I confess - sometimes I have problems reading my own poor handwriting from notes I took weeks ago. However is has nothing to do with language I use. With certain discipline you can develop clear, understandable and maintainable code in any language. It's all a matter of learning good habits like commenting the code (especially if you're not the only developer) or choosing meaningful long names for variables etc. It comes with experience. You can't blame programming language for lack of clarity in your code just like you cant blame natural language for its inappropriate use. If your Perl code is not beautiful you're doing it wrong - there is another, nice way.
Most people that complain about syntax have none or very little experience in Perl -- YAPC::EU::2009 - How Opera Software uses Perl presentation.
Perl is truly language of freedom. It gives amazing power and has features, non existing in other languages. Those powers can be used to create nice, tidy, clean and yet effective and concise code. Of course same powers can be used to write obfuscated code but, again, this is not a language problem because it is also possible with other languages. This is best explained the by creator of Perl himself (emphasis added):
Let me state my beliefs about this in the strongest possible way. The very fact that it's possible to write messy programs in Perl is also what makes it possible to write programs that are cleaner in Perl than they could ever be in a language that attempts to enforce cleanliness. The potential for greater good goes right along with the potential for greater evil. A little baby has little potential for good or evil, at least in the short term. A President of the United States has tremendous potential for both good and evil.Reasons for using Perl summarised in Why Perl?
I do not believe it is wrong to aspire to greatness, if greatness is properly defined. Greatness does not imply goodness. The President is not intrisically "gooder" than a baby. He merely has more options for exercising creativity, for good or for ill.
True greatness is measured by how much freedom you give to others, not by how much you can coerce others to do what you want.
Larry Wall http://www.wall.org/~larry/pm.html
Java
Just like Perl, Java is a subject of numerous myths misrepresenting its real position.
Despite commercial popularity there are multiple problems with the language:
* Poor memory management (garbage collection):
IMHO Java suffers from a garbage collection problem. If you don't allocate objects and maybe use only static methods, Java can be quite fast. But when you start creating huge amounts of objects (like required when working with Java's String class) its memory use and performance are getting worse and worse.
In theory GCs should be at least as fast as manual memory management or reference counting (which Python uses). Instead of wasting time for memory management while the program is working, it defers the memory management until the program is idle or it runs out of memory. Unfortunately on today's systems, memory is extremely slow and CPU cycles are cheap, and this is why the GC theory does not work. The Java VM constantly trashes the cache because it does not re-use memory fast enough. Instead it takes new (usually uncached) memory for new objects und defers freeing the unused memory of old objects (that are in the cache). This is probably the worst thing that you can do to the cache. A good VM would try to re-use memory as soon as possible, to increase the chances that it is still in cache (like Python's refcounter). Java does the opposite.
To make things worse, the VM seems to lack any coordination with the kernel. When the system is running out of RAM and needs to swap, the logical action for the VM would be to start the garbage collector. It doesn't however, and instead it starts allocating the new memory, forcing the kernel to move the old (unused) memory into the swap space! And when the VM finally decides to start the GC it will go through all the unused memory that is now in the swap, causing it be reloaded and possibly moving more frequently used memory back in the swap, only to re-load it again later. How much worse can it get?
-- Java has a GC problem, posted 10 Feb 2003 at 16:55 UTC by tjansen
Historically Java was successful partially because developers found it attractive comparing to C due to "automatic" memory management. It's turned to be a Java's greatest weakness. In C memory should me managed by developer to the contrast to Java where memory usually managed by systems administrator. In numerous papers explaining sophisticated garbage collection you may find dozens(!) parameters for memory tuning. And trust me, because Java developers usually cannot predict application's behaviour under load the only reliable way to configure memory management for particular application is to test, change parameter(s) and test again and again. Sometimes it helps. But defaults often not good enough, and it's too easy to make a mistake. Despite configuring Java "automatic" memory usage, developers can do very little. Java applications are handicapped by default.
* Verbosity:
Consider the following HTTP POST example:
Java | Perl |
---|---|
import java.net.URL; import java.net.HttpURLConnection; import java.io.DataOutputStream; import java.io.InputStream; import java.io.InputStreamReader; import java.io.BufferedReader; public class java_post { public static void main (String args[]) throws Exception { System.out.println( executePost("http://www.smh.com.au/execute_search.html", "text=fluoride") ); } public static String executePost(String targetURL, String urlParameters){ URL url; HttpURLConnection connection = null; try { //Create connection url = new URL(targetURL); connection = (HttpURLConnection)url.openConnection(); connection.setRequestMethod("POST"); connection.setRequestProperty("Content-Type", "application/x-www-form-urlencoded"); connection.setRequestProperty("Content-Length", "" + Integer.toString(urlParameters.getBytes().length)); connection.setRequestProperty("Content-Language", "en-US"); connection.setUseCaches (false); connection.setDoInput(true); connection.setDoOutput(true); //Send request DataOutputStream wr = new DataOutputStream ( connection.getOutputStream ()); wr.writeBytes (urlParameters); wr.flush (); wr.close (); //Get Response InputStream is = connection.getInputStream(); BufferedReader rd = new BufferedReader(new InputStreamReader(is)); String line; StringBuffer response = new StringBuffer(); while((line = rd.readLine()) != null) { response.append(line); response.append('\r'); } rd.close(); return response.toString(); } catch (Exception e) { e.printStackTrace(); return null; } finally { if(connection != null) { connection.disconnect(); } } } } | #!/usr/bin/perl use LWP::UserAgent; my $ua = LWP::UserAgent->new; my $res=$ua->post( 'http://www.smh.com.au/execute_search.html', { text=>'fluoride', } ); print $res->is_success ? $res->content : $res->status_line; |
Ruby | |
#!/usr/bin/ruby require "uri" require "net/http" x = Net::HTTP.post_form(URI.parse('http://www.smh.com.au/execute_search.html'), { 'text' => 'fluoride', } ) puts x.body | |
Python | |
#!/usr/bin/python -u import urllib, urllib2 data = urllib.urlencode({ 'text' : 'fluoride', }) req = urllib2.Request('http://www.smh.com.au/execute_search.html', data) response = urllib2.urlopen(req) print response.read() | |
PHP | |
<?php $postdata = http_build_query( array( 'text' => 'fluoride', ) ); $opts = array('http' => array( 'method' => 'POST', 'header' => 'Content-type: application/x-www-form-urlencoded', 'content' => $postdata ) ); $context = stream_context_create($opts); print file_get_contents('http://www.smh.com.au/execute_search.html', false, $context); ?> |
June 2011 Update: Greg McLaghlan made a good point: I think the Java verbosity example is a little misleading. If we compare it to the Perl example, you are loading Perl module which handles the http post whereas in the Java example you actually code that. It could be argued that that part of the code could have been packaged up and loaded just like the Perl module. It's a minor point I guess.
Yes this is true, but first I chosen the job to do and then it turned out that standard Java distribution does not come with HTTP Post methods by default.
Other languages have instruments to help with similar task in their standard distribution.
I believe it would be incorrect to involve 3rd party libraries to comparison, however here you may find a Java example of HTTP Post using Apache libraries. It is 33 lines long (no empty lines) - about 40% shorter than original Java example but nowhere near as compact as other languages: 2nd longest HTTP Post code is PHP - only 14 lines.
That's how one person expressed his frustration of Java verbosity in his blog:
Whenever I write code in Java I feel like I'm filling out endless forms in triplicate.And one of the comments from above blog's discussion (there are some other comments worth reading):
"Ok, sir, I'll just need your type signature here, here, and ... here. Now will this be everything, or..."
"Well, I might need to raise an exception."
The compiler purses its lips."An exception? Hmmm... let's see.... Yes, I think we can do that... I have the form over here... Yes, here it is. Now I need you to list all the exceptions you expect to raise here. Oh, wait, you have other classes? We'll have to file an amendment to them. Just put the type signature here, here, ... yes, copy that list of exceptions....
I think the problem with Java is not it's verbosity, but as someone else said "infrastructure framework". I have to go through so many classes, through so much leaps and bounds, to do anything.Verbosity is bad because code is read more times than its written therefore verbosity increases effort needed to maintain code.
I need a factory, to create a manager. Then I factor anther factory, to create a stream, then assign that stream to the manager. Afterwords, I give the manager to a dispatcher.
Then there is an uncaught exception and I have to sift through 50 lines of junk to actually find out what went wrong.
Java verbosity hurts both maintaining and development.
Usually Java developers claim Java code is easier to develop/maintain. I failed to discover any particular Java language feature to support that claim. Java's makes developers to do a lot of work even for simplest tasks.
Java's bad performance and memory usage are not compensated by any particular language feature(s).
It is far behind other languages in both performance and memory usage/management.
Time needed to tweak and test memory management together with maintenance and troubleshooting efforts are horrifying.
Personal experience:
Results of this testing are consistent with my personal experience.
Over the years I was involved in several projects where all Java applications demonstrated miserable performance while having tremendous system requirements.
Once on public-facing web site I found problematic ~1000+ lines long Java servlet. Incapable of fixing it I couldn't think of better solution than to rewrite it from scratch in different language.
In several days I produced ~200 lines Perl application, running up to 10 times faster than original Java application. Numerous bugs were fixed in process, and new version was easier to debug and had some improvements and new features.
I can't recall a single Java application server which doesn't degrade. Apparently they all leak memory so sooner or later they should be restarted. (I'd like to believe there are exceptions somewhere).
As a matter of fact restarting Java application servers is common practice in the industry, however it appears that only Java really needs it. It seems unnecessary for stable software like Apache web server which can run for years without restart. I rarery let busy Java application run longer than a week while web-facing Java application servers restarted nightly.
Another example vividly demonstrates problems with Java's memory management: once I found that particular web-facing Java application could handle no more than 24 simultaneous requests. (you may suspect it was running on old/virtualized server but it was really a relatively up to date machine, 8 x Intel(R) Xeon(R) CPU L5420 @ 2.50GHz/RAM 6 GiB/CentOS 5.5 GNU/Linux system) After days of tweaking and testing we found that capacity can be increased (doubled) by allocating more memory but this negatively affected response time. Too little memory is not enough; too much and garbage collection is choking.
Ridiculous solution was found: to farm Java application servers on the very same hardware, to give each just enough memory and to restrict maximum simultaneous connections per backend on load balancer. Needless to mention this "solution" cost great deal of effort - to set up, test, tweak memory parameters, test again etc.
Later developers managed to optimise application a little but two or more Java application servers per physical server are still working better than one.
Because of history of degradation each Java application server in a farm runs no longer than 24 hours - they all restarted overnight in round-robin manner. (Believe me it's much better than wake up at 3:00 just to do monkey's job restarting another Java application server which stopped responding.) That much effort needed only to ensure system's normal functioning.
Remarkably this service hosted on 8 HP Proliant G6 servers with two quad-core Intel(R) Xeon(R) CPUs - 64 CPUs (cores) total, and 72 GiB of RAM. With database size only 1GB the whole system can merely respond to ~180 simultaneous HTTP requests (lesst than 3 visitors per CPU and 2.5 GiB RAM per connection) - a tremendous waste of resources.
I remember several cases when new Java application release introduce negative change to backend capacity (surprisingly release/QA team wasn't aware) so during peak hours servers were collapsing unable to sustain load because load balancer was configured to allow more connections to backends than they could handle. Sometimes allowing just 20 less requests make a difference.
Another interesting problem was discovered when about 2500 MiB were allocated to JVM on x86 platform: Resin (Java application server) was crashing under load, sometimes every hour if enough load was provided. Apparently that was because of lack of addressable space (memory), not for application which got pre-allocated 2500 MiB, but for Java runtime itself which on some occasions tried to allocate memory for internal needs and failed.
Java - summary
As you may see from this research, in all three categories Java behave extremely bad, like no other language.
Java applications cannot match a fraction of other language's performance.
Java applications are truly the most expensive in development and administration.
Java needs more system resources i.e. more memory and more processing power. Usually more servers and therefore more electricity needed i.e. Java is not environment-friendly.
Fragile Java application servers need to be periodically restarted.
Unnecessary sophistication creates more points of failure so Java web application's availability is usually not somewhat impressive.
To make high-quality Java code and to run it in well-optimised environment requires tremendous effort and experience. Even then performance and capacity will be a fracture of similar system implemented in different language. By simply using different programming language same result can be achieved with less effort in development, debugging, maintenance and administration. Fortunately there is a good choice of mature languages to use - Nowadays in 2011 there is nothing you can do in Java that cannot be done in other languages.
No matter which other *mainstream* language will you choose - your applications and experience will benefit from switching.
Even if your Java skills are profoundly good, your only excuse to use Java is personal convenience. Lack of experience with other languages should be motivation to learn rather than excuse for using Java. Everyone will benefit from better applications written in other language(s).
Java is disaster. A disease. Rooted deeply to industry it is hard to escape it while ignorant architects keep pushing it. Java is a trap for system architects and managers who know no other languages. Typically they do not understand Java weakness and tend to overuse it because that's "the only tool" for the job. Those people should learn. Blind beliefs that Java is universal and good for any job simply can't be more wrong. Java not suitable for *anything*.
When starting new project you hardly can seriously consider writing it in Lua or tcl. However those languages beat Java in speed/RAM usage. Saying that Java is equally suitable for a job than tcl/Lua would be a compliment to Java. Gap between Java and other languages is so huge, so it would be a good idea to avoid Java whenever possible,
disregarding of how familiar with language you are.
More information about Java problems and weaknesses can be found in excellent Sean Kelly's videos:
Recovery from Addiction
Better Web App Development
Java Quotes:
- "If Java had true garbage collection, most programs would delete themselves upon execution."
- -- Robert Sewell
- "Complexity kills. It sucks the life out of developers, it makes products difficult to plan, build and test, it introduces security challenges, and it causes end-user and administrator frustration."
- -- Ray Ozzie
- Java is the SUV of programming tools. A project done in Java will cost 5 times as much, take twice as long, and be harder to maintain than a project done in a scripting language such as PHP or Perl. ... But the programmers and managers using Java will feel good about themselves because they are using a tool that, in theory, has a lot of power for handling problems of tremendous complexity. Just like the suburbanite who drives his SUV to the 7-11 on a paved road but feels good because in theory he could climb a 45-degree dirt slope.
- -- Greenspun, Philip
- Java: write once, run away!
- -- Cinap Lenrek
- Java is like a variant of the game of Tetris in which none of the pieces can fill gaps created by the other pieces, so all you can do is pile them up endlessly.
- -- Steve Yegge (2007, Codes Worst Enemy)
- JAVA truly is the great equalizing software. It has reduced all computers to mediocrity and buggyness.
- -- NASA's J-Track web site
- Using Java for serious jobs is like trying to take the skin off a rice pudding wearing boxing gloves.
- -- Tel Hudson
Conclusion
To take the right tool for a job it is important to understand position of programming Languages to each other. Tricky decision is easier to make if you consider right things while avoiding irrelevant ones.
There are some things irrelevant to good decision:
- Your favourite language at the moment.
- You may be very good and comfortable with language you already know, but this is not good enough excuse for not considering alternatives. Learning is important.
- Language creator(s) personality.
- It simply doesn't matter if you like them or not or even who they are.
- Your expectations regarding language features.
- It is always takes time to get used to new things especially if they are quite different.
- Speed of learning.
- Some languages have short *startup* learning curve. However in reality it is more like a "A minute to learn, a lifetime to master". This idea best explained by Peter Norvig in his Teach Yourself Programming in Ten Years essay.
There are some things to avoid:
- Considering one single language feature alone.
- Considering only one language feature, like speed or memory usage, will inevitably lead to wrong decision.
- Narrow purpose languages.
- Specialised languages like PHP may be good for web development only. When you need to do something different or simply extend the task's scope, a language for particular use only may not be good enough.
- Non-portable languages.
- Cross-platform portability matters. Too many people locked-in, stuck with windows-only technologies with only little hope of escaping.
- Non-free license.
- Non-free licenses comes with risks and restrictions.
There are some valuable things to consider:
- Availability of reusable code
- Even the best language in the world worth little without good free libraries.
- Free license.
- Freedom is very important, even if you don't fully understand why.
- Universal languages.
- Universal languages like Perl5 are generally good for pretty much any task. Universal languages are more powerful by definition which makes your skills universal.
- Well-portable languages
- Some time later software may be ported to different platform or operating system. Portability guarantees choice. Choice is good.
- "Feels good" feeling
- Essentially your feelings towards language is an ultimate merit of its goodness for you. For example, not all people can be comfortable with Python, but if you're OK with it you can tell from how comfortable it feels. Coding is fun if you like the language. Fun helps to make better programs.
FAQ.
- You deliberately make this test tough for Java! Java not optimised for strings.
- The key words here are "not optimised". (Apparently it was tough only for Java.) OK, if Java not optimised for strings, please let me know what exactly Java is optimised for.
- You deliberately chosen string manipulation to show Java weakness.
- Not quite... As I explained in the beginning, I believe strings are good test subject for comparison. I did expect Java wouldn't be the winner, but I certainly couldn't expect that miserable performance. Initially there was no Java in this testing - it has been added later.
- That's no surprise Java is slow.
- Even if you already knew it's slow, did you know about performance degradation and garbage collection problems? Did you know HOW slow it is? Honestly?
- Java is so slow because strings are immutable in Java.
- Immutable strings are not unique to Java. For example, strings are also immutable in Python. Python performed very well in this testing.
- Java's internal string representation in memory is UTF16 so Java has to do more work comparing to single-byte representation.
- This may be the case for other languages as well. However this does not explain why Java performance so much worse. If that affects Java test results - it may be one of those differences I'm trying to emphasise. Please note that in this test only Latin characters were used. Other languages support unicode as well. Test case based on defaults so no encoding has been explicitly chosen, neither UTF support explicitly disabled or enabled.
- What's wrong with Java?
- Well, everything. :( Read the gory details above. In short Java's biggest problems are inefficient Garbage Collection and verbosity. Unfortunately those problems are not compensated by any language features. Java's Language features looks poor comparing to other languages. Java development and maintenance require a great deal of effort.
- You shouldn't write a real code like this.
- True, but that's test code, remember? It's made slow deliberately, to produce computational load for comparison. Job can be done hundred times faster if optimised. Pretty much any artificial test would be quite different from reality. However even if test code doesn't look like real application, it clearly reflects problems that are manifested in real applications.
- Java works for some companies.
- We may disagree on definition of "works". Sometimes definition is quite loose - once I've been told that for production web site 2% of request timeouts is acceptable for business. (Yes, it was a web-facing Java application, of course.) I believe any number of timeouts for public facing web site is intolerable. If Java not expected to perform well we may have a double standards problem. If you look at companies who successfully maintain sophisticated Java services you may find that most of them are big companies who have virtually unlimited resources. If you can have as many servers as you want, as much staff as you want and as much time as you want - you can make everything work, but at what cost? Big companies may have luxury of being inefficient. Java may work for you if your survival doesn't depend on your effectiveness.
- Why you devote so much attention to Java and Perl and so little to Python and Ruby?
- I'm working in environment where Java is dominating. At the same time both Java and Perl are the most misunderstood languages around. In the minds of many developers, managers and system architects Java stands inadequately high while Perl is usually treated badly. Because in general industry so predetermined I believe it is necessary to do some explanations. There are not as many myths regarding Python and Ruby and their features are not so controversial. Perhaps if I were more competent with Python and Ruby I would have more to add.
- Java is good, I know how to make a great applications with Java.
- Great, you must be very talented, because for ordinary developer a great effort and experience is needed to overcome numerous problems of developing in Java. If you have to be a genius to create good and reliable Java applications, it's simply too difficult to mere mortals i.e. for most developers. (Author of this article consider Java too difficult for himself). Unfortunately Java problems, like garbage collection, exist even for well-written programs. Despite problems, comparing to some other languages there is considerably greater effort required for Java to achieve the same result. You may have better productivity with different language.
- My Java application works well.
- Probably it barely does anything or is not loaded enough to show performance degradation. That's a typical case when no more than few people using application at the same time or when application is extremely simple.
- Should we choose Java for our new project?
- By all means if
- you want to sabotage project
- you want it to be as expensive as possible
- speed of development doesn't matter
- product quality doesn't matter
- developers refuse to learn
- you didn't read/understand this article.
- What about .NET ?
- .NET (dot net) not so portable so it doesn't satisfy criteria for choosing languages. Because it has so much to do with Windows and Microsoft I see no reason for considering dot Net disregarding of its features or performance. Quoting Oktal: "I think Microsoft named .Net so it wouldn't show up in a Unix directory listing." Dot Net's license is not free which raises an ethical issue as well. There are no reasons to work with non-free language whatsoever. As a matter of fact proprietary nature is a strong argument against dot NET.
- You've just started another flame war.
- No I've not. Results of testing speak for themselves, even without examples from my personal experience. I have no agenda to soften embarrassing Java's performance to make Java users feel not so bad. If your favourite language wasn't the best in this testing perhaps you may benefit from learning something else and this article aims to encourage such learning. Learning, if done right, leads to better decisions. We need better decisions because industry will benefit from it. Sadly too many people who have been taught Java in Uni know too little about other languages to make good decisions. From my experience I know that Java professionals sometimes take results of this testing personally. It is good, because it is natural to feel outrage knowing how poor their programming language comparing to others. It is good because this outrage may encourage learning which eventually help to create better applications.
Credits
I'm indebted to patient colleagues of mine who kindly provided important feedback and criticism for this research.
I'm grateful to my family - numerous times they had to go out without me when they couldn't separate me from computer;
I'm obliged to my manager who tolerated discussions related to this research and somehow partially inspired it;
At last I'm thankful to Cityrail for providing reasonable comfort which makes possible to work on trains during traveling to/from city.
Links
- Test Driven Development presentation
- Languages:
- Software used
If you found this essay interesting please donate below to support the author.
Comments are moved to
http://raid6.com.au/~onlyjob/posts/arena/#comments
Please update your links.