I do agree with previous poster christantillo - just the number of nodes a language needs may not be the perfect test. But I also find it reasonable to go with what Paul Graham points out in his own comments - it is a good measure.
Having said that I must say that the arc solution is quite readable, while some other solutions are harder to understand. This is a very subjective statement (and probably displays more of my incompetence in some languages than in my competence in others), but if sufficiently many people are of the same opinion then it may be a useful observation.
I do have some comment on the actual task, though: Doesn't the challenge play to a particular strength of arc? Also isn't there some dependence on the available libraries? If some language doesn't have a powerful library in which the benchmark takes place it will naturally look bad.
I could come up with a task that will be ugly in arc, but plays to a particular strength of another library in a different language. Now, I could replicate that library first in arc (ignoring how long and complicated that may be) and only then count the length of the arc solution which utilizes the library.
But then I am not sure if this would still be a fair comparison.
[Edit: After having tried to write a word in my foreign language (German) I could e.g. suggest an extended challenge which reads the answer in any language. (Actually, some solutions in this challenge will work "correctly" in this sense.) Then the arc solution would indeed require me to write a complicated library first and it would no longer be competitive length-wise.]