Somewhat off-topic, but...
Out of curiosity, I played around with those examples a bit.
I had to slightly modify all of them, first to get Perl and Ruby to run
at all, and then to make sure all of them produce tabs instead of spaces.
Btw: The Perl version adds an extra tab at the end of each line.
perl -anF'\t|\n' -e'$n=@F-1 if \!$n;for(0..$n){push@{$$m[$_]},F\[_]} END{print map{join"\t",@$_,"\n"}@$m}'
python -c "import sys; print('\n'.join('\t'.join(c) for c in zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))"
ruby -e 'puts readlines.map(&:split).transpose.map(){|x|x*"\t"}'
The result shows (as expected) that this kind of comparison is utterly
meaningless. There are simply too many factors out of your control
that can influence the result. Between Perl and Python, the
executables that happen to be installed on my box pretty much get the
opposite result than you reported. Maybe my Python is in better shape
than yours, because it gets more exercise...
I'm actually a bit surprised by the bad performance of Perl. One of
the reasons may be the suboptimal algorithm, which explicitly loops
through the data in the interpreter. The other two use a functional
approach, where the heavy lifting is handled in C. Ruby has a slower
startup time, but its operating performance is much closer to Python.
I also didn't expect that much of a speed-up with Python 3 over 2.
The Python version is easy to understand, once you know that the
builtin function zip() is equivalent to Rubys transpose(). The rest
is IO and string manipulation. You also may not be familiar with
generator expressions. Very powerful stuff!
150 Kb (160 x 160 matrix)
0.2 perl 5
0.1 python 2.7
0.02 python 3.4
0.4 ruby
6 Mb (1000 x 1000 matrix)
1.02 perl 5
0.36 python 2.7
0.22 python 3.4
0.48 ruby 2.1
24 Mb (2000 x 2000 matrix)
4.11 perl 5
1.74 python 2.7
1.13 python 3.4
2.41 ruby 2.1
Cheers
-schorsch
···
Am 2016-04-13 19:01, schrieb Christopher Rush:
For a trivial example of transposing a matrix tab delimited data
file... which of these is most easily understood, reproducible, and
fastest. Also note, I couldn't find a working Ruby example...
perl -anF'\t|\n' -e'$n=@F-1if!$n;for(0..$n){push@{$$m[$_]},F\[_]}
END{print map{join"\t",@$_,"\n"}@$m}'
^^ this takes a quarter of a second on 148K file, and doesn't look
particularly clean but I could probably figure it out by researching
the documentation
python -c "import sys; print('\n'.join(' '.join(c) for c in
zip(*(l.split() for l in sys.stdin.readlines() if l.strip()))))"
^^ this takes half a second on 148K file but looks a bit baffling to me
awk '{for (f=1;f<=NF;f++) col[f] = col[f]":"$f} END {for
(f=1;f<=NF;f++) print col[f]}' | tr ':' ' '
^^ this takes 2 seconds on 148K file which isn't very good, but
probably the easiest to interpret by eye, in my opinion
ruby -e 'puts readlines.map(&:split).transpose.map{|x|x*" "}'
^^ I couldn't make this or any other Ruby examples I found online
work, which might mean I have a basic misunderstand of how to type
this in a single line workflow on the terminal. And all the examples I
could find look like a black box to me because apparently there are
functions built in to do this task.
echo '' >tmp1; cat m.txt |while read l ; do paste tmp1 <(echo $l | tr
-s ' ' \\n)>tmp2; cp tmp2 tmp1; done
^^ this series of commands is basically disqualified because it takes
far too long
Taken from these threads:
parsing - An efficient way to transpose a file in Bash - Stack Overflow
matrix - Transpose in perl - Stack Overflow
--
Georg Mischler -- simulations developer -- schorsch at schorsch com
+schorsch.com+ -- lighting design tools -- http://www.schorsch.com/