data:image/s3,"s3://crabby-images/9ea56/9ea56de8e7756caa6c379a86b3ab6f3c25cc4eb7" alt="Python vs perl"
data:image/s3,"s3://crabby-images/09aed/09aed779479ebb576ba6acd6b307e4d9189fb6b4" alt="python vs perl python vs perl"
Interpreter startup time on my machine is about half the total wall clock time, 30% of the user time, and most of the system time. Interpreter startup time on the vast majority of systems, Python takes substantially longer to begin running (I believe because more files are loaded at startup).
PYTHON VS PERL SERIAL
data:image/s3,"s3://crabby-images/249fb/249fb74c2e9a7d19706fabdf83622dde4cee6878" alt="python vs perl python vs perl"
I have now also tried subprocess python module as per Strauser, and others. Here's what the read_million.py line-by-line profiling looks like. What can I do to speed up the Python implementation of this script (even if I never reach the Perl performance)? Perhaps the gzip module in Python is slow (or perhaps I'm using it in a bad way) is there a better solution? EDIT #1 For instance, in Perl, I open up the filehandle using a subproc to UNIX gzip command in Python, I use the gzip library. Now, I know that Perl and Python have some expected differences. The Python script spends most of its time on for line in fh the Perl script spends most of its time in if($_ eq "1000000").
data:image/s3,"s3://crabby-images/31ee8/31ee8f72943a04f78a5e23581148d9a0fa1e5077" alt="python vs perl python vs perl"
PYTHON VS PERL CODE
I tried profiling both scripts, but there really isn't much code to profile. Print "This is the millionth line: Python"įor whatever reason, the Python script takes almost ~8x longer: $ time perl read_ time python read_million.py Print "This is the millionth line: Perl\n" My Perl script which processes this file is as follows: # read_ I have a gzipped data file containing a million lines: $ zcat million_ | head
data:image/s3,"s3://crabby-images/9ea56/9ea56de8e7756caa6c379a86b3ab6f3c25cc4eb7" alt="Python vs perl"