Python isn’t always slower than C

I use Python for its libraries. Django and friends make building powerful websites very simple, but I’ve never considered Python operationally fast. It’s not a requirement for me; as long as I can generate and return a page in 300ms from request, and it’s often done much quicker than that.

It’s Fast Enough™. That’s common of most modern server-side languages.

But yesterday a Unix.SE text-processing question popped up. The problem was fairly simple. Read a file with variable length, numbered lines of DNA sequences:

1    ATTGACTAGCATGCTAGCTAGCTGACGATGCGA
2    GCTGACTGACTAGCTAGCATCGACTG
3    TAGCTGCTAGCTGCTGACTGACTAGCTAGC

And write the DNA part from each line to the a file using the first number and adding .seq extension.

All the usual suspects (awk, sed and bash loop) were already there making trouble so I decided to add some non-conventional implementations and a benchmark. The hypothesis being that when you’re chunking through thousands or millions of lines and making just as many write operations, it helps to stick to one environment and fork out less. Amongst my implementations was one for C and one for Python.

The Python version is compact and self explanatory. Open the file, iterate the lines, split the line on whitespace and write accordingly.

with open("infile", "r") as f:
    for line in f:
        id, dna = line.split()
        with open(id + ".seq", "w") as fw:
            fw.write(dna)

…While in C you have to be a lot more exhaustive.strtok is uncharacteristically useful but you need to explain to the system which memory you’re reading into and then munge it around just to concatenate the extension onto the filename.

# include <stdio.h>
# include <string.h>

FILE *fp;
FILE *fpout;

char line[100];
char *id;
char *dna;
char *fnout;

main() {
    fp = fopen("infile", "r");
    while (fgets(line, 100, fp) != NULL) {
        id = strtok(line, "\t");
        dna = strtok(NULL, "\t");

        char *fnout = malloc(strlen(id)+5);
        fnout = strcat(fnout, id);
        fnout = strcat(fnout, ".seq");
    
        fpout = fopen(fnout, "w");
        fprintf(fpout, "%s", dna);
        fclose(fpout);
    }
    fclose(fp);
}

So is Python or C faster?

C obviously. Over a 100,000 line input file, C was 1.3x faster than Python…

But that’s CPython; what I get if I run python on Ubuntu. That isn’t the only available Python runtime. Amongst the contenders Pypy is probably the fastest. It’s a highly optimised reimplementation of almost everything in the Python specs. For most people this means it’s a drop-in alternative. It’s also pretty simple to get the latest copy on Ubuntu:

sudo add-apt-repository ppa:pypy/ppa
sudo apt-get update
sudo apt-get install pypy

pypy my-python-file.py

Back to the benchmark…

Pypy is 1.03x faster than my C

Clearly not that much faster but the code is also a lot more simple. We have Python running at the same speed as C. That in itself is fairly amazing.

I did go on to write a nice C++ option that was slightly faster again and wasn’t too bad on the eye but it’s still a lot more involved than the Python is. I know I’m obviously biased toward Python but it’s something well worth going to as a first choice, especially for simple, scrappy text-processing jobs like this.

If you already have Python code that you’re considering switching to C (modules or full-on), give Pypy a shot first. Worst thing that’ll happen is it won’t work or it’s still not fast enough.

But C gets the last laugh if you have time and expertise

My C implementation uses the default 4K buffering in both directions while Python and C++ are using 8K buffers. If you increase the read buffer to 8KB and remove the write buffer, C takes a convincing lead. Sincere thanks to Julian for optimising the C version.

So C is faster still, but only if you have the knowhow —or effort to work it out during profiling— and then how to fix it… And all the time associated with that. Python (and C++ to a large extend) are furiously easy languages to convey what you mean without wasting mental cycles on hyperoptimising silly things like IO.