Index ¦ Archives ¦ Atom ¦ RSS

Generating call graphs for understanding and refactoring python code

I have always been a fan of visualizations as I believe firmly that it is easier to crunch visual information than anything else. Visualizations are especially helpful for finding out patterns in data that are not expected and for patterns that are difficult to express textually in a concise manner.

The beast

A couple of weeks back I had the task of refactoring a very badly written piece of python code which violated tons of programming guidelines because of rampant usage of global variables, horrendous variable names and functions whose body extended to a couple of hundred lines of code!

As a first step to taming the beast, I removed all globals used and replaced with explicit arguments to function calls.

After this I had to split the code into multiple files so that different coherent parts maintained by respective teams/persons would go into seperate files. The file I was looking at had 5000+ lines of code and 100+ functions calling each other happily.

The approach

I was trying to figure out how to partition the file when I was struck by an interesting idea. The idea was to generate a call graph by statically analyzing the python code and then look at the generated graph to figure out which functions I could group in a given file split.

Python has a built-in module called parser that can construct an syntax tree by parsing a given piece of python code.

ast = parser.suite(python_code)
ast_list = parser.ast2list(ast)

The syntax tree generated for the python function

def a():
    c()

is

['stmt',
  ['compound_stmt',
   ['funcdef',
    ['NAME', 'def'],
    ['NAME', 'a'],
    ['parameters', ['LPAR', '('], ['RPAR', ')']],
    ['COLON', ':'],
    ['suite',
     ['NEWLINE', ''],
     ['INDENT', ''],
     ['stmt',
      ['simple_stmt',
       ['small_stmt',
        ['expr_stmt',
         ['testlist',
          ['test',
           ['or_test',
            ['and_test',
             ['not_test',
              ['comparison',
               ['expr',
                ['xor_expr',
                 ['and_expr',
                  ['shift_expr',
                   ['arith_expr',
                    ['term',
                     ['factor',
                      ['power',
                       ['atom', ['NAME', 'c']],
                       ['trailer',
                        ['LPAR', '('],
                        ['RPAR', ')']]]]]]]]]]]]]]]]]],
       ['NEWLINE', '']]]]]]]

I wrote a script to get the parse tree for the code I had to refactor. For every function definition, I found the name of the defined function and the functions it was calling (I removed references to built-in functions from the accounting). From the resulting call graph, I generated a graph definition in the dot language used by Graphviz. When this “dot” file is given as input to Graphviz, the final graph image file is generated.

You can download the script from here: construct_call_graph.py.

Usage: (test.py is the input file containing python code and test.dot is the graph definition file for Graphviz

cat test.py | ./construct_call_graph.py > test.dot

The output

call graph for real life code
file

I was able to identify orphaned functions that we not being used and functions that were being used by a lot of other functions (utility functions) which I moved to a seperate file. The shaded regions are fairly isolated from the rest of the program and represent a set of functions doing a common task. I moved the two regions to their own files.

Here is the output file for a smaller code file. feed crawler call
graph

callg1.py

def b(): pass

def d(): pass

def c():
    b()
    d()

def a():
    c()
    b()

a()
digraph G {
rankdir=LR
a -> c;
a -> b;
c -> b;
c -> d;
b;
d;
}

callg1 call graph

callg2.py this code has a cycle in the call graph. b() -> c() -> b() -> …

def b(): c()

def c(): b()

def a():
    c()
    b()

a()
digraph G {
rankdir=LR
a -> c;
a -> b;
c -> b;
b -> c;
}

callg2 call graph

callg3.py This code has an orphaned function, b().

def b(): pass

def c(): pass

def a():
    c()

a()
digraph G {
rankdir=LR
a -> c;
c;
b;
}

callg3 call graph

Try your code here

I set up the call graph generator here so you can try it yourself. Before submitting a piece of python code, make sure it does not have syntax errors.

To generate the graph image from “dot” specification file, you will need to install Graphviz. If you have a graph defined in test.dot, the following command will generate a png file for it.

cat test.dot | dot -Tpng > test.png

Links

There is python module out there called “pycallgraph” that builds call graphs by collecting information at run-time. I tried it but it proved too slow for the program I applied it to. I could have filtered to reduce the number of functions it hooked into but was too lazy to try this. You can follow these links to learn more about it.

© Prashanth Ellina. Built using Pelican. Theme by Giulio Fidente on github.