( to ) ? be : ! be;
sidebar left sidebar right

Generating call graphs for understanding and refactoring python code (2,650 views)

I have always been a fan of visualizations as I believe firmly that it is easier to crunch visual information than anything else. Visualizations are especially helpful for finding out patterns in data that are not expected and for patterns that are difficult to express textually in a concise manner.

The beast

A couple of weeks back I had the task of refactoring a very badly written piece of python code which violated tons of programming guidelines because of rampant usage of global variables, horrendous variable names and functions whose body extended to a couple of hundred lines of code!

As a first step to taming the beast, I removed all globals used and replaced with explicit arguments to function calls.

After this I had to split the code into multiple files so that different coherent parts maintained by respective teams/persons would go into seperate files. The file I was looking at had 5000+ lines of code and 100+ functions calling each other happily.

The approach

I was trying to figure out how to partition the file when I was struck by an interesting idea. The idea was to generate a call graph by statically analyzing the python code and then look at the generated graph to figure out which functions I could group in a given file split.

Python has a built-in module called parser that can construct an syntax tree by parsing a given piece of python code.

ast = parser.suite(python_code)
ast_list = parser.ast2list(ast)

The syntax tree generated for the python function

def a():
    c()

is

['stmt',
  ['compound_stmt',
   ['funcdef',
    ['NAME', 'def'],
    ['NAME', 'a'],
    ['parameters', ['LPAR', '('], ['RPAR', ')']],
    ['COLON', ':'],
    ['suite',
     ['NEWLINE', ''],
     ['INDENT', ''],
     ['stmt',
      ['simple_stmt',
       ['small_stmt',
        ['expr_stmt',
         ['testlist',
          ['test',
           ['or_test',
            ['and_test',
             ['not_test',
              ['comparison',
               ['expr',
                ['xor_expr',
                 ['and_expr',
                  ['shift_expr',
                   ['arith_expr',
                    ['term',
                     ['factor',
                      ['power',
                       ['atom', ['NAME', 'c']],
                       ['trailer',
                        ['LPAR', '('],
                        ['RPAR', ')']]]]]]]]]]]]]]]]]],
       ['NEWLINE', '']]]]]]]

I wrote a script to get the parse tree for the code I had to refactor. For every function definition, I found the name of the defined function and the functions it was calling (I removed references to built-in functions from the accounting). From the resulting call graph, I generated a graph definition in the dot language used by Graphviz. When this “dot” file is given as input to Graphviz, the final graph image file is generated.

You can download the script from here: construct_call_graph.py.

Usage: (test.py is the input file containing python code and test.dot is the graph definition file for Graphviz

cat test.py | ./construct_call_graph.py > test.dot

The output

call graph for real life code file

I was able to identify orphaned functions that we not being used and functions that were being used by a lot of other functions (utility functions) which I moved to a seperate file. The shaded regions are fairly isolated from the rest of the program and represent a set of functions doing a common task. I moved the two regions to their own files.

Here is the output file for a smaller code file.
feed crawler call graph

callg1.py

            def b(): pass
 
            def d(): pass
 
            def c():
                b()
                d()
 
            def a():
                c()
                b()
 
            a()
            digraph G {
            rankdir=LR
            a -> c;
            a -> b;
            c -> b;
            c -> d;
            b;
            d;
            }

callg1 call graph

callg2.py
this code has a cycle in the call graph. b() -> c() -> b() -> …

            def b(): c()
 
            def c(): b()
 
            def a():
                c()
                b()
 
            a()
            digraph G {
            rankdir=LR
            a -> c;
            a -> b;
            c -> b;
            b -> c;
            }

callg2 call graph

callg3.py
This code has an orphaned function, b().

            def b(): pass
 
            def c(): pass
 
            def a():
                c()
 
            a()
            digraph G {
            rankdir=LR
            a -> c;
            c;
            b;
            }

callg3 call graph

Try your code here

I set up the call graph generator here so you can try it yourself. Before submitting a piece of python code, make sure it does not have syntax errors.

To generate the graph image from “dot” specification file, you will need to install Graphviz. If you have a graph defined in test.dot, the following command will generate a png file for it.

cat test.dot | dot -Tpng > test.png

Links

There is python module out there called “pycallgraph” that builds call graphs by collecting information at run-time. I tried it but it proved too slow for the program I applied it to. I could have filtered to reduce the number of functions it hooked into but was too lazy to try this. You can follow these links to learn more about it.

subscribe to feed

Tags: , , , , , ,

"Generating call graphs for understanding and refactoring python code" was published on November 14th, 2007 and is listed in python.

Follow comments via the RSS Feed | Leave a comment | Trackback URL

Comments on "Generating call graphs for understanding and refactoring python code": 7 Comments

  1. Hugo wrote,

    Wow! That’s sweet… I’ll give it a whirl when I have a big graph to generate. pycallgraph’s speed (slow) made me think it is most useful to use it selectively on particular unit tests - but then I’d have to have good unit tests, which I don’t always have.

    How tough is it to extend this kind of thing to multiple modules? And what will it not work on, eval statements, obviously. How about use of “getattr” and similar? Basically any time a function is passed as a string, the call will be missed. Right?

  2. prashanthellina wrote,

    Although I have not tried yet, extending the script to handle modules and classes should be straight-forward. Like you said, evals, execs and functions being passed around are difficult to handle. Do keep me posted if you get somewhere along these lines. Thanks!

  3. vikraman wrote,

    This is a good way to split a big python script into smaller ones and including them in one “master script”. But i’m not sure whether such a division will be correct “functionally” because you might have to group functions that perform entirely different things in one file.
    plus, There may not be an increase in speed in doing such a thing. However from the point of view of a programmer its pretty neat.

  4. prashanthellina wrote,

    I find it hard to imagine that “entirely different” functions will be calling each other. an example? I believe that functions that call each other will be doing some common task. Also, increase in speed is not a necessary goal of refactoring. In fact in python when you split a function into smaller functions, you incur a performace hit (albeit very small). The goal is to make the code more maintainable.

  5. Rick wrote,

    Thanks for this - great idea.

    I was wondering if you would be willing to send me the script (and any dependencies) you use on this site to generate the graph.
    I have an entirely different kind of data that I would like to graph in a similar way.

    Thanks again,
    Rick

  6. prashanthellina wrote,

    Rick, the code is available already for download. You can find the link in the blog post.

  7. Computer Forum wrote,

    Amazing article! Detailed and very interested. I am going to recommend this blog to my friends.

Leave Your Comment

Subscribe without commenting

101,220 views

Prashanth Ellina is powered by WordPress

No Complaints Shifter Series Theme by Buzzdroid.com
Computers blogarama - the blog directory Blog Flux Directory Blog Directory & Search engine Computer Blogs - Blog Catalog Blog Directory Computers blogs Bloggeries Blog Directory blog directory Computers Blog Blog Search, Blog Directory p Listed in LS Blogs the Blog Directory and Blog Search Engine Blog Review Blog search - categorized blog directory Link With Us - Web Directory Find Blogs in the Blog
Directory Blog Directory