Tuesday, April 29, 2014

My code review checklist

I am not a fan of checklists (especially for code reviews). Code review checklists start small and then slow become a large, unwieldy. After sometime checklist becomes a bottleneck and instead of improving effectiveness of your process, these lengthy checklists start reducing the effectiveness. 

However, there are situations where I used checklists and they worked very well. For example, a Customer Release checklist. There are many small small things that you need to do before sending the new release to customer. Its easy to miss few critical steps. A release checklist worked very well.

I was not sure why typical organization checklists did not work well in cases like code review. Sometime back I read Atul Gawande's book 'Checklist Manifesto'. It triggered my interest in Checklists. As first step I extracted a Code Review checklist from my code review training content. I have used this 'mental' checklist for a many-many years. It has worked well for me even with different programming languages (C/C++, Java, Python, C#, Javascript) and technologies. 

Here is my code review checklist.



PS> Based on my experiences, information from Atul Gawande's book and from information internet, I have now prepared a 4 hour hands-on session on creating and improving the checklists. Contact me if you are interested.

Tuesday, April 08, 2014

Simple Code analysis with TC Tool - Analyzing code duplication

There are many code duplication tools available (opensource and commercial) like CPD - Copy Paste Detector or Simian. However CDD (Code Duplication Detector) in TCToolkit has some unique advantages.In a previous blog post I have explained why I wrote CDD
  • It uses excellent Pygments library for parsing the source code. Hence all the languages supported by Pygments are supported by CDD for duplication check.
  • It is reasonably Fast.  Last few weeks I spent some time optimizing it for speed.
    For example, on my Dell laptop it detected 164 duplicates in 1445 files of Tomcat source code in 45 seconds.
  • It can output duplications in multiple formats. 
    • In simple text format.
    • In HTML format with 'syntax highlighted' duplicate text fragments
    • It can also add Cpp/Java style '// code comments' in the original source code.
  • It create a matrix visualization of duplication to identify any duplication patterns. See the example below for Tomcat source (org/apache/coyote/http11) directory.


Here is the command line that I used for tomcat code analysis
cdd.py -l java -o javadups.htm
To see all the options available
cdd.py --help
There are few other simple code visualization tools in TCToolkit like TTC (Token Tag Cloud) or CCOM (Class Concurrence Matrix). I will explain their usage in later posts.

Give it a try and tell me your opinion.

Saturday, March 22, 2014

TCToolkit Update (Version 0.6.x)

When I consulted to companies on improving their source code (for refactoring it, improving the performance, detecting the design bottlenecks, detecting problematic files etc), I needed a way to quickly analyze a code base. However, there were not many tools available which gave me a quick insight on code. Commercial tools like Coverity, KlocWorks, Lattix etc are expensive. Because i could use it, I had to convince my client to 'license' it and that was difficult. Hence about 2 years back I wrote few python scripts to quickly help me analyze a codebase. Later I open sourced these python scripts a 'TCToolkit'.  

Recently I have done significant refactoring and updates to these scripts and also added some new scripts. Also I have moved the TCToolkit code to Bitbucket. (https://bitbucket.org/nitinbhide/tctoolkit ). 

Important updates are listed below

  1. Improved the performance of CDD (Code Duplication Detector). On my Dell laptop, subversion C code base (around 450 files) can now be analyzed for duplication in about 90 seconds.
  2. Now I use d3js library for generating the visualizations. Token tag cloud (TCC) now uses d3js for generating the tag cloud. CDD uses d3js for displaying the 'duplication matrix'.
  3. A new script 'CCOM' (Class Co-occurrence matrix) is added. This script analyzes the code base and finds out which classes are used together. It displays this information in matrix form.

    For example, class A has class B as member variable, or member function of class A uses class B as parameter then class A and B are treated as occuring. If a function takes two parameters objects class B and class C, then class B and C are treated as 'co-occurring'.
    If classes are co-occurring, then chances are there is some dependency between their functionality and hence changes in one MAY impact other.
  4. smjstreemap.py : This script generates a treemap visualization from the excellent freeware code metrics tool SourceMonitor. It also uses d3js for displaying the treemap.
 Give it a try on on your code base and see what kind of insights you get about your project.