Tuesday, April 08, 2014

Simple Code analysis with TC Tool - Analyzing code duplication

There are many code duplication tools available (opensource and commercial) like CPD - Copy Paste Detector or Simian. However CDD (Code Duplication Detector) in TCToolkit has some unique advantages.In a previous blog post I have explained why I wrote CDD
  • It uses excellent Pygments library for parsing the source code. Hence all the languages supported by Pygments are supported by CDD for duplication check.
  • It is reasonably Fast.  Last few weeks I spent some time optimizing it for speed.
    For example, on my Dell laptop it detected 164 duplicates in 1445 files of Tomcat source code in 45 seconds.
  • It can output duplications in multiple formats. 
    • In simple text format.
    • In HTML format with 'syntax highlighted' duplicate text fragments
    • It can also add Cpp/Java style '// code comments' in the original source code.
  • It create a matrix visualization of duplication to identify any duplication patterns. See the example below for Tomcat source (org/apache/coyote/http11) directory.


Here is the command line that I used for tomcat code analysis
cdd.py -l java -o javadups.htm
To see all the options available
cdd.py --help
There are few other simple code visualization tools in TCToolkit like TTC (Token Tag Cloud) or CCOM (Class Concurrence Matrix). I will explain their usage in later posts.

Give it a try and tell me your opinion.

Saturday, March 22, 2014

TCToolkit Update (Version 0.6.x)

When I consulted to companies on improving their source code (for refactoring it, improving the performance, detecting the design bottlenecks, detecting problematic files etc), I needed a way to quickly analyze a code base. However, there were not many tools available which gave me a quick insight on code. Commercial tools like Coverity, KlocWorks, Lattix etc are expensive. Because i could use it, I had to convince my client to 'license' it and that was difficult. Hence about 2 years back I wrote few python scripts to quickly help me analyze a codebase. Later I open sourced these python scripts a 'TCToolkit'.  

Recently I have done significant refactoring and updates to these scripts and also added some new scripts. Also I have moved the TCToolkit code to Bitbucket. (https://bitbucket.org/nitinbhide/tctoolkit ). 

Important updates are listed below

  1. Improved the performance of CDD (Code Duplication Detector). On my Dell laptop, subversion C code base (around 450 files) can now be analyzed for duplication in about 90 seconds.
  2. Now I use d3js library for generating the visualizations. Token tag cloud (TCC) now uses d3js for generating the tag cloud. CDD uses d3js for displaying the 'duplication matrix'.
  3. A new script 'CCOM' (Class Co-occurrence matrix) is added. This script analyzes the code base and finds out which classes are used together. It displays this information in matrix form.

    For example, class A has class B as member variable, or member function of class A uses class B as parameter then class A and B are treated as occuring. If a function takes two parameters objects class B and class C, then class B and C are treated as 'co-occurring'.
    If classes are co-occurring, then chances are there is some dependency between their functionality and hence changes in one MAY impact other.
  4. smjstreemap.py : This script generates a treemap visualization from the excellent freeware code metrics tool SourceMonitor. It also uses d3js for displaying the treemap.
 Give it a try on on your code base and see what kind of insights you get about your project.

Tuesday, February 25, 2014

Brian W. Kernighan on debugging

Just discovered this quote of Brian W. Kernighan on Debugging.
Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it
If you are not sure what is means, here is a stackoverflow discussion about it.
http://stackoverflow.com/questions/1103299/help-me-understand-this-brian-kernighan-quote

And an article explaining the quote by Alfred Thomson
http://blogs.msdn.com/b/alfredth/archive/2009/01/08/are-you-smart-enough-to-debug-your-own-code.aspx

It takes some time to understand this quote. Its a kind of zen Kōan