Issue 24

Securing Opensource Code via Static Analysis (I)

Raghudeep Kannavara
Security Researcher, Software and Services Group
@Intel USA


Static code analysis (SCA) is the analysis of computer programs that is performed without actually executing the programs, usually by using an automated tool. SCA has become an integral part of the software development life cycle and one of the first steps to detect and eliminate programming errors early in the software development stage. Although SCA tools are routinely used in proprietary software development environment to ensure software quality, application of such tools to the vast expanse of opensource code presents a forbidding albeit interesting challenge, especially when opensource code finds its way into commercial software. Although there have been recent efforts in this direction, in this paper, we address this challenge to some extent by applying static analysis on a popular opensource project, i.e., Linux kernel, discuss the results of our analysis and based on our analysis, we propose an alternate workflow that can be adopted while incorporating opensource software in a commercial software development process. Further, we discuss the benefits and the challenges faced while adopting the proposed alternate workflow. Keywords-Static Code Analysis; Opensource; Software Testing; Software Development Life Cycle

Although SCA is not new technology, of late, it has been receiving a lot of attention and is rapidly being adopted as an integral activity in the Software Development Lifecycle to improve quality, reliability and security of the software. Most proprietary software development establishments have a dedicated team of software validation professionals whose job responsibilities also include running various static analysis tools on the software under development, either in an agile environment or in a traditional waterfall model. SCA tools are capable of analyzing code developed using several programming languages and frameworks that include and not limited to Java, .Net and C/C++. Whether the goal is to develop application software or a mission critical embedded firmware, static analysis tools have proved to be one of the first steps towards identifying and eliminating software bugs and are critical to the overall success of the software project.

Having said that, it is clear that there are typically either commercial interests and/or security issues involved in licensing and exhaustively applying expensive static analysis tools in software development efforts with very little or no scope for failure. What then remains as an expansive, unexplored void are the endless realms of opensource code that is usually assumed to be well reviewed since opensource software has been used and reused in countless number of applications by numerous academic and commercial institutions across the globe. "Expansive" indicating that it is continually adding new source code and "Unexplored" indicating the absence of a dedicated, unbiased entity to statically analyze every line of code that is opensourced. Of late, SCA companies have taken up the initiative in this direction. But even then, the rate at which newer versions of the opensource software gets released or updated, presents a barrier to such efforts. Although such an effort would be attributed to paranoia by many, it is evident that financial meltdown and shrinking IT budgets have led to opensource software entering into commercial applications, for example, Linux, Apache, MySQL or OpenSSL

Figure 1. Usual software development process segment incorporating static analysis

The conventional workflow where SCA tool"s output is included in the formal software review package is as shown in figure 1. In most software development and testing institutions, although the software product as a whole is subjected to dynamic analysis, like fuzzing or black box testing and newly developed software code is subjected to stringent static analysis and review, opensource code incorporated in the software is usually not subjected to the same stringent static analysis and review as newly developed proprietary code, as depicted in figure 1. Usually, a binary of the necessary opensource software is incorporated in the complete software package. This may be based on the assumption that opensource software is more secure and less buggy than closed source software because the source is freely available, lots of people will look for security flaws and other software bugs in it in a way that isn"t going to happen in the commercial world. Despite the fact that this assumption has withstood the test of time and probably cannot be proven wrong, it would still be an interesting exercise to apply static analysis tools on opensource code and analyze the results. Although there have been previous efforts to statically analyze opensource code using commercial or opensource SCA tools, in this paper, we seek to verify if certain code issues (or bugs) can be detected early on by SCA. Further, we propose an alternate workflow that can be adopted while incorporating opensource code in a commercial software development process. We discuss the benefits, challenges and possible trade-offs to adopting the proposed alternate workflow. Such an effort will be of interest to both the software engineering community and the opensource community in general. Therefore, in order to experiment, we pick Klocwork Insight as our tool of choice, since it is one of the industry leaders in SCA. We pick a popular opensource project, i.e., Linux kernel, for our code analysis. We then proceed to run Klocwork Insight against Linux kernel code and we discuss the results of our analysis. In general, Klocwork is a representative tool for SCA and Linux is a representative opensource project for our analysis, which can be extended to other SCA tools and other opensource projects.

Paper organization

The rest of the paper is organized as follows. Section 2 provides the background to choosing Linux kernel code for analysis and SCA using Klocwork Insight. Section 3 discusses the results of SCA. In section 4, we discuss an alternate workflow that can be followed while incorporating opensource software in a commercial software development process and discuss the benefits and the challenges faced while following the proposed alternate workflow. Finally, in Sections 5 we conclude by summarizing important observations.


A. Linux kernel

The Linux kernel is an operating system kernel used by the Linux family of Unix-like operating systems. It is one of the most prominent examples of free and open source software and hence a natural choice for our analysis. The Linux kernel is released under the GNU General Public License version 2 (GPLv2) and is developed by contributors worldwide. Day-to-day development discussions take place on the Linux kernel mailing list. The Linux kernel has received contributions from thousands of programmers. Many Linux distributions have been released based upon the Linux kernel [8]. We chose the Linux kernel version, released in February 2010, for our analysis.

Static Code Analysis

The unique benefit of static analysis is its ability to scan complete codebases to identify logic and security bugs. It is much more comprehensive in reach than black-box systems testing. However, some issues are runtime dependent and can only be found by actually executing code, so static analysis cannot stand alone. A representational effort-benefit curve for using a SCA tool is shown in figure 2. It is observed that as more checks are enforced, the fraction of errors detected increases along with increase in the amount of effort required to enforce these checks. Typical compilers such as gcc incorporate static analysis to a certain extent in the form of warnings or errors reported during the compilation process. These generally require the least amount of effort and likewise report the least number of bugs. While on the other hand, formal verification of software, for example using model checking, is a much more complex approach towards automation of SCA, requiring greater amount of effort, but capable of detecting greater number of bugs of higher complexity. Although formal verification is not extensively adopted in the software industry, due to the law of diminishing returns, there are several operating systems that are formally verified such as NICTA"s Secure Embedded L4 microkernel [6].

In our analysis, we use the Klocwork Insight tool to perform SCA. Klocwork Insight is a SCA tool that is used to identify quality and security issues for C, C++, Java and C#. The product includes numerous desktop plug-ins for developers, an architecture analysis tool, and metrics and reporting. It is supported on both MS Windows and Linux OS based platforms [3]. Generally, most SCA tool checks typically include unused declarations, type inconsistencies, use before definition, unreachable code, ignored return values, execution paths with no return, likely infinite loops, and fall through cases. Checking is configurable and can also be customized to select what classes of errors are reported. Furthermore, custom checks can be created by the user to find specific conditions in a specific codebase. Usually, SCA tool usage rules configuration file can be edited to capture specific issues in the source code. For example, Klocwork usage rules configuration file can be updated with checker rules to flag the usage of banned APIs from the list of Microsoft Security Development Lifecycle (SDL) banned APIs (banned.h). This flexibility to edit the configuration file provides for more powerful or project specific checks to be incorporated in the SCA results. As more effort is put into tuning the SCA tool to the codebase and native compiler, better checking results. In general, most SCA tools are designed to be flexible and allow programmers or quality analysts to select appropriate points on the effort-benefit curve for particular projects. As different checks are turned on, the number of bugs that can be detected increases dramatically, that can also result in false positives. False positives can be reduced by editing the checker rules in the configuration files, turning on the required checkers and turning off the ones that are not needed for the project. SCA tools can be made to better understand the semantics of a given function or method by configuring it to understand any special keywords that may be used by the native compiler and by adding more information into the tool knowledge base. Thus, users can define new rules and associated checks to extend the SCA tool"s checking or to enforce application specific properties. In general, the automated build process incorporating SCA can be split into two stages. The first stage involves generating a "build specification" file, which gets generated as part of compile and link process. The second stage involves running static analysis tool on the link output i.e, "build specification" file to generate static analysis reports. Klocwork presents the results of static analysis via browser window, e.g. Internet Explorer, Mozilla Firefox, whose user interface can be customized to a certain extent.



  • Accenture
  • BT Code Crafters
  • Accesa
  • Bosch
  • Betfair
  • MHP
  • Connatix
  • BoatyardX
  • .msg systems
  • Yardi
  • Colors in projects