Why GitHub Code Scanning is awesome

2020-06-24

Secure code is important. Writing secure code is hard. As developers we all know this. Developers often use the OWASP TOP 10, a list of the 10 most critical security risks that you should think about when writing software. But of course there are more than 10 security risks in the world. Keeping up with everything that’s happening in the world of security and applying that to all the code you write is a noble goal but very hard.

Fortunately, security researchers focus on finding vulnerabilities in code and ways to avoid them. What if you could have an army of such security researchers that validate your code even before it gets merged to master? What if those security researchers work 24x7, review all your code in a matter of minutes and work all the time on learning new ways to better secure your code?

That’s what GitHub Code Scanning is all about and in this blog post I want to introduce you to Code Scanning and why I think it’s awesome.

Introducing GitHub Code Scanning

If you worked a bit with databases you can probably guess what the following SQL query does:

SELECT FirstName, LastName
FROM People
WHERE Country=’Netherlands’

SQL, Structured Query Language, is the language in which we talk to relational databases. The previous query will probably return the names of every person living in the Netherlands that’s available in the database. Now what if I would ask you to write a query that would count all if statements in your JavaScript application or that would check where you have async methods in C# that return void? I would absolutely have no idea where to start. Some really smart people did solve this problem.

GitHub Code Scanning allows you to run a set of queries that find security, style and quality issues against your code and validate the results.

Code Scanning behind the scenes

GitHub Code Scanning is powered by CodeQL. CodeQL is a ‘semantic code analysis engine’. What that means is, CodeQL transforms your code into a database and lets you query your code with a query language that understands your code.

Here is a very simple CodeQL query that returns all if statements in your Java code base:

import java

from IfStmt i
select i, "hello if"
The number of if statements is probably not so interesting. But queries can easily be extended. Take this example:

import csharp

from BlockStmt b
where b.getNumberOfStmts() = 0
select b, "This is an empty block."

This query will return all empty blocks in your C# code. It’s up to you to decide if you want to remove them or decide that there is an incorrectly placed { or }.

CodeQL queries can be extremely powerful. Because CodeQL understands your code, it can figure out if a value that you return in your HTML comes from user input or if your locks in a multithreaded application are not in the same order in different locations leading to possible deadlocks in your application.

Do I need to learn CodeQL?

Does this mean you need to become an expert in CodeQL before you can find security issues in your code? Fortunately no! GitHub, the home of CodeQL, is all about open source. This is why they open sourced a large library of CodeQL queries and allow you to run them against your code.

Even better, because the CodeQL queries are open source, people can add queries that find new issues and you automatically benefit from this. The queries are maintained by the Security Lab of GitHub but every security researcher in the world can do a pull request for a new security vulnerability they found. Looking at the Insights page of the repository on GitHub, last month 35 authors have submitted over 200 pull requests.

Do you have 35 security researchers in your team?

Show me how it works!

GitHub Code Scanning is currently in Beta. You can join the waitlist here: https://github.com/features/security/advanced-security/signup

To test Code Scanning, I forked a sample applications from Microsoft that has C# and JavaScript code. You can find my fork here: https://github.com/wouterdekortorg/eShopOnContainers

If the beta of Code Scanning is enabled for your organization, you will have a new option on the Security page:

Set up code scanning adds a new Action workflow to your project. Actions are the CI/CD system that’s integrated with GitHub. The default workflow looks like this:

name: "Code scanning - action"

on:
  push:
  pull_request:
  schedule:
    - cron: '0 23 * * 3'

jobs:
  CodeQL-Build:

    runs-on: ubuntu-latest

    steps:
    - name: Checkout repository
      uses: actions/checkout@v2
      with:
        # We must fetch at least the immediate parents so that if this is
        # a pull request then we can checkout the head.
        fetch-depth: 2

    # If this run was triggered by a pull request event, then checkout
    # the head of the pull request instead of the merge commit.
    - run: git checkout HEAD^2
      if: ${{ github.event_name == 'pull_request' }}
      
    # Initializes the CodeQL tools for scanning.
    - name: Initialize CodeQL
      uses: github/codeql-action/init@v1
      # Override language selection by uncommenting this and choosing your languages
      # with:
      #   languages: go, javascript, csharp, python, cpp, java

    # Autobuild attempts to build any compiled languages  (C/C++, C#, or Java).
    # If this step fails, then you should remove it and run the build manually (see below)
    - name: Autobuild
      uses: github/codeql-action/autobuild@v1

    # ℹ️ Command-line programs to run using the OS shell.
    # 📚 https://git.io/JvXDl

    # ✏️ If the Autobuild fails above, remove it and uncomment the following three lines
    #    and modify them (or add more) to build your code if your project
    #    uses a compiled language

    #- run: |
    #   make bootstrap
    #   make release

    - name: Perform CodeQL Analysis
      uses: github/codeql-action/analyze@v1

After committing the new workflow file, the Action automatically starts.

The workflow initializes CodeQL, automatically compiles code if that’s required for the languages discovered in your repository and then runs the default set of queries against your code.

CodeQL will find 13 issues in eShopOnContainers. Several are related to hard-coded credentials in the code. Some others find issues in jQuery (because the application is using an older version of jQuery with some known issues).

From now on, Code Scanning will scan the application on each push and pull request and report any new issues.

You probably want to add a bit of configuration to exclude the issues in external libraries like jQuery by adding a configuration file. You can also specify which query suites you want to run and use VS Code to build and tweak your own queries.

I’ll discuss those options in future blog posts but for now: join the beta and enable Code Scanning for your repo!

Questions? Comments? Please reach out!