About incremental analysis
Full CodeQL scans on every pull request can be slow, especially in large codebases. If you run the CodeQL CLI in your own CI/CD system, incremental analysis gives you two ways to speed things up:
- Diff-informed analysis reports only alerts in lines you added or changed, so queries run faster and results are more relevant.
- Overlay analysis reuses a cached database from your default branch instead of building one from scratch, cutting database creation and query evaluation time dramatically.
You can use these features independently or together. For most teams analyzing pull requests in established codebases, we recommend using both: overlay analysis for fast database creation and query evaluation, and diff-informed analysis for focused, relevant results.
If you use code scanning default setup or the codeql-action on GitHub, incremental analysis is already handled automatically. This article is for teams running the CodeQL CLI directly in their own CI/CD infrastructure.
Prerequisites
Before setting up incremental analysis, make sure you meet the following requirements:
- CodeQL CLI bundle version: 2.21.0 or later for diff-informed analysis; 2.23.8 or later for overlay analysis (with per-language minimums, see Minimum CLI bundle versions)
- Source root must be inside a Git repository
- Git version 2.38.0 or later (required for overlay analysis, specifically the
--formatoption used bygit ls-files) - All files of interest must be tracked by Git (not in
.gitignore) - Git index must accurately reflect the source tree being analyzed
- Build mode: Overlay analysis supports only
build-mode: none(traced builds are not supported). Go works with overlay analysis despite not explicitly supporting this mode.
Choosing an approach
| Scenario | Diff-informed | Overlay |
|---|---|---|
| Default branch push | No (not a PR) | overlay-base mode |
| PR analysis (first time, no cache) | Yes | No (run full analysis) |
| PR analysis (with cached base) | Yes | overlay mode |
| Non-PR, non-default branch | No | No |
For complete working examples in various CI systems, see the sample CodeQL pipeline configurations repository.
Diff-informed analysis
Diff-informed analysis is an optimization for pull request analysis. Instead of reporting all alerts found in the codebase, it reports only alerts in lines that were added or modified in the pull request diff.
Step 1: Identify the diff ranges
You need the added or modified line ranges from the pull request diff. The input can come from any source (git diff, your CI platform's API, or another mechanism).
For each changed file, produce a list of ranges with the following structure:
path: Absolute file path (always use forward slashes)startLine: 1-based, inclusive start lineendLine: 1-based, inclusive end line
For example, given this unified diff (generated by git diff):
--- a/src/utils.ts
+++ b/src/utils.ts
@@ -2,7 +2,6 @@ import { helper } from './helper';
function existing() {
const x = 1;
- const unused = 2;
return x;
}
@@ -14,6 +13,8 @@ function validate(input: string) {
function process(input: string) {
// validate
if (!input) return;
+ const sanitized = input.trim();
+ console.log(sanitized);
return input;
}
@@ -23,5 +24,5 @@ function format(value: number) {
function render(data: object) {
const output = JSON.stringify(data);
- return output;
+ return `<div>${output}</div>`;
}
The resulting diff ranges for src/utils.ts would be:
["/path/to/repo/src/utils.ts", 16, 17](the two inserted lines in the second hunk)["/path/to/repo/src/utils.ts", 27, 27](the modified line in the third hunk)
The first hunk contains only a deletion, so it produces no range. Note that ranges use the "to" (new file) line numbers, not the "from" (old file) numbers.
Special cases:
- Binary files or very large diffs (no patch content available): Use the sentinel range
{path, startLine: 0, endLine: 0}to indicate "entire file." - Renamed files with no content changes: Return an empty array (no ranges).
- Truncated diffs: If your diff source is incomplete for large pull requests (for example, an API that limits the number of changed files), you should skip diff-informed analysis and run full analysis for that run.
For a reference implementation of diff parsing, see getDiffRanges() in the codeql-action source code.
Step 2: Create a data extension pack
Create a temporary directory containing two files. This extension pack feeds into the restrictAlertsTo extensible predicate defined in the CodeQL standard library.
qlpack.yml:
name: my-ci/pr-diff-range
version: 0.0.0
library: true
extensionTargets:
codeql/util: '*' # Target the codeql/util pack where restrictAlertsTo is defined
dataExtensions:
- pr-diff-range.yml
pr-diff-range.yml:
extensions:
- addsTo:
pack: codeql/util
extensible: restrictAlertsTo
checkPresence: false # Don't error if the predicate doesn't exist in older CLI versions
data:
# Each row: [filePath, startLine, endLine]
- ["/path/to/repo/src/utils.ts", 16, 17]
- ["/path/to/repo/src/utils.ts", 27, 27]
Each data row is [filePath, lineStart, lineEnd]. Line numbers are 1-based. The special case lineStart = 0, lineEnd = 0 denotes a whole-file match.
Wichtig
If the diff has zero added or modified lines (for example, only deletions), you must still provide a non-empty data extension with a sentinel entry ["", 0, 0]. An empty data section would leave the restrictAlertsTo predicate inactive, which means all alerts would be produced—the opposite of the desired behavior.
Step 3: Pass the extension pack to the CodeQL CLI
When running queries, add the following flags to codeql database run-queries:
codeql database run-queries \
--additional-packs=PATH_TO_EXTENSION_PACK \
--extension-packs=my-ci/pr-diff-range \
PATH_TO_DATABASE \
QUERIES
--additional-packstells CodeQL where to find the pack on disk. For more information, see Abfragen in der Datenbank ausführen.--extension-packstells CodeQL to load the named extension pack.
Step 4: Exclude diagnostic queries
When using diff-informed analysis, you should exclude queries tagged with exclude-from-incremental. These diagnostic queries do not produce alerts (for example, metrics or code coverage), so they provide no value in an incremental context but still consume resources.
You can add this to your code scanning configuration file:
query-filters:
- exclude:
tags: exclude-from-incremental
Alternatively, create a query suite file (.qls) that excludes those queries:
- description: Pull request queries for Java
- import: codeql-suites/java-code-scanning.qls
from: codeql/java-queries
- exclude:
tags contain: exclude-from-incremental
For more information, see Workflowkonfigurationsoptionen für die Codeüberprüfung.
Step 5: Filter the SARIF output
After CodeQL generates the SARIF file, you must filter the output on the CI side to remove results whose locations fall outside the diff ranges.
For each result in the SARIF, check whether any of its locations or relatedLocations intersect with a diff range for that file. A location intersects a range when range.startLine <= location.endLine and location.startLine <= range.endLine. The special case range.startLine == range.endLine == 0 matches any location in the file. Make sure SARIF artifact locations are resolved to the same absolute path format used in the diff ranges before comparing.
The restrictAlertsTo predicate permits but does not guarantee that queries omit out-of-range alerts, so CI-side filtering is required for stable results.
For a reference implementation of SARIF filtering, see filterAlertsByDiffRange() in the codeql-action source code.
Summary of CLI flags for diff-informed analysis
| CLI command | Flag | Purpose |
|---|---|---|
codeql database init | --codescanning-config=FILE | Code scanning configuration file (for query filter) |
codeql database run-queries | --additional-packs=DIR | Location of the extension pack |
codeql database run-queries | --extension-packs=my-ci/pr-diff-range | Name of the extension pack to load |
codeql database interpret-results | --sarif-run-property=incrementalMode=diff-informed | (Optional) Tag SARIF with diff-informed metadata |
Overlay analysis
Overlay analysis speeds up CodeQL database creation and query evaluation for pull requests by building on top of a pre-existing "base" database:
- On the default branch: Build an "overlay-base" database (a full database with cached intermediate results). This can be any long-lived branch that pull requests target.
- On pull requests: Download the cached overlay-base database, then create a lightweight "overlay" database that only processes the changed files.
Overlay-base mode (default branch)
Run overlay-base mode on your default or long-lived target branch after each merge to create and cache a base database.
1. Initialize the database with --overlay-base
codeql database init \
--overlay-base \
--db-cluster \
PATH_TO_DATABASE \
--source-root=PATH_TO_SOURCE \
--language=LANGUAGE
The --overlay-base flag tells CodeQL to build a database that can serve as a base for future overlay analysis.
2. Build and extract as normal
Run any build steps and extraction as you normally would for your project.
3. Record file OIDs
After extraction completes, record the Git object IDs (OIDs) of all tracked files under the source root. Run this command from your source root directory (PATH_TO_SOURCE). This snapshot is used later to determine which files changed.
cd PATH_TO_SOURCE && git ls-files --recurse-submodules --format='%(objectname)_%(path)'
Parse this output into a JSON map of { "relative/path": "git-oid" } and store it alongside the database. The output includes files in Git submodules, which overlay analysis needs to accurately track all file changes between the base and the overlay.
4. Run queries and preserve the cache
When running queries on an overlay-base database, do not pass --expect-discarded-cache. The cached intermediate results are what makes pull request builds fast. Discarding them would force full re-evaluation on every PR.
5. Clean up and cache the database
After analysis, clean up the database using the overlay cleanup level:
codeql database cleanup PATH_TO_DATABASE --cache-cleanup=overlay
The overlay cleanup level preserves more cached data than the default clear level. Overlay mode reuses this cached data for efficient query evaluation on pull requests, so discarding it would eliminate the performance benefit.
Then store the database (including the OIDs file) in your caching system for later retrieval by pull request builds.
Overlay mode (pull requests)
Run overlay mode on pull request builds to create a lightweight database on top of the cached base. If no compatible overlay-base database is available in the cache (for example, on the first run or after a CodeQL CLI version upgrade), skip --overlay-changes and run a normal full analysis instead. Cache keys should include at least the CodeQL CLI version and language set to avoid incompatible base databases.
1. Download the cached overlay-base database
Retrieve the most recent overlay-base database from your cache. The database should include the OIDs file recorded during overlay-base mode.
2. Compute changed files
Compare the OIDs recorded in the base database with the current Git state. Run this command from the same source root directory (PATH_TO_SOURCE) used during overlay-base mode:
cd PATH_TO_SOURCE && git ls-files --recurse-submodules --format='%(objectname)_%(path)'
Compare the two maps to find files that were added, removed, or modified (different OID). Write the result as a JSON file:
{
"changes": ["src/modified-file.ts", "src/new-file.ts", "src/deleted-file.ts"]
}
The file paths must be relative to the source root.
3. Initialize the database with --overlay-changes
Run codeql database init against the restored overlay-base database directory. The PATH_TO_DATABASE must point to the restored cached overlay-base database, not a new empty directory—the command extends the existing base for the pull request analysis.
codeql database init \
--overlay-changes=PATH_TO_OVERLAY_CHANGES_JSON \
--db-cluster \
PATH_TO_DATABASE \
--source-root=PATH_TO_SOURCE \
--language=LANGUAGE
Wichtig
In overlay mode, do not pass --overwrite or --force-overwrite. You are building on top of the existing cached base database, not replacing it.
4. Build, extract, and run queries as normal
Proceed with build, extraction, and query execution as normal. You can add the --sarif-run-property flag to your existing codeql database interpret-results command to tag the SARIF output with overlay metadata:
codeql database interpret-results \
--format=sarif-latest \
--output=results.sarif \
--sarif-run-property=incrementalMode=overlay \
PATH_TO_DATABASE \
QUERIES_OR_SUITES
If both overlay and diff-informed analysis are active, use incrementalMode=overlay,diff-informed.
Alerts from incremental analysis appear in the pull request's code scanning results the same way as alerts from full scans. Any overlay-base database will work regardless of age, but fresher bases produce faster and more accurate results.
As with diff-informed analysis, exclude queries tagged exclude-from-incremental when using overlay mode. For details, see Step 4: Exclude diagnostic queries.
Summary of CLI flags for overlay analysis
| CLI command | Flag | Mode | Purpose |
|---|---|---|---|
codeql database init | --codescanning-config=FILE | overlay | Code scanning configuration file (for query filter) |
codeql database init | --overlay-base | overlay-base | Build a base database for future overlay use |
codeql database init | --overlay-changes=FILE | overlay | Build overlay database using only changed files |
codeql database init | (no --overwrite) | overlay | Don't overwrite the cached base database |
codeql database run-queries | (no --expect-discarded-cache) | overlay-base | Preserve cached intermediate results |
codeql database cleanup | --cache-cleanup=overlay | overlay-base | Use overlay-specific cleanup level |
codeql database interpret-results | --sarif-run-property=incrementalMode=overlay | overlay | Tag SARIF with overlay metadata |
Minimum CLI bundle versions
The base minimum version for overlay analysis is 2.23.8. Some languages require higher minimum versions:
| Language | Minimum CodeQL CLI bundle version |
|---|---|
| C/C++ | 2.25.0 |
| C# | 2.24.1 |
| Go | 2.24.2 |
| Java | 2.23.8 |
| JavaScript | 2.23.9 |
| Python | 2.23.9 |
| Ruby | 2.23.9 |