codedelta.app

CodeDelta

Source code change metrics, AI generation detection, and agent scan.

Version1.8.0 Beta (Build 21)
Released6 June 2026
Guidev1.8.0 · 6 June 2026
PlatformsmacOS · Linux · Windows
What's New in v1.8.0
Contents
01What CodeDelta Measures 02Metrics Reference 03GUI Guide 04Command Line Reference 05AI Audit (Code Scan + Agent Scan) 06Agent Scan (AIS) 07CI/CD Integration 08Accuracy & Methodology 09Licensing 10File Certification 11File Integrity 12Versioning & File Control
01

What CodeDelta Measures and How

CodeDelta is a command-line source code metrics tool with a browser-based GUI. It compares two snapshots of a source project — an old version and a new version — and measures exactly what changed, how much, and how complex. It is a professional-grade source code analysis tool for macOS, Linux, and Windows.

Core Principle

If your source files are constantly being changed, the source base is not stable and is not ready for release. CodeDelta makes this visible. Every build, every sprint, every release — run CodeDelta and watch the churn numbers. A healthy codebase shows decreasing churn as a release approaches. A codebase that remains "hot and active" near release is a risk.

Key Features

The Difference Between SLOC and LLOC

SLOC (Source Lines of Code) is a physical count — the number of non-empty, non-comment lines in the file. Two statements on one line = 1 SLOC.

LLOC (Logical Lines of Code) is a semicolon count — the number of logical statements, excluding those inside comments or string literals. One statement spread over three lines = 1 LLOC. This is invariant to formatting and is the preferred metric for measuring real code change.

int i=0; float j=0;  // This is 2 LLOC, 1 SLOC

cout << "Testing"
     << " Hello"
     << " there.";   // This is 1 LLOC, 3 SLOC

What Gets Measured

CodeDelta counts all source files recursively under the base directory, maps each file to a language by extension, and counts metrics per file. For comparisons, files are matched by relative path. A file present in the new project but not the old is Added (N). A file in the old but not the new is Deleted (D). A file present in both is either Changed (C) or Unchanged (U).

Supported Languages

CodeDelta supports over 30 languages out of the box. Full two-pass LLOC (Logical Lines of Code) is computed for C/C++, C#, Java, JavaScript/TypeScript, Python, PHP, SQL, Ruby, Perl, and Shell. For Ada, Assembly, CSS, Fortran, IDL, PowerBuilder, PL/SQL, VHDL, VB, XML, and Windows Batch, SLOC and LLOC are equal (no separate logical-line lexer). HTML, ASP, and JSP use the HTML lexer (SLOC only).

See the complete extension-to-language reference in section 03 (GUI Guide → Extension Overrides) for which file extensions auto-detect to which language. Custom mappings can be added without recompiling via the Extension Overrides field.

02

Metrics Reference

CodeDelta computes the source code metrics most commonly used in software-engineering analysis: SLOC, LLOC, comment counts, file counts, and per-language breakdowns. See the Metrics Reference table above for the full set.

Size Metrics

CodeNameDescription
LOCLines of CodeTotal lines including whitespace and comments. A count of newlines.
SLOCSource Lines of CodeNon-empty, non-comment lines. Physical line count of actual source.
LLOCLogical Lines of CodeSemicolon count excluding those in comments or string literals. Statement count.
PLOCPreprocessor Directive LOCLines beginning with # (#include, #define, #ifdef etc.). C/C++ and C# only.
COM_LOCComment Lines of CodeTotal comment lines = J_COM + C_COM + EOL_COM.
J_COMJava-style Comments/** ... */ block comment count.
C_COMC-style Comments/* ... */ block comment count.
EOL_COMEnd-of-Line Comments// to end-of-line comment count.
BYTESFile SizeFile size in bytes from the filesystem.
NFILENumber of FilesTotal number of source files in the project.

Change Metrics (per file and project)

CodeNameDescription
CHG_SLOCChanged SLOCSource lines that exist in both files but differ.
DEL_SLOCDeleted SLOCSource lines in the old file not present in the new.
ADD_SLOCAdded SLOCSource lines in the new file not present in the old.
CRN_SLOCChurn SLOCCHG_SLOC + DEL_SLOC + ADD_SLOC. Total change volume.
CHG_LLOCChanged LLOCLogical lines that exist in both but differ. The primary metric.
DEL_LLOCDeleted LLOCLogical lines in old not in new.
ADD_LLOCAdded LLOCLogical lines in new not in old.
CRN_LLOCChurn LLOCCHG_LLOC + DEL_LLOC + ADD_LLOC.

File Churn Metrics (project level only)

CodeNameDescription
CHG_FILEChanged FilesNumber of files present in both snapshots that changed.
DEL_FILEDeleted FilesNumber of files in the old snapshot not in the new.
ADD_FILEAdded FilesNumber of files in the new snapshot not in the old.
CRN_FILEChurn FilesCHG_FILE + DEL_FILE + ADD_FILE.

File Status Codes

StatusMeaning
CChanged — file exists in both snapshots and content differs
NNew — file exists only in the new snapshot (added)
DDeleted — file exists only in the old snapshot (removed)
UUnchanged — file exists in both and content is identical
OOld — CSV only: old metrics row for a changed file
XDiff — CSV only: arithmetic difference row for a changed file

Two-Pass LLOC Algorithm

LLOC is computed in two passes. Pass 1 strips all comments and string literals, then counts semicolons. Pass 2 identifies function boundaries using brace counting and signature detection, allowing CodeDelta to attribute LLOC to specific functions. The two-pass approach resolves the common ambiguity where a single logical statement spans multiple physical lines.

When LLOC diverges significantly from SLOC (ratio > 3.0 or < 0.3), CodeDelta flags the file with an LLOC warning in the report. This usually indicates unusual formatting — very long chained expressions, or highly condensed single-line code.

03

GUI Guide

Start the GUI server from the Terminal:

python3 /Applications/CodeDelta/codedelta_server.py

Then open http://localhost:7654 in your browser. The GUI runs entirely locally — no internet connection required, no data leaves your machine.

Analysis Mode Selector

The Run Analysis tab is split into two columns. Pick whichever fits the work you're doing:

Compare Projects (left column) — requires Old and New directories

ModeWhat it does
Code ChurnStandard SLOC/LLOC churn metrics between two snapshots. The classic CodeDelta function.
Code Churn + AI AuditChurn metrics PLUS AI Audit on the new snapshot. Single pass, three reports.

Single Project (right column) — requires only one directory

ModeWhat it does
Single Project Code ChurnPer-file SLOC, LLOC, comment counts on one snapshot. No comparison.
AI AuditRuns BOTH AI Code Scan and Agent Scan, producing TWO reports side by side. Use this when you want a complete AI assessment.
AI Code ScanJust the GSS + MLS → AIC analysis. Produces codedelta_ai_code_scan.html with an inline source viewer that highlights flagged lines.
AI Agent ScanJust the AIS (Agent Initiation Signature) analysis. Produces codedelta_agent_scan.html for security review.

Run Analysis Fields

FieldDescription
Old ProjectPath to the previous version of the source code.
New ProjectPath to the current version of the source code.
Exclude DirectoriesComma-separated directory names to skip (e.g. vendor,tests,node_modules). New in v1.4.0.
Extension OverridesMap non-standard extensions to existing language parsers. Comma-separated ext=lang pairs (e.g. h2=cpp,inc=php,ksh=sh). New in v1.4.2.
HTML ReportOutput path for the HTML report.
DatabasePath to the SQLite database file. Accumulates all runs for trend analysis. Defaults: ~/Library/Application Support/CodeDelta/codedelta.db on macOS, ~/.local/share/CodeDelta/codedelta.db on Linux, %APPDATA%\CodeDelta\codedelta.db on Windows. Survives reinstalls.
Project NameA label for this project, shown in reports and the History tab.
Old LabelA label for the old snapshot (e.g. v1.0, sprint-13).
New LabelA label for the new snapshot (e.g. v1.1, sprint-14).
Snapshot DateThe date the source was extracted (YYYY-MM-DD). Stored in the DB and shown in reports. New in v1.4.0.
Metric SetOptional: select a named metric set to filter which columns appear in the report and CSV. New in v1.4.0.

Metric Sets

A Metric Set is a named subset of metric columns. When active, only the selected metrics appear in the HTML report table and CSV output. This is useful for management reports that only need CHG_LLOC and ADD_LLOC, without the full detail of every metric.

To create a Metric Set:

  1. Click Manage Sets next to the Metric Set dropdown.
  2. Enter a name for the set (no spaces — underscores are fine).
  3. Tick the metric codes you want to include.
  4. Click Save Set.

The set is saved in the database and appears in the dropdown for future runs. To edit an existing set, click Edit beside it — it loads into the editor. To delete, click Delete.

To run without a metric set filter, leave the dropdown on — All metrics (no filter) —.

Available metric codes for sets: LOC, SLOC, LLOC, PLOC, COM_LOC, J_COM, C_COM, EOL_COM, Bytes, CHG_SLOC, DEL_SLOC, ADD_SLOC, CRN_SLOC, CHG_LLOC, DEL_LLOC, ADD_LLOC, CRN_LLOC, CHG_COM, DEL_COM, ADD_COM, CRN_COM, CHG_FILE, DEL_FILE, ADD_FILE, CRN_FILE

Excluded Directories

Enter a comma-separated list of directory names to exclude from both the old and new scans. For example: vendor,tests,third_party,generated. The names are matched against each path component — you do not need to provide full paths.

Built-in excluded directories (always skipped): .git, .svn, .hg, node_modules, __pycache__, vendor, dist, build, Debug, Release

Excluded directories are stored in the database run record and shown in the report header, so you always know what was and wasn't included in a historical run.

Extension Overrides

Some organisations use non-standard file extensions — for example .h2 for C++ headers, .inc for PHP include files, or .ksh for shell scripts. By default CodeDelta ignores files with unrecognised extensions. Extension overrides let you map any extension to an existing language parser without recompiling.

Enter a comma-separated list of ext=lang pairs in the Extension Overrides field:

h2=cpp,inc=php,ksh=sh

The target language can be any known extension (cpp, py, java, sh etc.) or a language name. Unknown target languages are silently ignored. The override takes effect before directory scanning, so all file matching uses the updated map.

Complete extension reference

LanguageAuto-detected extensions
C / C++c h cpp hpp cc cxx
C#cs
Javajava
JavaScript / TypeScriptjs jsx ts tsx
Pythonpy pyw
PHPphp
Rubyrb
Perlpl pm
Shellsh ash bash bsh csh tcsh tsh zsh
SQLsql
SQL (Oracle/PL-SQL)pls pks pkb
Visual Basicvb bas cls frm
VBScriptvbs
Windows Batchbat cmd
Adaada adb ads
Fortranf f90 f95 for
Assemblyasm s
VHDLvhd vhdl
HTMLhtml htm htp
ASPasp aspx
JSPjsp
CSScss
XMLxml xsd xsl xslt wsml
IDLidl
Symbian MMPmmp
PowerBuildersrd srf srs sru srw
Smalltalkst
Eiffele
Lisplisp lsp cl scm el
μCodeuc
Texttxt tsv cvs install readme

When mapping a custom extension via the override field, the target side of the ext=lang pair can be any of the codes above (e.g. cpp, py, java, sh, f90). Both the alias and the destination must already be known to CodeDelta — unknown destinations are silently ignored.

Results Tiles

After a run, results appear as coloured tiles in four groups:

History Tab

Shows all previous runs stored in the selected database. Columns include run date, snapshot date, project, old/new labels, file counts, SLOC/LLOC totals, and churn metrics. Click any row to open its HTML report.

Trend Tab

Charts CRN_SLOC and CRN_LLOC over time for a selected project. A declining trend indicates the codebase stabilising toward release. An increasing trend near a release deadline is a quality risk.

04

Command Line Reference

Basic Usage

codedelta <old-dir> <new-dir> [options]

Full Options

FlagDescription
-o, --output <file>HTML report output path (default: codedelta_report.html)
--csv <file>Also write a CSV report
--xml <file>Also write a structured XML report. Contains all metrics including PLOC.
-d, --db <file>SQLite database path (default: codedelta.db)
--project <name>Project name for the database and report
--old-label <label>Label for the old snapshot
--new-label <label>Label for the new snapshot
--note <text>Freeform note stored in the database run record
--exclude <dirs>Comma-separated directory names to skip (e.g. vendor,tests). New in v1.4.0.
--snapshot-date <date>Date source was extracted, YYYY-MM-DD. New in v1.4.0.
--metric-set <codes>Comma-separated metric codes to include in report/CSV. New in v1.4.0.
--ext <pairs>Map non-standard extensions to language parsers. Comma-separated ext=lang pairs, e.g. h2=cpp,inc=php. New in v1.4.2.
--threshold-churn <n>Exit with code 2 if CRN_LLOC exceeds n (for CI gates)
-v, --verboseVerbose output
-q, --quietSuppress all stdout output
-V, --versionPrint version and release notes
-h, --helpPrint usage

Exit Codes

CodeMeaning
0Success
1Error (invalid arguments, missing directory, etc.)
2Churn threshold exceeded (--threshold-churn)

CSV Output Format

The CSV has one header row and one or more data rows per file. For changed files, three rows are output:

The CSV output format is compatible with standard spreadsheet tools. Unchanged and deleted files have one row only. The file ends with a TOTAL row and a NFILE summary row.

If a Metric Set is active via --metric-set, only the specified columns appear in the CSV header and data rows.

Example: Nightly Build Integration

#!/bin/bash
codedelta /builds/project-v1.1 /builds/project-v1.2 \
  --project "MyApp" \
  --old-label "v1.1" \
  --new-label "v1.2" \
  --snapshot-date "$(date +%Y-%m-%d)" \
  --exclude "vendor,node_modules,generated" \
  --db /var/lib/codedelta/myapp.db \
  -o /var/www/reports/myapp-latest.html \
  --csv /var/www/reports/myapp-latest.csv \
  --threshold-churn 5000 \
  -q

if [ $? -eq 2 ]; then
  echo "ALERT: churn exceeds threshold — build is hot"
  exit 1
fi

Example: Metric Set for Management Report

codedelta /src/old /src/new \
  --metric-set "SLOC,LLOC,CHG_LLOC,ADD_LLOC,CRN_LLOC,CHG_FILE,CRN_FILE" \
  -o management_report.html \
  --csv management_report.csv
05

AI Code Scan & AI Audit

About CodeDelta's AI Detection Features — Please Read

CodeDelta is built on 20 years of expertise in code-churn measurement across huge legacy codebases deployed globally. Its core purpose is to measure changed, added, and deleted SLOC and LLOC across massive, multi-language projects — and that is where its proven value lies.

It seemed natural to extend this into the emerging area of AI-generated-code and AI-agent detection. We researched this seriously, using both a heuristic approach — GSS (Generation Signature Scoring) — and a more sophisticated machine-learning approach — MLS (Machine Learning Scoring) — the latter trained on datasets of known AI- and human-written code.

Our honest finding: AI-generated code can be detected only unreliably. It is possible to identify signals and properties that suggest code was AI-generated, but no technique we assessed does so dependably. In particular, if a developer instructs an AI to write code in the style of a human, every detection method we tried or uncovered proved ultimately unreliable. The differences that remain detectable are largely cosmetic — commenting and formatting habits — which are easily changed and are not proof of authorship.

For that reason, the AI Scan is a pointer toward code that may warrant a closer look — nothing more. It is not a determination of authorship, and it carries a meaningful false-positive rate on code unlike its reference data. Treat every flag as a prompt to review, never as evidence.

We also added a churn metric, Replacement Churn, on the observation that AI tools often delete and replace whole blocks rather than editing in place. This too is only a pointer: it reflects a workflow pattern, not authorship, and as AI tooling shifts toward in-line editing the signal is likely to weaken. It is a useful hint, not a measure of AI code.

The one component that detects something concrete is the Agent Scan, which identifies calls to known AI-agent frameworks and SDKs — because those are real, named artifacts in the code rather than statistical guesses.

In short: the AI element is provided free of charge, as an experimental extra, offered honestly for what it is — a set of pointers, not verdicts. Because it is included at no additional cost and makes no claim to reliable or definitive AI detection, it should not be regarded as a paid feature or relied upon as one. We genuinely welcome feedback on whether the AI features are useful to you and how they might be improved.

By contrast, CodeDelta's churn engine — the measurement of SLOC and LLOC change across versions — is the product's mature, dependable core, refined over two decades and trusted on large-scale legacy systems worldwide. The churn engine is the product you are purchasing; the AI features are a complimentary (no-cost) addition to it.

CodeDelta detects AI-generated code via two complementary scans:

The AI Audit mode in the GUI runs BOTH scans in a single pass, producing two separate HTML reports (codedelta_ai_code_scan.html and codedelta_agent_scan.html).

How GSS Works

Each file receives a Generation Signature Score (GSS) from 0 to 100. The score is the weighted sum of several signals. A high score indicates the file exhibits many of the structural patterns associated with AI-generated code. A low score does not prove the code is human-written — it means no unusual patterns were detected.

SignalWeightWhat it detects
Structural uniformity40Many repeated function bodies in one file. Strong signal — humans parameterise repeated logic, AI/generators produce N copies. New in v1.5.0.
Docblock coverage40Unusually uniform documentation — every function has a structured docblock
Guard clause density35High ratio of defensive checks (null checks, boundary guards) to total code
Formulaic exceptions30Exception messages that follow predictable patterns ("Invalid X: must be Y")
Audit/notify coupling25Logging calls systematically paired with operations
Annotation saturation20High density of decorators, attributes, or annotations
Zero inline comments10No // inline comments despite high docblock coverage
High add velocity15Large newly-added files with no history of incremental change
LLOC warning15SLOC/LLOC ratio anomaly suggesting unusual formatting

Risk Ratings

RatingScoreMeaning
HIGH> sensitivity thresholdFile exhibits strong AI generation signals
ELEVATEDthreshold × 0.6 – thresholdSeveral signals present — warrants review
NORMAL< threshold × 0.6No unusual patterns detected

Sensitivity Setting

The sensitivity slider in the GUI controls the threshold for HIGH classification:

Validation Results

GSS was validated against two corpora. A set of 26 human-written C++ files produced a mean score of 0.0 with no HIGH ratings. A set of 12 AI-generated files across multiple languages produced a mean score of 48.9 with 2 HIGH and 3 ELEVATED ratings. Zero false positives were observed on the human corpus.

Important Caveats

GSS measures patterns, not intent. A highly disciplined human developer who writes uniform docblocks and systematic null checks will score higher than average. The score is a signal for review, not a determination of authorship. It is most useful when scores are unexpectedly high for files in a codebase where the baseline is known.

ML Score (MLS) and AI Confidence (AIC)

When the ML plug-in is installed, the AI Code Scan report shows two additional columns alongside GSS:

Language coverage

LanguageGSSMLSNotes
Python✓ (83%)Original ML model
C / C++✓ (99%)
Java✓ (99%)
C#✓ (97.5%)New in v1.7.0
JavaScript / TypeScriptML model planned for v1.8
GoML model planned for v1.8
Other languagesGSS-only scoring

When MLS is unavailable for a language, the AIC column shows and risk classification uses GSS alone. All other functionality is unaffected.

Risk Bands

Files are placed into risk bands based on the AIC score (or GSS where MLS is not available):

BandAICMeaning
HIGH50+Strong evidence of AI generation. Review the file.
ELEVATED30–49Several signals present. May warrant review depending on context.
NORMAL<30No unusual patterns detected.

AI Detection Config

The AI Detection Settings card on the Run Analysis page (visible in any AI Audit or AI Code Scan mode) opens a configuration dialog where you can fine-tune detection per language. The card's status line shows "X/Y signals active · N custom patterns" — a quick view of how much of the default scoring is in effect.

Language tabs

The dialog has six tabs: C/C++, Java, Python, C#, JavaScript, Go. The tab bar is sticky — it stays visible as you scroll through long signal lists.

Built-in signal toggles

Each language has a set of built-in signals. You can toggle individual signals on or off. This matters because some signals are context-dependent:

Signal configuration is saved to ml_config.json in the CodeDelta directory and takes effect on the next analysis run.

Custom patterns

Add patterns specific to your codebase. Four pattern types:

Each custom pattern can be set to Override — when the pattern fires, the file is always rated HIGH regardless of the ML score. Use this for definitive markers in your codebase.

Custom patterns apply across all AI Code Scan and AI Audit runs until removed. Saved to ml_config.json.

06

Agent Scan — Agent Initiation Signature (AIS)

The Agent Scan analyses source files for code that initiates AI agents at runtime — code that imports AI SDKs, orchestrates agents, executes dynamic code, or constructs prompts from user input. This is entirely distinct from the AI Audit: Agent Scan detects code that runs AI agents, not code that was written by AI.

What Agent Scan is: A pattern-matching detector that looks for specific import statements, function calls, and code proximity patterns associated with AI agent invocation. It operates entirely on source text — no execution, no runtime analysis.

What Agent Scan is not: A security scanner, a vulnerability detector, or a complete AI usage audit. It cannot detect obfuscated AI calls, indirect agent invocation, or AI use through HTTP APIs without recognised SDK imports.

How It Works

Agent Scan reads each source file and looks for four categories of pattern using regular expression matching. Each pattern category carries a weight. Files whose total weight exceeds the configured threshold are flagged.

The scan is entirely static — it reads source text only and never executes any code. It works on files even when dependencies are not installed.

AIS Signal Categories

CategoryWeightWhat it detectsHow it detects it
AI SDK imports25 Import statements for 20+ AI frameworks and APIs Matches import openai, from anthropic import, import langchain and similar patterns. Full list: openai, anthropic, langchain, autogen, crewai, llama-index, huggingface, together, cohere, mistral, groq, replicate, google.cloud.aiplatform, boto3 (bedrock), transformers, litellm, llamafile, ollama, semantic-kernel, haystack.
Agent orchestration40 Function calls and class instantiations that invoke agent behaviour Matches patterns like Agent(, AgentExecutor(, .run_agent(, chain.invoke(, pipeline(, TaskRunner(. These indicate the code is not just importing AI libraries but actively invoking agent workflows.
Dynamic code execution50 Calls that execute code at runtime from strings or external sources Matches eval(, exec(, subprocess with shell=True, os.system(, and dynamic __import__. These are dangerous in any context and especially so near AI calls.
Prompt injection vectors35 User-controlled data being formatted directly into AI prompts Matches patterns where variables (typically named user_input, request, query, message) are interpolated directly into strings that are then passed to AI API calls. A genuine prompt injection vector — not just any f-string.

The Rogue Agent Pattern

The most serious finding is the Rogue Agent pattern: a dynamic code execution call (eval, exec, or subprocess) that appears within 10 lines of an AI API call. This pattern indicates code where an AI model's output may directly influence what gets executed on the host machine — a significant and specific security risk.

A file flagged for the Rogue Agent pattern has its AIS rating escalated to HIGH regardless of its total score.

Example of Rogue Agent pattern:

response = client.chat.completions.create(...)   # AI API call
code_to_run = response.choices[0].message.content
exec(code_to_run)                                 # within 10 lines — ROGUE AGENT

CRITICAL Risk Rating

A file that scores HIGH on both GSS (appears AI-generated) and AIS (initiates AI agents) receives a CRITICAL rating. This is the highest-risk scenario: code that nobody may have fully reviewed or understood, which autonomously initiates AI agents at runtime.

Source Viewer

The Agent Scan report includes an inline source viewer. Click View Source on any flagged file to see it with dangerous lines highlighted:

SDK Inventory

When AI SDK imports are found, the Agent Scan report includes an SDK Inventory — a list of each recognised AI framework detected and the number of files importing it. This gives a quick overview of which AI dependencies are present across the codebase.

Honest Limitations

What Agent Scan cannot detect:

  • HTTP-based AI calls without SDK imports — if code calls an AI API directly via requests.post("https://api.openai.com/...") without importing the openai SDK, Agent Scan will not flag it.
  • Aliased or renamed importsimport openai as ai_lib may not match all detection patterns depending on subsequent usage.
  • Obfuscated or dynamically constructed calls — if the AI SDK name is assembled at runtime from strings, Agent Scan cannot see it.
  • Indirect AI invocation through helper libraries — a custom internal library that wraps AI calls will not be recognised unless its imports match the known SDK list.
  • New or obscure AI SDKs — the SDK list covers the 20 most common frameworks as of May 2026. Newer or less common SDKs will not be detected unless added to the list.
  • Intent — Agent Scan cannot distinguish between a security researcher intentionally studying prompt injection and a developer accidentally introducing one. It flags patterns, not intent.
  • Safety of the AI usage — a file importing an AI SDK and using it responsibly will score the same as one using it dangerously. AIS score indicates AI agent presence, not AI agent risk quality.

False Positives

Agent Scan will produce false positives in some situations:

The source viewer is the most effective tool for investigating potential false positives — it shows exactly which lines triggered each flag.

Languages Supported

Agent Scan analyses all source files regardless of language but SDK import detection patterns are primarily calibrated for Python, JavaScript, and TypeScript — where AI agent frameworks are most commonly used. C++, Java, and C# projects that call AI APIs via HTTP without a recognised SDK will have lower AIS scores but may still be flagged for dynamic execution patterns.

07

CI/CD & CMVC Integration

CodeDelta integrates with any CI/CD system or version control system that can export source files to a local directory. It does not interface directly with any CMVC system — it requires plain directory copies of your source.

Churn Threshold Gate

Use --threshold-churn <n> to fail the build if CRN_LLOC exceeds a threshold. Exit code 2 means the threshold was exceeded.

Jenkins Example

stage('Code Metrics') {
  steps {
    sh """
      codedelta ${WORKSPACE}/old ${WORKSPACE}/new \
        --project "${JOB_NAME}" \
        --old-label "${OLD_TAG}" \
        --new-label "${NEW_TAG}" \
        --snapshot-date "${BUILD_DATE}" \
        --exclude "vendor,tests" \
        -d ${WORKSPACE}/codedelta.db \
        -o ${WORKSPACE}/report.html \
        --threshold-churn 10000 -q
    """
  }
}

GitHub Actions Example

- name: Run CodeDelta
  run: |
    codedelta ./old ./new \
      --project "${{ github.repository }}" \
      --old-label "${{ github.event.before }}" \
      --new-label "${{ github.sha }}" \
      --snapshot-date "$(date +%Y-%m-%d)" \
      --exclude "vendor,node_modules" \
      -o report.html --csv report.csv -q

Database Access

The SQLite database (codedelta.db) accumulates every run. You can query it directly:

sqlite3 codedelta.db "
  SELECT r.run_at, rt.crn_lloc, rt.total_sloc
  FROM run r JOIN run_totals rt ON r.id = rt.run_id
  WHERE r.project_id = (SELECT id FROM project WHERE name = 'MyApp')
  ORDER BY r.run_at DESC LIMIT 10;"

Key DB Tables

TableContents
projectProject names and creation dates
runEach run: old/new dirs, labels, snapshot_date, exclude_dirs, note
run_totalsProject-level totals for every metric per run
file_metricsPer-file metrics for every file in every run
metric_setNamed metric sets
metric_set_itemMetric codes belonging to each set
08

Accuracy & Methodology

The Myers Algorithm

CodeDelta uses the Myers diff algorithm for file comparison, which finds the shortest edit script between two files. This is more accurate than simple line-by-line comparison for files that have been significantly restructured.

Jaccard Similarity

Before running the full Myers diff, CodeDelta uses Jaccard similarity to detect file renames and large-scale moves. A file that appears deleted in the old snapshot and created in the new, but with high token similarity, is treated as a rename rather than a delete+add. This prevents inflated churn numbers from routine refactoring.

LLOC vs GNU diff

LLOC comparison works by tokenising each file's semicolon-terminated statements and running Myers on the resulting token stream rather than the line stream. This means a purely cosmetic reformatting — changing indentation, line breaks, or spacing within a statement — does not register as a change. Only actual logical changes increment CHG_LLOC.

PLOC Definition

PLOC counts lines beginning with # (after optional whitespace) in C/C++ and C# files. This includes all preprocessor directives: #include, #define, #ifdef, #ifndef, #endif, #pragma, #undef, #if, #else, #elif, #error, #warning. PLOC is included within SLOC and LOC — it is not subtracted, it is identified separately for clarity.

Comment Counting

J_COM counts /** ... */ Java-style docblock comments. C_COM counts /* ... */ C-style block comments. EOL_COM counts // to-end-of-line comments. A multi-line block comment counts as one comment regardless of how many lines it spans. COM_LOC = J_COM + C_COM + EOL_COM.

Character Literal Handling

Single-quoted character literals are correctly excluded from LLOC counting. A semicolon inside a character literal — such as char c = ';' — is not counted as a logical statement. This applies to C/C++, C#, Java, and JavaScript. Escape sequences ('\\', '\n', '\0') are also handled correctly. This was a known limitation corrected in v1.4.1.

09

Licensing

CodeDelta requires a valid licence to run. Two licence types are available:

TypeDurationUse
Time-locked evaluation30 daysFree for evaluation
Node-lockedPerpetualSingle machine, following purchase
FloatingPerpetualOrganisation-wide, network validated

Installing a Licence

Set the environment variable before running:

export CODEDELTA_LICENSE="CD-TIMED-20270503-XXXXXXXXXXXXXXXXX"
./codedelta old/ new/

Or store it in a file in your home directory:

echo "CD-TIMED-20270503-XXXXXXXXXXXXXXXXX" > ~/.codedelta_license

Contact

For licensing, volume pricing, or academic licences: admin@stackbun.com

10

File Certification — certify.py

certify.py is a standalone tool that answers the question every AI-assisted development team needs answered: "Is this the correct, tested version of every file?"

It works by recording the SHA256 fingerprint of every source file at the exact moment your tests pass. If any file changes after that — whether by accident, by an AI session, or by copying the wrong version — certify.py tells you immediately.

certify.py requires Python 3.8 or later and has no other dependencies. Drop it anywhere and run it directly. It is not part of the CodeDelta distribution —.

Quick Start

# After your tests pass, certify your files:
python3 certify.py certify --build 1 --src . --tests "pytest"

# Check status at any time:
python3 certify.py safe --src .

Commands

Supported Test Frameworks

certify.py works with any test framework. Pass the test command with --tests, or use --tests auto to auto-detect:

python3 certify.py certify --build 2 --tests "pytest"
python3 certify.py certify --build 2 --tests "npm test"
python3 certify.py certify --build 2 --tests "go test ./..."
python3 certify.py certify --build 2 --tests "cargo test"

AI Session Tracking

After every AI coding session, tag the files that were produced:

python3 certify.py ai-session \
  --files src/new_feature.py src/config.json \
  --label "Claude session — added payment module"

Tagged files are marked ⚡ in all certification reports. If a tagged file changes after certification without being re-tested, the safe command reports HIGH RISK.

How Certification Carries Forward

Files that have not changed since their last certification automatically carry their certification forward. Only changed files need re-certifying at each build. The history command shows the full audit trail — which build each file was first certified in, and whether it has changed since.

CI/CD Integration

# GitHub Actions — block deployment of uncertified code
- name: Verify file certification
  run: python3 certify.py verify --src . --strict
11

File Integrity — verify_files.py

verify_files.py is included in the CodeDelta distribution. It verifies that every source file matches its embedded FILE-ID — a SHA256 checksum of the file's own content, stored as a comment in the file itself.

Every file in the CodeDelta source distribution contains a line like:

// FILE-ID: 9b32c6f257db9c1cc28d1cb178dce47fa0ed06027468b56434f4993e01b48113

This ID is the SHA256 of the file content excluding that line. If the file has been modified since the ID was set, the computed hash will not match the stored ID.

Usage

# Verify all files in the current directory
python3 verify_files.py verify .

# Add or update FILE-IDs
python3 verify_files.py add .

Output

  ✓ src/report_html.cpp
  ✓ src/diff_engine.cpp
  ✗ MISMATCH: src/codedelta.cpp
      stored:   212d145ba32f7b43f7cf9516143b6c06...
      computed: 7f3a91d8e4c2b1a9f6d3e8c7b4a2d1f5...

Result: 38 OK, 1 FAILED, 2 skipped

A MISMATCH means the file has been modified since the FILE-ID was last set. Either the modification was intentional (update the FILE-ID with verify_files.py add) or the file has been corrupted or substituted.

12

Versioning & File Control

CodeDelta uses a three-layer versioning system so that every build is uniquely identifiable.

The three layers

LayerWhereWhen it changes
Master semver~/code-delta/VERSION (one-line text file)Manually, on major / minor / patch bumps
C++ binary version~/code-delta/src/version.hManually, when the binary is rebuilt with significant changes
Per-file buildHeader block at the top of each Python componentAutomatically, every time the file is deployed

The per-file header block

# FILE-ID: <64-char SHA-256 hash>
# ─── Component versioning ─────────────────────────────────
# CODEDELTA_COMPONENT: ai_ml_plugin
# CODEDELTA_BUILD: 5
# CODEDELTA_BUILD_DATE: 2026-05-28
# CODEDELTA_BUILD_TIME: 19:41
# ──────────────────────────────────────────────────────────

Every tagged Python file carries this block. The build number increments by one on every deploy. Date and time are stamped from the system clock. The FILE-ID is a SHA-256 of the file body (excluding the FILE-ID line itself) — it changes whenever the file content changes.

The /version endpoint

The GUI server exposes http://localhost:7654/version as a JSON endpoint. It returns the master version, the C++ binary's version and build, and an array of every Python component file with its current build number, date, time, and short FILE-ID. Bug reports should include this JSON so the developer knows exactly which build of every component is on the user's machine.

The About box (beta only)

In beta builds, the About box has a "Components" section listing every Python file's build details. In release builds, this section is hidden — end users see only the master version, release date, and licence info.

The deploy.sh utility

The deploy.sh script in the source repo handles the mechanical work of installing a patched file into the runtime location. It:

Common commands:

deploy.sh <filename>          # full deploy with backup + version bump
deploy.sh --dry-run <file>    # see what would happen, no changes
deploy.sh --revert <file>     # roll back to most recent backup
deploy.sh --list              # show routing table
deploy.sh --help              # help screen

For full details see CDVC.md in the source repo root.

Note: deploy.sh is a Unix shell script for macOS and Linux. Windows users patching files should copy the patched file manually into the CodeDelta installation directory; the auto-versioning runs server-side and is platform-independent.