Source code change metrics, AI generation detection, and agent scan.
scan command runs the full analysis — AI Audit, Agent Scan, and churn measurement — with no graphical interface, for terminals, cron jobs, and CI pipelines. python3 codedelta_server.py scan ./src.--mode selection — run any single analysis or combination: churn, ai, agent, both, or ai_audit (everything). Churn-only skips all ML for speed.--csv and --json (previously HTML and database only) — for dashboards, spreadsheets, and pipelines.--fail-on-critical and --fail-on-ai N set a non-zero exit code so a scheduled job or build can react automatically.ReActAgent, MsgHub, sequential_pipeline, register_tool_function).exec in Java, C++, or JavaScript no longer triggers a false positive. The summary now counts genuine agent activity rather than combined AI-written risk..zip) and Linux (.tar.gz) downloads are now complete self-contained bundles — engine, ML, GUI, and licence verifier together — matching the Windows installer.CodeDelta is a command-line source code metrics tool with a browser-based GUI. It compares two snapshots of a source project — an old version and a new version — and measures exactly what changed, how much, and how complex. It is a professional-grade source code analysis tool for macOS, Linux, and Windows.
If your source files are constantly being changed, the source base is not stable and is not ready for release. CodeDelta makes this visible. Every build, every sprint, every release — run CodeDelta and watch the churn numbers. A healthy codebase shows decreasing churn as a release approaches. A codebase that remains "hot and active" near release is a risk.
SLOC (Source Lines of Code) is a physical count — the number of non-empty, non-comment lines in the file. Two statements on one line = 1 SLOC.
LLOC (Logical Lines of Code) is a semicolon count — the number of logical statements, excluding those inside comments or string literals. One statement spread over three lines = 1 LLOC. This is invariant to formatting and is the preferred metric for measuring real code change.
int i=0; float j=0; // This is 2 LLOC, 1 SLOC
cout << "Testing"
<< " Hello"
<< " there."; // This is 1 LLOC, 3 SLOC
CodeDelta counts all source files recursively under the base directory, maps each file to a language by extension, and counts metrics per file. For comparisons, files are matched by relative path. A file present in the new project but not the old is Added (N). A file in the old but not the new is Deleted (D). A file present in both is either Changed (C) or Unchanged (U).
CodeDelta supports over 30 languages out of the box. Full two-pass LLOC (Logical Lines of Code) is computed for C/C++, C#, Java, JavaScript/TypeScript, Python, PHP, SQL, Ruby, Perl, and Shell. For Ada, Assembly, CSS, Fortran, IDL, PowerBuilder, PL/SQL, VHDL, VB, XML, and Windows Batch, SLOC and LLOC are equal (no separate logical-line lexer). HTML, ASP, and JSP use the HTML lexer (SLOC only).
See the complete extension-to-language reference in section 03 (GUI Guide → Extension Overrides) for which file extensions auto-detect to which language. Custom mappings can be added without recompiling via the Extension Overrides field.
CodeDelta computes the source code metrics most commonly used in software-engineering analysis: SLOC, LLOC, comment counts, file counts, and per-language breakdowns. See the Metrics Reference table above for the full set.
| Code | Name | Description |
|---|---|---|
| LOC | Lines of Code | Total lines including whitespace and comments. A count of newlines. |
| SLOC | Source Lines of Code | Non-empty, non-comment lines. Physical line count of actual source. |
| LLOC | Logical Lines of Code | Semicolon count excluding those in comments or string literals. Statement count. |
| PLOC | Preprocessor Directive LOC | Lines beginning with # (#include, #define, #ifdef etc.). C/C++ and C# only. |
| COM_LOC | Comment Lines of Code | Total comment lines = J_COM + C_COM + EOL_COM. |
| J_COM | Java-style Comments | /** ... */ block comment count. |
| C_COM | C-style Comments | /* ... */ block comment count. |
| EOL_COM | End-of-Line Comments | // to end-of-line comment count. |
| BYTES | File Size | File size in bytes from the filesystem. |
| NFILE | Number of Files | Total number of source files in the project. |
| Code | Name | Description |
|---|---|---|
| CHG_SLOC | Changed SLOC | Source lines that exist in both files but differ. |
| DEL_SLOC | Deleted SLOC | Source lines in the old file not present in the new. |
| ADD_SLOC | Added SLOC | Source lines in the new file not present in the old. |
| CRN_SLOC | Churn SLOC | CHG_SLOC + DEL_SLOC + ADD_SLOC. Total change volume. |
| CHG_LLOC | Changed LLOC | Logical lines that exist in both but differ. The primary metric. |
| DEL_LLOC | Deleted LLOC | Logical lines in old not in new. |
| ADD_LLOC | Added LLOC | Logical lines in new not in old. |
| CRN_LLOC | Churn LLOC | CHG_LLOC + DEL_LLOC + ADD_LLOC. |
| Code | Name | Description |
|---|---|---|
| CHG_FILE | Changed Files | Number of files present in both snapshots that changed. |
| DEL_FILE | Deleted Files | Number of files in the old snapshot not in the new. |
| ADD_FILE | Added Files | Number of files in the new snapshot not in the old. |
| CRN_FILE | Churn Files | CHG_FILE + DEL_FILE + ADD_FILE. |
| Status | Meaning |
|---|---|
| C | Changed — file exists in both snapshots and content differs |
| N | New — file exists only in the new snapshot (added) |
| D | Deleted — file exists only in the old snapshot (removed) |
| U | Unchanged — file exists in both and content is identical |
| O | Old — CSV only: old metrics row for a changed file |
| X | Diff — CSV only: arithmetic difference row for a changed file |
LLOC is computed in two passes. Pass 1 strips all comments and string literals, then counts semicolons. Pass 2 identifies function boundaries using brace counting and signature detection, allowing CodeDelta to attribute LLOC to specific functions. The two-pass approach resolves the common ambiguity where a single logical statement spans multiple physical lines.
When LLOC diverges significantly from SLOC (ratio > 3.0 or < 0.3), CodeDelta flags the file with an LLOC warning in the report. This usually indicates unusual formatting — very long chained expressions, or highly condensed single-line code.
Start the GUI server from the Terminal:
python3 /Applications/CodeDelta/codedelta_server.py
Then open http://localhost:7654 in your browser. The GUI runs entirely locally — no internet connection required, no data leaves your machine.
The Run Analysis tab is split into two columns. Pick whichever fits the work you're doing:
| Mode | What it does |
|---|---|
| Code Churn | Standard SLOC/LLOC churn metrics between two snapshots. The classic CodeDelta function. |
| Code Churn + AI Audit | Churn metrics PLUS AI Audit on the new snapshot. Single pass, three reports. |
| Mode | What it does |
|---|---|
| Single Project Code Churn | Per-file SLOC, LLOC, comment counts on one snapshot. No comparison. |
| AI Audit | Runs BOTH AI Code Scan and Agent Scan, producing TWO reports side by side. Use this when you want a complete AI assessment. |
| AI Code Scan | Just the GSS + MLS → AIC analysis. Produces codedelta_ai_code_scan.html with an inline source viewer that highlights flagged lines. |
| AI Agent Scan | Just the AIS (Agent Initiation Signature) analysis. Produces codedelta_agent_scan.html for security review. |
| Field | Description |
|---|---|
| Old Project | Path to the previous version of the source code. |
| New Project | Path to the current version of the source code. |
| Exclude Directories | Comma-separated directory names to skip (e.g. vendor,tests,node_modules). New in v1.4.0. |
| Extension Overrides | Map non-standard extensions to existing language parsers. Comma-separated ext=lang pairs (e.g. h2=cpp,inc=php,ksh=sh). New in v1.4.2. |
| HTML Report | Output path for the HTML report. |
| Database | Path to the SQLite database file. Accumulates all runs for trend analysis. Defaults: ~/Library/Application Support/CodeDelta/codedelta.db on macOS, ~/.local/share/CodeDelta/codedelta.db on Linux, %APPDATA%\CodeDelta\codedelta.db on Windows. Survives reinstalls. |
| Project Name | A label for this project, shown in reports and the History tab. |
| Old Label | A label for the old snapshot (e.g. v1.0, sprint-13). |
| New Label | A label for the new snapshot (e.g. v1.1, sprint-14). |
| Snapshot Date | The date the source was extracted (YYYY-MM-DD). Stored in the DB and shown in reports. New in v1.4.0. |
| Metric Set | Optional: select a named metric set to filter which columns appear in the report and CSV. New in v1.4.0. |
A Metric Set is a named subset of metric columns. When active, only the selected metrics appear in the HTML report table and CSV output. This is useful for management reports that only need CHG_LLOC and ADD_LLOC, without the full detail of every metric.
To create a Metric Set:
The set is saved in the database and appears in the dropdown for future runs. To edit an existing set, click Edit beside it — it loads into the editor. To delete, click Delete.
To run without a metric set filter, leave the dropdown on — All metrics (no filter) —.
Available metric codes for sets: LOC, SLOC, LLOC, PLOC, COM_LOC, J_COM, C_COM, EOL_COM, Bytes, CHG_SLOC, DEL_SLOC, ADD_SLOC, CRN_SLOC, CHG_LLOC, DEL_LLOC, ADD_LLOC, CRN_LLOC, CHG_COM, DEL_COM, ADD_COM, CRN_COM, CHG_FILE, DEL_FILE, ADD_FILE, CRN_FILE
Enter a comma-separated list of directory names to exclude from both the old and new scans. For example: vendor,tests,third_party,generated. The names are matched against each path component — you do not need to provide full paths.
Built-in excluded directories (always skipped): .git, .svn, .hg, node_modules, __pycache__, vendor, dist, build, Debug, Release
Excluded directories are stored in the database run record and shown in the report header, so you always know what was and wasn't included in a historical run.
Some organisations use non-standard file extensions — for example .h2 for C++ headers, .inc for PHP include files, or .ksh for shell scripts. By default CodeDelta ignores files with unrecognised extensions. Extension overrides let you map any extension to an existing language parser without recompiling.
Enter a comma-separated list of ext=lang pairs in the Extension Overrides field:
h2=cpp,inc=php,ksh=sh
The target language can be any known extension (cpp, py, java, sh etc.) or a language name. Unknown target languages are silently ignored. The override takes effect before directory scanning, so all file matching uses the updated map.
| Language | Auto-detected extensions |
|---|---|
| C / C++ | c h cpp hpp cc cxx |
| C# | cs |
| Java | java |
| JavaScript / TypeScript | js jsx ts tsx |
| Python | py pyw |
| PHP | php |
| Ruby | rb |
| Perl | pl pm |
| Shell | sh ash bash bsh csh tcsh tsh zsh |
| SQL | sql |
| SQL (Oracle/PL-SQL) | pls pks pkb |
| Visual Basic | vb bas cls frm |
| VBScript | vbs |
| Windows Batch | bat cmd |
| Ada | ada adb ads |
| Fortran | f f90 f95 for |
| Assembly | asm s |
| VHDL | vhd vhdl |
| HTML | html htm htp |
| ASP | asp aspx |
| JSP | jsp |
| CSS | css |
| XML | xml xsd xsl xslt wsml |
| IDL | idl |
| Symbian MMP | mmp |
| PowerBuilder | srd srf srs sru srw |
| Smalltalk | st |
| Eiffel | e |
| Lisp | lisp lsp cl scm el |
| μCode | uc |
| Text | txt tsv cvs install readme |
When mapping a custom extension via the override field, the target side of the ext=lang pair can be any of the codes above (e.g. cpp, py, java, sh, f90). Both the alias and the destination must already be known to CodeDelta — unknown destinations are silently ignored.
After a run, results appear as coloured tiles in four groups:
Shows all previous runs stored in the selected database. Columns include run date, snapshot date, project, old/new labels, file counts, SLOC/LLOC totals, and churn metrics. Click any row to open its HTML report.
Charts CRN_SLOC and CRN_LLOC over time for a selected project. A declining trend indicates the codebase stabilising toward release. An increasing trend near a release deadline is a quality risk.
codedelta <old-dir> <new-dir> [options]
| Flag | Description |
|---|---|
-o, --output <file> | HTML report output path (default: codedelta_report.html) |
--csv <file> | Also write a CSV report |
--xml <file> | Also write a structured XML report. Contains all metrics including PLOC. |
-d, --db <file> | SQLite database path (default: codedelta.db) |
--project <name> | Project name for the database and report |
--old-label <label> | Label for the old snapshot |
--new-label <label> | Label for the new snapshot |
--note <text> | Freeform note stored in the database run record |
--exclude <dirs> | Comma-separated directory names to skip (e.g. vendor,tests). New in v1.4.0. |
--snapshot-date <date> | Date source was extracted, YYYY-MM-DD. New in v1.4.0. |
--metric-set <codes> | Comma-separated metric codes to include in report/CSV. New in v1.4.0. |
--ext <pairs> | Map non-standard extensions to language parsers. Comma-separated ext=lang pairs, e.g. h2=cpp,inc=php. New in v1.4.2. |
--threshold-churn <n> | Exit with code 2 if CRN_LLOC exceeds n (for CI gates) |
-v, --verbose | Verbose output |
-q, --quiet | Suppress all stdout output |
-V, --version | Print version and release notes |
-h, --help | Print usage |
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (invalid arguments, missing directory, etc.) |
| 2 | Churn threshold exceeded (--threshold-churn) |
The CSV has one header row and one or more data rows per file. For changed files, three rows are output:
The CSV output format is compatible with standard spreadsheet tools. Unchanged and deleted files have one row only. The file ends with a TOTAL row and a NFILE summary row.
If a Metric Set is active via --metric-set, only the specified columns appear in the CSV header and data rows.
#!/bin/bash
codedelta /builds/project-v1.1 /builds/project-v1.2 \
--project "MyApp" \
--old-label "v1.1" \
--new-label "v1.2" \
--snapshot-date "$(date +%Y-%m-%d)" \
--exclude "vendor,node_modules,generated" \
--db /var/lib/codedelta/myapp.db \
-o /var/www/reports/myapp-latest.html \
--csv /var/www/reports/myapp-latest.csv \
--threshold-churn 5000 \
-q
if [ $? -eq 2 ]; then
echo "ALERT: churn exceeds threshold — build is hot"
exit 1
fi
codedelta /src/old /src/new \
--metric-set "SLOC,LLOC,CHG_LLOC,ADD_LLOC,CRN_LLOC,CHG_FILE,CRN_FILE" \
-o management_report.html \
--csv management_report.csv
CodeDelta is built on 20 years of expertise in code-churn measurement across huge legacy codebases deployed globally. Its core purpose is to measure changed, added, and deleted SLOC and LLOC across massive, multi-language projects — and that is where its proven value lies.
It seemed natural to extend this into the emerging area of AI-generated-code and AI-agent detection. We researched this seriously, using both a heuristic approach — GSS (Generation Signature Scoring) — and a more sophisticated machine-learning approach — MLS (Machine Learning Scoring) — the latter trained on datasets of known AI- and human-written code.
Our honest finding: AI-generated code can be detected only unreliably. It is possible to identify signals and properties that suggest code was AI-generated, but no technique we assessed does so dependably. In particular, if a developer instructs an AI to write code in the style of a human, every detection method we tried or uncovered proved ultimately unreliable. The differences that remain detectable are largely cosmetic — commenting and formatting habits — which are easily changed and are not proof of authorship.
For that reason, the AI Scan is a pointer toward code that may warrant a closer look — nothing more. It is not a determination of authorship, and it carries a meaningful false-positive rate on code unlike its reference data. Treat every flag as a prompt to review, never as evidence.
We also added a churn metric, Replacement Churn, on the observation that AI tools often delete and replace whole blocks rather than editing in place. This too is only a pointer: it reflects a workflow pattern, not authorship, and as AI tooling shifts toward in-line editing the signal is likely to weaken. It is a useful hint, not a measure of AI code.
The one component that detects something concrete is the Agent Scan, which identifies calls to known AI-agent frameworks and SDKs — because those are real, named artifacts in the code rather than statistical guesses.
In short: the AI element is provided free of charge, as an experimental extra, offered honestly for what it is — a set of pointers, not verdicts. Because it is included at no additional cost and makes no claim to reliable or definitive AI detection, it should not be regarded as a paid feature or relied upon as one. We genuinely welcome feedback on whether the AI features are useful to you and how they might be improved.
By contrast, CodeDelta's churn engine — the measurement of SLOC and LLOC change across versions — is the product's mature, dependable core, refined over two decades and trusted on large-scale legacy systems worldwide. The churn engine is the product you are purchasing; the AI features are a complimentary (no-cost) addition to it.
CodeDelta detects AI-generated code via two complementary scans:
The AI Audit mode in the GUI runs BOTH scans in a single pass, producing two separate HTML reports (codedelta_ai_code_scan.html and codedelta_agent_scan.html).
Each file receives a Generation Signature Score (GSS) from 0 to 100. The score is the weighted sum of several signals. A high score indicates the file exhibits many of the structural patterns associated with AI-generated code. A low score does not prove the code is human-written — it means no unusual patterns were detected.
| Signal | Weight | What it detects |
|---|---|---|
| Structural uniformity | 40 | Many repeated function bodies in one file. Strong signal — humans parameterise repeated logic, AI/generators produce N copies. New in v1.5.0. |
| Docblock coverage | 40 | Unusually uniform documentation — every function has a structured docblock |
| Guard clause density | 35 | High ratio of defensive checks (null checks, boundary guards) to total code |
| Formulaic exceptions | 30 | Exception messages that follow predictable patterns ("Invalid X: must be Y") |
| Audit/notify coupling | 25 | Logging calls systematically paired with operations |
| Annotation saturation | 20 | High density of decorators, attributes, or annotations |
| Zero inline comments | 10 | No // inline comments despite high docblock coverage |
| High add velocity | 15 | Large newly-added files with no history of incremental change |
| LLOC warning | 15 | SLOC/LLOC ratio anomaly suggesting unusual formatting |
| Rating | Score | Meaning |
|---|---|---|
| HIGH | > sensitivity threshold | File exhibits strong AI generation signals |
| ELEVATED | threshold × 0.6 – threshold | Several signals present — warrants review |
| NORMAL | < threshold × 0.6 | No unusual patterns detected |
The sensitivity slider in the GUI controls the threshold for HIGH classification:
GSS was validated against two corpora. A set of 26 human-written C++ files produced a mean score of 0.0 with no HIGH ratings. A set of 12 AI-generated files across multiple languages produced a mean score of 48.9 with 2 HIGH and 3 ELEVATED ratings. Zero false positives were observed on the human corpus.
GSS measures patterns, not intent. A highly disciplined human developer who writes uniform docblocks and systematic null checks will score higher than average. The score is a signal for review, not a determination of authorship. It is most useful when scores are unexpectedly high for files in a codebase where the baseline is known.
When the ML plug-in is installed, the AI Code Scan report shows two additional columns alongside GSS:
| Language | GSS | MLS | Notes |
|---|---|---|---|
| Python | ✓ | ✓ (83%) | Original ML model |
| C / C++ | ✓ | ✓ (99%) | |
| Java | ✓ | ✓ (99%) | |
| C# | ✓ | ✓ (97.5%) | New in v1.7.0 |
| JavaScript / TypeScript | ✓ | — | ML model planned for v1.8 |
| Go | ✓ | — | ML model planned for v1.8 |
| Other languages | ✓ | — | GSS-only scoring |
When MLS is unavailable for a language, the AIC column shows — and risk classification uses GSS alone. All other functionality is unaffected.
Files are placed into risk bands based on the AIC score (or GSS where MLS is not available):
| Band | AIC | Meaning |
|---|---|---|
| HIGH | 50+ | Strong evidence of AI generation. Review the file. |
| ELEVATED | 30–49 | Several signals present. May warrant review depending on context. |
| NORMAL | <30 | No unusual patterns detected. |
The AI Detection Settings card on the Run Analysis page (visible in any AI Audit or AI Code Scan mode) opens a configuration dialog where you can fine-tune detection per language. The card's status line shows "X/Y signals active · N custom patterns" — a quick view of how much of the default scoring is in effect.
The dialog has six tabs: C/C++, Java, Python, C#, JavaScript, Go. The tab bar is sticky — it stays visible as you scroll through long signal lists.
Each language has a set of built-in signals. You can toggle individual signals on or off. This matters because some signals are context-dependent:
Signal configuration is saved to ml_config.json in the CodeDelta directory and takes effect on the next analysis run.
Add patterns specific to your codebase. Four pattern types:
# AI-generated.(?i)auto.?generated./generated/ or _ai.py.Each custom pattern can be set to Override — when the pattern fires, the file is always rated HIGH regardless of the ML score. Use this for definitive markers in your codebase.
Custom patterns apply across all AI Code Scan and AI Audit runs until removed. Saved to ml_config.json.
The Agent Scan analyses source files for code that initiates AI agents at runtime — code that imports AI SDKs, orchestrates agents, executes dynamic code, or constructs prompts from user input. This is entirely distinct from the AI Audit: Agent Scan detects code that runs AI agents, not code that was written by AI.
Agent Scan reads each source file and looks for four categories of pattern using regular expression matching. Each pattern category carries a weight. Files whose total weight exceeds the configured threshold are flagged.
The scan is entirely static — it reads source text only and never executes any code. It works on files even when dependencies are not installed.
| Category | Weight | What it detects | How it detects it |
|---|---|---|---|
| AI SDK imports | 25 | Import statements for 20+ AI frameworks and APIs | Matches import openai, from anthropic import, import langchain and similar patterns. Full list: openai, anthropic, langchain, autogen, crewai, llama-index, huggingface, together, cohere, mistral, groq, replicate, google.cloud.aiplatform, boto3 (bedrock), transformers, litellm, llamafile, ollama, semantic-kernel, haystack. |
| Agent orchestration | 40 | Function calls and class instantiations that invoke agent behaviour | Matches patterns like Agent(, AgentExecutor(, .run_agent(, chain.invoke(, pipeline(, TaskRunner(. These indicate the code is not just importing AI libraries but actively invoking agent workflows. |
| Dynamic code execution | 50 | Calls that execute code at runtime from strings or external sources | Matches eval(, exec(, subprocess with shell=True, os.system(, and dynamic __import__. These are dangerous in any context and especially so near AI calls. |
| Prompt injection vectors | 35 | User-controlled data being formatted directly into AI prompts | Matches patterns where variables (typically named user_input, request, query, message) are interpolated directly into strings that are then passed to AI API calls. A genuine prompt injection vector — not just any f-string. |
The most serious finding is the Rogue Agent pattern: a dynamic code execution call (eval, exec, or subprocess) that appears within 10 lines of an AI API call. This pattern indicates code where an AI model's output may directly influence what gets executed on the host machine — a significant and specific security risk.
A file flagged for the Rogue Agent pattern has its AIS rating escalated to HIGH regardless of its total score.
Example of Rogue Agent pattern:
response = client.chat.completions.create(...) # AI API call
code_to_run = response.choices[0].message.content
exec(code_to_run) # within 10 lines — ROGUE AGENT
A file that scores HIGH on both GSS (appears AI-generated) and AIS (initiates AI agents) receives a CRITICAL rating. This is the highest-risk scenario: code that nobody may have fully reviewed or understood, which autonomously initiates AI agents at runtime.
The Agent Scan report includes an inline source viewer. Click View Source on any flagged file to see it with dangerous lines highlighted:
When AI SDK imports are found, the Agent Scan report includes an SDK Inventory — a list of each recognised AI framework detected and the number of files importing it. This gives a quick overview of which AI dependencies are present across the codebase.
What Agent Scan cannot detect:
requests.post("https://api.openai.com/...") without importing the openai SDK, Agent Scan will not flag it.import openai as ai_lib may not match all detection patterns depending on subsequent usage.Agent Scan will produce false positives in some situations:
tests and test directories.The source viewer is the most effective tool for investigating potential false positives — it shows exactly which lines triggered each flag.
Agent Scan analyses all source files regardless of language but SDK import detection patterns are primarily calibrated for Python, JavaScript, and TypeScript — where AI agent frameworks are most commonly used. C++, Java, and C# projects that call AI APIs via HTTP without a recognised SDK will have lower AIS scores but may still be flagged for dynamic execution patterns.
CodeDelta integrates with any CI/CD system or version control system that can export source files to a local directory. It does not interface directly with any CMVC system — it requires plain directory copies of your source.
Use --threshold-churn <n> to fail the build if CRN_LLOC exceeds a threshold. Exit code 2 means the threshold was exceeded.
stage('Code Metrics') {
steps {
sh """
codedelta ${WORKSPACE}/old ${WORKSPACE}/new \
--project "${JOB_NAME}" \
--old-label "${OLD_TAG}" \
--new-label "${NEW_TAG}" \
--snapshot-date "${BUILD_DATE}" \
--exclude "vendor,tests" \
-d ${WORKSPACE}/codedelta.db \
-o ${WORKSPACE}/report.html \
--threshold-churn 10000 -q
"""
}
}
- name: Run CodeDelta
run: |
codedelta ./old ./new \
--project "${{ github.repository }}" \
--old-label "${{ github.event.before }}" \
--new-label "${{ github.sha }}" \
--snapshot-date "$(date +%Y-%m-%d)" \
--exclude "vendor,node_modules" \
-o report.html --csv report.csv -q
The SQLite database (codedelta.db) accumulates every run. You can query it directly:
sqlite3 codedelta.db "
SELECT r.run_at, rt.crn_lloc, rt.total_sloc
FROM run r JOIN run_totals rt ON r.id = rt.run_id
WHERE r.project_id = (SELECT id FROM project WHERE name = 'MyApp')
ORDER BY r.run_at DESC LIMIT 10;"
| Table | Contents |
|---|---|
| project | Project names and creation dates |
| run | Each run: old/new dirs, labels, snapshot_date, exclude_dirs, note |
| run_totals | Project-level totals for every metric per run |
| file_metrics | Per-file metrics for every file in every run |
| metric_set | Named metric sets |
| metric_set_item | Metric codes belonging to each set |
CodeDelta uses the Myers diff algorithm for file comparison, which finds the shortest edit script between two files. This is more accurate than simple line-by-line comparison for files that have been significantly restructured.
Before running the full Myers diff, CodeDelta uses Jaccard similarity to detect file renames and large-scale moves. A file that appears deleted in the old snapshot and created in the new, but with high token similarity, is treated as a rename rather than a delete+add. This prevents inflated churn numbers from routine refactoring.
LLOC comparison works by tokenising each file's semicolon-terminated statements and running Myers on the resulting token stream rather than the line stream. This means a purely cosmetic reformatting — changing indentation, line breaks, or spacing within a statement — does not register as a change. Only actual logical changes increment CHG_LLOC.
PLOC counts lines beginning with # (after optional whitespace) in C/C++ and C# files. This includes all preprocessor directives: #include, #define, #ifdef, #ifndef, #endif, #pragma, #undef, #if, #else, #elif, #error, #warning. PLOC is included within SLOC and LOC — it is not subtracted, it is identified separately for clarity.
J_COM counts /** ... */ Java-style docblock comments. C_COM counts /* ... */ C-style block comments. EOL_COM counts // to-end-of-line comments. A multi-line block comment counts as one comment regardless of how many lines it spans. COM_LOC = J_COM + C_COM + EOL_COM.
Single-quoted character literals are correctly excluded from LLOC counting. A semicolon inside a character literal — such as char c = ';' — is not counted as a logical statement. This applies to C/C++, C#, Java, and JavaScript. Escape sequences ('\\', '\n', '\0') are also handled correctly. This was a known limitation corrected in v1.4.1.
CodeDelta requires a valid licence to run. Two licence types are available:
| Type | Duration | Use |
|---|---|---|
| Time-locked evaluation | 30 days | Free for evaluation |
| Node-locked | Perpetual | Single machine, following purchase |
| Floating | Perpetual | Organisation-wide, network validated |
Set the environment variable before running:
export CODEDELTA_LICENSE="CD-TIMED-20270503-XXXXXXXXXXXXXXXXX"
./codedelta old/ new/
Or store it in a file in your home directory:
echo "CD-TIMED-20270503-XXXXXXXXXXXXXXXXX" > ~/.codedelta_license
For licensing, volume pricing, or academic licences: admin@stackbun.com
certify.py is a standalone tool that answers the question every AI-assisted development team needs answered: "Is this the correct, tested version of every file?"
It works by recording the SHA256 fingerprint of every source file at the exact moment your tests pass. If any file changes after that — whether by accident, by an AI session, or by copying the wrong version — certify.py tells you immediately.
certify.py requires Python 3.8 or later and has no other dependencies. Drop it anywhere and run it directly. It is not part of the CodeDelta distribution —.
# After your tests pass, certify your files:
python3 certify.py certify --build 1 --src . --tests "pytest"
# Check status at any time:
python3 certify.py safe --src .
certify.py works with any test framework. Pass the test command with --tests, or use --tests auto to auto-detect:
python3 certify.py certify --build 2 --tests "pytest"
python3 certify.py certify --build 2 --tests "npm test"
python3 certify.py certify --build 2 --tests "go test ./..."
python3 certify.py certify --build 2 --tests "cargo test"
After every AI coding session, tag the files that were produced:
python3 certify.py ai-session \
--files src/new_feature.py src/config.json \
--label "Claude session — added payment module"
Tagged files are marked ⚡ in all certification reports. If a tagged file changes after certification without being re-tested, the safe command reports HIGH RISK.
Files that have not changed since their last certification automatically carry their certification forward. Only changed files need re-certifying at each build. The history command shows the full audit trail — which build each file was first certified in, and whether it has changed since.
# GitHub Actions — block deployment of uncertified code
- name: Verify file certification
run: python3 certify.py verify --src . --strict
verify_files.py is included in the CodeDelta distribution. It verifies that every source file matches its embedded FILE-ID — a SHA256 checksum of the file's own content, stored as a comment in the file itself.
Every file in the CodeDelta source distribution contains a line like:
// FILE-ID: 9b32c6f257db9c1cc28d1cb178dce47fa0ed06027468b56434f4993e01b48113
This ID is the SHA256 of the file content excluding that line. If the file has been modified since the ID was set, the computed hash will not match the stored ID.
# Verify all files in the current directory
python3 verify_files.py verify .
# Add or update FILE-IDs
python3 verify_files.py add .
✓ src/report_html.cpp
✓ src/diff_engine.cpp
✗ MISMATCH: src/codedelta.cpp
stored: 212d145ba32f7b43f7cf9516143b6c06...
computed: 7f3a91d8e4c2b1a9f6d3e8c7b4a2d1f5...
Result: 38 OK, 1 FAILED, 2 skipped
A MISMATCH means the file has been modified since the FILE-ID was last set. Either the modification was intentional (update the FILE-ID with verify_files.py add) or the file has been corrupted or substituted.
CodeDelta uses a three-layer versioning system so that every build is uniquely identifiable.
| Layer | Where | When it changes |
|---|---|---|
| Master semver | ~/code-delta/VERSION (one-line text file) | Manually, on major / minor / patch bumps |
| C++ binary version | ~/code-delta/src/version.h | Manually, when the binary is rebuilt with significant changes |
| Per-file build | Header block at the top of each Python component | Automatically, every time the file is deployed |
# FILE-ID: <64-char SHA-256 hash>
# ─── Component versioning ─────────────────────────────────
# CODEDELTA_COMPONENT: ai_ml_plugin
# CODEDELTA_BUILD: 5
# CODEDELTA_BUILD_DATE: 2026-05-28
# CODEDELTA_BUILD_TIME: 19:41
# ──────────────────────────────────────────────────────────
Every tagged Python file carries this block. The build number increments by one on every deploy. Date and time are stamped from the system clock. The FILE-ID is a SHA-256 of the file body (excluding the FILE-ID line itself) — it changes whenever the file content changes.
The GUI server exposes http://localhost:7654/version as a JSON endpoint. It returns the master version, the C++ binary's version and build, and an array of every Python component file with its current build number, date, time, and short FILE-ID. Bug reports should include this JSON so the developer knows exactly which build of every component is on the user's machine.
In beta builds, the About box has a "Components" section listing every Python file's build details. In release builds, this section is hidden — end users see only the master version, release date, and licence info.
The deploy.sh script in the source repo handles the mechanical work of installing a patched file into the runtime location. It:
~/Downloads/ (handles the macOS (N) suffix automatically).~/.codedelta-backups/.Common commands:
deploy.sh <filename> # full deploy with backup + version bump
deploy.sh --dry-run <file> # see what would happen, no changes
deploy.sh --revert <file> # roll back to most recent backup
deploy.sh --list # show routing table
deploy.sh --help # help screen
For full details see CDVC.md in the source repo root.
Note: deploy.sh is a Unix shell script for macOS and Linux. Windows users patching files should copy the patched file manually into the CodeDelta installation directory; the auto-versioning runs server-side and is platform-independent.