How CodeDelta works, what its reports mean, and how to use it from the GUI, the command line, and in automated jobs.
A comprehensive guide to how CodeDelta works, what its reports mean, and how to use it from the GUI, the command line, and in automated jobs. If a question isn't answered here, it's a candidate to be added — this FAQ is meant to be the first place to look.
CodeDelta compares two versions of a source-code tree — an old version and a new version — and measures what changed between them. It produces metrics (how much code was added, deleted, and changed), HTML reports, and optional CSV/XML exports, and it can record results over time in a database for trend analysis. It also includes AI-assisted analysis of the code.
You need two folders: the old version of your project and the new version.
GUI: start the server and open it in your browser:
python3 codedelta_server.py
then go to http://localhost:7654, point it at your two folders, and run.
Command line:
codedelta ./project_v1 ./project_v2 -o report.html
This writes report.html (plus two companion files — see "What reports do I get?").
No — the GUI is a front-end wrapper around the same engine. Anything the GUI does,
the command-line codedelta binary does underneath. Use whichever you prefer; the
results are identical.
CodeDelta measures change from old to new. "Added" means present in new but not old; "deleted" means present in old but not new; "changed" means present in both but modified. Getting them the right way round matters — swapping them inverts adds and deletes.
A single run with -o report.html produces three HTML files:
- report.html — the main detailed report (per-file metrics)
- report_overview.html — a PM-friendly summary
- report_diff.html — a side-by-side diff viewer
Optionally you can also export:
- --csv <file> — per-file metrics as CSV (for spreadsheets)
- --xml <file> — XML export (EPM-compatible format)
Wherever you point -o. If you don't specify, the default is
codedelta_report.html in the current folder. The two companion files
(_overview, _diff) are always written next to the main report.
All progress and result messages go to stderr; stdout is reserved for
machine-readable output (--help, --version, and future JSON modes). This means
you can pipe stdout cleanly in scripts without progress noise mixed in.
CodeDelta classifies every line into one of four states (the C/A/D/U model inherited from EPM): - A — Added: new lines not in the old version - D — Deleted: lines removed from the old version - C — Changed: lines that were modified - U — Unchanged: lines identical in both (when all change metrics are zero, the file's status is Unchanged)
Each is measured across three line types, giving the column prefixes you see: - SLOC — Source Lines Of Code (physical lines, comments stripped) - LLOC — Logical Lines Of Code (statements; e.g. semicolon-delimited in C-like languages) - COM — Comment lines
So ADD_SLOC = source lines added, DEL_LLOC = logical lines deleted,
CHG_COM = comment lines changed, and so on.
CRN is churn — a combined measure of logical-line change (added + deleted +
changed logical lines). It's the single best "how much real change happened here"
number, which is why it's the metric the CI gate (--threshold-churn) watches.
SLOC counts physical source lines (after removing comments and blank lines). LLOC
counts logical statements. One physical line can hold several statements
(a=1; b=2; is one SLOC, two LLOC), and one statement can span several physical
lines (the reverse). For languages without statement delimiters (e.g. HTML, plain
text), LLOC is defined to equal SLOC.
HTML has no logical-line concept (no statement delimiters like semicolons), so by definition LLOC = SLOC for HTML. This is intentional and matches EPM behaviour.
For changed files, CodeDelta reports the new file's line count (EPM convention) — the current state is what you usually care about. Deleted files are counted from the old version (the new doesn't exist).
Any non-comment, non-whitespace difference flags a file as Changed. Re-saving with
different line endings, reformatting, or a one-character edit all count. Use the
diff viewer (report_diff.html) to see exactly what differs.
Yes — --metric-set takes a comma-separated list of metric codes. With no
--metric-set, all metrics are reported.
CodeDelta recognises a wide range by file extension, including: C/C++ (.c .h .cpp
.hpp .cc .cxx), C# (.cs), Java (.java), JavaScript/TypeScript (.js .jsx .ts
.tsx .mts .cts), Python (.py .pyw), PHP (.php), HTML (.html .htm), CSS,
XML, SQL/PL-SQL (.sql .pls .pks .pkb), Perl (.pl .pm), Visual Basic (.vb .bas
.cls .frm .vbs), Ada (.ada .adb .ads), VHDL (.vhdl .vhd), assembler (.asm
.s), Fortran (.f .f90 .f95 .for), Ruby (.rb), shell (.sh), batch (.bat),
ASP/JSP (.asp .aspx .jsp), PowerBuilder (.srd .srf .srs .sru .srw), IDL, and
plain text/readme files.
Yes — --ext lets you map extra extensions to a language for that run. For example,
to treat .inc files as C:
codedelta old new --ext "inc=c" -o report.html
(Check --help for the exact syntax in your build.)
Comment changes (COM metrics) are tracked apart from code (SLOC/LLOC) so that documentation churn doesn't inflate your code-change figures — and so you can see documentation effort on its own.
Pass -d <file> (or --db) to record each run into a SQLite database. It
appends — every run adds a new snapshot, it never overwrites previous data. Over
many runs this builds a longitudinal history you can chart (churn over time, growth,
etc.). The default database is codedelta.db.
It's a standard SQLite file. You can open it with any SQLite tool:
sqlite3 codedelta.db
…then run SQL queries, or use a GUI like DB Browser for SQLite. Because it's plain SQLite, you can also read it from Python, Excel (via ODBC), or any language with a SQLite driver.
No. The database is append-only by design — each run adds a snapshot. The history
is the point; it's what powers trend analysis. (If you ever want a fresh history,
point -d at a new filename or delete the old DB file deliberately.)
Generally no — it can grow large and changes every run. Keep it on disk (it's needed for trends), but exclude it from git. Back it up separately if the history matters to you.
Use the metadata flags, which are recorded in the DB and the overview report:
- --project <name> — the project name (defaults to the new directory's name)
- --old-label <str> / --new-label <str> — version labels (e.g. v1.0, a git
hash, a date)
- --note <str> — free text (e.g. "nightly cron run")
codedelta <old_dir> <new_dir> [options]
-o, --output <file> Main HTML report (default codedelta_report.html)
--csv <file> Also write per-file CSV
--xml <file> Also write XML (EPM-compatible)
-d, --db <file> SQLite DB for trending (appends)
--project <name> Project name
--old-label <str> Label for old version
--new-label <str> Label for new version
--note <str> Free-text note
--exclude <dirs> Directories to skip
--ext <map> Add file-extension → language mappings
--metric-set <list> Report only these metrics
--snapshot-date <d> Override the recorded snapshot date
--threshold-churn N Exit code 3 if total churn exceeds N (CI gating)
-v, --verbose Print every file processed
-q, --quiet Errors only (for cron)
-h, --help Show help
-V, --version Show version
Use --quiet (errors only) and point the DB and report at fixed paths. Example:
codedelta /path/yesterday /path/today \
--project Parky \
--old-label "$(date -d yesterday +%Y-%m-%d)" \
--new-label "$(date +%Y-%m-%d)" \
--note 'nightly cron' \
-d /var/log/codedelta/parky.db \
-o /var/log/codedelta/latest.html \
--quiet
Add that line to your crontab (crontab -e) with a schedule. Because --quiet
prints only errors, a clean run produces no output — ideal for cron.
Yes — that's what --threshold-churn N is for. If total churn (CRN_LLOC) exceeds
N, CodeDelta exits with code 3. Wire that into CI:
codedelta old new --threshold-churn 5000 -o report.html || echo "Too much churn!"
Your CI system can treat exit 3 as a failed gate.
--threshold-churn exceededUse --exclude with the directories to skip:
codedelta old new --exclude "node_modules,build,dist" -o report.html
CodeDelta includes AI-assisted analysis that examines the source and produces its own report (the AI Code Scan), separate from the change metrics. It can flag characteristics of the code using trained models. The detection settings (including per-language configuration) are available in the GUI's AI Detection settings.
See the Privacy section below. CodeDelta is designed to run locally.
This is a fair question of any AI-detection model, and CodeDelta addresses it directly with what we call header-blind models. The detector is deliberately built not to rely on signals a person could trivially fake.
The weakness we found and closed: when the Java model was first trained, its single strongest signal — roughly 28% of the model's entire decision — was whether a file carried an author/copyright header comment. We measured the effect, and a rich header pushes a file's AI-probability down (in one test, from 70.9% toward 63.8%) — a tidy header makes code look more “human” to the model. That is gameable: a team that mandates header comments as a coding standard would unintentionally make its AI-generated code look human-written, defeating detection through a formatting rule that has nothing to do with who actually wrote the code.
Rather than patch over the signal at scoring time, the model is retrained from scratch with the header feature removed entirely. Forced to ignore headers, it learns to detect AI from structural characteristics instead — code patterns, naming, spacing consistency, token entropy, annotation density — properties that reflect how the code is genuinely written and are far harder to fake.
We validate each header-blind model with a header sweep: take real files, force the header signal across its full range, and re-score. For the blind model the AI-probability does not move at all when the header changes — the lever is no longer connected — while it still cleanly separates AI code from human code on a large independent test corpus, with a low false-positive rate on human code. The trade-off is a small reduction in raw accuracy in exchange for detection that holds up when someone is actively trying to hide AI authorship. The approach is live for C++ and Java, where the header signal was most dominant, and is being extended to other languages where that signal is significant.
No. CodeDelta runs locally and analyses your code on your own machine. It does not upload your source anywhere.
CodeDelta verifies a signed licence file (codedelta.lic) offline — no licence
server, no internet needed. See the separate Licensing Guide for installing and
activating your licence, and for troubleshooting licence messages.
See the Licensing Guide's troubleshooting section — in short, place codedelta.lic
in ~/.codedelta/. (Development builds don't require a licence; release/customer
builds do.)
The path you gave doesn't exist or isn't a directory. Check the path; use absolute paths if unsure.
None of the files matched a known language. Either the folders are empty/wrong, or
your files use extensions CodeDelta doesn't map by default — use --ext to add
them (see Languages above).
You passed both. Pick one.
Make sure the server is running (python3 codedelta_server.py) and that nothing
else is using the port. If a previous server is stuck, stop it first:
pkill -f codedelta_server
then start it again.
Open the diff viewer (report_diff.html) to see line-by-line what CodeDelta
detected. If it still looks wrong, note the file and the metric and report it — the
diff view usually explains the discrepancy (e.g. reformatting counted as change,
comment vs code classification).