CodeDelta — FAQ

A comprehensive guide to how CodeDelta works, what its reports mean, and how to use it from the GUI, the command line, and in automated jobs. If a question isn't answered here, it's a candidate to be added — this FAQ is meant to be the first place to look.

GETTING STARTED

What does CodeDelta actually do?

CodeDelta compares two versions of a source-code tree — an old version and a new version — and measures what changed between them. It produces metrics (how much code was added, deleted, and changed), HTML reports, and optional CSV/XML exports, and it can record results over time in a database for trend analysis. It also includes AI-assisted analysis of the code.

How do I run my first comparison?

You need two folders: the old version of your project and the new version.

GUI: start the server and open it in your browser:

python3 codedelta_server.py

then go to http://localhost:7654, point it at your two folders, and run.

Command line:

codedelta ./project_v1 ./project_v2 -o report.html

This writes report.html (plus two companion files — see "What reports do I get?").

Is the GUI different from the command-line tool?

No — the GUI is a front-end wrapper around the same engine. Anything the GUI does, the command-line codedelta binary does underneath. Use whichever you prefer; the results are identical.

What's the difference between the "old" and "new" directory?

CodeDelta measures change from old to new. "Added" means present in new but not old; "deleted" means present in old but not new; "changed" means present in both but modified. Getting them the right way round matters — swapping them inverts adds and deletes.

REPORTS & OUTPUT

What reports do I get?

A single run with -o report.html produces three HTML files: - report.html — the main detailed report (per-file metrics) - report_overview.html — a PM-friendly summary - report_diff.html — a side-by-side diff viewer

Optionally you can also export: - --csv <file> — per-file metrics as CSV (for spreadsheets) - --xml <file> — XML export (EPM-compatible format)

Where do the reports go?

Wherever you point -o. If you don't specify, the default is codedelta_report.html in the current folder. The two companion files (_overview, _diff) are always written next to the main report.

Why does the progress text appear but nothing prints to the screen I expected?

All progress and result messages go to stderr; stdout is reserved for machine-readable output (--help, --version, and future JSON modes). This means you can pipe stdout cleanly in scripts without progress noise mixed in.

UNDERSTANDING THE METRICS

What do the metric names mean?

CodeDelta classifies every line into one of four states (the C/A/D/U model inherited from EPM): - A — Added: new lines not in the old version - D — Deleted: lines removed from the old version - C — Changed: lines that were modified - U — Unchanged: lines identical in both (when all change metrics are zero, the file's status is Unchanged)

Each is measured across three line types, giving the column prefixes you see: - SLOC — Source Lines Of Code (physical lines, comments stripped) - LLOC — Logical Lines Of Code (statements; e.g. semicolon-delimited in C-like languages) - COM — Comment lines

So ADD_SLOC = source lines added, DEL_LLOC = logical lines deleted, CHG_COM = comment lines changed, and so on.

What is CRN_LLOC / "churn"?

CRN is churn — a combined measure of logical-line change (added + deleted + changed logical lines). It's the single best "how much real change happened here" number, which is why it's the metric the CI gate (--threshold-churn) watches.

What's the difference between SLOC and LLOC, and why don't they match?

SLOC counts physical source lines (after removing comments and blank lines). LLOC counts logical statements. One physical line can hold several statements (a=1; b=2; is one SLOC, two LLOC), and one statement can span several physical lines (the reverse). For languages without statement delimiters (e.g. HTML, plain text), LLOC is defined to equal SLOC.

Why does an HTML file show LLOC equal to SLOC?

HTML has no logical-line concept (no statement delimiters like semicolons), so by definition LLOC = SLOC for HTML. This is intentional and matches EPM behaviour.

Why is a "Changed" file's line count based on the new version, not the old?

For changed files, CodeDelta reports the new file's line count (EPM convention) — the current state is what you usually care about. Deleted files are counted from the old version (the new doesn't exist).

Why did a file I barely touched show as "Changed"?

Any non-comment, non-whitespace difference flags a file as Changed. Re-saving with different line endings, reformatting, or a one-character edit all count. Use the diff viewer (report_diff.html) to see exactly what differs.

Can I report only some metrics?

Yes — --metric-set takes a comma-separated list of metric codes. With no --metric-set, all metrics are reported.

LANGUAGES

Which languages does CodeDelta understand?

CodeDelta recognises a wide range by file extension, including: C/C++ (.c .h .cpp .hpp .cc .cxx), C# (.cs), Java (.java), JavaScript/TypeScript (.js .jsx .ts .tsx .mts .cts), Python (.py .pyw), PHP (.php), HTML (.html .htm), CSS, XML, SQL/PL-SQL (.sql .pls .pks .pkb), Perl (.pl .pm), Visual Basic (.vb .bas .cls .frm .vbs), Ada (.ada .adb .ads), VHDL (.vhdl .vhd), assembler (.asm .s), Fortran (.f .f90 .f95 .for), Ruby (.rb), shell (.sh), batch (.bat), ASP/JSP (.asp .aspx .jsp), PowerBuilder (.srd .srf .srs .sru .srw), IDL, and plain text/readme files.

A file type I use isn't being counted. Can I add it?

Yes — --ext lets you map extra extensions to a language for that run. For example, to treat .inc files as C:

codedelta old new --ext "inc=c" -o report.html

(Check --help for the exact syntax in your build.)

Why are comments counted separately?

Comment changes (COM metrics) are tracked apart from code (SLOC/LLOC) so that documentation churn doesn't inflate your code-change figures — and so you can see documentation effort on its own.

THE DATABASE & TRENDING

How does the database work?

Pass -d <file> (or --db) to record each run into a SQLite database. It appends — every run adds a new snapshot, it never overwrites previous data. Over many runs this builds a longitudinal history you can chart (churn over time, growth, etc.). The default database is codedelta.db.

How do I access the database directly?

It's a standard SQLite file. You can open it with any SQLite tool:

sqlite3 codedelta.db

…then run SQL queries, or use a GUI like DB Browser for SQLite. Because it's plain SQLite, you can also read it from Python, Excel (via ODBC), or any language with a SQLite driver.

Will running CodeDelta again wipe my history?

No. The database is append-only by design — each run adds a snapshot. The history is the point; it's what powers trend analysis. (If you ever want a fresh history, point -d at a new filename or delete the old DB file deliberately.)

Should I commit the database to version control?

Generally no — it can grow large and changes every run. Keep it on disk (it's needed for trends), but exclude it from git. Back it up separately if the history matters to you.

How do I label runs so I can tell them apart later?

Use the metadata flags, which are recorded in the DB and the overview report: - --project <name> — the project name (defaults to the new directory's name) - --old-label <str> / --new-label <str> — version labels (e.g. v1.0, a git hash, a date) - --note <str> — free text (e.g. "nightly cron run")

COMMAND LINE & AUTOMATION

What are the main command-line options?

codedelta <old_dir> <new_dir> [options]

  -o, --output <file>      Main HTML report (default codedelta_report.html)
      --csv <file>         Also write per-file CSV
      --xml <file>         Also write XML (EPM-compatible)
  -d, --db <file>          SQLite DB for trending (appends)
      --project <name>     Project name
      --old-label <str>    Label for old version
      --new-label <str>    Label for new version
      --note <str>         Free-text note
      --exclude <dirs>     Directories to skip
      --ext <map>          Add file-extension → language mappings
      --metric-set <list>  Report only these metrics
      --snapshot-date <d>  Override the recorded snapshot date
      --threshold-churn N  Exit code 3 if total churn exceeds N (CI gating)
  -v, --verbose            Print every file processed
  -q, --quiet              Errors only (for cron)
  -h, --help               Show help
  -V, --version            Show version

How do I run CodeDelta from cron (scheduled nightly runs)?

Use --quiet (errors only) and point the DB and report at fixed paths. Example:

codedelta /path/yesterday /path/today \
  --project Parky \
  --old-label "$(date -d yesterday +%Y-%m-%d)" \
  --new-label "$(date +%Y-%m-%d)" \
  --note 'nightly cron' \
  -d /var/log/codedelta/parky.db \
  -o /var/log/codedelta/latest.html \
  --quiet

Add that line to your crontab (crontab -e) with a schedule. Because --quiet prints only errors, a clean run produces no output — ideal for cron.

Can I make a build fail when there's too much change?

Yes — that's what --threshold-churn N is for. If total churn (CRN_LLOC) exceeds N, CodeDelta exits with code 3. Wire that into CI:

codedelta old new --threshold-churn 5000 -o report.html || echo "Too much churn!"

Your CI system can treat exit 3 as a failed gate.

What do the exit codes mean?

0 — success
1 — generic failure (I/O error, missing directory, report write failed)
2 — invalid arguments
3 — --threshold-churn exceeded
4 — no valid licence (release builds)

How do I exclude folders (e.g. node_modules, build output)?

Use --exclude with the directories to skip:

codedelta old new --exclude "node_modules,build,dist" -o report.html

AI ANALYSIS

What is the AI analysis / AI Code Scan?

CodeDelta includes AI-assisted analysis that examines the source and produces its own report (the AI Code Scan), separate from the change metrics. It can flag characteristics of the code using trained models. The detection settings (including per-language configuration) are available in the GUI's AI Detection settings.

Does my code get sent anywhere for the AI analysis?

See the Privacy section below. CodeDelta is designed to run locally.

Can the AI detection be gamed?

This is a fair question of any AI-detection model, and CodeDelta addresses it directly with what we call header-blind models. The detector is deliberately built not to rely on signals a person could trivially fake.

The weakness we found and closed: when the Java model was first trained, its single strongest signal — roughly 28% of the model's entire decision — was whether a file carried an author/copyright header comment. We measured the effect, and a rich header pushes a file's AI-probability down (in one test, from 70.9% toward 63.8%) — a tidy header makes code look more “human” to the model. That is gameable: a team that mandates header comments as a coding standard would unintentionally make its AI-generated code look human-written, defeating detection through a formatting rule that has nothing to do with who actually wrote the code.

Rather than patch over the signal at scoring time, the model is retrained from scratch with the header feature removed entirely. Forced to ignore headers, it learns to detect AI from structural characteristics instead — code patterns, naming, spacing consistency, token entropy, annotation density — properties that reflect how the code is genuinely written and are far harder to fake.

We validate each header-blind model with a header sweep: take real files, force the header signal across its full range, and re-score. For the blind model the AI-probability does not move at all when the header changes — the lever is no longer connected — while it still cleanly separates AI code from human code on a large independent test corpus, with a low false-positive rate on human code. The trade-off is a small reduction in raw accuracy in exchange for detection that holds up when someone is actively trying to hide AI authorship. The approach is live for C++ and Java, where the header signal was most dominant, and is being extended to other languages where that signal is significant.

PRIVACY & LICENSING

Does my source code leave my machine?

No. CodeDelta runs locally and analyses your code on your own machine. It does not upload your source anywhere.

How does licensing work?

CodeDelta verifies a signed licence file (codedelta.lic) offline — no licence server, no internet needed. See the separate Licensing Guide for installing and activating your licence, and for troubleshooting licence messages.

The tool says it can't find a licence. What do I do?

See the Licensing Guide's troubleshooting section — in short, place codedelta.lic in ~/.codedelta/. (Development builds don't require a licence; release/customer builds do.)

TROUBLESHOOTING (USAGE)

"Error: old/new directory not found"

The path you gave doesn't exist or isn't a directory. Check the path; use absolute paths if unsure.

"Error: no recognised source files found. Check extensions."

None of the files matched a known language. Either the folders are empty/wrong, or your files use extensions CodeDelta doesn't map by default — use --ext to add them (see Languages above).

"--verbose and --quiet are mutually exclusive"

You passed both. Pick one.

The GUI won't open at localhost:7654

Make sure the server is running (python3 codedelta_server.py) and that nothing else is using the port. If a previous server is stuck, stop it first:

pkill -f codedelta_server

then start it again.

Numbers look off / a metric seems wrong

Open the diff viewer (report_diff.html) to see line-by-line what CodeDelta detected. If it still looks wrong, note the file and the metric and report it — the diff view usually explains the discrepancy (e.g. reformatting counted as change, comment vs code classification).