Training custom models on your own data

Most customers should not train their own model. Our shared bundles already cover parked-vehicle, damage, and forest-bay policies, and you benefit from the labelled data every other partner contributes back. Start there.

But sometimes you need a customer-specific detector — a private asset class, a competitor lockup, a one-off forensic check. The same pipeline that trains our shared bundles can train yours, and the resulting bundle dispatches and drifts just like any other.

Pipeline at a glance

text

  raw photos
       │
       ▼
  verify_ai_ml_datasets  ──┐
       │                   │
       ▼                   │
  verify_ai_ml_samples     │   (COCO-style JSONB annotations,
       │                   │    split = train / validation / test)
       ▼                   │
  ml/ Python training    ◄──┘
   (RT-DETRv2-S detector,
    multi-format export)
       │
       ▼
  verify_ai_ml_models  ────► verify_ai_model_bundles
                                       │
                                       ▼
                                ModelManager (SDK)
                                  auto-download

Everything except the SDK download happens server-side.

1. Build the dataset

A dataset is a row in verify_ai_ml_datasets plus N rows in verify_ai_ml_samples. Each sample stores its image storage path, the compliance ground truth, and a COCO-style annotation blob:

sql

insert into verify_ai_ml_datasets (policy_id, name, status)
values ('pol_acme_parking', 'Acme parking v1', 'collecting')
returning id;
-- → '7b1c…' (UUID)
 
insert into verify_ai_ml_samples
  (dataset_id, image_path, is_compliant, annotations, split, review_status)
values
  ('7b1c…', 'datasets/acme/0001.jpg',
   true,
   '{"objects":[{"class":"acme_scooter","bbox":[12,8,640,920]}]}'::jsonb,
   'train', 'approved');

verify_ai_ml_datasets.status accepts 'collecting' | 'ready' | 'training' | 'archived'. Flip from collecting to ready once labelling is done. Split values match what the training loop expects:

| Split | Used for | | ------------ | ---------------------------------------------------------- | | train | Gradient updates. | | validation | Early-stopping, hyperparameter tuning. | | test | Held-out final metrics reported on the model row. |

Synthetic / augmented samples don't get their own split. They live in the same dataset with is_augmented = true and augmentation_type populated; the training loop filters them out of the test slice. The generator under ml/data/ (diffusion-based augmenter) is preview and not wired up in CI yet.

2. Train

Training runs in ml/ (Python). RT-DETRv2-S is the default detector; the same Makefile downloads images, converts to COCO, trains, exports to CoreML / TFLite / ONNX, and runs the gold-eval comparison:

bash

cd ml
make all DATASET_ID=7b1c… POLICY_ID=pol_acme_parking
# Or run the stages individually:
#   make download convert train export validate eval

The training step (make train) shells into python training/train_rtdetr.py with training/config.yaml. Each async job writes a row in verify_ai_ml_jobs (status, progress, logs, metrics) and on success creates a verify_ai_ml_models row with:

base_architecture (rt-detrv2-s)
dataset_id → the training dataset.
coreml_path / tflite_path / onnx_path plus matching sha256_coreml / sha256_tflite / sha256_onnx.
ontology_version / schema_version — the contract this model was trained against.
Metrics: accuracy, precision_score, recall, f1_score, map50, map50_95, gemini_agreement, inference_time_ms.

3. Bundle

A model on its own can't be shipped — devices need the policy AST and the ontology version too. That packaging is what verify_ai_model_bundles is for. In practice make deploy runs ml/deploy/register_bundle.py, which uploads each exported artifact to the verify-ai-models storage bucket, computes per-artifact SHA-256s, loads the policy AST from ml/config/policy_asts/<policy_id>.json, and inserts a row equivalent to:

sql

insert into verify_ai_model_bundles
  (policy_id, bundle_version, ontology_version, schema_version,
   policy_ast, artifacts, is_active,
   training_dataset_id, git_sha, gold_eval_results)
values (
  'pol_acme_parking',
  (select coalesce(max(bundle_version), 0) + 1
     from verify_ai_model_bundles
     where policy_id = 'pol_acme_parking'),
  'v3',
  '1.0.0',
  '{ /* policy AST loaded from ml/config/policy_asts/pol_acme_parking.json */ }'::jsonb,
  jsonb_build_array(
    jsonb_build_object(
      'role', 'detector',
      'architecture', 'rt-detrv2-s',
      'version', 1,
      'format', 'coreml',
      'storage_path', 'pol_acme_parking/v1/model.mlpackage.zip',
      'size_bytes',  18204113,
      'sha256',      'a1b2…'
    ),
    jsonb_build_object(
      'role', 'detector',
      'architecture', 'rt-detrv2-s',
      'version', 1,
      'format', 'tflite',
      'storage_path', 'pol_acme_parking/v1/model.tflite',
      'size_bytes',  16801004,
      'sha256',      'c3d4…'
    )
  ),
  false,        -- not active yet, see promotion gate below
  '7b1c…'::uuid,
  '<training repo commit>',
  '{ /* mAP / per-class precision JSON */ }'::jsonb
);

policy_ast lives on the bundle row itself — it is not read from verify_ai_policies (which only carries the VLM-side config: provider, model, system prompt, etc.). is_active = false keeps the bundle out of /api/v1/models/latest results until we explicitly promote it.

4. Promotion gate

The convention before flipping is_active = true is to capture a reference snapshot for this bundle into verify_ai_model_reference_stats — that's what drift detection compares against later. See Detecting model drift for the snapshot shape.

Promotion script (paraphrased):

await captureReferenceSnapshot(bundleId);    // inserts a row with
                                             // snapshot_type='promotion'
await supabase
  .from("verify_ai_model_bundles")
  .update({ is_active: true })
  .eq("id", bundleId);

The gate is convention, not a DB constraint. If you skip the snapshot the bundle still goes live, but the hourly drift cron will record live stats with drift_score = null for it — you get no drift signal until a reference is captured.

5. Dispatch

Once the bundle is active, devices pick it up on their next ModelManager.checkForUpdates() call. To roll it out gradually, combine with dispatch targeting:

sql

update verify_ai_model_bundles
   set bundle_tier = 'canary',
       weight      = 10,
       targeting   = '{"customer_ids":["cus_acme"]}'::jsonb
 where id = '...';

That keeps the new bundle to 10% of Acme devices for a few days while the drift PSI stabilises, then a follow-up update promotes it to production at weight = 100.

Lineage and provenance

Every bundle carries enough metadata to reproduce or audit it (the lineage columns were added to verify_ai_model_bundles in 20260620_model_dispatch_targeting.sql):

verify_ai_model_bundles.git_sha — the training repo commit.
verify_ai_model_bundles.training_dataset_id — FK into verify_ai_ml_datasets, so the exact sample rows are reachable.
verify_ai_model_bundles.gold_eval_results — held-out metrics jsonb.
verify_ai_model_bundles.artifacts[].sha256 — per-artifact integrity.
verify_ai_ml_models keeps the matching per-format metrics (map50, map50_95, inference_time_ms, etc.) and storage paths.

Auditors get a single chain from "this device decision" back to the dataset rows that trained the model. Don't break that chain by hand editing — use the migrations.

What's next

Detecting model drift — what happens after promotion.
Model dispatch and targeting — how to canary your new bundle to a slice of users first.

Get in Touch