Troubleshooting
InferGuard is deliberately strict. Many "failures" are evidence-quality gates doing their job.
validate-completed --strict exits non-zero
Symptom
inferguard validate-completed: status=synthetic_only ...
# exit code 1 with --strict
What it means
Strict mode returns success only for live_complete. Synthetic bundles, incomplete live runs, missing contracts, and not-publishable runs return non-zero.
Fix
- For smoke tests, run without
--strictor append|| truein documentation examples. - For publication, add the missing request, launch, engine metrics, GPU metrics, and contract artifacts listed in
validation_report.md.
Synthetic run is not publishable live evidence
simulate-gpu intentionally stamps artifacts with synthetic_gpu_mimic. That is useful for testing bundle rendering, but the validator will classify it as synthetic_only unless real live evidence is present and the synthetic markers are removed from the publication path.
No successful request-profile rows
Symptom
reason=no_successful_request_profile_rows
status=live_incomplete
claim_status=not_proven
Fix
- Check that
--endpointpoints to/v1/chat/completions. - Check the model name sent with
--model. - Inspect
request_profile/requests_profile.jsonlfor HTTP status anderror_type. - Increase
--timeout-secondsfor long prefill workloads. - Start with
examples/02-profile-real-endpoint.mdand the localserve-mimicflow before testing a real GPU endpoint.
Healthcheck failed or timed out
Symptom
claim_id=launch_healthcheck
reason=launch_healthcheck_not_successful
Fix
- If the engine is already running, use
launch-engine --external-launch --endpoint-url .... - Increase
--healthcheck-timeout-secondsfor large model load or CUDA graph capture. - Confirm the endpoint accepts the model id in
launch/command.json. - Inspect
launch/stdout.logandlaunch/stderr.log.
Engine metrics timeline is empty
Symptom
reason=no_live_engine_metric_sample
Fix
- Confirm the serving engine exposes Prometheus metrics.
- For vLLM/SGLang launches, pass the engine flags needed to enable metrics.
- Use the exact metrics URL with
collect-metrics --engine-metrics-url, usuallyhttp://host:port/metrics. - Keep collection running long enough to overlap real requests.
DCGM GPU metrics are missing
Symptom
reason=missing_required_dcgm_metrics
Fix
- Start DCGM exporter on the GPU host.
- Confirm
DCGM_FI_DEV_GPU_UTILandDCGM_FI_DEV_FB_USEDare present in the scrape. - Pass the exporter URL to
collect-metrics --dcgm-metrics-url. - On Slurm, make sure the exporter is reachable from the job network namespace.
Slurm timeout or preemption interrupted the run
v0.7.1 registers JSONL streams, partial-result producers, and launched engine processes with shared signal cleanup. If Slurm sends SIGTERM, InferGuard should flush partial rows and write partial_results.json where supported.
Fix
- Look for
partial_results.jsonin the command output directory. - Increase Slurm wall time for first model-load runs.
- Split launch, request profile, and metrics collection into smaller scheduler steps if the allocation is tight.
OOM during model launch
Common causes
- Model weights do not fit in single-node HBM.
--max-model-lenor concurrency creates too much KV pressure.- GPU memory utilization is too aggressive for the engine.
Fix
- Check hardware coverage before attempting DSv4-Pro on H100 single-node.
- Lower
--max-model-len, concurrency, or--gpu-memory-utilizationfor vLLM. - Use H200/B200/B300 templates for DSv4-Pro single-node, or wait for validated GB200/GB300 external lanes.
request-profile token counts are estimated
If the endpoint does not return OpenAI usage fields, InferGuard estimates prompt and completion tokens. The row remains useful for latency, status, and failure analysis, but token-count fields will be inferred instead of measured.
Endpoint URL rejected
InferGuard rejects endpoint URLs with userinfo, query strings, or fragments so secrets do not end up in artifacts.
Use:
http://host:8000/v1/chat/completions
Do not use:
http://token@host:8000/v1/chat/completions?api_key=...
Pass credentials with --api-key when supported.
diagnose-bottleneck says not enough evidence
That is expected when the job lacks request rows, engine metrics, GPU metrics, or validation context. Run the earlier pipeline stages first:
inferguard request-profile ...
inferguard collect-metrics ...
inferguard validate-completed --results-root ...
inferguard diagnose-bottleneck --job-dir ...