Observability Configuration¶
SkillMeat exposes three observability pillars: OpenTelemetry (OTel) distributed tracing, structured JSON logging with tenant context propagation, and a Prometheus-compatible metrics endpoint.
All configuration is done through environment variables. Nothing is enabled by default to keep the out-of-the-box experience zero-friction.
OpenTelemetry Tracing¶
Environment Variables¶
All OTel settings use the SKILLMEAT_OTEL_ prefix.
| Variable | Default | Description |
|---|---|---|
SKILLMEAT_OTEL_ENABLED |
false |
Master switch. Set to true to start the OTel SDK and export spans. |
SKILLMEAT_OTEL_EXPORTER |
otlp_http |
Exporter backend. Choices: otlp_http, otlp_grpc, stdout. |
SKILLMEAT_OTEL_ENDPOINT |
http://localhost:4318 |
Collector endpoint URL. Applies when exporter is otlp_http or otlp_grpc. |
SKILLMEAT_OTEL_SAMPLE_RATE |
1.0 |
Sampling ratio (0.0–1.0). 1.0 samples every trace; 0.0 samples none. |
SKILLMEAT_OTEL_SERVICE_NAME |
skillmeat |
Service name reported in OTel resource attributes. |
Exporter Options¶
Sends spans over HTTP/1.1 to an OTLP-compatible collector. Default port 4318.
SKILLMEAT_OTEL_ENABLED=true
SKILLMEAT_OTEL_EXPORTER=otlp_http
SKILLMEAT_OTEL_ENDPOINT=http://otel-collector:4318
Compatible with: OpenTelemetry Collector, Jaeger (OTLP receiver), Grafana Tempo, Honeycomb, Datadog OTLP ingest.
Sends spans over gRPC. Default port 4317. Falls back to otlp_http if the grpc package is not installed.
SKILLMEAT_OTEL_ENABLED=true
SKILLMEAT_OTEL_EXPORTER=otlp_grpc
SKILLMEAT_OTEL_ENDPOINT=http://otel-collector:4317
Note
The endpoint should not include a path suffix for gRPC (e.g. use http://host:4317, not http://host:4317/v1/traces).
Sampling Rate¶
The SKILLMEAT_OTEL_SAMPLE_RATE variable controls a TraceIdRatioBased sampler:
| Value | Effect |
|---|---|
1.0 |
Sample every request (development default) |
0.1 |
Sample 10 % of requests (light production load) |
0.01 |
Sample 1 % of requests (high-traffic production) |
0.0 |
Sample nothing (disable tracing without disabling SDK) |
For production deployments start with 0.1 and adjust based on your ingest budget.
Startup Log Messages¶
When SkillMeat starts you will see one of the following log lines at INFO level:
OTel SDK initialised — service=skillmeat exporter=otlp_http endpoint=http://otel-collector:4318 sample_rate=1.00
If OTel is disabled:
(This message is emitted at DEBUG level so it does not appear in default INFO logs unless you raise the log level.)
If the OTel packages are not installed but SKILLMEAT_OTEL_ENABLED=true:
Install the optional OTel dependencies with:
Structured Logging¶
SkillMeat emits structured JSON logs by default. Each log line includes context fields that help correlate entries across a distributed system.
Tenant Context Propagation¶
In enterprise edition, log entries automatically include tenant_id and org_id fields drawn from the incoming request's auth context:
{
"timestamp": "2026-05-19T10:23:41.123Z",
"level": "INFO",
"message": "Artifact cache refreshed",
"tenant_id": "tenant-acme",
"org_id": "org-eng",
"artifact_id": "skill:canvas-design",
"duration_ms": 42
}
These fields are injected by ObservabilityMiddleware and propagated through Python's contextvars so they appear on every log line within the request span — including nested service calls.
In local (single-tenant) edition, tenant_id and org_id are omitted.
Log Format¶
| Variable | Default | Description |
|---|---|---|
SKILLMEAT_LOG_LEVEL |
INFO |
Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL. |
SKILLMEAT_LOG_FORMAT |
json |
Log format: json (structured) or text (human-readable). |
Set SKILLMEAT_LOG_FORMAT=text for local development to get readable console output.
Prometheus Metrics¶
SkillMeat exposes a Prometheus-compatible metrics endpoint at /metrics. No additional configuration is required — the endpoint is always available when the API server is running.
Scrape Configuration¶
Add SkillMeat to your Prometheus scrape_configs:
Metrics Catalog¶
Marketplace Metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
skillmeat_marketplace_installs_total |
Counter | broker, listing_id, status |
Total marketplace bundle downloads (installs). status is success or failure. |
skillmeat_marketplace_operation_duration_seconds |
Histogram | broker, operation |
Duration of marketplace operations (listings fetch, publish, etc.) in seconds. |
skillmeat_marketplace_listings_total |
Gauge | broker, type |
Total marketplace listings available. |
skillmeat_marketplace_publishes_total |
Counter | broker, status |
Total publish submissions. |
skillmeat_marketplace_search_total |
Counter | broker |
Total marketplace searches. |
skillmeat_marketplace_errors_total |
Counter | broker, operation, error_type |
Total marketplace errors by type. |
skillmeat_marketplace_scan_duration_seconds |
Histogram | source_id |
Duration of marketplace repository scans. |
skillmeat_marketplace_scan_artifacts_total |
Counter | source_id, artifact_type |
Artifacts detected during scans. |
skillmeat_marketplace_import_total |
Counter | source_id, status |
Artifact imports from marketplace sources. |
skillmeat_marketplace_scan_errors_total |
Counter | source_id, error_type |
Errors during marketplace scans. |
Service-Layer Metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
skillmeat_service_operation_total |
Counter | service, operation, status, tenant_id |
Total service operations by outcome (success/error). tenant_id label populated in enterprise edition only; empty string in local edition. |
skillmeat_service_operation_duration_seconds |
Histogram | service, operation, tenant_id |
Duration of service operations in seconds. Covers ArtifactDeploymentService, BundleService, and MarketplaceService. |
Cardinality guard
The tenant_id label on service metrics is bounded by the number of active tenants in your enterprise deployment. Do not add tenant_id to DB-layer or marketplace metrics — those are higher-cardinality paths.
Database-Layer Metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
skillmeat_db_query_duration_seconds |
Histogram | repository, operation |
Duration of individual database queries. Currently instrumented on MarketplaceSourceRepository and MarketplaceCatalogRepository hot-path methods (get, list, create, update, delete, bulk). |
Artifact & Cache Metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
skillmeat_artifacts_total |
Gauge | artifact_type, scope |
Total artifacts in the local collection. |
skillmeat_cache_refresh_duration_seconds |
Histogram | scope |
Duration of cache refresh operations. |
skillmeat_cache_refresh_total |
Counter | scope, status |
Total cache refresh operations. |
Clone & Scan Metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
skillmeat_clone_duration_seconds |
Histogram | strategy |
Duration of git clone operations. |
skillmeat_extraction_duration_seconds |
Histogram | artifact_count_bucket |
Duration of manifest extraction. |
skillmeat_scan_total_duration_seconds |
Histogram | strategy, status |
Total scan duration including all sub-operations. |
Bundle Metrics¶
| Metric | Type | Labels | Description |
|---|---|---|---|
skillmeat_bundle_exports_total |
Counter | status, format |
Total bundle exports. |
skillmeat_bundle_imports_total |
Counter | status, strategy, format |
Total bundle imports. |
skillmeat_bundle_operation_duration_seconds |
Histogram | operation |
Duration of bundle operations. |
Example: Docker Compose with OTLP Collector¶
The following docker-compose.override.yml wires SkillMeat to an OpenTelemetry Collector and a Prometheus instance for local observability testing.
services:
skillmeat-api:
environment:
SKILLMEAT_OTEL_ENABLED: "true"
SKILLMEAT_OTEL_EXPORTER: "otlp_http"
SKILLMEAT_OTEL_ENDPOINT: "http://otel-collector:4318"
SKILLMEAT_OTEL_SAMPLE_RATE: "1.0"
SKILLMEAT_OTEL_SERVICE_NAME: "skillmeat"
SKILLMEAT_LOG_FORMAT: "json"
otel-collector:
image: otel/opentelemetry-collector-contrib:latest
command: ["--config=/etc/otelcol/config.yaml"]
volumes:
- ./monitoring/otel-collector.yaml:/etc/otelcol/config.yaml:ro
ports:
- "4318:4318" # OTLP HTTP receiver
- "4317:4317" # OTLP gRPC receiver
- "8889:8889" # Prometheus metrics exporter (for collector self-metrics)
prometheus:
image: prom/prometheus:latest
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
ports:
- "9090:9090"
A minimal monitoring/otel-collector.yaml for local use:
receivers:
otlp:
protocols:
http:
endpoint: 0.0.0.0:4318
grpc:
endpoint: 0.0.0.0:4317
exporters:
debug:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
exporters: [debug]
A minimal monitoring/prometheus.yml:
global:
scrape_interval: 15s
scrape_configs:
- job_name: skillmeat
static_configs:
- targets:
- skillmeat-api:8080
Tip
Use ./compose.sh --profile local up -d to start the local dev stack. Add the OTel collector and Prometheus services to docker-compose.override.yml to layer on observability without modifying the base compose files.
Grafana Dashboard¶
A starter Grafana dashboard JSON for SkillMeat metrics is included at monitoring/grafana/dashboards/skillmeat-overview.json. Import it via Dashboards → Import in your Grafana instance after configuring a Prometheus data source pointing to your SkillMeat metrics endpoint.