Skip to content

Observability Configuration

SkillMeat exposes three observability pillars: OpenTelemetry (OTel) distributed tracing, structured JSON logging with tenant context propagation, and a Prometheus-compatible metrics endpoint.

All configuration is done through environment variables. Nothing is enabled by default to keep the out-of-the-box experience zero-friction.


OpenTelemetry Tracing

Environment Variables

All OTel settings use the SKILLMEAT_OTEL_ prefix.

Variable Default Description
SKILLMEAT_OTEL_ENABLED false Master switch. Set to true to start the OTel SDK and export spans.
SKILLMEAT_OTEL_EXPORTER otlp_http Exporter backend. Choices: otlp_http, otlp_grpc, stdout.
SKILLMEAT_OTEL_ENDPOINT http://localhost:4318 Collector endpoint URL. Applies when exporter is otlp_http or otlp_grpc.
SKILLMEAT_OTEL_SAMPLE_RATE 1.0 Sampling ratio (0.0–1.0). 1.0 samples every trace; 0.0 samples none.
SKILLMEAT_OTEL_SERVICE_NAME skillmeat Service name reported in OTel resource attributes.

Exporter Options

Sends spans over HTTP/1.1 to an OTLP-compatible collector. Default port 4318.

SKILLMEAT_OTEL_ENABLED=true
SKILLMEAT_OTEL_EXPORTER=otlp_http
SKILLMEAT_OTEL_ENDPOINT=http://otel-collector:4318

Compatible with: OpenTelemetry Collector, Jaeger (OTLP receiver), Grafana Tempo, Honeycomb, Datadog OTLP ingest.

Sends spans over gRPC. Default port 4317. Falls back to otlp_http if the grpc package is not installed.

SKILLMEAT_OTEL_ENABLED=true
SKILLMEAT_OTEL_EXPORTER=otlp_grpc
SKILLMEAT_OTEL_ENDPOINT=http://otel-collector:4317

Note

The endpoint should not include a path suffix for gRPC (e.g. use http://host:4317, not http://host:4317/v1/traces).

Prints span JSON to standard output. Useful for local debugging and CI pipelines.

SKILLMEAT_OTEL_ENABLED=true
SKILLMEAT_OTEL_EXPORTER=stdout

No SKILLMEAT_OTEL_ENDPOINT needed when using stdout.

Sampling Rate

The SKILLMEAT_OTEL_SAMPLE_RATE variable controls a TraceIdRatioBased sampler:

Value Effect
1.0 Sample every request (development default)
0.1 Sample 10 % of requests (light production load)
0.01 Sample 1 % of requests (high-traffic production)
0.0 Sample nothing (disable tracing without disabling SDK)

For production deployments start with 0.1 and adjust based on your ingest budget.

Startup Log Messages

When SkillMeat starts you will see one of the following log lines at INFO level:

OTel SDK initialised — service=skillmeat exporter=otlp_http endpoint=http://otel-collector:4318 sample_rate=1.00

If OTel is disabled:

OTel SDK disabled (SKILLMEAT_OTEL_ENABLED=false). No spans will be exported.

(This message is emitted at DEBUG level so it does not appear in default INFO logs unless you raise the log level.)

If the OTel packages are not installed but SKILLMEAT_OTEL_ENABLED=true:

OTel SDK packages not installed but SKILLMEAT_OTEL_ENABLED=true. ...

Install the optional OTel dependencies with:

pip install "skillmeat[otel]"

Structured Logging

SkillMeat emits structured JSON logs by default. Each log line includes context fields that help correlate entries across a distributed system.

Tenant Context Propagation

In enterprise edition, log entries automatically include tenant_id and org_id fields drawn from the incoming request's auth context:

{
  "timestamp": "2026-05-19T10:23:41.123Z",
  "level": "INFO",
  "message": "Artifact cache refreshed",
  "tenant_id": "tenant-acme",
  "org_id": "org-eng",
  "artifact_id": "skill:canvas-design",
  "duration_ms": 42
}

These fields are injected by ObservabilityMiddleware and propagated through Python's contextvars so they appear on every log line within the request span — including nested service calls.

In local (single-tenant) edition, tenant_id and org_id are omitted.

Log Format

Variable Default Description
SKILLMEAT_LOG_LEVEL INFO Log level: DEBUG, INFO, WARNING, ERROR, CRITICAL.
SKILLMEAT_LOG_FORMAT json Log format: json (structured) or text (human-readable).

Set SKILLMEAT_LOG_FORMAT=text for local development to get readable console output.


Prometheus Metrics

SkillMeat exposes a Prometheus-compatible metrics endpoint at /metrics. No additional configuration is required — the endpoint is always available when the API server is running.

Scrape Configuration

Add SkillMeat to your Prometheus scrape_configs:

scrape_configs:
  - job_name: skillmeat
    static_configs:
      - targets:
          - skillmeat:8080

Metrics Catalog

Marketplace Metrics

Metric Type Labels Description
skillmeat_marketplace_installs_total Counter broker, listing_id, status Total marketplace bundle downloads (installs). status is success or failure.
skillmeat_marketplace_operation_duration_seconds Histogram broker, operation Duration of marketplace operations (listings fetch, publish, etc.) in seconds.
skillmeat_marketplace_listings_total Gauge broker, type Total marketplace listings available.
skillmeat_marketplace_publishes_total Counter broker, status Total publish submissions.
skillmeat_marketplace_search_total Counter broker Total marketplace searches.
skillmeat_marketplace_errors_total Counter broker, operation, error_type Total marketplace errors by type.
skillmeat_marketplace_scan_duration_seconds Histogram source_id Duration of marketplace repository scans.
skillmeat_marketplace_scan_artifacts_total Counter source_id, artifact_type Artifacts detected during scans.
skillmeat_marketplace_import_total Counter source_id, status Artifact imports from marketplace sources.
skillmeat_marketplace_scan_errors_total Counter source_id, error_type Errors during marketplace scans.

Service-Layer Metrics

Metric Type Labels Description
skillmeat_service_operation_total Counter service, operation, status, tenant_id Total service operations by outcome (success/error). tenant_id label populated in enterprise edition only; empty string in local edition.
skillmeat_service_operation_duration_seconds Histogram service, operation, tenant_id Duration of service operations in seconds. Covers ArtifactDeploymentService, BundleService, and MarketplaceService.

Cardinality guard

The tenant_id label on service metrics is bounded by the number of active tenants in your enterprise deployment. Do not add tenant_id to DB-layer or marketplace metrics — those are higher-cardinality paths.

Database-Layer Metrics

Metric Type Labels Description
skillmeat_db_query_duration_seconds Histogram repository, operation Duration of individual database queries. Currently instrumented on MarketplaceSourceRepository and MarketplaceCatalogRepository hot-path methods (get, list, create, update, delete, bulk).

Artifact & Cache Metrics

Metric Type Labels Description
skillmeat_artifacts_total Gauge artifact_type, scope Total artifacts in the local collection.
skillmeat_cache_refresh_duration_seconds Histogram scope Duration of cache refresh operations.
skillmeat_cache_refresh_total Counter scope, status Total cache refresh operations.

Clone & Scan Metrics

Metric Type Labels Description
skillmeat_clone_duration_seconds Histogram strategy Duration of git clone operations.
skillmeat_extraction_duration_seconds Histogram artifact_count_bucket Duration of manifest extraction.
skillmeat_scan_total_duration_seconds Histogram strategy, status Total scan duration including all sub-operations.

Bundle Metrics

Metric Type Labels Description
skillmeat_bundle_exports_total Counter status, format Total bundle exports.
skillmeat_bundle_imports_total Counter status, strategy, format Total bundle imports.
skillmeat_bundle_operation_duration_seconds Histogram operation Duration of bundle operations.

Example: Docker Compose with OTLP Collector

The following docker-compose.override.yml wires SkillMeat to an OpenTelemetry Collector and a Prometheus instance for local observability testing.

services:
  skillmeat-api:
    environment:
      SKILLMEAT_OTEL_ENABLED: "true"
      SKILLMEAT_OTEL_EXPORTER: "otlp_http"
      SKILLMEAT_OTEL_ENDPOINT: "http://otel-collector:4318"
      SKILLMEAT_OTEL_SAMPLE_RATE: "1.0"
      SKILLMEAT_OTEL_SERVICE_NAME: "skillmeat"
      SKILLMEAT_LOG_FORMAT: "json"

  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    command: ["--config=/etc/otelcol/config.yaml"]
    volumes:
      - ./monitoring/otel-collector.yaml:/etc/otelcol/config.yaml:ro
    ports:
      - "4318:4318"   # OTLP HTTP receiver
      - "4317:4317"   # OTLP gRPC receiver
      - "8889:8889"   # Prometheus metrics exporter (for collector self-metrics)

  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml:ro
    ports:
      - "9090:9090"

A minimal monitoring/otel-collector.yaml for local use:

receivers:
  otlp:
    protocols:
      http:
        endpoint: 0.0.0.0:4318
      grpc:
        endpoint: 0.0.0.0:4317

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [debug]

A minimal monitoring/prometheus.yml:

global:
  scrape_interval: 15s

scrape_configs:
  - job_name: skillmeat
    static_configs:
      - targets:
          - skillmeat-api:8080

Tip

Use ./compose.sh --profile local up -d to start the local dev stack. Add the OTel collector and Prometheus services to docker-compose.override.yml to layer on observability without modifying the base compose files.


Grafana Dashboard

A starter Grafana dashboard JSON for SkillMeat metrics is included at monitoring/grafana/dashboards/skillmeat-overview.json. Import it via Dashboards → Import in your Grafana instance after configuring a Prometheus data source pointing to your SkillMeat metrics endpoint.