RFC 0002: HCL-driven Policy Engine for repo-guardian
Status: Accepted Author: Donald Gifford Date: 2026-03-14
Summary
Replace repo-guardian's hardcoded file rules, special-cased custom properties checker, and environment-variable-only configuration with a unified HCL-driven policy engine. Rules, reconcilers, ignore lists, and org-level settings are all declared in a configuration file, making it possible to add new compliance checks, file-change-triggered reconciliation, and per-repo overrides without code changes.
Problem Statement
repo-guardian currently has two separate code paths that are conceptually the same thing at different maturity levels:
-
File rules (CODEOWNERS, dependabot, renovate) -- hardcoded Go structs in
internal/rules/registry.go. Adding a new rule means writing Go code, recompiling, and redeploying. Rules are binary: the file exists or it doesn't. There is no post-merge reconciliation. -
Custom properties checker (
internal/checker/properties.go) -- a completely separate code path that also checks for a file (catalog-info.yaml), but additionally reads the file's contents and reconciles derived state (GitHub custom properties). This is special-cased rather than being a capability any rule can have.
Both follow the same pattern: ensure a file exists in a repo; if not, PR it; optionally, when the file exists, read it and reconcile something.
Additional pain points:
-
No ignore/exclusion mechanism. There is no way to say "skip this rule for these repos" or "ignore this repo entirely" without code changes. At scale (10k+ repos), there will always be exceptions.
-
No push-event feedback loop. When repo-guardian creates a PR to add
catalog-info.yamland the PR is merged, the app has no way to detect the change until the next scheduler cycle (default 168h). This is because the app doesn't subscribe topushevents. -
Env vars don't scale. Configuration is entirely via environment variables. This works for a handful of settings but breaks down for structured data like rule definitions, per-repo overrides, field mappings, and multi-org support. You cannot express "for repos matching
org/infra-*, skip the renovate rule" in an env var. -
No content validation. Rules only check if a file exists. There is no way to verify a file's contents are correct -- e.g., CODEOWNERS has an actual owner (not a placeholder), or dependabot.yml includes the
github-actionsecosystem. -
No multi-org support path. The current architecture assumes a single GitHub org. As the tool grows, it needs a way to scope rules and configuration per-org.
Proposed Solution
Introduce an HCL configuration file (guardian.hcl) that declaratively
defines:
- Rules -- typed compliance checks (file presence, content validation, repo settings, branch protection)
- Reconcilers -- optional post-change behaviors attached to rules (e.g., read a file and sync derived state somewhere)
- Ignore lists -- repos to skip globally or per-rule, with glob support
- Push event handling -- which file changes trigger re-evaluation
- Operational settings -- worker count, queue size, schedule interval, dry-run, etc. (with env var overrides)
Why HCL
| Feature | HCL | YAML | Env Vars |
|---|---|---|---|
| Typed blocks | Yes | No (flat maps) | No |
| Expressions/functions | Yes | No | No |
| Comments | Yes | Yes | No |
| References between blocks | Yes | No | No |
| Conditionals | Yes | No | No |
| Validation DSL | Via hcldec |
External | Manual |
| Familiarity | Terraform users | Universal | Universal |
| Nested structure | Natural | Indentation-based | Flat |
HCL is chosen over YAML because it provides typed blocks, expressions, references between blocks, and composition primitives that map naturally to policy configuration. It's the same format the team uses with Terraform, so the learning curve is zero.
Design
Configuration File: guardian.hcl
# Operational settings -- all have built-in defaults, all can be
# overridden by environment variables.
guardian {
dry_run = false
schedule_interval = "168h"
worker_count = 5
queue_size = 1000
log_level = "info"
}
# Global ignore list -- these repos are skipped for ALL rules.
ignore {
repos = [
"myorg/legacy-monolith",
"myorg/archived-*",
"myorg/terraform-*",
]
}
# --- File presence and content rules ---
rule "file" "codeowners" {
check = "contains" # exists | contains | exact
paths = [".github/CODEOWNERS", "CODEOWNERS", "docs/CODEOWNERS"]
target = ".github/CODEOWNERS"
template = "codeowners.tmpl"
assertion {
pattern = "^[^#].*@"
message = "CODEOWNERS must contain at least one owner"
}
assertion {
not_pattern = "@org/CHANGEME"
message = "CODEOWNERS still contains the placeholder team"
}
pr {
search_terms = ["codeowners", "CODEOWNERS"]
}
ignore {
repos = ["myorg/special-case"]
}
}
rule "file" "dependabot" {
check = "contains"
paths = [".github/dependabot.yml", ".github/dependabot.yaml"]
target = ".github/dependabot.yml"
template = "dependabot.tmpl"
assertion {
yaml_path = "updates[*].package-ecosystem"
contains = "github-actions"
message = "dependabot must include github-actions ecosystem"
}
pr {
search_terms = ["dependabot"]
}
}
rule "file" "renovate" {
enabled = false
check = "exists"
paths = [
"renovate.json",
"renovate.json5",
".renovaterc",
".renovaterc.json",
".github/renovate.json",
".github/renovate.json5",
]
target = "renovate.json"
template = "renovate.tmpl"
pr {
search_terms = ["renovate"]
}
}
rule "file" "catalog_info" {
check = "exists"
paths = ["catalog-info.yaml", "catalog-info.yml"]
target = "catalog-info.yaml"
template = "catalog-info.tmpl"
pr {
search_terms = ["catalog-info"]
}
# When the file exists or is modified, reconcile custom properties
reconcile "custom_properties" {
mode = "api" # or "github-action"
watch = true # trigger on push events
owner = "spec.owner"
component = "metadata.name"
jira_project = "metadata.annotations.jira/project-key"
jira_label = "metadata.annotations.jira/label"
defaults {
owner = "Unclassified"
component = "Unclassified"
}
}
}
rule "file" "ci_lint" {
check = "exact"
paths = [".github/workflows/lint.yml"]
target = ".github/workflows/lint.yml"
template = "lint-workflow.tmpl"
pr {
search_terms = ["lint workflow"]
}
reconcile "workflow_sync" {
watch = true
}
}
# --- Repository setting rules ---
rule "setting" "vulnerability_alerts" {
property = "vulnerability_alerts_enabled"
expected = true
remediate = true
}
rule "setting" "default_branch" {
property = "default_branch"
expected = "main"
}
# --- Branch protection rules ---
rule "branch_protection" "main" {
branch = "main"
require_pr = true
required_approvals = 1
dismiss_stale_reviews = true
require_status_checks = ["ci/lint", "ci/test"]
reconcile "branch_protection" {
watch = true
}
}
Config File Loading
Supports both single file and directory loading. The GUARDIAN_CONFIG
env var (default: /etc/repo-guardian/guardian.hcl) can point to either:
- A single
.hclfile - A directory containing one or more
.hclfiles
When pointing to a directory, all .hcl files are loaded and merged. HCL
handles block merging natively. This allows splitting large configurations
into logical files (e.g., rules-files.hcl, rules-settings.hcl,
ignores.hcl).
Architecture Overview
guardian.hcl (file or directory)
|
v
+-------------------+
| HCL Config Loader | (internal/policy)
| - Parse & validate|
| - Build rule set |
| - Merge env vars |
+-------------------+
|
v
+-------------------+ +-------------------+
| Rule Engine | | Reconciler |
| (replaces current | | Registry |
| FileRule registry| | |
| + properties | | - custom_props |
| checker) | | - label_sync |
+-------------------+ | - branch_protect |
| | - workflow_sync |
v +-------------------+
+-------------------+ |
| Checker Engine |<----------+
| (refactored) |
+-------------------+
|
v
+-------------------+
| Webhook Handler |
| + push event |
| handler |
+-------------------+
Rule Types
Rules use the two-label HCL block syntax: rule "<type>" "<name>". The type
determines which schema validates the block, and the name is the unique
identifier for logging, metrics, and ignore lists.
| Rule Type | Purpose | Check Modes |
|---|---|---|
file |
Ensure a file exists with correct content | exists, contains, exact |
setting |
Ensure a repo setting has expected value | boolean match, string match |
branch_protection |
Ensure branch protection rules are set | structured comparison |
File Rule Check Modes
| Mode | Behavior | Use Case |
|---|---|---|
exists |
File is present, content ignored | Renovate config, catalog-info.yaml |
contains |
File is present and passes assertions | CODEOWNERS has a real owner, dependabot includes ecosystems |
exact |
File must match template (semantic diff for YAML, byte diff for plaintext) | CI workflows, standardized configs |
Content Assertions
Assertions are typed by which field is used:
pattern/not_pattern-- regex matching for unstructured files (CODEOWNERS, Makefile, etc.)yaml_path+contains/equals-- YAML-aware path matching, parser inferred from file extensionjson_path(future) -- JSON-aware path matching
For exact mode, YAML files use semantic comparison (ignore
comment/whitespace differences) rather than byte-level diff.
Reconciler Types
Reconcilers are optional behaviors attached to rules. They run when a rule's
target file exists (or is modified, if watch = true). Each reconciler type
has a fixed schema validated at config load time.
| Reconciler Type | Purpose | Fixed Schema Fields |
|---|---|---|
custom_properties |
Read file, extract values, sync to GitHub custom properties | mode, owner, component, jira_project, jira_label, defaults |
label_sync |
Read labels from file, sync to GitHub repo labels | TBD |
branch_protection |
Read settings from file, apply branch protection | TBD |
workflow_sync |
Ensure workflow file matches template | TBD |
The reconciler interface in Go:
type Reconciler interface {
// Name returns the reconciler identifier (e.g., "custom_properties").
Name() string
// Reconcile is called when the rule's file exists in the repo.
// content is the file's raw content.
Reconcile(ctx context.Context, client github.Client,
owner, repo, defaultBranch, content string) error
}
Adding a new reconciler type requires:
- Go code for the reconciler implementation
- HCL schema definition for the reconciler's fixed fields
- No changes to the rule engine or config loader
Ignore Lists
Two levels:
- Global --
ignore { repos = [...] }at the top level skips repos for all rules - Per-rule --
ignore { repos = [...] }inside a rule block skips that specific rule for those repos
Patterns support glob matching (e.g., myorg/terraform-*).
Evaluation order:
- Global ignore list checked first (cheapest)
- Per-rule ignore list checked before rule evaluation
Push Event Handling
When a rule has reconcile { watch = true }, repo-guardian handles push
events for the rule's file paths.
The push handler:
- Checks the push is to the default branch
- Scans
commits[].addedandcommits[].modifiedfor watched file paths - Enqueues the repo if a match is found
This closes the feedback loop where repo-guardian creates a
catalog-info.yaml PR, and after merge, immediately reconciles the custom
properties rather than waiting for the weekly scheduler.
The GitHub App must subscribe to the push event (one-time manual config).
No new permissions are required.
Config Loading and Env Var Interaction
The HCL config file is optional. When absent, the system falls back to the current env-var-only behavior, preserving full backward compatibility.
When present, the loading order is:
- Built-in defaults (same as today)
guardian.hcloverrides defaults (all settings including operational)- Environment variables override
guardian.hcl
This means env vars always win, which is important for Kubernetes deployments where secrets (webhook secret, private key) should never be in the HCL file.
The config file path is set via GUARDIAN_CONFIG env var (default:
/etc/repo-guardian/guardian.hcl).
Migration from Current System
The refactoring is designed to be non-breaking:
- The existing
DefaultRulesininternal/rules/registry.gobecome the built-in defaults when no HCL config is present - The existing env vars continue to work unchanged
- The custom properties checker logic moves into a
custom_propertiesreconciler but the behavior is identical - The
CUSTOM_PROPERTIES_MODEenv var continues to work, configuring the built-incatalog_inforule's reconciler
A deployment can migrate incrementally: start using guardian.hcl for new
rules and overrides while keeping existing env vars for operational settings.
Helm Chart Changes
Add support for mounting guardian.hcl via ConfigMap:
# values.yaml
policy:
# -- Inline HCL policy configuration
config: ""
# -- Use an existing ConfigMap for the policy file
existingConfigMap: ""
Both inline and external ConfigMap are supported. When policy.config is
set, the chart creates a ConfigMap with the HCL content and mounts it at
/etc/repo-guardian/guardian.hcl. When policy.existingConfigMap is set,
the chart mounts that ConfigMap instead.
Alternatives Considered
1. YAML Configuration File
YAML is more universally known, but it lacks HCL's expressiveness for this use case. Policy rules with nested blocks, optional reconcilers, per-rule overrides, and glob patterns are awkward in YAML. YAML also doesn't support comments in a standardized way across all parsers, and doesn't have expressions or references. Since the team already uses HCL for Terraform, the learning curve is zero.
2. CUE
CUE provides stronger validation and typing than HCL, but it's less mainstream, has a steeper learning curve, and the Go ecosystem support is less mature. It would be overkill for a configuration file that's essentially a list of rules with optional blocks.
3. Keep Env Vars, Add a JSON/YAML Rules File
Split the problem: keep env vars for operational config, add a separate rules file for compliance checks. This avoids the HCL dependency but means two configuration mechanisms that don't compose well. Per-rule overrides and ignore lists would still be awkward in YAML.
4. Go Plugin System (No Config File)
Keep rules in Go code but make them more pluggable via interfaces. This maintains type safety but doesn't solve the "add a rule without recompiling" problem. It also doesn't address ignore lists or per-repo overrides.
Implementation Phases
Phase 1: HCL Config Parser and Policy Types
Define the Go types that represent the HCL schema (PolicyConfig, Rule,
Reconciler, IgnoreList). Implement the HCL parser using
hashicorp/hcl/v2. Write a config loader that supports single file and
directory loading, and merges HCL settings with env var overrides.
Deliverable: internal/policy package that can parse guardian.hcl and
produce a typed config struct. Comprehensive tests for parsing, validation,
env var override behavior, and directory loading.
Phase 2: Rule Engine Refactor
Replace the hardcoded FileRule registry with a generic rule engine that
reads from the parsed policy config. The checker engine accepts rules from
either the policy config or the built-in defaults (when no config file is
present). Support all three check modes (exists, contains, exact).
Deliverable: Existing file rules (CODEOWNERS, dependabot, renovate) work identically whether configured via HCL or built-in defaults. Content assertions work for both regex and YAML path matching. All existing tests pass.
Phase 3: Reconciler Interface and Custom Properties Migration
Define the Reconciler interface. Migrate the existing custom properties
checker into a custom_properties reconciler with a fixed schema. The
reconciler is attached to the catalog_info rule and runs when
catalog-info.yaml exists.
Deliverable: Custom properties behavior is identical but driven by the
reconciler interface. The CUSTOM_PROPERTIES_MODE env var continues to work
as a shorthand for configuring the built-in reconciler.
Phase 4: Ignore Lists
Implement global and per-rule ignore lists with glob matching. The checker
engine checks ignore lists before evaluating rules. Ignore lists are
configured in guardian.hcl.
Deliverable: Repos and rules can be excluded via HCL config. Tests cover exact match, glob patterns, global vs per-rule scoping.
Phase 5: Push Event Handler
Add push event handling to the webhook handler. When a rule has
reconcile { watch = true }, pushes to the default branch that modify the
rule's watched files trigger re-evaluation.
Deliverable: Merging a catalog-info.yaml PR triggers an immediate
custom properties rescan. Metric: repos_checked_total{trigger="push"}.
Phase 6: Additional Rule Types
Implement rule "setting" and rule "branch_protection" types. Define
their HCL schemas, checker logic, and reconciler implementations
(label_sync, branch_protection, workflow_sync).
Deliverable: Setting and branch protection rules can be declared in
guardian.hcl and evaluated by the checker engine.
Phase 7: Helm Chart and Documentation
Update the Helm chart to support guardian.hcl via inline values or
external ConfigMap. Update documentation, migration guide, and examples.
Deliverable: Users can deploy with HCL config via Helm values. README and chart values document the new configuration model.
Risks and Mitigations
| Risk | Impact | Likelihood | Mitigation |
|---|---|---|---|
| HCL dependency adds complexity | Medium | Low | hashicorp/hcl/v2 is a single, stable, well-maintained dependency. It's the same library Terraform uses. |
| Breaking existing deployments | High | Low | HCL config is optional. Existing env-var-only deployments work unchanged. Migration is incremental. |
| Push event volume overwhelms the system | Medium | Medium | Filter in the webhook handler (no API calls). Work queue drops jobs when full. Can remove push event subscription as a kill switch. |
| Over-engineering for current scale | Low | Medium | Phase the implementation. Each phase delivers standalone value. |
| Config file drift between environments | Medium | Medium | Env vars override HCL, so secrets and per-env settings stay in Kubernetes Secrets/ConfigMaps. The HCL file is the same across envs. |
Resolved Questions
-
Single file vs directory loading: Support both from the start. A single file or a directory of
.hclfiles.GUARDIAN_CONFIGcan point to either. -
Operational settings location: All settings (including operational ones like
worker_count,schedule_interval) are declared in the HCL file with defaults. Environment variables override any HCL setting. -
Reconciler mapping style: Fixed schema per reconciler type. Each reconciler defines its own typed fields (e.g.,
owner,component,jira_projectforcustom_properties). Adding new fields requires Go code changes, but validation happens at config load time. -
Reconciler types: Four concrete types planned:
custom_properties,label_sync,branch_protection,workflow_sync. The interface is justified. -
Content validation: Yes, three check modes:
exists(presence only),contains(presence + content assertions),exact(must match template). Assertions support regex (pattern/not_pattern) for unstructured files and YAML path matching (yaml_path+contains/equals) for structured files. Parser is inferred from file extension. -
Rule types: Design the schema for multiple rule types from the start:
rule "file",rule "setting",rule "branch_protection". The two-label HCL block syntax (rule "<type>" "<name>") naturally accommodates this. -
Config delivery in Kubernetes: Both inline Helm values (
policy.config) and external ConfigMap (policy.existingConfigMap).
Success Criteria
- Adding a new file presence rule requires only a
guardian.hclchange -- no Go code, no recompile, no redeploy (just a ConfigMap update + rollout) - Content assertions catch files that exist but are incorrect (placeholder owners, missing ecosystems)
- Ignoring a repo or rule for specific repos is a one-line config change
catalog-info.yamlchanges are detected within seconds of merge (push event handler) rather than up to 168h (scheduler)- Existing deployments with env-var-only configuration continue to work unchanged
- The
custom_propertiesreconciler produces identical behavior to the current special-cased checker - Setting and branch protection rules can be declared and enforced via HCL
References
- RFC-0001: Repo Compliance App
- DESIGN-0001: Custom Properties from Backstage
- hashicorp/hcl v2
- GitHub Push Event Payload
- Current rule registry:
internal/rules/registry.go - Current custom properties checker:
internal/checker/properties.go - Current webhook handler:
internal/webhook/handler.go