# Kubernetes NetworkPolicy Baseline Pack

> Opinionated baseline for locking down traffic inside a Kubernetes cluster. Default-deny patterns, namespace isolation, egress control, CNI-specific notes (Calico, Cilium, AWS VPC CNI), plus ready-to-apply YAML. Production-tested. MIT licensed.

**Assumptions:** you have a CNI that enforces `NetworkPolicy` (stock Kubernetes NetworkPolicy works with Calico, Cilium, Antrea, Weave; AWS VPC CNI needs `aws-network-policy-agent`). Without an enforcing CNI this is decorative.

---

## 0. Pack contents

| Section | YAML in file |
|---|---|
| 1. Starting position | Assess cluster state |
| 2. Default-deny per namespace | `default-deny.yaml` |
| 3. Allow DNS | `allow-dns.yaml` |
| 4. Tier-based allow (web → api → db) | `tier-flows.yaml` |
| 5. Kube-system exceptions | `kube-system.yaml` |
| 6. Ingress controller traffic | `ingress-controller.yaml` |
| 7. Egress to managed services | `egress-managed.yaml` |
| 8. Istio/Linkerd co-existence | notes + YAML |
| 9. Cilium-specific enhancements | `cilium-clusterwide.yaml` |
| 10. Rollout strategy | Staged enforcement |

---

## 1. Assess before you enforce

Before applying default-deny anywhere, run:

```bash
# What's talking to what, right now?
kubectl get pods -A -o wide
# If Cilium:
cilium hubble observe --since 1h --output json > /tmp/flows.json
# If Calico:
kubectl get globalnetworkpolicies -o yaml
# If AWS VPC CNI:
kubectl get networkpolicies -A
```

Map the traffic you observe to *intended* traffic. Everything else is either legitimate-but-undocumented (fix your docs) or actually unwanted.

---

## 2. Default-deny per namespace

The first rule for every namespace: deny ingress **and** egress unless explicitly allowed.

```yaml
# default-deny.yaml  — apply per namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: app-prod
spec:
  podSelector: {}
  policyTypes:
    - Ingress
    - Egress
```

Apply this to every namespace you manage. `kube-system`, `kube-public`, `kube-node-lease` are special — see Section 5.

---

## 3. Allow DNS

If you deny egress and forget DNS, nothing works. Allow CoreDNS traffic explicitly:

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-dns-egress
  namespace: app-prod
spec:
  podSelector: {}
  policyTypes: [Egress]
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
          podSelector:
            matchLabels:
              k8s-app: kube-dns
      ports:
        - protocol: UDP
          port: 53
        - protocol: TCP
          port: 53
```

If you use NodeLocal DNSCache, the pattern changes — allow traffic to the node IP on port 53 via the CNI's node selector (Cilium: `toNodes`; Calico: `nets`).

---

## 4. Tier-based allow (web → api → db)

Classic 3-tier pattern. Label pods with `tier: web|api|db`. Then:

```yaml
# web tier: ingress from the ingress controller only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: web-ingress
  namespace: app-prod
spec:
  podSelector:
    matchLabels:
      tier: web
  policyTypes: [Ingress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: ingress-system
          podSelector:
            matchLabels:
              app.kubernetes.io/name: ingress-nginx
      ports:
        - protocol: TCP
          port: 8080
---
# api tier: ingress from web tier only
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: api-ingress
  namespace: app-prod
spec:
  podSelector:
    matchLabels:
      tier: api
  policyTypes: [Ingress]
  ingress:
    - from:
        - podSelector:
            matchLabels:
              tier: web
      ports:
        - protocol: TCP
          port: 8080
---
# db tier: ingress from api only, nothing else
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: db-ingress
  namespace: app-prod
spec:
  podSelector:
    matchLabels:
      tier: db
  policyTypes: [Ingress]
  ingress:
    - from:
        - podSelector:
            matchLabels:
              tier: api
      ports:
        - protocol: TCP
          port: 5432
```

Egress equivalents: web can egress to api; api can egress to db + DNS + your internal endpoints; db can egress to DNS + backup targets only.

---

## 5. Kube-system exceptions

`kube-system` usually needs broader reach (metrics-server scraping kubelet, cluster-dns from everywhere, autoscaler talking to cloud APIs). Do **not** apply default-deny here unless you really understand every component.

If you want to restrict kube-system, a safer approach: allow intra-namespace, allow egress to the cluster API endpoint and cloud APIs, deny everything else.

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: kube-system-intra-plus
  namespace: kube-system
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]
  ingress:
    - {} # allow all ingress within cluster, optional
  egress:
    - {} # allow all egress — narrow later
```

Better: profile kube-system for a week first, then write specific rules.

---

## 6. Ingress controller traffic

The ingress controller (nginx, Traefik, etc.) needs:
- **Ingress** from outside the cluster (from a LoadBalancer / NodePort / hostPort)
- **Egress** to every workload namespace it fronts

```yaml
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: ingress-controller
  namespace: ingress-system
spec:
  podSelector:
    matchLabels:
      app.kubernetes.io/name: ingress-nginx
  policyTypes: [Ingress, Egress]
  ingress:
    - from: []     # any source (it's a public ingress point)
      ports:
        - protocol: TCP
          port: 80
        - protocol: TCP
          port: 443
  egress:
    - to:
        - namespaceSelector: {}   # to any namespace
      ports:
        - protocol: TCP
          port: 8080
```

If your ingress is internal-only, narrow the ingress clause to your VPC CIDR via the CNI's IP selector (Calico / Cilium only — stock NetworkPolicy doesn't support CIDR in `from`).

---

## 7. Egress to managed services

Cloud-managed DBs, KMS, Secret Manager, object storage — these sit at static or near-static IPs. Stock NetworkPolicy can only target pods/namespaces, so you need either:

- **CIDR-based egress** — Calico `GlobalNetworkPolicy` or Cilium `CiliumNetworkPolicy` with `toCIDR`.
- **FQDN-based egress** — Cilium `toFQDNs` or Calico enterprise. Preferred because IP changes silently over time.

Cilium example:

```yaml
apiVersion: cilium.io/v2
kind: CiliumNetworkPolicy
metadata:
  name: api-to-managed
  namespace: app-prod
spec:
  endpointSelector:
    matchLabels:
      tier: api
  egress:
    - toFQDNs:
        - matchPattern: "*.s3.eu-west-1.amazonaws.com"
        - matchName: "secretsmanager.eu-west-1.amazonaws.com"
      toPorts:
        - ports:
            - port: "443"
              protocol: TCP
```

---

## 8. Service mesh co-existence

Istio / Linkerd enforce their own L7 policies on top. NetworkPolicy at L3/L4 is still valuable — it stops traffic before it ever reaches the mesh sidecar. Defence-in-depth.

- Keep NetworkPolicy for coarse isolation (namespace, tier).
- Use the mesh for L7 authz (JWT claims, path, method).
- Make sure sidecar injection traffic (istio-agent phoning home to istiod) is allowed.

For Istio 1.20+ ambient mode, you still need NetworkPolicy to isolate ztunnel communication.

---

## 9. Cilium-specific enhancements

If on Cilium, you get more expressive policy. Enable these once stable:

- **ClusterwideNetworkPolicy** — rules that apply cluster-wide (e.g., deny egress to `169.254.169.254` except from specific metadata-needing pods).
- **toFQDNs** with regex — lock down external egress without chasing IP churn.
- **Layer-7 HTTP rules** — for mesh-like method/path control without a sidecar.
- **Identity-based policy** — relies on Cilium identities, not just labels.

Clusterwide example — block metadata service abuse:

```yaml
apiVersion: cilium.io/v2
kind: CiliumClusterwideNetworkPolicy
metadata:
  name: deny-imds-default
spec:
  endpointSelector: {}
  egressDeny:
    - toCIDR:
        - 169.254.169.254/32
```

Then whitelist pods that really need IMDS via a namespace- or pod-scoped allow rule.

---

## 10. Rollout strategy

1. **Log-only mode first.** Calico: use `"action": "Log"` in tiered policies. Cilium: `policy-audit-mode: true`. Capture for a week.
2. **Review the logs.** Every denied flow that is legitimate becomes an allow rule.
3. **Start with the least-critical namespace.** Apply default-deny + the allow rules you derived. Watch for 48h.
4. **Roll out to remaining namespaces** one per day.
5. **Apply cluster-wide policies last.**

Rollback plan: `kubectl delete networkpolicy` per namespace. Have this scripted.

---

## 11. Gotchas

- **`kubernetes.io/metadata.name` label:** auto-populated on namespaces since 1.22. Don't rely on it on older clusters.
- **HostNetwork pods:** NetworkPolicy does not apply to pods with `hostNetwork: true`. Keep them to minimum and pin them down with node-level firewalling.
- **NodePort traffic:** source IP handling differs by CNI. Ingress from LoadBalancer may appear as the node IP. Test.
- **DNS + NodeLocal DNSCache:** see Section 3.
- **StatefulSets rotating:** pods come up + down, label selectors still apply, but test during rolling updates.
- **Cross-cluster traffic (multi-cluster):** NetworkPolicy doesn't apply across clusters natively; use mesh or external CNI features.
- **External IPs / ExternalName services:** NetworkPolicy doesn't apply to these. Use CIDR / FQDN egress rules instead.

---

## 12. Observability

- **Hubble (Cilium):** best-in-class flow visibility. `hubble observe --verdict DENIED --since 5m`
- **Calico Flow Logs:** require Calico Enterprise, but worth it for regulated environments.
- **VPC Flow Logs (cloud):** catch anything that escapes pod-level policy.
- **NetworkPolicy dry-run audit:** Calico + Cilium both support non-enforcing mode.

---

## 13. Compliance mapping

| Framework | Control | How this helps |
|---|---|---|
| CIS Kubernetes | 5.3.2 | Minimise use of the default namespace + deny-by-default |
| NIST 800-53 | SC-7 | Boundary protection between workloads |
| PCI-DSS v4 | 1.2, 1.3 | Segmentation within the CDE |
| ISO 27001 | A.8.22 | Segregation in networks |
| Zero Trust (NIST 800-207) | PE.4, CA.3 | Per-workload policy enforcement |

---

## Attribution

Built by **Hak** at **VantagePoint Networks**. Based on real production K8s clusters on EKS, GKE, AKS, and on-prem Calico. MIT licensed — fork, customise, ship.
