# GCP Landing Zone Reference

> Working reference for a Google Cloud Platform enterprise-scale landing zone. Aligned with Google's Cloud Foundation Toolkit + security best practices. Resource hierarchy, network topology, IAM boundaries, logging, policy guardrails, and a rollout order that has survived audit. MIT licensed.

**Use case:** you are a net-new GCP adopter or consolidating from scattered projects. You want a foundation that won't have to be rebuilt in 18 months.

---

## 0. Reference snapshot

| Layer | Choice | Rationale |
|---|---|---|
| Hierarchy | Organization → Folders (env + business unit) → Projects | Inherited IAM + policy |
| Network | Shared VPC per environment | Central governance, decentralised consumption |
| Connectivity hub | HA VPN or Cloud Interconnect + Cloud Router | Dual-tunnel is the minimum resilient pattern |
| Identity | Cloud Identity + Google Workspace OR federated from external IdP | Federate unless you need GCP-native users |
| Logging | Org-level aggregated sink → centralised logging project | Tamper-resistant, cross-project |
| Policy | Organization Policies + custom constraints | Guardrails before guardrails are needed |
| IaC | Terraform with the Cloud Foundation Toolkit modules | Official reference, testable |

---

## 1. Resource hierarchy

```
organization (example.com)
├── folder: bootstrap
│   └── project: prj-b-bootstrap      (seed state, CI/CD for IaC)
├── folder: common
│   ├── project: prj-c-billing        (BigQuery billing export)
│   ├── project: prj-c-logging        (aggregated log sinks)
│   ├── project: prj-c-monitoring     (Cloud Monitoring scope)
│   ├── project: prj-c-secrets        (Secret Manager)
│   └── project: prj-c-dns            (Cloud DNS)
├── folder: network
│   ├── project: prj-n-shared-base-prod    (Shared VPC host, base tier)
│   ├── project: prj-n-shared-base-nonprod
│   ├── project: prj-n-shared-restricted-prod (VPC-SC perimeter)
│   └── project: prj-n-shared-restricted-nonprod
├── folder: security
│   ├── project: prj-sec-secrets
│   └── project: prj-sec-scc          (Security Command Center hub)
├── folder: prod
│   ├── folder: bu-retail
│   │   ├── project: prj-p-retail-app
│   │   └── project: prj-p-retail-data
│   └── folder: bu-finance
├── folder: nonprod
│   └── (same pattern as prod)
└── folder: dev
    └── (sandbox per engineer / team)
```

**Naming convention** — `prj-<env>-<bu>-<service>` keeps console searchable and billing reports legible. `env` is b (bootstrap), c (common), n (network), p (prod), np (nonprod), d (dev).

---

## 2. Network topology

### 2.1 Shared VPC model

Central network team owns two VPC host projects per environment: **base** (general workloads) and **restricted** (VPC Service Controls perimeter for sensitive data). Workload projects are *service projects* attached to them.

```
[on-prem / other cloud]
        │
  HA VPN × 2 tunnels    ← Cloud Interconnect upgrade path
        │
   Cloud Router (BGP)
        │
   Shared VPC host-prod (base)   ──────  Shared VPC host-prod (restricted)
        │                                      │
   subnets per region                     subnets for sensitive data services
        │
   service projects
   (workloads)
```

### 2.2 Subnet plan

Per region, per tier:

| Subnet | CIDR | Purpose |
|---|---|---|
| `sn-p-base-<region>-gke` | `/20` | GKE nodes, with `/14` pod secondary + `/20` service secondary |
| `sn-p-base-<region>-app` | `/22` | Compute Engine app tier |
| `sn-p-base-<region>-data` | `/24` | PrivateLink to Cloud SQL etc. |
| `sn-p-base-<region>-mgmt` | `/26` | Bastion, jump hosts |
| `sn-p-restricted-<region>-*` | `/22` | VPC-SC-perimeter subnets |

Use RFC 1918 super-net per region to make peering straightforward (`10.128.0.0/16` for europe-west2, `10.129.0.0/16` for us-central1 etc.). Never overlap with on-prem.

### 2.3 Firewall baseline

Default Google rules removed. Hierarchical firewall policies at the organisation level:

- **Deny** ingress from the internet to RFC 1918 ranges (defence-in-depth)
- **Allow** health checks from Google LB ranges (`35.191.0.0/16`, `130.211.0.0/22`)
- **Allow** IAP (`35.235.240.0/20`) to SSH port for managed access
- **Deny** egress to known bad / mining pool ranges (via Threat Intel address group)
- **Allow** egress to Private Google Access endpoints (`199.36.153.8/30`)

Environment-level policies override as needed. Workload-level VPC firewall rules are lowest priority.

### 2.4 Private Google Access + Private Service Connect

- Enable Private Google Access on every subnet.
- Add DNS response policy: `*.googleapis.com` → `199.36.153.8/30` so on-prem traffic via Interconnect resolves privately.
- Use Private Service Connect for consuming Google APIs and published services.

---

## 3. IAM boundaries

### 3.1 Service-account strategy

- **One service account per workload role.** Not per person, not per project.
- Use Workload Identity Federation for CI/CD — stop minting JSON keys.
- For GKE: Workload Identity, not node-level service accounts.
- Rotate any JSON keys that must exist (90 days max) and alert if they age out.

### 3.2 Human access

- All humans federated from the corporate IdP into Cloud Identity.
- Group-based role assignment — no direct user bindings.
- Privileged groups guarded by IAM Conditions (e.g., time-bounded, region-locked).
- Just-in-time elevation via Privileged Access Manager or equivalent.
- No `Owner` or `Editor` primitive roles in production. Period.

### 3.3 Recommended predefined roles

| Persona | Roles |
|---|---|
| Platform engineer | `roles/viewer` org-wide, `roles/compute.networkAdmin` on network folder |
| Developer | `roles/editor` in their dev project only, `roles/viewer` in prod for debugging |
| SRE on-call | `roles/monitoring.editor`, `roles/logging.viewer`, JIT `roles/compute.instanceAdmin` |
| Security auditor | `roles/iam.securityReviewer`, `roles/securitycenter.adminViewer` |
| Billing admin | `roles/billing.admin` at billing-account level, no project-level access |

---

## 4. Organization Policies (guardrails)

Enable these at org level unless there is a stated exception:

- `constraints/iam.disableServiceAccountKeyCreation` — forces Workload Identity Federation.
- `constraints/iam.disableServiceAccountKeyUpload` — no imported keys.
- `constraints/compute.requireShieldedVm` — secure boot + vTPM.
- `constraints/compute.restrictSharedVpcSubnetworks` — only approved subnets.
- `constraints/compute.vmExternalIpAccess` — deny external IPs by default.
- `constraints/sql.restrictPublicIp` — Cloud SQL private IP only.
- `constraints/storage.publicAccessPrevention` — buckets default private.
- `constraints/storage.uniformBucketLevelAccess` — kill ACL drift.
- `constraints/gcp.resourceLocations` — restrict to EU/UK or your approved regions.
- `constraints/iam.allowedPolicyMemberDomains` — only your Cloud Identity domain.
- `constraints/serviceuser.services` — whitelist which APIs can be enabled.

---

## 5. Logging + monitoring

### 5.1 Aggregated sinks

In `prj-c-logging` create:

- **Audit log sink** (admin activity + data access + system events) → BigQuery dataset with 1-year retention + archive to GCS bucket with object lock.
- **Access Transparency sink** → same BigQuery + GCS.
- **VPC Flow Logs sink** per network host project.
- **Firewall logs sink** per network host project.

Sinks **include children** so every workload project's logs flow in automatically.

### 5.2 Retention

| Log | Hot | Cold |
|---|---|---|
| Admin activity | 1 year BigQuery | 7 years GCS with object lock |
| Data access | 90 days BigQuery | 1 year GCS |
| Access Transparency | 1 year BigQuery | 7 years GCS |
| VPC Flow | 30 days BigQuery | 1 year GCS |
| Application | 30 days Cloud Logging | Your call |

Tune per regulatory bar (PCI = 1 year minimum, with 3 months immediately available; HIPAA = 6 years).

### 5.3 Alerting

Security Command Center Premium feeding into your SIEM. Minimum alerts:

- Privilege escalation (role grant to Owner / Editor outside approved break-glass)
- New project creation outside expected folders
- Public bucket / public dataset / public Cloud Run service
- Service account key creation
- VPN / Interconnect tunnel down
- Suspicious Workload Identity federation binding

---

## 6. Security baselines

### 6.1 VPC Service Controls

Create a restricted perimeter around sensitive services: BigQuery, Cloud Storage, Cloud SQL, KMS, Secret Manager, Dataflow. Ingress rules for your CI/CD identity; egress rules only to your own projects. Dry-run for 2 weeks before enforcing.

### 6.2 Key management

- CMEK for every data-at-rest service that supports it (GCS, BQ, CloudSQL, PD, Pub/Sub).
- Keys in `prj-c-secrets` with a separate org-level project for the KMS.
- Rotation: 90 days for keys protecting high-sensitivity data, 365 otherwise.
- HSM-backed for PCI / regulated workloads.

### 6.3 Secret Manager

All application secrets in Secret Manager, referenced by workload identity. No secrets in env vars checked into code. Access logged to audit sink.

### 6.4 Binary Authorization (for GKE)

Require signed container images. Signers = your build pipeline. Break-glass project for emergencies.

---

## 7. IaC + build pipeline

- Terraform + Google Cloud Foundation Toolkit modules.
- Remote state in a GCS bucket in the bootstrap project, with object versioning and object lock.
- CI/CD via Cloud Build or GitHub Actions using Workload Identity Federation (no keys).
- Plan stage on every PR; apply stage only on merge to main + approval.
- Separate state files per folder / project to bound blast radius.

---

## 8. Billing + cost management

- Enable BigQuery billing export on day one.
- Budgets per project + email + Pub/Sub alerts at 50%/80%/100%.
- Use labels consistently: `env`, `bu`, `app`, `cost-centre`.
- Monthly report to department leads.
- Committed use discounts reviewed quarterly.

---

## 9. Rollout order

Do not try to stand this all up at once. Recommended order:

1. Bootstrap project + Terraform state + CI/CD with Workload Identity Federation.
2. Resource hierarchy (org, folders).
3. Org-level policies (guardrails).
4. Common projects (billing, logging, monitoring, secrets, DNS).
5. Logging sinks + aggregated exports.
6. Network host projects + HA VPN to on-prem.
7. First set of workload projects attached as service projects.
8. Security Command Center + SIEM connector.
9. First production workload migration.
10. VPC Service Controls perimeter (dry-run first).

---

## 10. Attribution

Built by **Hak** at **VantagePoint Networks**. Based on Google Cloud Foundation Toolkit, CIS GCP Benchmark, and real multi-BU GCP migrations. MIT licensed — fork, customise, ship.
