Infrastructure as Code: Provisioning with Terraform on Hetzner Cloud
How I transformed the architecture designed in the first article into reproducible code with Terraform. From initial setup to complete deployment with automatic Ansible inventory generation.
Context: From Design to Implementation
In the first article I documented the multi-tenant network architecture I designed.
Now I show how I implemented it using Terraform as Infrastructure as Code (IaC).
The goal was to have completely reproducible infrastructure: destroy and recreate everything in ~20 minutes
with a single terraform apply command.
The complete code is available on GitHub: magefleet/terraform-init-infra
What We'll Implement
- 4 private networks (Management, Shared, Business, Enterprise) configurable via variables
- Bastion host with NAT gateway and WireGuard VPN auto-configured via cloud-init
- Automatic generation of internal SSH keys for inter-server communication
- Reusable modules for Rancher, Vault, ArgoCD, Ecommerce
- Automatic Ansible inventory generation from Terraform state
- Outputs ready for testing and debugging
1. Initial Project Setup
1.1 Directory Structure
I organized the Terraform project with a modular structure that clearly separates responsibilities:
terraform-init-infra/
├── main.tf # Main orchestration
├── provider.tf # Hetzner provider configuration
├── networks.tf # Multi-tenant network definitions
├── variables.tf # Configurable variables
├── outputs.tf # Outputs for testing and debugging
├── terraform.tfvars # Variable values (GITIGNORED!)
├── terraform.tfvars.example # Template for terraform.tfvars
├── modules/
│ ├── rancher/ # Rancher management cluster module
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ └── outputs.tf
│ ├── vault/ # HashiCorp Vault module
│ ├── argocd/ # ArgoCD GitOps module
│ ├── ecommerce/ # Magento/Ecommerce module
│ └── ansible/ # Module for inventory generation
└── templates/
├── userdata_bastion.tpl # Cloud-init for bastion host
└── userdata_cluster_node.tpl # Cloud-init for cluster nodes
Structure rationale: Each module is independent and reusable.
I can disable individual components (e.g., Rancher) via boolean variables
without modifying the code (var.enable_rancher).
1.2 Hetzner Provider Configuration
The provider configuration is minimal. In the provider.tf file:
# Reference: provider.tf lines 1-3
provider "hcloud" {
token = var.hcloud_token
}
The API token is passed via variable instead of being hardcoded. In main.tf
I defined the required providers with locked versions for reproducibility:
# Reference: main.tf lines 1-16
terraform {
required_providers {
hcloud = {
source = "hetznercloud/hcloud"
version = "~> 1.52.0" # Lock version for stability
}
tls = {
source = "hashicorp/tls"
version = "~> 4.0" # For SSH key generation
}
random = {
source = "hashicorp/random"
version = "~> 3.6" # For key rotation generation
}
}
}
Best Practice: Always lock provider versions in production.
The ~> operator allows patch updates (1.52.1, 1.52.2) but blocks minor/major updates
that could introduce breaking changes.
1.3 Getting the Hetzner API Token
To obtain the Hetzner Cloud API token:
- Log in to Hetzner Cloud Console
- Select your project
- Go to Security → API Tokens
- Generate a new token with Read & Write permissions
- Copy the token (it will be shown only once!)
Save the token in terraform.tfvars:
hcloud_token = "your-api-token-here"
ssh_key_name = "your-ssh-key-name"
hcloud_ssh_private_key_path = "~/.ssh/id_rsa"
Security note: The terraform.tfvars file contains secrets and MUST be
in .gitignore. I included a terraform.tfvars.example with placeholders
to document required variables.
2. Multi-Tenant Network Implementation
2.1 Management Network (10.0.0.0/16)
The management network is the heart of the infrastructure. In the networks.tf file I implemented it like this:
# Reference: networks.tf lines 21-29
resource "hcloud_network" "private_net" {
name = "service-magefleet-network"
ip_range = var.management_network_cidr # Default: 10.0.0.0/16
labels = {
purpose = "management"
tier = "infrastructure"
}
}
I made the CIDR configurable via variable to allow customization without modifying code:
# Reference: variables.tf lines 189-199
variable "management_network_cidr" {
type = string
description = "CIDR block for management network"
default = "10.0.0.0/16"
}
variable "management_subnet_cidr" {
type = string
description = "CIDR block for management subnet"
default = "10.0.0.0/24"
}
The subnet is created with an explicit dependency on the network:
# Reference: networks.tf lines 31-38
resource "hcloud_network_subnet" "private_subnet" {
network_id = hcloud_network.private_net.id
type = "cloud"
network_zone = "eu-central"
ip_range = var.management_subnet_cidr
depends_on = [hcloud_network.private_net]
}
Note on network_zone: Hetzner requires specifying the network zone. "eu-central" covers the Nuremberg (nbg1), Falkenstein (fsn1) and Helsinki (hel1) datacenters.
2.2 NAT Gateway Route
The route for the NAT gateway is a critical component. It tells all VMs in the network to use the bastion as gateway to the internet:
# Reference: networks.tf lines 41-50
resource "hcloud_network_route" "to_extra_net" {
network_id = hcloud_network.private_net.id
destination = "0.0.0.0/0" # Default route
gateway = one(hcloud_server.bastion.network[*]).ip # Bastion IP
depends_on = [
hcloud_server.bastion,
hcloud_network_subnet.private_subnet
]
}
Technical explanation: The one() function extracts the IP from the
bastion's network interface list. The depends_on ensures the bastion
exists before creating the route.
2.3 Customer Networks (Shared, Business, Enterprise)
Customer networks follow the same pattern but with count/for_each conditionals to enable them only when necessary:
# Reference: networks.tf lines 58-77
resource "hcloud_network" "customers_shared" {
count = var.enable_shared_customers ? 1 : 0 # Conditional creation
name = "customers-shared-network"
ip_range = var.customers_shared_network_cidr # Default: 10.10.0.0/16
labels = {
purpose = "customers"
tier = "standard"
}
}
resource "hcloud_network_subnet" "customers_shared_workers" {
count = var.enable_shared_customers ? 1 : 0
network_id = hcloud_network.customers_shared[0].id
type = "cloud"
network_zone = "eu-central"
ip_range = var.customers_shared_subnet_cidr # Default: 10.10.0.0/24
}
Why count instead of for_each? For optional single resources I use count.
For dynamic multiple instances (e.g. business customers) I use for_each.
Business Customers with Dynamic Subnets
For business customers, each customer gets a dedicated /24 subnet. This requires a dynamic approach:
# Reference: networks.tf lines 111-120
resource "hcloud_network_subnet" "business_customer_subnet" {
for_each = var.enable_business_customers ? var.business_customers : {}
network_id = hcloud_network.customers_business[0].id
type = "cloud"
network_zone = "eu-central"
ip_range = "${var.customers_business_subnet_base}.${each.value.subnet_id}.0/24"
depends_on = [hcloud_network.customers_business]
}
The business_customers variable is a map with validation:
# Reference: variables.tf lines 266-295
variable "business_customers" {
type = map(object({
subnet_id = number # 1-254, used as: 10.20.{subnet_id}.0/24
server_type = string # e.g. "cpx31", "cpx41"
location = string # e.g. "nbg1", "fsn1"
}))
description = "Map of business tier customers with dedicated nodes"
default = {}
validation {
condition = alltrue([
for k, v in var.business_customers : v.subnet_id >= 1 && v.subnet_id <= 254
])
error_message = "subnet_id must be between 1 and 254"
}
}
# Usage example in terraform.tfvars:
# business_customers = {
# "client-acme" = {
# subnet_id = 1 # Creates 10.20.1.0/24
# server_type = "cpx41"
# location = "nbg1"
# }
# "client-beta" = {
# subnet_id = 2 # Creates 10.20.2.0/24
# server_type = "cpx41"
# location = "nbg1"
# }
# }
Validation block: Terraform automatically validates that subnet_id
is between 1 and 254 before applying. This prevents configuration errors.
3. Bastion Host with Auto-Configuration
3.1 Automatic SSH Key Generation
A feature I implemented is automatic generation of internal SSH keys for bastion → private VMs communication. This eliminates the need to manually manage keys:
# Reference: main.tf lines 19-24
resource "tls_private_key" "internal_ssh_key" {
algorithm = "RSA"
rsa_bits = 4096 # RSA 4096-bit for security
}
Terraform generates the key pair, and then I inject the private key into the bastion and the public key into private VMs. Everything automatic, no manual management.
3.2 Bastion Host Resource
The bastion is the most complex component because it must perform multiple functions: NAT gateway, SSH jump host, WireGuard VPN server, internal DNS server. Here's how I configured it:
# Reference: main.tf lines 31-63
resource "hcloud_server" "bastion" {
name = "bastion-host-wireguard"
server_type = "cpx11" # 2 vCPU, 2GB RAM (sufficient)
image = "ubuntu-22.04"
location = "nbg1" # Nuremberg
ssh_keys = [var.ssh_key_name] # SSH key for initial access
# Attach to management network with static IP
network {
network_id = hcloud_network.private_net.id
ip = var.bastion_private_ip # Default: 10.0.0.2
}
# Cloud-init user data for automatic configuration
user_data = templatefile("${path.module}/templates/userdata_bastion.tpl", {
private_network_ip_range = hcloud_network_subnet.private_subnet.ip_range
internal_ssh_private_key = tls_private_key.internal_ssh_key.private_key_pem
bastion_private_ip = var.bastion_private_ip
vault_cluster_ip = var.vault_private_ip
rancher_cluster_ip = var.rancher_private_ip
npm_admin_email = var.npm_admin_email
npm_admin_password = var.npm_admin_password
})
depends_on = [
hcloud_network_subnet.private_subnet
]
}
Note on user_data: Hetzner supports cloud-init. The template
userdata_bastion.tpl is populated with variables from Terraform
and executed at first VM boot.
3.3 Cloud-Init Template for NAT Configuration
The userdata_bastion.tpl template configures the bastion as NAT gateway.
Here are the most important parts:
# Reference: templates/userdata_bastion.tpl lines 4-12
#!/bin/bash -x
set -e # Exit on error
# NAT Configuration
apt update && apt upgrade -y
cat > /etc/networkd-dispatcher/routable.d/10-eth0-post-up << 'EOF_NAT_SCRIPT'
#!/bin/bash
echo 1 > /proc/sys/net/ipv4/ip_forward
iptables -t nat -A POSTROUTING -s '${private_network_ip_range}' -o eth0 -j MASQUERADE
EOF_NAT_SCRIPT
chmod +x /etc/networkd-dispatcher/routable.d/10-eth0-post-up
Technical explanation:
-
ip_forward = 1: Enables IP packet forwarding between interfaces -
POSTROUTING MASQUERADE: Applies SNAT for traffic from 10.0.0.0/16 to internet - The script is automatically executed when eth0 interface becomes routable
Internal DNS Server with dnsmasq
I configured dnsmasq on the bastion to provide internal DNS resolution.
This allows using names like vault.internal instead of IPs:
# Reference: templates/userdata_bastion.tpl lines 21-45
cat > /etc/dnsmasq.d/internal.conf << 'EOF_DNSMASQ'
# Internal services
address=/vault.internal/${vault_cluster_ip}
address=/rancher.internal/${rancher_cluster_ip}
# DNS forwarding for external queries
server=8.8.8.8
server=1.1.1.1
# Listen on localhost and private interface
listen-address=127.0.0.1,${bastion_private_ip}
bind-interfaces
domain=internal.local
expand-hosts
cache-size=1000
EOF_DNSMASQ
systemctl enable dnsmasq
systemctl restart dnsmasq
Why dnsmasq? It's lightweight, simple to configure, and perfect for private networks.
Private VMs can now use the bastion as DNS server (/etc/resolv.conf points to 10.0.0.2).
4. Reusable Modules
4.1 Terraform Module Pattern
Each infrastructure component (Rancher, Vault, ArgoCD) is implemented as a module.
The pattern is always the same: main.tf, variables.tf, outputs.tf.
Example of the Rancher module in the main main.tf:
# Reference: main.tf lines 65-76
module "rancher" {
count = var.enable_rancher ? 1 : 0 # Conditional instantiation
source = "./modules/rancher"
bastion_id = hcloud_server.bastion.id
bastion_private_ip = var.bastion_private_ip
rancher_private_ip = var.rancher_private_ip
internal_ssh_public_key = tls_private_key.internal_ssh_key.public_key_openssh
location = var.location
network_id = hcloud_network.private_net.id
ssh_key_name = var.ssh_key_name
}
Advantages of modular approach:
- I can reuse modules in other projects
- Each module is independently testable
- Disabling a component is simple:
enable_rancher = false - Each component's logic is isolated from the rest
4.2 Ansible Module for Inventory Generation
An interesting aspect is the ansible module that automatically generates
Ansible inventory from Terraform state:
# Reference: main.tf lines 134-163
module "ansible" {
source = "./modules/ansible"
bastion_id = hcloud_server.bastion.id
bastion_private_ip = var.bastion_private_ip
bastion_public_ip = hcloud_server.bastion.ipv4_address
ecommerce_cluster_private_ip = try(module.ecommerce[0].private_ip, var.ecommerce_private_ip)
argocd_cluster_private_ip = try(module.argocd[0].private_ip, var.argocd_private_ip)
rancher_cluster_private_ip = try(module.rancher[0].private_ip, var.rancher_private_ip)
vault_cluster_private_ip = var.vault_private_ip
internal_ssh_private_pem = tls_private_key.internal_ssh_key.private_key_pem
# Pass enable flags
enable_rancher = var.enable_rancher
enable_vault = var.enable_vault
enable_ecommerce = var.enable_ecommerce
enable_argocd = var.enable_argocd
# Other parameters...
}
try() function: I use try() to handle cases where the module
is not enabled. If module.rancher[0] doesn't exist, fallback to var.rancher_private_ip.
The module generates files like inventory.ini and ansible.cfg ready for immediate use.
This eliminates the need to maintain inventory manually.
5. Outputs and Testing
5.1 Useful Outputs
I configured outputs that provide all necessary information for testing and debugging:
# Reference: outputs.tf lines 1-21
output "bastion_public_ip" {
value = hcloud_server.bastion.ipv4_address
description = "Bastion host public IP"
}
output "internal_ssh_private_key" {
description = "Internal SSH private key for Ansible"
value = tls_private_key.internal_ssh_key.private_key_pem
sensitive = true # Not shown in logs
}
To see sensitive outputs: terraform output -raw internal_ssh_private_key
5.2 WireGuard Configuration Auto-Fetch
I implemented a null_resource that automatically retrieves the WireGuard configuration generated by the bastion:
# Reference: outputs.tf lines 24-57
resource "null_resource" "fetch_wireguard_client_config" {
depends_on = [hcloud_server.bastion]
triggers = {
always_run = timestamp() # Force run every time
}
# Wait for cloud-init to finish on bastion
provisioner "remote-exec" {
inline = ["cloud-init status --wait > /dev/null"]
connection {
type = "ssh"
user = "root"
host = hcloud_server.bastion.ipv4_address
private_key = file(var.hcloud_ssh_private_key_path)
timeout = "5m"
}
}
# Retrieve WireGuard configuration file
provisioner "local-exec" {
command = <<-EOT
ssh -o StrictHostKeyChecking=no \
-o IdentityFile=${var.hcloud_ssh_private_key_path} \
root@${hcloud_server.bastion.ipv4_address} \
'cat /root/wg_client.conf' > wireguard_client_config.conf
EOT
}
}
After terraform apply, you'll find the wireguard_client_config.conf file
in the current directory, ready to import into your WireGuard client.
5.3 Testing Instructions Output
I created an output with step-by-step instructions for testing the infrastructure:
# After terraform apply, display instructions:
terraform output testing_instructions > testing.txt
cat testing.txt
The output includes ready commands for:
- SSH into bastion
- SSH from bastion → private VMs using internal key
- Test internet connectivity from private VMs (NAT gateway)
- Setup WireGuard VPN
- Access Rancher UI
6. State Management and Best Practices
6.1 Remote State with Terraform Cloud
In production, Terraform state should NOT be local. I configured remote state on Terraform Cloud (or S3 for AWS projects):
# Add to main.tf for remote state
terraform {
backend "remote" {
organization = "magefleet"
workspaces {
name = "production"
}
}
}
Why remote state?
- Avoids conflicts when multiple people work on the same project
- Automatic state locking
- Automatic state backup
- State history for rollback
6.2 Workspaces for Multiple Environments
To manage dev/staging/production, I used Terraform workspaces:
# Create workspace for staging
terraform workspace new staging
terraform workspace select staging
terraform apply -var-file="staging.tfvars"
# Switch to production
terraform workspace select production
terraform apply -var-file="production.tfvars"
Each workspace has its own state file, allowing separate infrastructures with the same code.
6.3 tfvars Files per Environment
I organized variables per environment:
# production.tfvars
environment = "production"
management_network_cidr = "10.0.0.0/16"
enable_rancher = true
enable_vault = true
# staging.tfvars
environment = "staging"
management_network_cidr = "10.100.0.0/16" # Different CIDR
enable_rancher = true
enable_vault = false # Cost savings in staging
7. Complete Workflow
7.1 Initial Deploy
# 1. Clone repository
git clone https://github.com/ramingo/magefleet.git
cd magefleet/terraform-init-infra
# 2. Copy template and configure variables
cp terraform.tfvars.example terraform.tfvars
# Edit terraform.tfvars with your values
# 3. Initialize Terraform
terraform init
# 4. Validate configuration
terraform validate
# 5. Plan (dry-run)
terraform plan -out=tfplan
# 6. Review the plan
# IMPORTANT: Read carefully what will be created!
# 7. Apply
terraform apply tfplan
# 8. Save outputs
terraform output testing_instructions > TESTING.md
terraform output -raw internal_ssh_private_key > internal_key.pem
chmod 600 internal_key.pem
7.2 Subsequent Changes
To modify existing infrastructure:
# 1. Modify terraform.tfvars or .tf files
# Example: Add a business customer
# In terraform.tfvars:
business_customers = {
"client-new" = {
subnet_id = 3
server_type = "cpx41"
location = "nbg1"
}
}
# 2. Plan to see what will change
terraform plan
# 3. Apply only if plan is correct
terraform apply
7.3 Destroy (Warning!)
# To destroy ALL infrastructure
terraform destroy
# To destroy specific resources
terraform destroy -target=module.ecommerce
⚠️ Warning: terraform destroy deletes EVERYTHING. In production,
always use specific targets and backup the state before.
8. Common Troubleshooting
8.1 "Error creating network: network overlaps"
Problem: Terraform fails with network overlap error.
Cause: You already have a network with overlapping CIDR.
Solution:
# 1. List existing networks
hcloud network list
# 2. Change CIDR in terraform.tfvars
management_network_cidr = "10.50.0.0/16" # Use different range
# 3. Or delete existing network manually
hcloud network delete OLD_NETWORK_ID
8.2 "Cloud-init did not finish in time"
Problem: The null_resource that retrieves WireGuard config times out.
Cause: Cloud-init on bastion is still configuring the system.
Solution:
# Verify cloud-init status manually
ssh root@BASTION_IP 'cloud-init status'
# Desired output: "status: done"
# If "status: running", wait and re-run:
terraform apply
8.3 "Private VMs can't reach internet"
Problem: Private VMs can't apt update or ping 8.8.8.8.
Debugging steps:
# 1. SSH into bastion
ssh root@BASTION_IP
# 2. Verify IP forwarding
cat /proc/sys/net/ipv4/ip_forward # Must be 1
# 3. Verify iptables NAT rules
iptables -t nat -L -n -v | grep MASQUERADE
# 4. SSH into a private VM from bastion
ssh -i /root/.ssh/id_rsa_internal root@10.0.0.4
# 5. From private VM, verify route
ip route show | grep default # Must point to 10.0.0.2
# 6. Verify DNS
cat /etc/resolv.conf # Should have nameserver 10.0.0.2 or 8.8.8.8
Conclusions
Terraform implementation allowed me to transform a complex architectural design into reproducible code. Key decisions were:
- Modular structure: Each component is a reusable module with enable flags
- Cloud-init for auto-configuration: Bastion configures itself automatically at boot
- Automatic SSH key generation: Eliminates manual management and improves security
- Terraform-Ansible integration: Inventory automatically generated from state
- Conditional multiple networks: Activate only necessary networks to reduce costs
In the next article I'll show how I use Ansible to configure services on this infrastructure: Rancher, Vault, ArgoCD, and Kubernetes cluster deployment.
Resources
- GitHub Repository with complete code
- Terraform Hetzner Provider Documentation
- Hetzner Cloud Networks Documentation
- Terraform Official Tutorials
Previous article: Multi-Tenant Cloud Infrastructure Architecture
Next article: Bastion Host Setup and Security Hardening