Deploying to Amazon EKS
Complete guide to deploying self-hosted Spacelift Flows on Amazon EKS using OpenTofu and Helm.
This guide provides a way to quickly get Spacelift Flows up and running on an Elastic Kubernetes Service (EKS) cluster. The infrastructure is deployed using OpenTofu, and the application services are deployed using Helm charts.
Currently, agents have to run outside the Kubernetes cluster, e.g., on an EC2 Auto Scaling Group. We will soon provide a fully Kubernetes-native solution.
Overview
Section titled “Overview”This deployment creates a complete Spacelift Flows instance with the following components:
- EKS Auto Mode cluster for container orchestration
- RDS Aurora PostgreSQL for the database
- S3 bucket for object storage
- KMS encryption for data at rest
- ACM certificates for SSL/TLS
- Agent pool deployed via Terraform module
The following services will be deployed as Kubernetes pods using the helm charts:
- The server.
- The worker.
- The gateway.
The server hosts the Spacelift Flows HTTP API and serves the embedded frontend assets. The server is exposed to the outside world through an Application Load Balancer for HTTP traffic, including the OAuth and MCP endpoints required for external integrations.
The worker is the component that handles recurring tasks and asynchronous jobs.
The gateway is a service that hosts the WebSocket server for agents and routes JavaScript evaluations to the right agents/runtimes.
The agent pool is deployed as an EC2 ASG and consists of an agent service. Agent services handle requests from the gateway and distribute execution commands.
Requirements
Section titled “Requirements”Before starting, ensure you have:
AWS Prerequisites
Section titled “AWS Prerequisites”- AWS CLI configured with appropriate permissions
- Access to an AWS account with the following service limits:
- EKS clusters: At least 1 available
- RDS Aurora clusters: At least 1 available
- VPC: At least 1 available (or use existing)
- NAT Gateways: At least 1 available per AZ
- Elastic IPs: At least 1 available per NAT Gateway
Tools Required
Section titled “Tools Required”- OpenTofu >= 1.6.0 (or Terraform >= 1.5.0)
- kubectl for Kubernetes management
- Helm >= 3.0 for application deployment
- AWS CLI >= 2.0
Domain Requirements
Section titled “Domain Requirements”- A registered domain name with DNS management access
- Ability to create DNS records for certificate validation
Optional Requirements
Section titled “Optional Requirements”- SMTP server for email notifications (recommended). You can enable Amazon SES by setting the Terraform variable
enable_ses = true. - Anthropic API key for AI features (recommended)
- OpenTelemetry collector endpoint for observability
Deploy Infrastructure
Section titled “Deploy Infrastructure”The infrastructure deployment uses a modular approach with OpenTofu to provision all necessary AWS resources.
1. Prepare the Environment
Section titled “1. Prepare the Environment”First, ensure your AWS CLI is configured with the appropriate credentials and region:
# Verify AWS CLI configurationaws sts get-caller-identity
# Set your preferred region and export as environment variableexport TF_VAR_aws_region=us-west-22. Create Working Directory
Section titled “2. Create Working Directory”Create a new directory for your infrastructure deployment:
mkdir spacelift-flows-infracd spacelift-flows-infra3. Create Infrastructure Configuration
Section titled “3. Create Infrastructure Configuration”Create a main.tf file that references the Spacelift Flows infrastructure module:
terraform { required_version = ">= 1.5.0" required_providers { aws = { source = "hashicorp/aws" version = "~> 6.0" } random = { source = "hashicorp/random" version = "~> 3.4" } }}
provider "aws" { region = var.aws_region}
module "spacelift_flows" { source = "github.com/spacelift-io/terraform-aws-eks-spacelift-flows-selfhosted?ref=v0.1.0"
# Required variables app_domain = var.app_domain organization_name = var.organization_name admin_email = var.admin_email aws_region = var.aws_region license_token = var.license_token anthropic_api_key = var.anthropic_api_key # optional expose_gateway = true
# Email configuration (choose one of the following options):
# Option 1: Dev mode - emails logged to server logs (testing only) # Useful for initial setup, but not suitable for production # email_dev_enabled = true
# Option 2: Custom SMTP server # smtp_host = "smtp.example.com" # smtp_port = 587 # smtp_username = "" # smtp_password = "" # smtp_from_address = "noreply@yourcompany.com"
# Option 3: Amazon SES (recommended for AWS deployments) # Requires domain verification and production access request (see Verify Deployment section) # enable_ses = true
# Optional variables k8s_namespace = var.k8s_namespace}
# Uncomment to deploy the agent pool when the spacelift flows backend services are deployed.# module "spacelift_flows_agent_pool" {# source = "github.com/spacelift-io/terraform-aws-spacelift-flows-agentpool-ec2?ref=v0.2.0"# agent_pool_id = module.spacelift_flows.agent_pool_id# agent_pool_token = module.spacelift_flows.agent_pool_token# backend_endpoint = "https://${var.app_domain}"# gateway_endpoint = "https://gateway.${var.app_domain}"# agent_image_tag = var.spacelift_flows_image_tag## reuse_vpc_id = module.spacelift_flows.vpc_id# reuse_vpc_subnet_ids = module.spacelift_flows.vpc_private_subnet_ids# aws_region = var.aws_region# min_size = 1# desired_capacity = 5# max_size = 10# }See more examples with different configurations in the GitHub repository.
If you want to reuse Spacelift Self-Hosted RDS, create a Postgres database:
CREATE DATABASE flows;We also suggest creating a separate user.
And then you should build a connection URL like this:
database_connection_url = format("postgres://%s:%s@%s:5432/flows", module.spacelift.rds_username, urlencode(module.spacelift.rds_password), module.spacelift.rds_cluster_endpoint)4. Create Variables File
Section titled “4. Create Variables File”Create a variables.tf file with variable definitions:
# Required variablesvariable "app_domain" { description = "The domain name for the Spacelift Flows instance" type = string}
variable "organization_name" { description = "Name of the organization" type = string}
variable "admin_email" { description = "Email address for the admin user" type = string}
variable "aws_region" { description = "AWS region for deployment" type = string}
variable "license_token" { description = "The JWT token for using Spacelift flows. Only required for generating the kubernetes_secrets output." type = string sensitive = true}
variable "k8s_namespace" { type = string}
variable "spacelift_flows_image_tag" { type = string}
variable "anthropic_api_key" { description = "Anthropic API key for AI features" type = string default = ""}5. Create Outputs File
Section titled “5. Create Outputs File”Create an outputs.tf file to expose important values:
output "config_secret_manifest" { description = "Outputs manifests that are needed to configure a secret for the Flows app." value = module.spacelift_flows.config_secret_manifest sensitive = true}
output "ingress_manifest" { description = "Outputs manifests that are needed to configure aws ingress." value = module.spacelift_flows.ingress_manifest}
output "shell" { value = module.spacelift_flows.shell}6. Set Environment Variables
Section titled “6. Set Environment Variables”Configure your deployment using environment variables. Start with the minimum required variables:
Choose a domain you control and can create DNS records for. Subdomains work well (e.g., flows.yourcompany.com).
# Required configurationexport TF_VAR_app_domain="flows.yourcompany.com"export TF_VAR_organization_name="Your Organization"export TF_VAR_admin_email="admin@yourcompany.com"export TF_VAR_license_token=""export TF_VAR_k8s_namespace="spacelift-flows"export TF_VAR_spacelift_flows_image_tag="0.3.0"# Optional configurationexport TF_VAR_anthropic_api_key="" # optionalFlows has an incredibly powerful and helpful AI assistant, which can help you quickly build and debug Flows. The AI assistant requires an API key from Anthropic to work, and you can get it by signing up at https://console.anthropic.com/ .
While Flows will work without it, we strongly advise not skipping this.
7. Initialize OpenTofu
Section titled “7. Initialize OpenTofu”Initialize the working directory and download required providers:
tofu init8. Review the Deployment Plan
Section titled “8. Review the Deployment Plan”Generate and review the execution plan:
tofu plan9. Deploy the Infrastructure
Section titled “9. Deploy the Infrastructure”Apply the configuration to create the infrastructure:
tofu applyWhen prompted, type yes to confirm the deployment.
10. Verify Infrastructure Deployment
Section titled “10. Verify Infrastructure Deployment”Once applied, you should grab all variables that need to be exported in the shell that will be used in next steps. We expose a shell output in tofu that you can source directly for convenience.
# Source in your shell all the required env vars to continue the installation process$(tofu output -raw shell)11. Configure kubectl Access
Section titled “11. Configure kubectl Access”Configure kubectl to access your new EKS cluster:
# Update kubeconfigaws eks update-kubeconfig --region $TF_VAR_aws_region --name $EKS_CLUSTER_NAME
# Create a namespacekubectl create namespace $TF_VAR_k8s_namespace12. Validate the certificates
Section titled “12. Validate the certificates”You can skip this step, if you provided already issued cert as a cert_arn variable.
After the infrastructure is deployed, you should validate that the ACM certificates are properly issued and configured:
# Get the certificate ARN from OpenTofu outputsCERT_ARN=$(aws acm list-certificates --region $TF_VAR_aws_region --query "CertificateSummaryList[?DomainName=='$TF_VAR_app_domain'].CertificateArn" --output text)
# Check certificate statusaws acm describe-certificate --certificate-arn $CERT_ARN --region $TF_VAR_aws_region --query "Certificate.Status" --output textThe certificate status should show ISSUED. If it shows PENDING_VALIDATION, you need to validate the certificate by creating the required DNS records:
# Get DNS validation recordsaws acm describe-certificate --certificate-arn $CERT_ARN --region $TF_VAR_aws_region --query "Certificate.DomainValidationOptions[*].ResourceRecord" --output tableCreate the CNAME records shown in the output in your DNS provider. The certificate will automatically be issued once DNS validation is complete (usually within a few minutes).
You can monitor the validation status:
# Check validation statusaws acm describe-certificate --certificate-arn $CERT_ARN --region $TF_VAR_aws_region --query "Certificate.DomainValidationOptions[*].ValidationStatus" --output textOnce all domains show SUCCESS, the certificate is ready to use.
Infrastructure Deployment Complete
Section titled “Infrastructure Deployment Complete”At this point, you have successfully deployed:
- ✅ VPC with public and private subnets
- ✅ EKS Auto Mode cluster ready for workloads
- ✅ RDS Aurora PostgreSQL database cluster
- ✅ S3 bucket for object storage with encryption
- ✅ KMS key for encryption at rest
- ✅ IAM roles and policies for service access
The next step is to deploy the application services using the generated Kubernetes manifests.
Deploy Application Services
Section titled “Deploy Application Services”1. Apply Configuration Secret
Section titled “1. Apply Configuration Secret”Create the main configuration secret:
tofu output -raw config_secret_manifest | kubectl apply -f -2. Apply Ingress Configuration
Section titled “2. Apply Ingress Configuration”Create an AWS ingress class:
tofu output -raw ingress_manifest | kubectl apply -f -3. Deploy Core Services
Section titled “3. Deploy Core Services”Create helm values file for the spacelift flows installation:
cat > flows-values.yaml <<EOFappDomain: $TF_VAR_app_domain
global: image: tag: $TF_VAR_spacelift_flows_image_tag
ingress: className: "spacelift-flows" exposeGateway: true annotations: { alb.ingress.kubernetes.io/healthcheck-port: "8080", alb.ingress.kubernetes.io/healthcheck-path: "/health", alb.ingress.kubernetes.io/scheme: internet-facing }EOFInstall the main Spacelift Flows services using the module reference:
helm repo add spacelift https://downloads.spacelift.io/helmhelm repo update
helm upgrade spacelift-flows spacelift/spacelift-flows --install -f flows-values.yaml --namespace $TF_VAR_k8s_namespaceMonitor deployment progress:
kubectl get pods -n $TF_VAR_k8s_namespace --watchWait for the load balancer to be provisioned:
kubectl get ingress -n $TF_VAR_k8s_namespace --watch4. Update DNS Records
Section titled “4. Update DNS Records”Once the ingress has an external IP/hostname, create DNS records:
# Get the load balancer hostnameLB_HOSTNAME=$(kubectl get ingress spacelift-flows -n $TF_VAR_k8s_namespace -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
echo "Create the following DNS records:"echo "$TF_VAR_app_domain CNAME $LB_HOSTNAME"echo "*.endpoints.$TF_VAR_app_domain CNAME $LB_HOSTNAME"echo "oauth.$TF_VAR_app_domain CNAME $LB_HOSTNAME"echo "mcp.$TF_VAR_app_domain CNAME $LB_HOSTNAME"echo "gateway.$TF_VAR_app_domain CNAME $LB_HOSTNAME"Deploy Agent Pool
Section titled “Deploy Agent Pool”1. Uncomment the spacelift_flows_agent_pool module
Section titled “1. Uncomment the spacelift_flows_agent_pool module”Now that all backend services are running, uncomment the spacelift_flows_agent_pool module.
2. Run Tofu Apply
Section titled “2. Run Tofu Apply”tofu applyVerify Deployment
Section titled “Verify Deployment”Configure SES (if using SES)
Section titled “Configure SES (if using SES)”If you enabled Amazon SES for email delivery (enable_ses = true), you must verify your domain in SES and either request production access or verify individual email addresses as identities before emails can be sent.
Access the Web Interface
Section titled “Access the Web Interface”Navigate to your domain in a web browser:
https://$SPACELIFT_FLOWS_DOMAINYou should see the Spacelift Flows login page.
2. Login
Section titled “2. Login”On first access, you’ll be prompted to provide your email to log in. If you have correctly configured your SMTP credentials, you should receive an email with a login link. Alternatively, if you have enabled email developer mode sending, you will find the login link in the logs.
3. Test Agent Pool
Section titled “3. Test Agent Pool”- Log into the web interface
- Navigate to the sample flows in your project and verify that they work
4. Health Checks
Section titled “4. Health Checks”Verify all services are healthy:
kubectl get pods -n $TF_VAR_k8s_namespaceAll pods should show STATUS: Running and READY: 1/1.
Configuration Options
Section titled “Configuration Options”Scaling the Deployment
Section titled “Scaling the Deployment”EKS Node Scaling
Section titled “EKS Node Scaling”EKS Auto Mode handles node scaling automatically based on pod resource requests.
Resource Allocation
Section titled “Resource Allocation”Update resource requests and limits in values files:
# For application servicesapplicationServices: resources: requests: memory: "512Mi" cpu: "500m" limits: memory: "1Gi" cpu: "1000m"Troubleshooting
Section titled “Troubleshooting”Common Issues
Section titled “Common Issues”Pods Stuck in Pending
Section titled “Pods Stuck in Pending”Check node capacity and resource requests:
kubectl describe nodeskubectl get pods -n $TF_VAR_k8s_namespace -o wideEKS Auto Mode should automatically provision nodes, but may take 5-10 minutes.
Agent Connection Problems
Section titled “Agent Connection Problems”Check agent logs in Cloudwatch Logs.
Logs and Debugging
Section titled “Logs and Debugging”Application Logs
Section titled “Application Logs”# Server logskubectl logs -n $TF_VAR_k8s_namespace -l app.kubernetes.io/component=server -f
# Worker logskubectl logs -n $TF_VAR_k8s_namespace -l app.kubernetes.io/component=worker -f
# Gateway logskubectl logs -n $TF_VAR_k8s_namespace -l app.kubernetes.io/component=gateway -fInfrastructure Logs
Section titled “Infrastructure Logs”# Check EKS cluster statusaws eks describe-cluster --name $EKS_CLUSTER_NAME --region $TF_VAR_aws_regionMaintenance
Section titled “Maintenance”Updates
Section titled “Updates”Application Updates
Section titled “Application Updates”Update image tags in Helm values and upgrade:
helm upgrade spacelift-flows spacelift/spacelift-flows --namespace $TF_VAR_k8s_namespace --values flows-values.yamlInfrastructure Updates
Section titled “Infrastructure Updates”Update OpenTofu configuration and apply:
tofu plantofu applyRe-Apply Configuration Secret
Section titled “Re-Apply Configuration Secret”Update the main configuration secret:
tofu output -raw config_secret_manifest | kubectl apply -f -Re-Apply Ingress Configuration
Section titled “Re-Apply Ingress Configuration”Update an AWS ingress class:
tofu output -raw ingress_manifest | kubectl apply -f -Force Restart Application Pods
Section titled “Force Restart Application Pods”Restart application pods without changing the deployment configuration. This is useful when you need to pick up configuration changes from Secrets, or force a fresh start of the application:
# Restart server podskubectl rollout restart -n spacelift-flows deployment spacelift-flows-server
# Restart worker podskubectl rollout restart -n spacelift-flows deployment spacelift-flows-worker
# Restart gateway podskubectl rollout restart -n spacelift-flows deployment spacelift-flows-gateway
# Monitor the server rollout statuskubectl rollout status -n spacelift-flows deployment spacelift-flows-serverCleanup
Section titled “Cleanup”This will permanently destroy all data. Ensure you have backups if needed.
Remove Application Services
Section titled “Remove Application Services”helm uninstall spacelift-flows --namespace $TF_VAR_k8s_namespacekubectl delete namespace $TF_VAR_k8s_namespaceDestroy Infrastructure
Section titled “Destroy Infrastructure”Ensure that you have disabled RDS delete protection and S3 bucket retain on destroy:
s3_retain_on_destroy = falserds_delete_protection_enabled = falsetofu destroyManual Cleanup
Section titled “Manual Cleanup”Some resources may require manual cleanup:
- ACM certificates (if DNS validation records weren’t removed)
- Route53 DNS records created manually
Next Steps
Section titled “Next Steps”Configure traces with OpenTelemetry operator.