Deploying to Amazon EKS

Complete guide to deploying self-hosted Spacelift Flows on Amazon EKS using OpenTofu and Helm.

This guide provides a way to quickly get Spacelift Flows up and running on an Elastic Kubernetes Service (EKS) cluster. The infrastructure is deployed using OpenTofu, and the application services are deployed using Helm charts.

Currently, agents have to run outside the Kubernetes cluster, e.g., on an EC2 Auto Scaling Group. We will soon provide a fully Kubernetes-native solution.

Overview

This deployment creates a complete Spacelift Flows instance with the following components:

EKS Auto Mode cluster for container orchestration
RDS Aurora PostgreSQL for the database
S3 bucket for object storage
KMS encryption for data at rest
ACM certificates for SSL/TLS
Agent pool deployed via Terraform module

The following services will be deployed as Kubernetes pods using the helm charts:

The server.
The worker.
The gateway.

The server hosts the Spacelift Flows HTTP API and serves the embedded frontend assets. The server is exposed to the outside world through an Application Load Balancer for HTTP traffic, including the OAuth and MCP endpoints required for external integrations.

The worker is the component that handles recurring tasks and asynchronous jobs.

The gateway is a service that hosts the WebSocket server for agents and routes JavaScript evaluations to the right agents/runtimes.

The agent pool is deployed as an EC2 ASG and consists of an agent service. Agent services handle requests from the gateway and distribute execution commands.

Requirements

Before starting, ensure you have:

AWS Prerequisites

AWS CLI configured with appropriate permissions
Access to an AWS account with the following service limits:
- EKS clusters: At least 1 available
- RDS Aurora clusters: At least 1 available
- VPC: At least 1 available (or use existing)
- NAT Gateways: At least 1 available per AZ
- Elastic IPs: At least 1 available per NAT Gateway

Tools Required

OpenTofu >= 1.6.0 (or Terraform >= 1.5.0)
kubectl for Kubernetes management
Helm >= 3.0 for application deployment
AWS CLI >= 2.0

Domain Requirements

A registered domain name with DNS management access
Ability to create DNS records for certificate validation

Optional Requirements

SMTP server for email notifications (recommended). You can enable Amazon SES by setting the Terraform variable enable_ses = true.
Anthropic API key for AI features (recommended)
OpenTelemetry collector endpoint for observability

Deploy Infrastructure

The infrastructure deployment uses a modular approach with OpenTofu to provision all necessary AWS resources.

1. Prepare the Environment

First, ensure your AWS CLI is configured with the appropriate credentials and region:

# Verify AWS CLI configuration
aws sts get-caller-identity

# Set your preferred region and export as environment variable
export TF_VAR_aws_region=us-west-2

2. Create Working Directory

Create a new directory for your infrastructure deployment:

mkdir spacelift-flows-infra
cd spacelift-flows-infra

3. Create Infrastructure Configuration

Create a main.tf file that references the Spacelift Flows infrastructure module:

terraform {
  required_version = ">= 1.5.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 6.0"
    }
    random = {
      source  = "hashicorp/random"
      version = "~> 3.4"
    }
  }
}

provider "aws" {
  region = var.aws_region
}

module "spacelift_flows" {
  source = "github.com/spacelift-io/terraform-aws-eks-spacelift-flows-selfhosted?ref=v0.2.0"

  # Required variables
  app_domain        = var.app_domain
  organization_name = var.organization_name
  admin_email       = var.admin_email
  aws_region        = var.aws_region
  license_token     = var.license_token
  anthropic_api_key = var.anthropic_api_key # optional
  expose_gateway    = true

  # Email configuration (choose one of the following options):

  # Option 1: Dev mode - emails logged to server logs (testing only)
  # Useful for initial setup, but not suitable for production
  # email_dev_enabled = true

  # Option 2: Custom SMTP server
  # smtp_host         = "smtp.example.com"
  # smtp_port         = 587
  # smtp_username     = ""
  # smtp_password     = ""
  # smtp_from_address = "noreply@yourcompany.com"

  # Option 3: Amazon SES (recommended for AWS deployments)
  # Requires domain verification and production access request (see Verify Deployment section)
  # enable_ses = true

  # Optional variables
  k8s_namespace     = var.k8s_namespace
}

# Uncomment to deploy the agent pool when the spacelift flows backend services are deployed.
# module "spacelift_flows_agent_pool" {
#   source             = "github.com/spacelift-io/terraform-aws-spacelift-flows-agentpool-ec2?ref=v0.4.0"
#   agent_pool_id      = module.spacelift_flows.agent_pool_id
#   agent_pool_token   = module.spacelift_flows.agent_pool_token
#   backend_endpoint   = "https://${var.app_domain}"
#   gateway_endpoint   = "https://gateway.${var.app_domain}"
#   agent_image_tag    = var.spacelift_flows_image_tag
#
#   reuse_vpc_id         = module.spacelift_flows.vpc_id
#   reuse_vpc_subnet_ids = module.spacelift_flows.vpc_private_subnet_ids
#   aws_region           = var.aws_region
#   min_size             = 1
#   desired_capacity     = 5
#   max_size             = 10
# }

See more examples with different configurations in the GitHub repository.

If you want to reuse Spacelift Self-Hosted RDS, create a Postgres database:

CREATE DATABASE flows;

We also suggest creating a separate user.

And then you should build a connection URL like this:

database_connection_url = format("postgres://%s:%s@%s:5432/flows", module.spacelift.rds_username, urlencode(module.spacelift.rds_password), module.spacelift.rds_cluster_endpoint)

4. Create Variables File

Create a variables.tf file with variable definitions:

# Required variables
variable "app_domain" {
  description = "The domain name for the Spacelift Flows instance"
  type        = string
}

variable "organization_name" {
  description = "Name of the organization"
  type        = string
}

variable "admin_email" {
  description = "Email address for the admin user"
  type        = string
}

variable "aws_region" {
  description = "AWS region for deployment"
  type        = string
}

variable "license_token" {
  description = "The JWT token for using Spacelift flows. Only required for generating the kubernetes_secrets output."
  type        = string
  sensitive   = true
}

variable "k8s_namespace" {
  type = string
}

variable "spacelift_flows_image_tag" {
  type = string
}

variable "anthropic_api_key" {
  description = "Anthropic API key for AI features"
  type        = string
  default     = ""
}

5. Create Outputs File

Create an outputs.tf file to expose important values:

output "config_secret_manifest" {
  description = "Outputs manifests that are needed to configure a secret for the Flows app."
  value     = module.spacelift_flows.config_secret_manifest
  sensitive = true
}

output "ingress_manifest" {
  description = "Outputs manifests that are needed to configure AWS ingress."
  value = module.spacelift_flows.ingress_manifest
}

# Uncomment if deploying in a private VPC
# output "internal_ingress_manifest" {
#   description = "Outputs internal manifests that are needed to configure internal AWS ingress."
#   value = module.spacelift_flows.internal_ingress_manifest
# }


output "shell" {
  value     = module.spacelift_flows.shell
}

6. Set Environment Variables

Configure your deployment using environment variables. Start with the minimum required variables:

Choose a domain you control and can create DNS records for. Subdomains work well (e.g., flows.yourcompany.com).

# Required configuration
export TF_VAR_app_domain="flows.yourcompany.com"
export TF_VAR_organization_name="Your Organization"
export TF_VAR_admin_email="admin@yourcompany.com"
export TF_VAR_license_token=""
export TF_VAR_k8s_namespace="spacelift-flows"
export TF_VAR_spacelift_flows_image_tag="0.5.0"
# Optional configuration
export TF_VAR_anthropic_api_key="" # optional

Flows has an incredibly powerful and helpful AI assistant, which can help you quickly build and debug Flows. The AI assistant requires an API key from Anthropic to work, and you can get it by signing up at https://console.anthropic.com/ .

While Flows will work without it, we strongly advise not skipping this.

7. Initialize OpenTofu

Initialize the working directory and download required providers:

tofu init

8. Review the Deployment Plan

Generate and review the execution plan:

tofu plan

9. Deploy the Infrastructure

Apply the configuration to create the infrastructure:

tofu apply

When prompted, type yes to confirm the deployment.

10. Verify Infrastructure Deployment

Once applied, you should grab all variables that need to be exported in the shell that will be used in next steps. We expose a shell output in tofu that you can source directly for convenience.

# Source in your shell all the required env vars to continue the installation process
$(tofu output -raw shell)

11. Configure kubectl Access

Configure kubectl to access your new EKS cluster:

# Update kubeconfig
aws eks update-kubeconfig --region $TF_VAR_aws_region --name $EKS_CLUSTER_NAME

# Create a namespace
kubectl create namespace $TF_VAR_k8s_namespace

12. Validate the certificates

You can skip this step, if you provided already issued cert as a cert_arn variable.

After the infrastructure is deployed, you should validate that the ACM certificates are properly issued and configured:

# Get the certificate ARN from OpenTofu outputs
CERT_ARN=$(aws acm list-certificates --region $TF_VAR_aws_region --query "CertificateSummaryList[?DomainName=='$TF_VAR_app_domain'].CertificateArn" --output text)

# Check certificate status
aws acm describe-certificate --certificate-arn $CERT_ARN --region $TF_VAR_aws_region --query "Certificate.Status" --output text

The certificate status should show ISSUED. If it shows PENDING_VALIDATION, you need to validate the certificate by creating the required DNS records:

# Get DNS validation records
aws acm describe-certificate --certificate-arn $CERT_ARN --region $TF_VAR_aws_region --query "Certificate.DomainValidationOptions[*].ResourceRecord" --output table

Create the CNAME records shown in the output in your DNS provider. The certificate will automatically be issued once DNS validation is complete (usually within a few minutes).

You can monitor the validation status:

# Check validation status
aws acm describe-certificate --certificate-arn $CERT_ARN --region $TF_VAR_aws_region --query "Certificate.DomainValidationOptions[*].ValidationStatus" --output text

Once all domains show SUCCESS, the certificate is ready to use.

Infrastructure Deployment Complete

At this point, you have successfully deployed:

✅ VPC with public and private subnets
✅ EKS Auto Mode cluster ready for workloads
✅ RDS Aurora PostgreSQL database cluster
✅ S3 bucket for object storage with encryption
✅ KMS key for encryption at rest
✅ IAM roles and policies for service access

The next step is to deploy the application services using the generated Kubernetes manifests.

Deploy Application Services

1. Apply Configuration Secret

Create the main configuration secret:

tofu output -raw config_secret_manifest | kubectl apply -f -

2. Apply Ingress Configuration

Create an AWS ingress class:

tofu output -raw ingress_manifest | kubectl apply -f -

For private VPC deployments: Use the internal ingress class instead:

tofu output -raw internal_ingress_manifest | kubectl apply -f -

3. Deploy Core Services

Create helm values file for the spacelift flows installation:

cat > flows-values.yaml <<EOF
appDomain: $TF_VAR_app_domain

global:
  image:
    tag: $TF_VAR_spacelift_flows_image_tag

ingress:
  className: "spacelift-flows"
  exposeGateway: true
  annotations: {
    alb.ingress.kubernetes.io/healthcheck-port: "8080",
    alb.ingress.kubernetes.io/healthcheck-path: "/health",
    alb.ingress.kubernetes.io/scheme: internet-facing # Change to "internal" for private VPC deployments
  }
EOF

Install the main Spacelift Flows services using the module reference:

helm repo add spacelift https://downloads.spacelift.io/helm
helm repo update

helm upgrade spacelift-flows spacelift/spacelift-flows --install -f flows-values.yaml --namespace $TF_VAR_k8s_namespace

Monitor deployment progress:

kubectl get pods -n $TF_VAR_k8s_namespace --watch

Wait for the load balancer to be provisioned:

kubectl get ingress -n $TF_VAR_k8s_namespace --watch

4. Update DNS Records

Once the ingress has an external IP/hostname, create DNS records:

# Get the load balancer hostname
LB_HOSTNAME=$(kubectl get ingress spacelift-flows -n $TF_VAR_k8s_namespace -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

echo "Create the following DNS records:"
echo "$TF_VAR_app_domain CNAME $LB_HOSTNAME"
echo "*.endpoints.$TF_VAR_app_domain CNAME $LB_HOSTNAME"
echo "oauth.$TF_VAR_app_domain CNAME $LB_HOSTNAME"
echo "mcp.$TF_VAR_app_domain CNAME $LB_HOSTNAME"
echo "gateway.$TF_VAR_app_domain CNAME $LB_HOSTNAME"

Deploy Agent Pool

1. Uncomment the spacelift_flows_agent_pool module

Now that all backend services are running, uncomment the spacelift_flows_agent_pool module.

2. Run Tofu Apply

tofu apply

Verify Deployment

Configure SES (if using SES)

If you enabled Amazon SES for email delivery (enable_ses = true), you must verify your domain in SES and either request production access or verify individual email addresses as identities before emails can be sent.

Access the Web Interface

Navigate to your domain in a web browser:

https://$SPACELIFT_FLOWS_DOMAIN

You should see the Spacelift Flows login page.

On first access, you’ll be prompted to provide your email to log in. If you have correctly configured your SMTP credentials, you should receive an email with a login link. Alternatively, if you have enabled email developer mode sending, you will find the login link in the logs.

3. Test Agent Pool

Log into the web interface
Navigate to the sample flows in your project and verify that they work

4. Health Checks

Verify all services are healthy:

kubectl get pods -n $TF_VAR_k8s_namespace

All pods should show STATUS: Running and READY: 1/1.

Configuration Options

Scaling the Deployment

EKS Node Scaling

EKS Auto Mode handles node scaling automatically based on pod resource requests.

Resource Allocation

Update resource requests and limits in values files:

# For application services
applicationServices:
  resources:
    requests:
      memory: "512Mi"
      cpu: "500m"
    limits:
      memory: "1Gi"
      cpu: "1000m"

Troubleshooting

Common Issues

Pods Stuck in Pending

Check node capacity and resource requests:

kubectl describe nodes
kubectl get pods -n $TF_VAR_k8s_namespace -o wide

EKS Auto Mode should automatically provision nodes, but may take 5-10 minutes.

Agent Connection Problems

Check agent logs in Cloudwatch Logs.

Logs and Debugging

Application Logs

# Server logs
kubectl logs -n $TF_VAR_k8s_namespace -l app.kubernetes.io/component=server -f

# Worker logs
kubectl logs -n $TF_VAR_k8s_namespace -l app.kubernetes.io/component=worker -f

# Gateway logs
kubectl logs -n $TF_VAR_k8s_namespace -l app.kubernetes.io/component=gateway -f

Infrastructure Logs

# Check EKS cluster status
aws eks describe-cluster --name $EKS_CLUSTER_NAME --region $TF_VAR_aws_region

Maintenance

Updates

Application Updates

Update image tags in Helm values and upgrade:

helm upgrade spacelift-flows spacelift/spacelift-flows --namespace $TF_VAR_k8s_namespace --values flows-values.yaml

Infrastructure Updates

Update OpenTofu configuration and apply:

tofu plan
tofu apply

Re-Apply Configuration Secret

Update the main configuration secret:

tofu output -raw config_secret_manifest | kubectl apply -f -

Re-Apply Ingress Configuration

Update the AWS ingress class:

tofu output -raw ingress_manifest | kubectl apply -f -

Or for private VPC deployments:

tofu output -raw internal_ingress_manifest | kubectl apply -f -

Force Restart Application Pods

Restart application pods without changing the deployment configuration. This is useful when you need to pick up configuration changes from Secrets, or force a fresh start of the application:

# Restart server pods
kubectl rollout restart -n spacelift-flows deployment spacelift-flows-server

# Restart worker pods
kubectl rollout restart -n spacelift-flows deployment spacelift-flows-worker

# Restart gateway pods
kubectl rollout restart -n spacelift-flows deployment spacelift-flows-gateway

# Monitor the server rollout status
kubectl rollout status -n spacelift-flows deployment spacelift-flows-server

Cleanup

This will permanently destroy all data. Ensure you have backups if needed.

Remove Application Services

helm uninstall spacelift-flows --namespace $TF_VAR_k8s_namespace
kubectl delete namespace $TF_VAR_k8s_namespace

Destroy Infrastructure

Ensure that you have disabled RDS delete protection and S3 bucket retain on destroy:

s3_retain_on_destroy = false
rds_delete_protection_enabled = false

tofu destroy

Manual Cleanup

Some resources may require manual cleanup:

ACM certificates (if DNS validation records weren’t removed)
Route53 DNS records created manually

Advanced Settings

For advanced configuration options including private VPC deployment, HTTP proxy configuration, and custom CA certificates, see the Advanced Settings guide.

Next Steps

Configure traces with OpenTelemetry operator.