Skip to content

Shardlyn Architecture

Overview

Shardlyn is a cloud-native, BYO-cloud control plane for deploying and managing containerized workloads across multiple cloud providers. It follows a declarative, pull-based architecture where lightweight agents on your servers report state and receive desired state from the Shardlyn control plane. Deploy game servers, web applications, databases, and more with a unified management experience.

System Architecture

Components

Control Plane (Managed)

The control plane is fully managed by Shardlyn and handles:

  • API Server: REST API for the dashboard and external integrations
  • Authentication: JWT-based auth with MFA support and GitHub OAuth
  • Authorization: RBAC with organization-level isolation
  • Reconciler: Computes desired state and distributes to agents
  • Provisioner: Provisions cloud infrastructure via Terraform (AWS, GCP, Hetzner, OCI)
  • Planner: Resource sizing and bin-packing algorithms
  • WebSocket Hub: Real-time log streaming and interactive console
  • Billing: Subscription management via Stripe
  • Alerting: Rule-based alerts with email notifications
  • DNS Management: Cloudflare DNS integration for custom domains
  • SSH CA: Certificate authority for secure server access
  • Backup Manager: Scheduled backups to S3-compatible storage

Agent (Runs on Your Nodes)

The agent is a lightweight Go binary that runs on each of your servers:

  • Registers with the control plane using a one-time bootstrap token
  • Reports heartbeat with resource usage and container states
  • Receives desired state from the control plane
  • Applies changes via Docker API (create, start, stop, remove)
  • Syncs Git repositories to container volumes (for Git Deploy)
  • Exposes Prometheus metrics for observability

Data Flow

State Machines

Instance States

Node States

Security Model

Authentication

Shardlyn supports multiple authentication methods:

  • Email/Password: With bcrypt hashing and optional MFA (TOTP)
  • GitHub OAuth: Sign in with your GitHub account
  • API Tokens: For programmatic access and CI/CD integrations
  • SSH Certificates: Signed by Shardlyn's CA for secure server access

Agent Authentication

1. You create a node registration token in the dashboard
2. The agent is installed on your server with the token
3. Agent calls the control plane with the token (one-time use)
4. Control plane returns a persistent auth token
5. Agent uses the auth token for all subsequent communication

RBAC Model

ResourceAdminUser
UsersCRUDR (self)
NodesCRUDR
WorkloadsCRUDCRUD
InstancesCRUDCRUD
ProvisioningCRUDR
OrganizationsCRUDR
BillingCRUDR
Audit LogsR-

Key Design Decisions

Pull-Based Communication

Agents poll the control plane (heartbeat) rather than control plane pushing to agents.

Why this matters:

  • Simpler networking — no inbound ports needed on your servers
  • Works behind NAT and firewalls
  • Agents can be offline without affecting other nodes
  • Natural rate limiting

Stateless Control Plane

All state lives in PostgreSQL. The control plane can be horizontally scaled.

Idempotent Operations

All agent operations are idempotent. Creating an already-running container is a no-op. This enables safe retries, crash recovery, and multiple reconciliation loops.

Declarative Configuration

Users declare desired state (workload spec), and the system converges to it. No imperative "start this container" commands.

Performance

SettingDefaultNotes
Heartbeat interval10sLower = faster updates, more load
PostgreSQL pool10 connectionsConfigurable per deployment
Prometheus scrape15sConfigurable

Built for teams that want control of their own infrastructure.