Skip to content

Shardlyn Threat Model

Overview

This document describes the security architecture, potential threats, and mitigations in the Shardlyn platform. It follows the STRIDE methodology for threat classification.

Examples in this document often use game server workloads because they are a common high-risk/Internet-exposed use case, but the same model applies to web apps, databases, and other workloads managed by Shardlyn.

Related

System Boundaries

┌─────────────────────────────────────────────────────────────────────────────┐
│                              TRUST BOUNDARY 1                               │
│                           (Public Internet)                                 │
│                                                                             │
│   ┌─────────────┐          ┌─────────────┐          ┌─────────────┐        │
│   │   Admin     │          │   Player    │          │  Attacker   │        │
│   │   Browser   │          │   Client    │          │             │        │
│   └──────┬──────┘          └──────┬──────┘          └──────┬──────┘        │
└──────────┼───────────────────────┼───────────────────────┼─────────────────┘
           │ HTTPS                 │ Game Protocol         │ Various
           │                       │                       │
┌──────────┼───────────────────────┼───────────────────────┼─────────────────┐
│          ▼                       │                       ▼                 │
│   ┌─────────────┐                │              ┌─────────────┐            │
│   │   Web UI    │                │              │  Firewall   │            │
│   │   (React)   │                │              │  (iptables) │            │
│   └──────┬──────┘                │              └─────────────┘            │
│          │                       │                                         │
│          │ REST API              │                                         │
│          ▼                       │                                         │
│   ┌─────────────────────────────────────────────────────────────────┐      │
│   │                    TRUST BOUNDARY 2                              │      │
│   │                  (Control Plane DMZ)                            │      │
│   │                                                                  │      │
│   │   ┌─────────────────────────────────────────────────────┐       │      │
│   │   │              Control Plane                           │       │      │
│   │   │   ┌─────────┐  ┌─────────┐  ┌─────────┐            │       │      │
│   │   │   │   API   │  │  Auth   │  │Reconcile│            │       │      │
│   │   │   │ Handler │  │  (JWT)  │  │  Loop   │            │       │      │
│   │   │   └─────────┘  └─────────┘  └─────────┘            │       │      │
│   │   └─────────────────────┬───────────────────────────────┘       │      │
│   │                         │                                        │      │
│   └─────────────────────────┼────────────────────────────────────────┘      │
│                             │                                               │
│   ┌─────────────────────────┼────────────────────────────────────────┐      │
│   │        TRUST BOUNDARY 3 │ (Database Zone)                        │      │
│   │                         ▼                                        │      │
│   │              ┌─────────────────────┐                            │      │
│   │              │     PostgreSQL      │                            │      │
│   │              │   (credentials,     │                            │      │
│   │              │    state, specs)    │                            │      │
│   │              └─────────────────────┘                            │      │
│   └──────────────────────────────────────────────────────────────────┘      │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────┐      │
│   │                    TRUST BOUNDARY 4                              │      │
│   │                   (Agent Network)                               │      │
│   │                                                                  │      │
│   │   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐      │      │
│   │   │   Agent 1   │     │   Agent 2   │     │   Agent N   │      │      │
│   │   │   ┌─────┐   │     │   ┌─────┐   │     │   ┌─────┐   │      │      │
│   │   │   │Docker│◄─┼─────┼───│Game │───┼─────┼──►│     │   │      │      │
│   │   │   │     ├───┼─────┼───►Port │◄──┼─────┼───┤     │   │      │      │
│   │   │   └─────┘   │     │   └─────┘   │     │   └─────┘   │      │      │
│   │   └──────┬──────┘     └──────┬──────┘     └──────┬──────┘      │      │
│   │          │                   │                   │              │      │
│   └──────────┼───────────────────┼───────────────────┼──────────────┘      │
│              │ Heartbeat         │                   │ Player               │
│              │ (HTTPS)           │                   │ Connections          │
│              ▼                   │                   │                      │
│       Control Plane ◄────────────┘                   │                      │
│                                                      │                      │
└──────────────────────────────────────────────────────┼──────────────────────┘

                                               Game Traffic


                                               Players (Internet)

Assets

Critical Assets

AssetDescriptionConfidentialityIntegrityAvailability
User CredentialsPasswords, API tokensHIGHHIGHMEDIUM
JWT Signing KeySigns all auth tokensHIGHHIGHHIGH
Agent Auth TokensAuthenticate agentsHIGHHIGHHIGH
DatabaseAll platform stateHIGHHIGHHIGH
Workload DataGame worlds, app uploads, databases, configsMEDIUMHIGHHIGH
TF StateCloud credentialsHIGHHIGHMEDIUM

Secondary Assets

AssetDescriptionConfidentialityIntegrityAvailability
Workload SpecsContainer definitionsLOWMEDIUMMEDIUM
Metrics DataMonitoring infoLOWLOWLOW
Audit LogsAction historyMEDIUMHIGHMEDIUM

Threat Analysis (STRIDE)

Spoofing

T1: User Authentication Bypass

Threat: Attacker impersonates legitimate user without credentials.

Vectors:

  • Brute force password guessing
  • Credential stuffing from leaked databases
  • Session hijacking

Mitigations:

  • [x] Bcrypt password hashing (cost=10)
  • [x] JWT with expiration (1 hour)
  • [x] Rate limiting on login (configurable)
  • [x] Account lockout after failed attempts (configurable)
  • [ ] Multi-factor authentication (future)

Residual Risk: MEDIUM


T2: Agent Impersonation

Threat: Attacker registers rogue agent to receive workload specs or report false states.

Vectors:

  • Stolen registration token
  • Man-in-the-middle during registration
  • Leaked agent auth token

Mitigations:

  • [x] One-time registration tokens (can only be used once)
  • [x] Token expiration (default 24h)
  • [x] Unique agent auth token per agent
  • [x] TLS required for agent communication (configurable)
  • [ ] Agent certificate pinning (future)

Residual Risk: MEDIUM


Tampering

T3: Workload Spec Injection

Threat: Attacker modifies workload spec to run malicious containers.

Vectors:

  • SQL injection in spec storage
  • Malicious spec via API
  • Spec modification in transit

Mitigations:

  • [x] Parameterized SQL queries (pgx)
  • [x] JSON Schema validation of specs
  • [x] RBAC on workload creation
  • [ ] Image allowlist (future)
  • [ ] Spec signing (future)

Residual Risk: MEDIUM


T4: Container Escape

Threat: Malicious container escapes isolation to compromise host.

Vectors:

  • Kernel exploits
  • Docker socket access
  • Privileged containers
  • Host path mounts

Mitigations:

  • [x] No privileged containers by default
  • [x] No host network mode
  • [x] Volume mounts restricted to data directory
  • [ ] Seccomp profiles (future)
  • [ ] AppArmor/SELinux (future)
  • [ ] User namespaces (future)

Residual Risk: MEDIUM-HIGH (Docker inherent risk)


T5: Database Tampering

Threat: Attacker modifies database directly to escalate privileges or corrupt state.

Vectors:

  • SQL injection
  • Direct database access
  • Backup restoration of old data

Mitigations:

  • [x] Parameterized queries throughout
  • [x] Database network isolation (docker network)
  • [x] Strong database password
  • [ ] Database encryption at rest (production)
  • [ ] Regular integrity checks (future)

Residual Risk: LOW


Repudiation

T6: Action Denial

Threat: User denies performing destructive action (e.g., deleting instances).

Vectors:

  • Shared accounts
  • Session hijacking
  • Insider threats

Mitigations:

  • [x] Audit logging with user ID, timestamp, action
  • [x] Correlation IDs for request tracing
  • [ ] Immutable audit log storage (future)
  • [ ] Audit log integrity verification (future)

Residual Risk: LOW


Information Disclosure

T7: Credential Exposure

Threat: Sensitive credentials leaked through logs, APIs, or storage.

Vectors:

  • Credentials in error messages
  • Debug logging of secrets
  • Insecure storage

Mitigations:

  • [x] Password hashes excluded from API responses (json:"-")
  • [x] TF state excluded from API responses
  • [x] Structured logging (no credential interpolation)
  • [x] Secrets stored as hashes where possible
  • [ ] Secret scanning in CI (future)

Residual Risk: LOW


T8: Workload Spec Leakage

Threat: Workload specs with environment variables (containing secrets) exposed.

Vectors:

  • Unauthorized API access
  • Log exposure
  • Database dump

Mitigations:

  • [x] RBAC on workload access
  • [ ] Environment variable encryption at rest (future)
  • [ ] Secret management integration (Vault) (future)

Residual Risk: MEDIUM


T9: Network Sniffing

Threat: Attacker captures sensitive data from network traffic.

Vectors:

  • Unencrypted control plane traffic
  • Unencrypted agent heartbeats
  • Man-in-the-middle attacks

Mitigations:

  • [x] TLS enforcement in production (configurable)
  • [ ] Certificate validation (TODO)
  • [x] Sensitive data not sent in query parameters

Residual Risk: HIGH (without TLS)


Denial of Service

T10: API Exhaustion

Threat: Attacker overwhelms control plane with requests.

Vectors:

  • Login brute force
  • Instance creation spam
  • WebSocket connection flood

Mitigations:

  • [x] Rate limiting per IP/user (configurable)
  • [ ] Request size limits
  • [x] Database connection pooling (prevents exhaustion)
  • [x] WebSocket connection limits (configurable)

Residual Risk: MEDIUM


T11: Resource Exhaustion

Threat: Malicious workload consumes all node resources.

Vectors:

  • CPU bomb
  • Memory exhaustion
  • Disk fill

Mitigations:

  • [x] Resource limits enforced (CPU, memory)
  • [x] Volume size limits in spec
  • [ ] Per-user resource quotas (future)
  • [ ] Automatic workload eviction (future)

Residual Risk: MEDIUM


T12: Agent Starvation

Threat: Control plane unavailable, agents cannot receive desired state.

Vectors:

  • Control plane crash
  • Network partition
  • Database failure

Mitigations:

  • [x] Agents continue running existing containers
  • [x] Idempotent operations (safe retries)
  • [ ] Control plane HA (future)
  • [ ] Agent local caching of last known state (future)

Residual Risk: MEDIUM


Elevation of Privilege

T13: RBAC Bypass

Threat: User gains admin privileges without authorization.

Vectors:

  • JWT manipulation
  • Role escalation bugs
  • Horizontal privilege escalation

Mitigations:

  • [x] Role stored in JWT, validated server-side
  • [x] RBAC checks on all admin endpoints
  • [x] User can only view own profile (unless admin)
  • [ ] Regular RBAC audit (operational)

Residual Risk: LOW


T14: Agent Privilege Escalation

Threat: Compromised agent gains control plane access.

Vectors:

  • Agent credential reuse
  • Control plane API exposure to agents

Mitigations:

  • [x] Agents have separate auth mechanism (X-Agent-Token)
  • [x] Agents cannot access user management APIs
  • [x] Agent tokens scoped to specific agent ID
  • [ ] Network segmentation (agents in separate VLAN)

Residual Risk: MEDIUM


Security Controls Summary

Implemented

ControlDescriptionThreats Mitigated
Password Hashingbcrypt cost=10T1
JWT AuthenticationSigned, expiring tokens (15m access)T1, T13
One-Time TokensAgent registration tokensT2
Parameterized QueriesSQL injection preventionT3, T5
JSON Schema ValidationSpec format enforcementT3
RBACRole-based access controlT3, T7, T13
Audit LoggingAction trackingT6
Field ExclusionSecrets excluded from JSONT7
Resource LimitsContainer CPU/memory capsT11
Connection PoolingDB connection managementT10
Rate LimitingToken bucket per IP/userT1, T10
Account LockoutLogin lockout after failed attemptsT1
TLS EnforcementConfigurable HTTPS requirementT2, T9
WebSocket Connection LimitsCap concurrent WS sessionsT10

Planned (TODO)

ControlPriorityThreats Mitigated
Image AllowlistMEDIUMT3
Secret ManagementMEDIUMT8
Seccomp ProfilesLOWT4
Control Plane HALOWT12

Not Planned (Accepted Risk)

ControlReason
Container SigningComplexity vs. risk for MVP
Network PoliciesKubernetes-only feature
Hardware Security ModulesCost prohibitive for target users

Attack Scenarios

Scenario 1: Malicious Admin

Attacker: Insider with admin access Goal: Exfiltrate workload data or disrupt service

Attack Path:

  1. Create malicious workload with data exfiltration script
  2. Deploy to node with valuable workload data
  3. Container mounts data volume, exfiltrates via network

Mitigations:

  • Audit logging tracks who created workload
  • Volume mounts restricted to shardlyn data directory
  • Network monitoring can detect unusual egress

Scenario 2: Compromised Agent

Attacker: External with agent node access Goal: Pivot to control plane or other agents

Attack Path:

  1. Compromise agent node (e.g., via vulnerable public workload such as a game server)
  2. Extract agent auth token from filesystem
  3. Attempt to use token for broader access

Mitigations:

  • Agent token scoped to single agent
  • Cannot access admin APIs
  • Control plane validates agent ID matches token

Scenario 3: Credential Stuffing

Attacker: External with leaked credential database Goal: Gain user/admin access

Attack Path:

  1. Obtain leaked email/password combinations
  2. Automated login attempts against Shardlyn
  3. Successful login with reused credentials

Mitigations (TODO):

  • Rate limiting on login endpoint
  • Account lockout after failures
  • Breach detection notifications

Security Recommendations

For Operators

  1. Enable TLS for all production deployments
  2. Rotate secrets (JWT key, database password) regularly
  3. Monitor audit logs for suspicious activity
  4. Keep components updated for security patches
  5. Network segmentation between control plane and agents
  6. Backup encryption for database and TF state

For Users

  1. Use strong, unique passwords
  2. Don't share accounts
  3. Review workload specs before deployment
  4. Monitor resource usage for anomalies
  5. Report suspicious activity to admins

Incident Response

Detection

  • Monitor shardlyn_http_requests_total{status=~"4.."} for auth failures
  • Alert on shardlyn_instances_by_state{state="error"} spikes
  • Review audit logs for unusual patterns

Containment

  1. Revoke compromised user/agent tokens
  2. Isolate affected nodes (network/firewall)
  3. Stop suspicious instances

Recovery

  1. Rotate all secrets (JWT key, passwords)
  2. Regenerate agent tokens
  3. Restore from known-good backup if needed
  4. Review audit logs for full scope

Post-Incident

  1. Document timeline and actions
  2. Update threat model with new vectors
  3. Implement additional controls as needed

Built for teams that want control of their own infrastructure.