Shardlyn Threat Model

Overview

This document describes the security architecture, potential threats, and mitigations in the Shardlyn platform. It follows the STRIDE methodology for threat classification.

Examples in this document often use game server workloads because they are a common high-risk/Internet-exposed use case, but the same model applies to web apps, databases, and other workloads managed by Shardlyn.

Architecture — System design and component overview
API Reference — Authentication and authorization endpoints
Deployment Guide — Production hardening steps

System Boundaries

┌─────────────────────────────────────────────────────────────────────────────┐
│                              TRUST BOUNDARY 1                               │
│                           (Public Internet)                                 │
│                                                                             │
│   ┌─────────────┐          ┌─────────────┐          ┌─────────────┐        │
│   │   Admin     │          │   Player    │          │  Attacker   │        │
│   │   Browser   │          │   Client    │          │             │        │
│   └──────┬──────┘          └──────┬──────┘          └──────┬──────┘        │
└──────────┼───────────────────────┼───────────────────────┼─────────────────┘
           │ HTTPS                 │ Game Protocol         │ Various
           │                       │                       │
┌──────────┼───────────────────────┼───────────────────────┼─────────────────┐
│          ▼                       │                       ▼                 │
│   ┌─────────────┐                │              ┌─────────────┐            │
│   │   Web UI    │                │              │  Firewall   │            │
│   │   (React)   │                │              │  (iptables) │            │
│   └──────┬──────┘                │              └─────────────┘            │
│          │                       │                                         │
│          │ REST API              │                                         │
│          ▼                       │                                         │
│   ┌─────────────────────────────────────────────────────────────────┐      │
│   │                    TRUST BOUNDARY 2                              │      │
│   │                  (Control Plane DMZ)                            │      │
│   │                                                                  │      │
│   │   ┌─────────────────────────────────────────────────────┐       │      │
│   │   │              Control Plane                           │       │      │
│   │   │   ┌─────────┐  ┌─────────┐  ┌─────────┐            │       │      │
│   │   │   │   API   │  │  Auth   │  │Reconcile│            │       │      │
│   │   │   │ Handler │  │  (JWT)  │  │  Loop   │            │       │      │
│   │   │   └─────────┘  └─────────┘  └─────────┘            │       │      │
│   │   └─────────────────────┬───────────────────────────────┘       │      │
│   │                         │                                        │      │
│   └─────────────────────────┼────────────────────────────────────────┘      │
│                             │                                               │
│   ┌─────────────────────────┼────────────────────────────────────────┐      │
│   │        TRUST BOUNDARY 3 │ (Database Zone)                        │      │
│   │                         ▼                                        │      │
│   │              ┌─────────────────────┐                            │      │
│   │              │     PostgreSQL      │                            │      │
│   │              │   (credentials,     │                            │      │
│   │              │    state, specs)    │                            │      │
│   │              └─────────────────────┘                            │      │
│   └──────────────────────────────────────────────────────────────────┘      │
│                                                                             │
│   ┌─────────────────────────────────────────────────────────────────┐      │
│   │                    TRUST BOUNDARY 4                              │      │
│   │                   (Agent Network)                               │      │
│   │                                                                  │      │
│   │   ┌─────────────┐     ┌─────────────┐     ┌─────────────┐      │      │
│   │   │   Agent 1   │     │   Agent 2   │     │   Agent N   │      │      │
│   │   │   ┌─────┐   │     │   ┌─────┐   │     │   ┌─────┐   │      │      │
│   │   │   │Docker│◄─┼─────┼───│Game │───┼─────┼──►│     │   │      │      │
│   │   │   │     ├───┼─────┼───►Port │◄──┼─────┼───┤     │   │      │      │
│   │   │   └─────┘   │     │   └─────┘   │     │   └─────┘   │      │      │
│   │   └──────┬──────┘     └──────┬──────┘     └──────┬──────┘      │      │
│   │          │                   │                   │              │      │
│   └──────────┼───────────────────┼───────────────────┼──────────────┘      │
│              │ Heartbeat         │                   │ Player               │
│              │ (HTTPS)           │                   │ Connections          │
│              ▼                   │                   │                      │
│       Control Plane ◄────────────┘                   │                      │
│                                                      │                      │
└──────────────────────────────────────────────────────┼──────────────────────┘
                                                       │
                                               Game Traffic
                                                       │
                                                       ▼
                                               Players (Internet)

Assets

Critical Assets

Asset	Description	Confidentiality	Integrity	Availability
User Credentials	Passwords, API tokens	HIGH	HIGH	MEDIUM
JWT Signing Key	Signs all auth tokens	HIGH	HIGH	HIGH
Agent Auth Tokens	Authenticate agents	HIGH	HIGH	HIGH
Database	All platform state	HIGH	HIGH	HIGH
Workload Data	Game worlds, app uploads, databases, configs	MEDIUM	HIGH	HIGH
TF State	Cloud credentials	HIGH	HIGH	MEDIUM

Secondary Assets

Asset	Description	Confidentiality	Integrity	Availability
Workload Specs	Container definitions	LOW	MEDIUM	MEDIUM
Metrics Data	Monitoring info	LOW	LOW	LOW
Audit Logs	Action history	MEDIUM	HIGH	MEDIUM

Threat Analysis (STRIDE)

Spoofing

T1: User Authentication Bypass

Threat: Attacker impersonates legitimate user without credentials.

Vectors:

Brute force password guessing
Credential stuffing from leaked databases
Session hijacking

Mitigations:

[x] Bcrypt password hashing (cost=10)
[x] JWT with expiration (1 hour)
[x] Rate limiting on login (configurable)
[x] Account lockout after failed attempts (configurable)
[ ] Multi-factor authentication (future)

Residual Risk: MEDIUM

T2: Agent Impersonation

Threat: Attacker registers rogue agent to receive workload specs or report false states.

Vectors:

Stolen registration token
Man-in-the-middle during registration
Leaked agent auth token

Mitigations:

[x] One-time registration tokens (can only be used once)
[x] Token expiration (default 24h)
[x] Unique agent auth token per agent
[x] TLS required for agent communication (configurable)
[ ] Agent certificate pinning (future)

Residual Risk: MEDIUM

Tampering

T3: Workload Spec Injection

Threat: Attacker modifies workload spec to run malicious containers.

Vectors:

SQL injection in spec storage
Malicious spec via API
Spec modification in transit

Mitigations:

[x] Parameterized SQL queries (pgx)
[x] JSON Schema validation of specs
[x] RBAC on workload creation
[ ] Image allowlist (future)
[ ] Spec signing (future)

Residual Risk: MEDIUM

T4: Container Escape

Threat: Malicious container escapes isolation to compromise host.

Vectors:

Kernel exploits
Docker socket access
Privileged containers
Host path mounts

Mitigations:

[x] No privileged containers by default
[x] No host network mode
[x] Volume mounts restricted to data directory
[ ] Seccomp profiles (future)
[ ] AppArmor/SELinux (future)
[ ] User namespaces (future)

Residual Risk: MEDIUM-HIGH (Docker inherent risk)

T5: Database Tampering

Threat: Attacker modifies database directly to escalate privileges or corrupt state.

Vectors:

SQL injection
Direct database access
Backup restoration of old data

Mitigations:

[x] Parameterized queries throughout
[x] Database network isolation (docker network)
[x] Strong database password
[ ] Database encryption at rest (production)
[ ] Regular integrity checks (future)

Residual Risk: LOW

Repudiation

T6: Action Denial

Threat: User denies performing destructive action (e.g., deleting instances).

Vectors:

Shared accounts
Session hijacking
Insider threats

Mitigations:

[x] Audit logging with user ID, timestamp, action
[x] Correlation IDs for request tracing
[ ] Immutable audit log storage (future)
[ ] Audit log integrity verification (future)

Residual Risk: LOW

Information Disclosure

T7: Credential Exposure

Threat: Sensitive credentials leaked through logs, APIs, or storage.

Vectors:

Credentials in error messages
Debug logging of secrets
Insecure storage

Mitigations:

[x] Password hashes excluded from API responses (json:"-")
[x] TF state excluded from API responses
[x] Structured logging (no credential interpolation)
[x] Secrets stored as hashes where possible
[ ] Secret scanning in CI (future)

Residual Risk: LOW

T8: Workload Spec Leakage

Threat: Workload specs with environment variables (containing secrets) exposed.

Vectors:

Unauthorized API access
Log exposure
Database dump

Mitigations:

[x] RBAC on workload access
[ ] Environment variable encryption at rest (future)
[ ] Secret management integration (Vault) (future)

Residual Risk: MEDIUM

T9: Network Sniffing

Threat: Attacker captures sensitive data from network traffic.

Vectors:

Unencrypted control plane traffic
Unencrypted agent heartbeats
Man-in-the-middle attacks

Mitigations:

[x] TLS enforcement in production (configurable)
[ ] Certificate validation (TODO)
[x] Sensitive data not sent in query parameters

Residual Risk: HIGH (without TLS)

Denial of Service

T10: API Exhaustion

Threat: Attacker overwhelms control plane with requests.

Vectors:

Login brute force
Instance creation spam
WebSocket connection flood

Mitigations:

[x] Rate limiting per IP/user (configurable)
[ ] Request size limits
[x] Database connection pooling (prevents exhaustion)
[x] WebSocket connection limits (configurable)

Residual Risk: MEDIUM

T11: Resource Exhaustion

Threat: Malicious workload consumes all node resources.

Vectors:

CPU bomb
Memory exhaustion
Disk fill

Mitigations:

[x] Resource limits enforced (CPU, memory)
[x] Volume size limits in spec
[ ] Per-user resource quotas (future)
[ ] Automatic workload eviction (future)

Residual Risk: MEDIUM

T12: Agent Starvation

Threat: Control plane unavailable, agents cannot receive desired state.

Vectors:

Control plane crash
Network partition
Database failure

Mitigations:

[x] Agents continue running existing containers
[x] Idempotent operations (safe retries)
[ ] Control plane HA (future)
[ ] Agent local caching of last known state (future)

Residual Risk: MEDIUM

Elevation of Privilege

T13: RBAC Bypass

Threat: User gains admin privileges without authorization.

Vectors:

JWT manipulation
Role escalation bugs
Horizontal privilege escalation

Mitigations:

[x] Role stored in JWT, validated server-side
[x] RBAC checks on all admin endpoints
[x] User can only view own profile (unless admin)
[ ] Regular RBAC audit (operational)

Residual Risk: LOW

T14: Agent Privilege Escalation

Threat: Compromised agent gains control plane access.

Vectors:

Agent credential reuse
Control plane API exposure to agents

Mitigations:

[x] Agents have separate auth mechanism (X-Agent-Token)
[x] Agents cannot access user management APIs
[x] Agent tokens scoped to specific agent ID
[ ] Network segmentation (agents in separate VLAN)

Residual Risk: MEDIUM

Security Controls Summary

Implemented

Control	Description	Threats Mitigated
Password Hashing	bcrypt cost=10	T1
JWT Authentication	Signed, expiring tokens (15m access)	T1, T13
One-Time Tokens	Agent registration tokens	T2
Parameterized Queries	SQL injection prevention	T3, T5
JSON Schema Validation	Spec format enforcement	T3
RBAC	Role-based access control	T3, T7, T13
Audit Logging	Action tracking	T6
Field Exclusion	Secrets excluded from JSON	T7
Resource Limits	Container CPU/memory caps	T11
Connection Pooling	DB connection management	T10
Rate Limiting	Token bucket per IP/user	T1, T10
Account Lockout	Login lockout after failed attempts	T1
TLS Enforcement	Configurable HTTPS requirement	T2, T9
WebSocket Connection Limits	Cap concurrent WS sessions	T10

Planned (TODO)

Control	Priority	Threats Mitigated
Image Allowlist	MEDIUM	T3
Secret Management	MEDIUM	T8
Seccomp Profiles	LOW	T4
Control Plane HA	LOW	T12

Not Planned (Accepted Risk)

Control	Reason
Container Signing	Complexity vs. risk for MVP
Network Policies	Kubernetes-only feature
Hardware Security Modules	Cost prohibitive for target users

Attack Scenarios

Scenario 1: Malicious Admin

Attacker: Insider with admin access Goal: Exfiltrate workload data or disrupt service

Attack Path:

Create malicious workload with data exfiltration script
Deploy to node with valuable workload data
Container mounts data volume, exfiltrates via network

Mitigations:

Audit logging tracks who created workload
Volume mounts restricted to shardlyn data directory
Network monitoring can detect unusual egress

Scenario 2: Compromised Agent

Attacker: External with agent node access Goal: Pivot to control plane or other agents

Attack Path:

Compromise agent node (e.g., via vulnerable public workload such as a game server)
Extract agent auth token from filesystem
Attempt to use token for broader access

Mitigations:

Agent token scoped to single agent
Cannot access admin APIs
Control plane validates agent ID matches token

Scenario 3: Credential Stuffing

Attacker: External with leaked credential database Goal: Gain user/admin access

Attack Path:

Obtain leaked email/password combinations
Automated login attempts against Shardlyn
Successful login with reused credentials

Mitigations (TODO):

Rate limiting on login endpoint
Account lockout after failures
Breach detection notifications

Security Recommendations

For Operators

Enable TLS for all production deployments
Rotate secrets (JWT key, database password) regularly
Monitor audit logs for suspicious activity
Keep components updated for security patches
Network segmentation between control plane and agents
Backup encryption for database and TF state

For Users

Use strong, unique passwords
Don't share accounts
Review workload specs before deployment
Monitor resource usage for anomalies
Report suspicious activity to admins

Incident Response

Detection

Monitor shardlyn_http_requests_total{status=~"4.."} for auth failures
Alert on shardlyn_instances_by_state{state="error"} spikes
Review audit logs for unusual patterns

Containment

Revoke compromised user/agent tokens
Isolate affected nodes (network/firewall)
Stop suspicious instances

Recovery

Rotate all secrets (JWT key, passwords)
Regenerate agent tokens
Restore from known-good backup if needed
Review audit logs for full scope

Post-Incident

Document timeline and actions
Update threat model with new vectors
Implement additional controls as needed

Shardlyn Threat Model ​

Overview ​

System Boundaries ​

Assets ​

Critical Assets ​

Secondary Assets ​

Threat Analysis (STRIDE) ​

Spoofing ​

T1: User Authentication Bypass ​

T2: Agent Impersonation ​

Tampering ​

T3: Workload Spec Injection ​

T4: Container Escape ​

T5: Database Tampering ​

Repudiation ​

T6: Action Denial ​

Information Disclosure ​

T7: Credential Exposure ​

T8: Workload Spec Leakage ​

T9: Network Sniffing ​

Denial of Service ​

T10: API Exhaustion ​

T11: Resource Exhaustion ​

T12: Agent Starvation ​

Elevation of Privilege ​

T13: RBAC Bypass ​

T14: Agent Privilege Escalation ​

Security Controls Summary ​

Implemented ​

Planned (TODO) ​

Not Planned (Accepted Risk) ​

Attack Scenarios ​

Scenario 1: Malicious Admin ​

Scenario 2: Compromised Agent ​

Scenario 3: Credential Stuffing ​

Security Recommendations ​

For Operators ​

For Users ​

Incident Response ​

Detection ​

Containment ​

Recovery ​

Post-Incident ​

Shardlyn Threat Model

Overview

System Boundaries

Assets

Critical Assets

Secondary Assets

Threat Analysis (STRIDE)

Spoofing

T1: User Authentication Bypass

T2: Agent Impersonation

Tampering

T3: Workload Spec Injection

T4: Container Escape

T5: Database Tampering

Repudiation

T6: Action Denial

Information Disclosure

T7: Credential Exposure

T8: Workload Spec Leakage

T9: Network Sniffing

Denial of Service

T10: API Exhaustion

T11: Resource Exhaustion

T12: Agent Starvation

Elevation of Privilege

T13: RBAC Bypass

T14: Agent Privilege Escalation

Security Controls Summary

Implemented

Planned (TODO)

Not Planned (Accepted Risk)

Attack Scenarios

Scenario 1: Malicious Admin

Scenario 2: Compromised Agent

Scenario 3: Credential Stuffing

Security Recommendations

For Operators

For Users

Incident Response

Detection

Containment

Recovery

Post-Incident