Skip to content

Architecture Overview

This page describes the logical and physical cluster architecture, storage design, observability, and the GitOps deployment pipeline that powers DataHub.local.


General

High-level view of the cluster: how nodes, networking, and storage components relate to each other.

graph TB
    classDef internet fill:#1A237E,color:#fff,stroke:#3F51B5,stroke-width:2px
    classDef control fill:#1565C0,color:#fff,stroke:#42A5F5,stroke-width:2px
    classDef armWorker fill:#2E7D32,color:#fff,stroke:#66BB6A,stroke-width:2px
    classDef amdWorker fill:#4527A0,color:#fff,stroke:#9575CD,stroke-width:2px
    classDef network fill:#00695C,color:#fff,stroke:#26C6DA,stroke-width:2px
    classDef storage fill:#E65100,color:#fff,stroke:#FFA726,stroke-width:2px

    subgraph Internet
        DNS["๐ŸŒ External DNS"]:::internet
        GHCR["๐Ÿ“ฆ GitHub Container Registry"]:::internet
    end

    subgraph "Home Network"
        Router["๐Ÿ  Home Router"]:::network

        subgraph "K3s Cluster"
            subgraph "Control Plane"
                CP["๐ŸŽ›๏ธ orpi-0\nOrangePi 4 LTS"]:::control
            end

            subgraph "ARM64 Workers"
                W1["๐ŸŸ  orpi-1\nOrangePi 5B 16G"]:::armWorker
                W2["๐ŸŸ  orpi-2\nOrangePi 5B 16G"]:::armWorker
                W3["๐ŸŸ  orpi-3\nOrangePi 5B 16G"]:::armWorker
            end

            subgraph "AMD64 Workers"
                W4["๐Ÿ”ต amd-1\nCHUWI UBox 32G"]:::amdWorker
                W5["๐Ÿ’พ nas\nCWWK X86-P5 N305"]:::amdWorker
                W6["๐Ÿ’ป laptop\nLenovo Legion WSL2"]:::amdWorker
            end

            subgraph "Networking"
                Traefik["๐Ÿ”€ Traefik\nIngress"]:::network
                Tailscale["๐Ÿ”’ Tailscale\nVPN"]:::network
            end

            subgraph "Storage"
                Longhorn["๐Ÿ’พ Longhorn\nDistributed Block"]:::storage
                Garage["๐Ÿชฃ Garage\nS3 Object Store"]:::storage
                NFS["๐Ÿ“ NFS\nNAS Shares"]:::storage
            end
        end
    end

    Router --> Traefik
    Traefik --> CP
    CP --> W1 & W2 & W3 & W4 & W5 & W6
    Longhorn --> W1 & W2 & W3
    Garage --> W1 & W2 & W3
    NFS --> W5
    DNS --> Router

Physical Layout

The cluster is a heterogeneous mix of ARM64 and AMD64 machines โ€” an OrangePi SBC for the control plane, OrangePi 5B boards as initial workers, and x86 mini-PCs that have progressively replaced them for better performance-per-euro.

graph TB
    classDef control fill:#1565C0,color:#fff,stroke:#42A5F5,stroke-width:2px
    classDef armNode fill:#2E7D32,color:#fff,stroke:#66BB6A,stroke-width:2px
    classDef amdNode fill:#4527A0,color:#fff,stroke:#9575CD,stroke-width:2px
    classDef netNode fill:#00695C,color:#fff,stroke:#26C6DA,stroke-width:2px

    UPS["โšก CyberPower UPS\nPower Protection"]:::netNode
    SW["๐ŸŒ HORACO 2.5GbE\nManaged Switch"]:::netNode

    CP["๐ŸŽ›๏ธ datahublocal-orpi-0\nOrangePi 4 LTS ยท 4 GB\nARM64 RK3399 ยท Control Plane"]:::control

    W1["๐ŸŸ  datahublocal-orpi-1\nOrangePi 5B ยท 16 GB\nARM64 RK3588\nWorker ยท Longhorn ยท Garage"]:::armNode
    W2["๐ŸŸ  datahublocal-orpi-2\nOrangePi 5B ยท 16 GB\nARM64 RK3588\nWorker ยท Longhorn ยท Garage"]:::armNode
    W3["๐ŸŸ  datahublocal-orpi-3\nOrangePi 5B ยท 16 GB\nARM64 RK3588\nWorker ยท Longhorn ยท Garage"]:::armNode

    W4["๐Ÿ”ต datahublocal-amd-1\nCHUWI UBox ยท 32 GB\nAMD Ryzen 5 6600H\nHeavy Compute Worker"]:::amdNode
    W5["๐Ÿ’พ datahublocal-nas\nCWWK X86-P5 N305 ยท 16 GB\n3ร—1 TB HDD RAID + 128 GB NVMe\nNAS Worker ยท Garage S3 ยท NFS"]:::amdNode
    W6["๐Ÿ’ป datahublocal-legion-laptop\nLenovo Legion ยท AMD64\nWSL2 ยท Dev Worker"]:::amdNode

    UPS --> SW
    SW --> CP
    SW --> W1
    SW --> W2
    SW --> W3
    SW --> W4
    SW --> W5
    SW --> W6

Cluster Nodes

Node Hardware Role CPU Arch OS
datahublocal-orpi-0 OrangePi 4 LTS (4 GB) Control Plane ARM64 (RK3399) Debian 13
datahublocal-orpi-1 OrangePi 5B (16 GB) Worker ARM64 (RK3588) Debian 13
datahublocal-orpi-2 OrangePi 5B (16 GB) Worker ARM64 (RK3588) Debian 13
datahublocal-orpi-3 OrangePi 5B (16 GB) Worker ARM64 (RK3588) Debian 13
datahublocal-amd-1 CHUWI UBox (AMD 6600H, 32 GB) Worker AMD64 Debian 13
datahublocal-nas CWWK X86-P5 (N305, 16 GB, RAID 3ร—1TB + 128 GB NVMe) NAS + Worker AMD64 Debian 12
datahublocal-legion-laptop Lenovo Legion laptop Dev + Worker AMD64 (WSL2) Ubuntu 24.04

Kubernetes distribution: K3s v1.36 (lightweight, production-ready)
Container runtime: containerd 2.2

Architecture evolution: The cluster started ARM-only with OrangePi 5B boards, but over time x86 mini-PCs proved significantly more cost-effective for compute-heavy workloads (Spark, Trino). The ARM nodes remain useful for lightweight services and multi-arch testing โ€” see Lessons Learned for more context.


Namespace Layout

Services are organized into namespaces by concern, each managed as a separate ArgoCD Application:

Namespace Contents
kube-system K3s system components, Traefik, Longhorn CSI, cert-manager, ExternalDNS, reloader, reflector, snapshot-controller, NVIDIA plugin
automation ArgoCD, n8n, Velero, Kopia backup agents
data Airflow, Trino, Superset, Redpanda, Nessie, Spark, PostgreSQL, Garage, Ollama, Open WebUI, Valkey
monitoring Prometheus, AlertManager, Grafana, Loki, Promtail, Robusta, Speedtest exporter
security cert-manager, Dex (OIDC), OAuth2-proxy, Tailscale
media CommaFeed (RSS), MusicGrabber
other Homepage dashboard, ConvertX, IT Tools, Mazanoke, Omni Tools, Stirling PDF

Networking & Access

Method Use Case
Traefik IngressRoutes Internal HTTP/HTTPS routing for all web UIs (Grafana, Superset, ArgoCD, etc.)
Tailscale VPN Secure remote access from anywhere โ€” no port forwarding needed
ExternalDNS Automatically manages DNS records for exposed services
cert-manager Automatic TLS certificates via Let's Encrypt
OAuth2-proxy + Dex SSO authentication for all web services using OIDC

Storage

Storage is split by access pattern and cost profile โ€” fast NVMe for latency-sensitive workloads, spinning HDD RAID for bulk data that doesn't need IOPS.

Tier Technology Backing hardware Use case
High-performance block Longhorn NVMe SSDs on OrangePi 5B nodes Stateful apps that need low latency: PostgreSQL, Redpanda, Valkey โ€” replicated across nodes for HA
High-capacity object (S3) Garage NVMe SSD on CWWK NAS Data lake (Iceberg tables, Spark outputs, Loki logs, backups) โ€” fast random reads for analytics workloads without the cost of full NVMe replicas across every node
Bulk shared filesystem NFS (CWWK NAS RAID) 3ร—1 TB HDD RAID Large sequential files: media library, raw data exports, archives โ€” big and cheap, latency-tolerant

Design principle: Put the right data on the right storage. Longhorn replicas on NVMe make PostgreSQL snappy; Garage on a single NVMe NAS node is fast enough for object storage at a fraction of the cost; HDD RAID covers everything that just needs capacity.


Observability

All services expose Prometheus metrics via ServiceMonitor resources. Logs are shipped via Promtail to Loki. Alerts flow from Prometheus AlertManager to notification channels.

flowchart LR
    classDef source fill:#1565C0,color:#fff,stroke:#42A5F5,stroke-width:2px
    classDef collector fill:#2E7D32,color:#fff,stroke:#66BB6A,stroke-width:2px
    classDef store fill:#4527A0,color:#fff,stroke:#9575CD,stroke-width:2px
    classDef alerting fill:#B71C1C,color:#fff,stroke:#EF5350,stroke-width:2px
    classDef viz fill:#00695C,color:#fff,stroke:#26C6DA,stroke-width:2px
    classDef notify fill:#E65100,color:#fff,stroke:#FFA726,stroke-width:2px

    Pods["๐Ÿ“ฆ All Pods\n(metrics + logs)"]:::source
    NodeExporter["๐Ÿ“Š Node Exporter\n(per node)"]:::collector
    Promtail["๐Ÿ“‹ Promtail\n(log shipper)"]:::collector
    Prometheus["๐Ÿ”ฅ Prometheus\n(metrics)"]:::store
    Loki["๐Ÿ—ƒ๏ธ Loki\n(logs)"]:::store
    AlertManager["๐Ÿšจ AlertManager"]:::alerting
    Grafana["๐Ÿ“ˆ Grafana\n(dashboards)"]:::viz
    Robusta["๐Ÿค– Robusta\n(K8s monitoring)"]:::alerting
    Notifications["๐Ÿ“ฒ Slack / Email"]:::notify

    Pods --> Promtail
    Pods --> NodeExporter
    NodeExporter --> Prometheus
    Promtail --> Loki
    Prometheus --> AlertManager
    Prometheus --> Grafana
    Loki --> Grafana
    AlertManager --> Robusta
    Robusta -->|"enriched alerts"| Notifications

GitOps

Everything in the cluster is managed as code. No manual kubectl apply for services โ€” all changes flow through Git.

flowchart LR
    classDef dev fill:#1565C0,color:#fff,stroke:#42A5F5,stroke-width:2px
    classDef bootstrap fill:#2E7D32,color:#fff,stroke:#66BB6A,stroke-width:2px
    classDef secrets fill:#B71C1C,color:#fff,stroke:#EF5350,stroke-width:2px
    classDef core fill:#4527A0,color:#fff,stroke:#9575CD,stroke-width:2px
    classDef workflow fill:#00695C,color:#fff,stroke:#26C6DA,stroke-width:2px
    classDef cluster fill:#E65100,color:#fff,stroke:#FFA726,stroke-width:2px

    Dev["๐Ÿ‘จโ€๐Ÿ’ป Developer / Me\n(git push)"]:::dev

    subgraph "Bootstrap Layer"
        Ansible["datahub-local-bootstrap\n(Ansible)\nโ€ข Install K3s\nโ€ข Configure OS\nโ€ข Deploy ArgoCD"]:::bootstrap
    end

    subgraph "Secrets Layer"
        Secrets["datahub-local-secrets\n(Private)\nโ€ข Encrypted secrets\nโ€ข API keys\nโ€ข Credentials"]:::secrets
    end

    subgraph "Core Layer"
        Core["datahub-local-core\n(Helmfile)\nโ€ข ApplicationSets\nโ€ข All services\nโ€ข 7 namespaces"]:::core
    end

    subgraph "Workflow Layer"
        Workflows["datahub-local-workflows\nโ€ข n8n flows\nโ€ข Airflow DAGs\nโ€ข SQLMesh models"]:::workflow
    end

    subgraph "K3s Cluster"
        ArgoCD["๐Ÿ”„ ArgoCD\n(GitOps controller)"]:::cluster
        Services["๐Ÿš€ Running Services\n(Pods, Deployments,\nStatefulSets)"]:::cluster
    end

    Dev -->|"1. provision"| Ansible
    Ansible -->|"2. installs"| ArgoCD
    Secrets -->|"3. sync"| ArgoCD
    Core -->|"4. sync"| ArgoCD
    Workflows -->|"5. sync"| ArgoCD
    ArgoCD -->|"reconciles"| Services

Deployment Flow

  1. Bootstrap โ€” Ansible provisions OS on bare metal, installs K3s, and deploys ArgoCD as the first application.
  2. Secrets โ€” ArgoCD syncs datahub-local-secrets (private repo) to deploy encrypted secrets into the security namespace.
  3. Core โ€” ArgoCD syncs datahub-local-core, which contains Helmfile-based ApplicationSets that expand into one Application per namespace ร— service.
  4. Workflows โ€” n8n flow JSONs, Airflow DAG Python files, and SQLMesh models are synced from datahub-local-workflows.
  5. Reconciliation โ€” ArgoCD continuously watches all repos and auto-syncs on any commit to HEAD.