Roadmap
What's coming next for DataHub.local β priorities are ordered by status: done, in progress, near-term, and medium-term.
β Completed
| Item | Details |
|---|---|
| Homelab hardware & physical setup | Mixed ARM64/AMD64 cluster in a MicroATX case; CyberPower UPS; HORACO 2.5GbE managed switch |
| K3s Kubernetes cluster | 7-node cluster with GitOps, Longhorn, Traefik, cert-manager, Tailscale |
| Core services (GitOps) | ArgoCD ApplicationSet pipeline; encrypted secrets; Velero + Kopia backups; SSO via Dex |
| Data Lakehouse Infra | Trino + Project Nessie + Garage S3 + CloudNative PostgreSQL + Apache Spark |
| Streaming infrastructure | Redpanda 3-broker cluster (Kafka-compatible) |
| Local AI inference | Ollama + Open WebUI + VUI voice interface |
| Workflow automation | n8n with AI nodes |
| Active n8n Workflows | LinkedIn Professional Visibility workflow; AI Diagram Generation workflow |
| Observability stack | Prometheus + Grafana + Loki + Robusta + AlertManager |
| Open source publishing | spark-apps-helm, garage-helm, servarr |
π In Progress
AI-Powered Personal Workflows (n8n + LLMs)
Goal: Automate repetitive personal tasks using local LLMs connected via n8n workflows.
| Workflow | Description | Status |
|---|---|---|
| π° Content post updater | Pull trending topics from Commafeed + Google Trends MCP β generate an updated version of posts to share | π Building |
Data Lakehouse β SQLMesh + Iceberg
Goal: Replace ad-hoc Airflow ETL scripts with a proper transformation layer using SQLMesh.
SQLMesh brings software engineering practices to SQL transformations: version control, automated testing, CI/CD for data models, and incremental processing.
flowchart LR
classDef source fill:#1565C0,color:#fff,stroke:#42A5F5,stroke-width:2px
classDef layer fill:#4527A0,color:#fff,stroke:#9575CD,stroke-width:2px
classDef catalog fill:#00695C,color:#fff,stroke:#26C6DA,stroke-width:2px
classDef viz fill:#E65100,color:#fff,stroke:#FFA726,stroke-width:2px
Sources["π External Sources\n(APIs, GDrive, etc.)"]:::source
Raw["π¦ Raw Layer\n(Garage S3 / Iceberg)"]:::layer
Staging["π Staging Layer\n(Iceberg)"]:::layer
Marts["β¨ Mart Layer\n(Iceberg)"]:::layer
Superset["π Superset\nDashboards"]:::viz
Nessie["ποΈ Project Nessie\n(catalog)"]:::catalog
Sources -->|"Airflow ingest"| Raw
Raw -->|"SQLMesh staging"| Staging
Staging -->|"SQLMesh transform"| Marts
Marts -->|"Trino SQL"| Superset
Nessie -.->|"versions"| Raw & Staging & Marts
π Near-Term
Move to Claude β AI-Assisted Development
Goal: Adopt Claude as the primary AI assistant across the entire project β documentation, coding, repo management, and operations.
- Use Claude Code for all coding tasks and doc updates in every repository
- Initialise every datahub-local repo with a
CLAUDE.mdand project-specific skills - Create reusable Claude skills for common tasks (deploy, lint, review, update docs)
- Use Claude for ongoing documentation maintenance so the docs stay in sync with the cluster
Invoice Service β Personal Spending Intelligence
Goal: Build a real-time pipeline that automatically ingests supermarket invoices and turns them into actionable spending insights.
How it works:
- Ingestion β fetch invoice emails from Spanish supermarkets (Mercadona, Lidl, etc.) or extract receipts from Google Photos via OCR
- Storage β parse and store structured line-item data in the Iceberg data lake
- Transformation β SQLMesh models aggregate spend by category, product, and store over time
- Notifications β send a weekly digest via n8n with highlights like:
- πΈ Current month spend by category
- π Products whose price has risen the most
- π Shopping pattern changes vs. previous months
flowchart LR
classDef source fill:#1565C0,color:#fff,stroke:#42A5F5,stroke-width:2px
classDef process fill:#2E7D32,color:#fff,stroke:#66BB6A,stroke-width:2px
classDef store fill:#4527A0,color:#fff,stroke:#9575CD,stroke-width:2px
classDef notify fill:#E65100,color:#fff,stroke:#FFA726,stroke-width:2px
Email["π§ Email / Google Photos"]:::source
Airflow["βοΈ Airflow"]:::process
Iceberg["ποΈ Iceberg\n(Garage S3)"]:::store
Marts["π Spend Marts"]:::store
n8n["π n8n"]:::process
Notify["π² Slack / Notification"]:::notify
Email -->|"fetch + OCR"| Airflow
Airflow -->|"structured rows"| Iceberg
Iceberg -->|"SQLMesh models"| Marts
Marts -->|"Trino query"| n8n
n8n -->|"weekly digest"| Notify
π Medium-Term
AI Agents β Sympozium
Goal: Autonomous LLM-powered SRE agents that monitor, diagnose, and remediate cluster issues without human intervention.
flowchart LR
classDef alerting fill:#B71C1C,color:#fff,stroke:#EF5350,stroke-width:2px
classDef agent fill:#4527A0,color:#fff,stroke:#9575CD,stroke-width:2px
classDef data fill:#1565C0,color:#fff,stroke:#42A5F5,stroke-width:2px
classDef notify fill:#E65100,color:#fff,stroke:#FFA726,stroke-width:2px
Alert["π¨ AlertManager"]:::alerting
Robusta["π€ Robusta"]:::alerting
Agent["π§ AI Agent\n(Sympozium)"]:::agent
Loki["ποΈ Loki"]:::data
Prometheus["π Prometheus"]:::data
K8s["βΈοΈ Kubernetes API"]:::data
n8n["π n8n"]:::data
Notify["π² Slack / Dashboard"]:::notify
Alert --> Robusta
Robusta -->|"enriched alert"| Agent
Agent -->|"query logs"| Loki
Agent -->|"query metrics"| Prometheus
Agent -->|"kubectl exec"| K8s
Agent -->|"run playbook"| n8n
Agent -->|"report"| Notify
MCP Servers for AI Tooling
Goal: Expose cluster capabilities as Model Context Protocol (MCP) servers so AI assistants can interact with the homelab directly.
| Server | Exposes |
|---|---|
mcp-prometheus |
Query metrics, inspect alerts, get service health |
mcp-loki |
Search logs, tail pod output, find errors |
mcp-kubernetes |
List / describe / restart workloads safely |
mcp-trino |
Run SQL queries against the data lakehouse |
mcp-nessie |
Browse Iceberg catalog, list tables, inspect schemas |
mcp-garage |
List buckets / objects, check storage usage |