Skip to content

Home

DataHub.local DataHub.local

Enterprise-grade data platform running on homelab Kubernetes. Built for learning, experimentation, and real-world results.

GitHub Org License GitOps Docs
Explore Architecture View on GitHub
7
Cluster Nodes
25+
Running Services
44
CPU Cores
100 GB
Total RAM
~4 TB
Total Storage

What is DataHub.local?

DataHub.local is a personal homelab project that runs a complete, enterprise-grade data platform on hardware. Think of it as a personal Snowflake/Databricks — running at home, on a Kubernetes cluster made of OrangePi boards, a NAS, and a laptop.

It is simultaneously a portfolio project, a learning environment, and a real working platform used daily for data workflows, media, AI inference, and home automation.


Goals

  • Infrastructure as Code

    Deploy and maintain a production-grade Kubernetes cluster using GitOps, Ansible, and Helm — fully reproducible from a git repository.

  • Data Platform

    Design and run a scalable data lakehouse: ingestion → streaming → storage → transformation → visualization.

  • AI & Automation

    Self-host LLMs for inference, build AI-powered workflows in n8n, and run autonomous agents for SRE/DevOps tasks.

  • Security & Observability

    Full-stack monitoring (metrics, logs, alerts), zero-trust networking with Tailscale, and OIDC-based SSO for all services.

  • Portfolio & Learning

    Every component here is a hands-on experiment — from kernel-level GPIO fan control to Iceberg table formats and LLM pipelines.


About the Author

Alvaro Santos Andres

Alvaro Santos Andres

Data & AI engineer building production-grade platforms on open-source tech. DataHub.local is a real, running homelab — from Kubernetes on ARM64 to LLM pipelines and GitOps, built and improved in the open.