Open to opportunities

Hi, I'm Erick Rubio.
I turn raw data
into clarity.

Data Engineer & Analyst specializing in building robust pipelines, designing data models, and crafting insightful reports that drive real business decisions.

Erick Rubio

Passionate about making data work smarter.

I'm a data professional currently living in Miami who thrives at the intersection of engineering and analytics. From designing scalable database architectures to building end-to-end ETL pipelines, I love turning messy, scattered data into structured, actionable insights.

My approach combines strong technical fundamentals with a keen eye for business context — because the best data solution is one that actually gets used.

10+
Years Experience
20+
Projects Delivered
10+
Tools Mastered

My technical toolkit.

🗄️

Databases

Designing, optimizing, and maintaining relational and non-relational databases for performance at scale.

PostgreSQL MySQL MongoDB Redis
🔀

Data Pipelines

Building reliable ETL/ELT workflows that move and transform data from source to insight.

Apache Airflow dbt Spark Kafka
📊

Reporting & BI

Creating dashboards and reports that tell compelling stories and empower stakeholders.

Tableau Power BI Looker Metabase
🧩

Data Modeling

Crafting star schemas, dimensional models, and data vaults that keep things clean and queryable.

Star Schema Snowflake Data Vault
🐍

Programming

Writing clean, efficient code for data manipulation, automation, and analysis.

Python SQL Bash JavaScript
☁️

Cloud & DevOps

Deploying and managing data infrastructure in the cloud with CI/CD best practices.

AWS GCP Docker Terraform

Things I've built.

📡

Real-Time Sales Pipeline

Kafka-based streaming pipeline processing 50K+ events/sec, feeding a live executive dashboard with sub-second latency.

KafkaSparkRedshiftTableau
🏗️

Data Warehouse Migration

Led migration from on-prem SQL Server to Snowflake, reducing query times by 80% and cutting infrastructure costs.

SnowflakedbtAirflowPython
🔍

Customer Churn Analytics

End-to-end analytics platform identifying churn signals, enabling the retention team to reduce churn by 15%.

PythonPostgreSQLLookerScikit-learn

Automated Reporting System

Built a self-serve reporting engine generating 200+ weekly reports, saving 40 hours/month of manual work.

PythonAirflowBigQueryData Studio

Thoughts on data, engineering & more.

Why Your Data Pipeline Keeps Breaking (And How to Fix It)

Most pipeline failures come down to the same handful of antipatterns. Here's a practical guide to building pipelines that actually survive production.

Star Schema vs. Data Vault: When to Use What

A practical comparison of two popular data modeling approaches, with real-world examples of when each one shines.

From Analyst to Engineer: My Transition Story

How I made the leap from writing SQL queries in spreadsheets to designing distributed data systems — and what I wish I knew earlier.

Why Your Data Pipeline Keeps Breaking (And How to Fix It)

Erick Rubio Jan 15, 2026 8 min read

If you've spent any time in data engineering, you know the feeling: you wake up to a flood of Slack alerts because your overnight pipeline decided to take the night off. It's frustrating, but it's also preventable.

The Usual Suspects

After years of debugging pipelines across different companies and tech stacks, I've noticed the same patterns showing up again and again. Most failures boil down to a few key antipatterns that are easy to fall into and, fortunately, not too hard to fix.

The first culprit is schema drift. Upstream sources change their schema without warning — a renamed column, a new nullable field, a changed data type. If your pipeline assumes a fixed schema, it will break.

Building Resilient Pipelines

The fix isn't just better monitoring (though that helps). It's about building pipelines that expect things to go wrong. Schema validation at ingestion, idempotent transformations, and proper dead-letter queues can turn a 3 AM emergency into a morning task.

The best pipeline isn't the one that never fails — it's the one that fails gracefully and tells you exactly what happened.

Start by adding contract tests between your pipeline stages. Tools like great_expectations or dbt tests make this straightforward. Then implement proper retry logic with exponential backoff — not infinite retries that hammer your source systems.

The Bigger Picture

Ultimately, pipeline reliability is a team sport. It requires good communication with upstream data producers, clear SLAs, and a culture where data quality is everyone's responsibility. The technical solutions are important, but the organizational ones matter just as much.

Star Schema vs. Data Vault: When to Use What

Erick Rubio Dec 3, 2025 6 min read

Choosing a data modeling approach is one of those decisions that shapes your entire analytics stack. Two of the most popular approaches — Star Schema and Data Vault — each have passionate advocates. But the truth is, neither is universally "better."

Star Schema: The Classic

Star schemas are intuitive, performant for BI queries, and well-understood by analysts. If your primary goal is enabling fast, ad-hoc reporting, a well-designed star schema is hard to beat. Fact tables surrounded by denormalized dimension tables create a structure that tools like Tableau and Power BI love.

Data Vault: The Flexible

Data Vault shines when you need auditability, historical tracking, and the ability to integrate diverse sources over time. Its hub-link-satellite structure separates business keys from descriptive attributes, making it resilient to source changes. It's more complex upfront, but pays dividends in large, evolving data ecosystems.

My Recommendation

In practice, I often use both: Data Vault for the raw/integration layer, then transform into star schemas for the presentation layer. This gives you the best of both worlds — flexibility in storage, simplicity in consumption.

From Analyst to Engineer: My Transition Story

Erick Rubio Nov 18, 2025 10 min read

Three years ago, I was a data analyst writing SQL in Google Sheets add-ons and building pivot tables. Today, I'm designing distributed data systems. Here's how that journey unfolded — and the lessons that might help you if you're considering a similar path.

The Turning Point

It started with frustration. I kept running into the same problem: the data I needed for my analyses was either missing, wrong, or formatted differently every time. Instead of complaining, I started learning how the data got there in the first place.

Learning the Stack

I began with Python — specifically Pandas for data manipulation. Then I discovered Airflow and the concept of orchestrated workflows. Suddenly, the manual processes I'd been doing could run themselves. That was the "aha" moment that hooked me on engineering.

From there, I dove into databases (beyond just querying them), version control, Docker, and eventually cloud services. Each skill compounded on the last. The analyst background turned out to be a superpower — I understood what questions the data needed to answer.

Advice for the Journey

If you're an analyst thinking about the engineering side: start by automating something you do manually. That first script that saves you an hour a week will teach you more than any course. And don't feel like you have to abandon analytics — the best data engineers are the ones who understand the end user.

Let's work together.

Whether you have a project in mind, want to collaborate, or just want to chat about data — I'd love to hear from you.