Why Your Data Pipeline Keeps Breaking (And How to Fix It)

Erick Rubio Jan 15, 2026 8 min read

If you've spent any time in data engineering, you know the feeling: you wake up to a flood of Slack alerts because your overnight pipeline decided to take the night off. It's frustrating, but it's also preventable.

The Usual Suspects

After years of debugging pipelines across different companies and tech stacks, I've noticed the same patterns showing up again and again. Most failures boil down to a few key antipatterns that are easy to fall into and, fortunately, not too hard to fix.

The first culprit is schema drift. Upstream sources change their schema without warning — a renamed column, a new nullable field, a changed data type. If your pipeline assumes a fixed schema, it will break.

Building Resilient Pipelines

The fix isn't just better monitoring (though that helps). It's about building pipelines that expect things to go wrong. Schema validation at ingestion, idempotent transformations, and proper dead-letter queues can turn a 3 AM emergency into a morning task.

The best pipeline isn't the one that never fails — it's the one that fails gracefully and tells you exactly what happened.

Start by adding contract tests between your pipeline stages. Tools like great_expectations or dbt tests make this straightforward. Then implement proper retry logic with exponential backoff — not infinite retries that hammer your source systems.

The Bigger Picture

Ultimately, pipeline reliability is a team sport. It requires good communication with upstream data producers, clear SLAs, and a culture where data quality is everyone's responsibility. The technical solutions are important, but the organizational ones matter just as much.

Star Schema vs. Data Vault: When to Use What

Erick Rubio Dec 3, 2025 6 min read

Choosing a data modeling approach is one of those decisions that shapes your entire analytics stack. Two of the most popular approaches — Star Schema and Data Vault — each have passionate advocates. But the truth is, neither is universally "better."

Star Schema: The Classic

Star schemas are intuitive, performant for BI queries, and well-understood by analysts. If your primary goal is enabling fast, ad-hoc reporting, a well-designed star schema is hard to beat. Fact tables surrounded by denormalized dimension tables create a structure that tools like Tableau and Power BI love.

Data Vault: The Flexible

Data Vault shines when you need auditability, historical tracking, and the ability to integrate diverse sources over time. Its hub-link-satellite structure separates business keys from descriptive attributes, making it resilient to source changes. It's more complex upfront, but pays dividends in large, evolving data ecosystems.

My Recommendation

In practice, I often use both: Data Vault for the raw/integration layer, then transform into star schemas for the presentation layer. This gives you the best of both worlds — flexibility in storage, simplicity in consumption.

From Analyst to Engineer: My Transition Story

Erick Rubio Nov 18, 2025 10 min read

Three years ago, I was a data analyst writing SQL in Google Sheets add-ons and building pivot tables. Today, I'm designing distributed data systems. Here's how that journey unfolded — and the lessons that might help you if you're considering a similar path.

The Turning Point

It started with frustration. I kept running into the same problem: the data I needed for my analyses was either missing, wrong, or formatted differently every time. Instead of complaining, I started learning how the data got there in the first place.

Learning the Stack

I began with Python — specifically Pandas for data manipulation. Then I discovered Airflow and the concept of orchestrated workflows. Suddenly, the manual processes I'd been doing could run themselves. That was the "aha" moment that hooked me on engineering.

From there, I dove into databases (beyond just querying them), version control, Docker, and eventually cloud services. Each skill compounded on the last. The analyst background turned out to be a superpower — I understood what questions the data needed to answer.

Advice for the Journey

If you're an analyst thinking about the engineering side: start by automating something you do manually. That first script that saves you an hour a week will teach you more than any course. And don't feel like you have to abandon analytics — the best data engineers are the ones who understand the end user.

Hi, I'm Erick Rubio.
I turn raw data
into clarity.

Passionate about making data work smarter.

My technical toolkit.

Databases

Data Pipelines

Reporting & BI

Data Modeling

Programming

Cloud & DevOps

Things I've built.

Real-Time Sales Pipeline

Data Warehouse Migration

Customer Churn Analytics

Automated Reporting System

Thoughts on data, engineering & more.