All posts

Database & Data Engineering Consulting

Pragmatic consulting for PostgreSQL operations, data modeling, and data engineering infrastructure.

On this page

I help startups and engineering teams set up, scale, and improve operations for their databases and data platforms.

Whether you are deploying new database infrastructure or optimizing existing operations at scale, I provide pragmatic engineering support.

Here are the two areas I focus on:

1. PostgreSQL Operations & Data Modeling

I have worked on database internals and production operations. I help teams get the most out of Postgres without adding unnecessary operational complexity.

  • Operations & Scale: Designing and bootstrapping new PostgreSQL deployments, scaling connection pooling, and establishing high availability patterns.
  • Diagnostics & Optimization: Isolating performance bottlenecks, latency spikes, and lock contention using query execution tracing and active session monitoring.
  • Data Modeling & Schema Design: Designing database schemas for high throughput, structuring table partition strategies, and designing optimized index layouts.
  • Change Data Capture (CDC) & Security: Building reliable CDC pipelines (Debezium/logical replication) and auditing complex database access control lists (ACLs).

2. Data Engineering & Analytics Infrastructure

I have built and scaled query engines and data lake platforms processing petabytes of data. I help teams optimize their data platform architecture and query performance.

  • Query Engines at Scale: Setting up and tuning query engines like Apache Spark, Presto/Trino, and Apache Iceberg for high-throughput analytics.
  • Data Pipeline Operations: Designing robust ETL/ELT pipelines and streamlining existing scripts to reduce execution times and compute costs.
  • Data Infrastructure Observability: Building end-to-end monitoring, logging, and metrics across platforms (Spark, Yarn, Trino, HDFS) to ensure reliability.
  • Data Platform Architecture: Designing storage layouts, partitioning schemes, and choosing catalog tools for scalable, cost-efficient data lakes.

How I Work

I prefer focused, high-impact engagements rather than open-ended arrangements:

  1. Infrastructure Design & Setup: Partnering with your team to design, bootstrap, and deploy new Postgres setups, CDC pipelines, or data lake architectures (Iceberg/Trino).
  2. Operations & Performance Audits: Conducting a 1-2 week review of your database metrics, schema designs, or ETL execution profiles to provide an optimization roadmap.
  3. Technical Advisory: Collaborating with your team as a fractional expert on data architecture, scalability reviews, and query engine configuration.

Technical Background

I have spent my career building, scaling, and contributing to database engines and data platforms:

  • Yugabyte: Implemented transactional streaming CDC, OpenTelemetry tracing, and Active Session History.
  • LinkedIn: Led big data infrastructure observability initiatives and consulted with startups on database scaling and ETL optimization.
  • StarTree: Worked on real-time analytics infrastructure, query performance, and scaling Apache Pinot for high-throughput, low-latency workloads.
  • Qubole: Led the SQL teams for Hive and Presto services, founding the data engineering team and scaling usage.
  • Yahoo: Built and scaled Perl/C++ query engines processing petabytes of daily data.

Contact

If you have a problem you would like to discuss, email me at [email protected] or find me on LinkedIn.