I help startups and engineering teams set up, scale, and improve operations for their databases and data platforms.
Whether you are deploying new database infrastructure or optimizing existing operations at scale, I provide pragmatic engineering support.
Here are the two areas I focus on:
1. PostgreSQL Operations & Data Modeling
I have worked on database internals and production operations. I help teams get the most out of Postgres without adding unnecessary operational complexity.
- Operations & Scale: Designing and bootstrapping new PostgreSQL deployments, scaling connection pooling, and establishing high availability patterns.
- Diagnostics & Optimization: Isolating performance bottlenecks, latency spikes, and lock contention using query execution tracing and active session monitoring.
- Data Modeling & Schema Design: Designing database schemas for high throughput, structuring table partition strategies, and designing optimized index layouts.
- Change Data Capture (CDC) & Security: Building reliable CDC pipelines (Debezium/logical replication) and auditing complex database access control lists (ACLs).
2. Data Engineering & Analytics Infrastructure
I have built and scaled query engines and data lake platforms processing petabytes of data. I help teams optimize their data platform architecture and query performance.
- Query Engines at Scale: Setting up and tuning query engines like Apache Spark, Presto/Trino, and Apache Iceberg for high-throughput analytics.
- Data Pipeline Operations: Designing robust ETL/ELT pipelines and streamlining existing scripts to reduce execution times and compute costs.
- Data Infrastructure Observability: Building end-to-end monitoring, logging, and metrics across platforms (Spark, Yarn, Trino, HDFS) to ensure reliability.
- Data Platform Architecture: Designing storage layouts, partitioning schemes, and choosing catalog tools for scalable, cost-efficient data lakes.
How I Work
I prefer focused, high-impact engagements rather than open-ended arrangements:
- Infrastructure Design & Setup: Partnering with your team to design, bootstrap, and deploy new Postgres setups, CDC pipelines, or data lake architectures (Iceberg/Trino).
- Operations & Performance Audits: Conducting a 1-2 week review of your database metrics, schema designs, or ETL execution profiles to provide an optimization roadmap.
- Technical Advisory: Collaborating with your team as a fractional expert on data architecture, scalability reviews, and query engine configuration.
Technical Background
I have spent my career building, scaling, and contributing to database engines and data platforms:
- Yugabyte: Implemented transactional streaming CDC, OpenTelemetry tracing, and Active Session History.
- LinkedIn: Led big data infrastructure observability initiatives and consulted with startups on database scaling and ETL optimization.
- StarTree: Worked on real-time analytics infrastructure, query performance, and scaling Apache Pinot for high-throughput, low-latency workloads.
- Qubole: Led the SQL teams for Hive and Presto services, founding the data engineering team and scaling usage.
- Yahoo: Built and scaled Perl/C++ query engines processing petabytes of daily data.
Contact
If you have a problem you would like to discuss, email me at [email protected] or find me on LinkedIn.