Resume

Summary

Detail-oriented Data Engineer with 3+ years of experience designing scalable ETL and streaming pipelines on AWS. Strong in modular system design, automated CI/CD, and observability. Proven ability to transform business needs into resilient, production-ready data solutions.

Technical Skills

Languages: Java, Python, Scala, SQL
Cloud & Infra: AWS (Lambda, EMR, Redshift, Glue, S3, EventBridge, CloudFormation), Terraform
Data & ETL: Apache Spark, Kinesis, DynamoDB, Athena, Matillion
CI/CD & Testing: GitHub Actions, Maven, JUnit, Mockito, Guice DI
Monitoring: CloudWatch, Datadog, PagerDuty

Professional Experience

Blackline Safety — Data Engineer

June 2023 – Present · Edmonton, AB

Built reusable Matillion + Lambda component that parses a custom SQL DSL and orchestrates data movement via S3, Kinesis, Redshift — improving feed onboarding by 70%.
Refactored serverless transformation engine using Maven and Guice DI; introduced SQS/DynamoDB DLQs and automated CI/CD via CloudFormation.
Migrated EMR CDC jobs to Spark Structured Streaming with optimized cluster configs — reducing latency by 30% and cost by 20%.
Automated Glue Catalog + Athena table creation; deployed Redshift Serverless with cross-cluster sharing for self-service analytics.
Developed SFTP-based reporting platform with query validation, Datadog alerting, and EventBridge scheduling with DLQ replay logic.
Created GitHub Actions pipeline to sync customer config with DynamoDB using checksum validation, ensuring integrity.
Led schema migrations and backfills (e.g. EXO8 gas, gamma data); built validation logic to resolve data mismatches.
Managed production on-call rotation and authored runbooks and observability dashboards in Datadog and CloudWatch.

Scotiabank — Data Engineer

June 2023 – Present

Developed scalable PII detection and anonymization framework using Spark (Scala + Python) across Parquet, ORC, JSON, HBase, etc.
Optimized Spark workloads via Spark UI and execution plans, achieving 70% performance gains on 500+ production tables.
Deployed framework post-UAT and integrated Ranger policies for secure PII masking in production systems.
Assisted HDP to CDP migration with issue debugging and platform compatibility fixes.
Automated ingestion pipelines with Python, Bash, and built unit-tested delivery workflows.
Documented pipelines for audits and collaborated with compliance on data privacy initiatives.

Scotiabank — Data Analyst Intern

Jan 2023 – May 2023

Automated 5+ manual business processes using Python and VBA.
Created interactive PowerBI dashboards for data-driven leadership decisions.

American Express — Lead Analyst, Business Intelligence

Sep 2018 – Sep 2020

Refactored Hive SQL queries to improve runtime by 50%; migrated logic to PySpark, increasing efficiency by 60%.
Optimized jobs using broadcast joins and Scala UDFs, and automated reports using Python and Bash.
Mentored junior analysts and supported transition to Spark-based workloads.

Mu‑Sigma Business Solutions — Trainee Associate, Data Analytics

Nov 2016 – Sep 2018

Built HiveQL pipelines and SQL scripts to analyze clickstream behavior and KPIs.
Created reusable data extractors to automate Excel/PowerBI reporting processes.
Maintained data dictionaries, technical docs, and logic maps for ETL flows.

Education

Lambton College, Toronto, CA — Cloud Computing for Big Data (GPA: 3.67/4.0, President Award recipient), Sep 2021 – May 2023
National Institute of Technology, India — Bachelor of Technology in Computer Science, May 2012 – May 2016

Pulkit Kapoor