Summary

Detail-oriented Data Engineer with 3+ years of experience designing scalable ETL and streaming pipelines on AWS. Strong in modular system design, automated CI/CD, and observability. Proven ability to transform business needs into resilient, production-ready data solutions.

Technical Skills

  • Languages: Java, Python, Scala, SQL
  • Cloud & Infra: AWS (Lambda, EMR, Redshift, Glue, S3, EventBridge, CloudFormation), Terraform
  • Data & ETL: Apache Spark, Kinesis, DynamoDB, Athena, Matillion
  • CI/CD & Testing: GitHub Actions, Maven, JUnit, Mockito, Guice DI
  • Monitoring: CloudWatch, Datadog, PagerDuty

Professional Experience

Blackline Safety — Data Engineer

June 2023 – Present · Edmonton, AB

  • Built reusable Matillion + Lambda component that parses a custom SQL DSL and orchestrates data movement via S3, Kinesis, Redshift — improving feed onboarding by 70%.
  • Refactored serverless transformation engine using Maven and Guice DI; introduced SQS/DynamoDB DLQs and automated CI/CD via CloudFormation.
  • Migrated EMR CDC jobs to Spark Structured Streaming with optimized cluster configs — reducing latency by 30% and cost by 20%.
  • Automated Glue Catalog + Athena table creation; deployed Redshift Serverless with cross-cluster sharing for self-service analytics.
  • Developed SFTP-based reporting platform with query validation, Datadog alerting, and EventBridge scheduling with DLQ replay logic.
  • Created GitHub Actions pipeline to sync customer config with DynamoDB using checksum validation, ensuring integrity.
  • Led schema migrations and backfills (e.g. EXO8 gas, gamma data); built validation logic to resolve data mismatches.
  • Managed production on-call rotation and authored runbooks and observability dashboards in Datadog and CloudWatch.

Scotiabank — Data Engineer

June 2023 – Present

  • Developed scalable PII detection and anonymization framework using Spark (Scala + Python) across Parquet, ORC, JSON, HBase, etc.
  • Optimized Spark workloads via Spark UI and execution plans, achieving 70% performance gains on 500+ production tables.
  • Deployed framework post-UAT and integrated Ranger policies for secure PII masking in production systems.
  • Assisted HDP to CDP migration with issue debugging and platform compatibility fixes.
  • Automated ingestion pipelines with Python, Bash, and built unit-tested delivery workflows.
  • Documented pipelines for audits and collaborated with compliance on data privacy initiatives.

Scotiabank — Data Analyst Intern

Jan 2023 – May 2023

  • Automated 5+ manual business processes using Python and VBA.
  • Created interactive PowerBI dashboards for data-driven leadership decisions.

American Express — Lead Analyst, Business Intelligence

Sep 2018 – Sep 2020

  • Refactored Hive SQL queries to improve runtime by 50%; migrated logic to PySpark, increasing efficiency by 60%.
  • Optimized jobs using broadcast joins and Scala UDFs, and automated reports using Python and Bash.
  • Mentored junior analysts and supported transition to Spark-based workloads.

Mu‑Sigma Business Solutions — Trainee Associate, Data Analytics

Nov 2016 – Sep 2018

  • Built HiveQL pipelines and SQL scripts to analyze clickstream behavior and KPIs.
  • Created reusable data extractors to automate Excel/PowerBI reporting processes.
  • Maintained data dictionaries, technical docs, and logic maps for ETL flows.

Education

  • Lambton College, Toronto, CA — Cloud Computing for Big Data (GPA: 3.67/4.0, President Award recipient), Sep 2021 – May 2023
  • National Institute of Technology, India — Bachelor of Technology in Computer Science, May 2012 – May 2016