Resume
Summary
Detail-oriented Data Engineer with 3+ years of experience designing scalable ETL and streaming pipelines on AWS. Strong in modular system design, automated CI/CD, and observability. Proven ability to transform business needs into resilient, production-ready data solutions.
Technical Skills
- Languages: Java, Python, Scala, SQL
- Cloud & Infra: AWS (Lambda, EMR, Redshift, Glue, S3, EventBridge, CloudFormation), Terraform
- Data & ETL: Apache Spark, Kinesis, DynamoDB, Athena, Matillion
- CI/CD & Testing: GitHub Actions, Maven, JUnit, Mockito, Guice DI
- Monitoring: CloudWatch, Datadog, PagerDuty
Professional Experience
Blackline Safety — Data Engineer
June 2023 – Present · Edmonton, AB
- Built reusable Matillion + Lambda component that parses a custom SQL DSL and orchestrates data movement via S3, Kinesis, Redshift — improving feed onboarding by 70%.
- Refactored serverless transformation engine using Maven and Guice DI; introduced SQS/DynamoDB DLQs and automated CI/CD via CloudFormation.
- Migrated EMR CDC jobs to Spark Structured Streaming with optimized cluster configs — reducing latency by 30% and cost by 20%.
- Automated Glue Catalog + Athena table creation; deployed Redshift Serverless with cross-cluster sharing for self-service analytics.
- Developed SFTP-based reporting platform with query validation, Datadog alerting, and EventBridge scheduling with DLQ replay logic.
- Created GitHub Actions pipeline to sync customer config with DynamoDB using checksum validation, ensuring integrity.
- Led schema migrations and backfills (e.g. EXO8 gas, gamma data); built validation logic to resolve data mismatches.
- Managed production on-call rotation and authored runbooks and observability dashboards in Datadog and CloudWatch.
Scotiabank — Data Engineer
June 2023 – Present
- Developed scalable PII detection and anonymization framework using Spark (Scala + Python) across Parquet, ORC, JSON, HBase, etc.
- Optimized Spark workloads via Spark UI and execution plans, achieving 70% performance gains on 500+ production tables.
- Deployed framework post-UAT and integrated Ranger policies for secure PII masking in production systems.
- Assisted HDP to CDP migration with issue debugging and platform compatibility fixes.
- Automated ingestion pipelines with Python, Bash, and built unit-tested delivery workflows.
- Documented pipelines for audits and collaborated with compliance on data privacy initiatives.
Scotiabank — Data Analyst Intern
Jan 2023 – May 2023
- Automated 5+ manual business processes using Python and VBA.
- Created interactive PowerBI dashboards for data-driven leadership decisions.
American Express — Lead Analyst, Business Intelligence
Sep 2018 – Sep 2020
- Refactored Hive SQL queries to improve runtime by 50%; migrated logic to PySpark, increasing efficiency by 60%.
- Optimized jobs using broadcast joins and Scala UDFs, and automated reports using Python and Bash.
- Mentored junior analysts and supported transition to Spark-based workloads.
Mu‑Sigma Business Solutions — Trainee Associate, Data Analytics
Nov 2016 – Sep 2018
- Built HiveQL pipelines and SQL scripts to analyze clickstream behavior and KPIs.
- Created reusable data extractors to automate Excel/PowerBI reporting processes.
- Maintained data dictionaries, technical docs, and logic maps for ETL flows.
Education
- Lambton College, Toronto, CA — Cloud Computing for Big Data (GPA: 3.67/4.0, President Award recipient), Sep 2021 – May 2023
- National Institute of Technology, India — Bachelor of Technology in Computer Science, May 2012 – May 2016