Obed is a Data Engineer with 8+ years experience, especially well-versed in tools such as Python, SQL, Apache Airflow, Kafka, Snowflake, Cloud, Databases, JavaScript, Go, Pandas/Numpy, Docker, etc. Adept knowledge of Project Management Tools such as JIRA, Confluence, Air Table, Smartsheet, and Asana.
Hire Obed> Lead engineer for Airflow ETL consolidating 21m daily records from 600+ Snowflake tables into one table; this unified data model reduces time-to-insight for product analytics by improving query efficiency.
> Design data pipeline that reduced latency from 3 hours to 15 minutes from MySQL to Snowflake; including design discussions with the team, implementing successful POC, and project plan to migrate to the new pipeline.
> Lead project to standardize event analytics schemas (Avro) by coordinating with multiple product engineering teams, analysts, and data engineers while implementing changes across JavaScript/PHP/Java/Python/Go repositories, doing extensive validation, and executing large-scale production releases.
> Develop data pipelines with Airflow to ingest data from PagerDuty, New Relic, AWS S3, Funnel.IO, etc. to Snowflake along with automated data testing (Great Expectations) while keeping stakeholders updated.
> Contribute to team culture by drafting onboarding steps/team member milestones, establishing a protocol for post-mortem documents, publishing a public blog post on Medium, and providing mentorship for junior engineers.
> Design and implement ETL workflow framework in python (inspired by Airflow/Luigi) for file ingestion configurable via SQL; e.g. dynamic operator importing, developer tooling, and automated documentation
> Develop end-to-end client solution for improving data quality and decreasing report delivery time by extending our distributed python worker framework to automate manual processes (reduced total time by 10x)
> Establish a year-long plan and lead the Python 2.7 to 3.6 migration of 1k+ ETL ingestion workflows and 300+ custom analytics processes, including major library upgrades (e.g. pandas, NumPy, paramiko, xlrd)
> Improve runtime performance of Legacy Python ingestion workflows by 33%, reduce generic workers’ memory usage by 25%, and restructure database flows to accommodate for a ~20% increase in ingested data