Position: Data Engineer Location: Remote Role Duration: 9 Months (Possible Extension)
Responsibilities:
The Team: The Data Platform team in our Company's Animal Health IT (MAHI-IT) designs and implements end to end data solutions to support customer facing applications in animal traceability, monitoring, well-being, and more.
We seek a Data Engineer to help the team setting up, maintaining, optimizing, and scaling data pipelines from multiple sources and across different functional teams in a cloud environment.
Assist in developing best practices for deploying, monitoring, and scaling data pipelines in the cloud
Identify requirements for ingestion, transformation, and storage of data
Design and implement optimal and scalable data pipelines
Use cloud tools to integrate data from multiple data source into the data lake and design and implement ways to expose it
Identify opportunities for automation and optimization of data pipelines and re-design of data architecture and infrastructure for great scalability and optimal delivery
Implement cloud/ data infrastructure required to extract, transform, and load data from multiple sources
Identify required security and governance procedures to keep the data safe in a cloud environment
Assist in developing and executing testing plans to help with QA efforts.
Requirements/ Qualification:
Bachelor's degree in Data Engineering, Computer Science, or related field.
Experience designing and implementing data engineering pipelines.
Advanced knowledge in Python and PySpark .
Working knowledge of one or more SQL languages.
3+ years of hands-on experience with developing data warehouse solutions and data products.
1+ year of hands-on experience developing a distributed data processing platform with Hadoop, Hive, Spark, Airflow, Kafka, etc.
3+ years of hands-on experience in modeling and designing data schemas.
Advanced experience with programming languages: Python, Pyspark, Scala, etc.
Knowledge of scripting languages: Perl, Shell, etc.
Practice working with, processing, and leading large data sets.
Experience with cloud tools for ingesting and processing data
Preferred Experience And Skills:
Experience with AWS tools big data platforms – S3, EMR, EKS, Lambda, etc.
Experience with data ingestion and transformation tools like Streamsets and Databricks
Experience working with DevOps teams
Experience with container technologies such as docker and Kubernetes
Experience with data warehousing tools like Snowflake and Redshift