Data Engineer

Software Engineering, Data Science
Shanghai, China
Posted on Wednesday, May 15, 2024
1. Design, develop, and maintain highly scalable, reliable, and efficient data processing systems with a strong emphasis on code quality and performance.
2. Collaborate closely with data analysts, software developers, and business stakeholders to deeply understand data requirements and architect robust solutions to address their needs.
3. Focus on the development and maintenance of ETL pipelines, ensuring seamless extraction, transformation, and loading of data from diverse sources into our data warehouse based on Data-bricks platform.
4. Spearhead the development and maintenance of real-time data processing systems utilizing cutting-edge big data technologies such as Spark Streaming and Kafka.
5. Establish and enforce rigorous data quality and validation checks to uphold the accuracy and consistency of our data assets.
6. Act as a point of contact for troubleshooting and resolving complex data processing issues, collaborating with cross-functional teams as necessary to ensure timely resolution.
7. Proactively monitor and optimize data processing systems to uphold peak performance, scalability, and reliability standards, leveraging advanced AWS operational knowledge.
8. Utilize AWS services such as EC2, S3, Glue and Data-bricks to architect, deploy, and manage data processing infrastructure in the cloud.
9. Implement robust security measures and access controls to safeguard sensitive data assets within the AWS environment.
10. Stay abreast of the latest advancements in AWS technologies and best practices, incorporating new tools and services to continually improve our data processing capabilities.
1. Bachelor’s or Master’s degree in Computer Science or a related field.
2. Minimum of 5 years of hands-on experience as a Data Engineer, demonstrating a proven track record of designing and implementing sophisticated data processing systems.
4. Good understanding of Data-bricks platform and Delta Lake.
Familiar with data job scheduler tool such as Dagster.
5. Proficiency in one or more programming languages such as Scala, Java, or Python.
6. Deep expertise in big data technologies including Apache Spark for ETL processing and optimization.
7. Proficient in utilizing BI tools such as Metabase for data visualization and analysis.
8. Advanced understanding of data modeling, data quality, and data governance best practices.
9. Outstanding communication and collaboration skills, with the ability to effectively engage with diverse stakeholders across the organization.
10. Extensive experience in AWS operational management, including deployment, configuration, and optimization of data processing infrastructure within the AWS cloud environment.
11. Strong understanding of AWS services such as EC2, S3, Glue and EMR, with the ability to architect scalable and resilient data solutions leveraging these services.
12. Proficiency in AWS security best practices, with experience implementing robust security measures and access controls to protect sensitive data assets.
13. Hands-on experience with automation and DevOps tools such as Terraform for infrastructure as code and automation purposes.
14. Can read/write in English.