Apache Iceberg Data Lead Job at Mphasis, Jersey City, NJ

YnZBUkFYaklaMkJiOEdHcmpRWkMrOWlBaVE9PQ==
  • Mphasis
  • Jersey City, NJ

Job Description

Job Summary:

We are seeking a highly skilled and experienced Apache Iceberg Data Lead to design, implement, and manage our data lake infrastructure. You will be responsible for building a scalable and efficient data lake using Apache Iceberg, ensuring data reliability, performance, and accessibility for downstream analytics and reporting. You will work closely with our Flink stream application developers and data scientists to build a robust data platform.

Responsibilities:

  • Data Lake Architecture and Design:
  • Design and implement a scalable and robust data lake architecture using Apache Iceberg.
  • Define data lake best practices, including data partitioning, clustering, and versioning.
  • Develop and maintain data lake schemas and metadata.
  • Integrate Apache Iceberg with other data lake components (e.g., storage systems, compute engines).
  • Iceberg Implementation and Management:
  • Implement and manage Apache Iceberg tables for both raw source data and processed Flink output.
  • Optimize Iceberg performance for various query patterns.
  • Ensure data quality and consistency within the data lake.
  • Manage Iceberg table evolution and schema changes.
  • Implement data retention and archival policies.
  • Integration with Flink and Other Data Systems:
  • Design and implement seamless integration between Apache Flink and Apache Iceberg for data ingestion and storage.
  • Work with Flink developers to ensure efficient data writing to Iceberg tables.
  • Integrate Iceberg with other data processing and analytics tools (e.g., Spark, Presto, Trino).
  • Work with message queues like Kafka to ingest data into iceberg.
  • Performance and Optimization:
  • Monitor and optimize data lake performance.
  • Troubleshoot and resolve data lake performance and stability issues.
  • Conduct performance testing and benchmarking.
  • Data Governance and Security:
  • Implement data governance policies within the data lake.
  • Ensure data security and access control.
  • Implement data lineage and audit trails.
  • Technical Leadership:
  • Provide technical leadership and guidance on Apache Iceberg and data lake best practices.
  • Mentor junior engineers and contribute to knowledge sharing.
  • Stay up-to-date with the latest developments in Apache Iceberg and data lake technologies.

Qualifications:

  • Required:
  • Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
  • 7+ years of experience in data engineering or data warehousing.
  • 3+ years of hands-on experience with Apache Iceberg.
  • Strong understanding of data lake architectures and best practices.
  • Proficiency in SQL and experience with data processing frameworks (e.g., Spark, Flink).
  • Experience with cloud storage systems (e.g., AWS S3, Azure Blob Storage, Google Cloud Storage).
  • Experience with message queues like Kafka.
  • Strong problem-solving and analytical skills.
  • Excellent communication and collaboration skills.
  • Preferred:
  • Experience with other data lake technologies (e.g., Apache Hudi, Delta Lake).
  • Experience with metadata management tools.
  • Experience with data governance and security tools.
  • Experience with containerization and orchestration technologies (Docker, Kubernetes).
  • Contributions to open source projects.

Job Tags

Similar Jobs

AppleOne Employment Services

Medical Records Clerk Job at AppleOne Employment Services

Job Title: Medical Records Clerk Location: Santa Ana, CA (Hybrid) 2 days in office and 3 days at home Pay Rate: $25/hour Position Type: Full-Time | Hybrid Schedule About Us: Were an established personal injury law firm based in Santa Ana with a reputation...

Trident Consulting

Civil Structural Engineer Job at Trident Consulting

 ...General of India in San Francisco Ranked as the #1 Women Owned Business Enterprise in the large category by IT-Serve Received the Tech-Serve Excellence award Consistently ranked in the Inc. 5000 list of fastest-growing private companies in America Recognized in... 

Synectics

Environmental Data Validation Chemist Job at Synectics

About Synectics: At Synectics, we're the leading data management service provider in the environmental industry. Based in Sacramento, CA, since 1996, we specialize in web-based solutions for Federal government clients. Our team of scientists and programmers is passionate...

Legacy Health

Urgent Care Physician Assistant - Float Job at Legacy Health

 ...Legacy to make their lives better. They seek care that is as compassionate as it is...  ...and surgical services to patients of the Urgent Care Clinic(s). Ability to make responsible...  ...throughout any single workday; therefore, travel between locations is also anticipated.... 

Frances Valentine

Product Team Assistant Job at Frances Valentine

 ...Frances Valentine brings heritage and fashion together under one roof to create pieces that not only offer color + joy, but also celebrate...  ...opportunity to join our Product team as the Product Team Assistant. This is a full-time in-office position, working out of our beautiful...