Big Data Engineer with Talend (Hadoop, Cloudera, AWS, Scala, Spark, Devops)
Consultancy - London
Competitive Salary +bonus+benefits+package
Talend, Hadoop, Cloudera, AWS, Scala, Spark, Devops, BI, ETL.
Salt are proud to announce their solid partnership with one of the worlds Leading Consultancies. They require a solid Big Data Engineer with Talend (Hadoop, Cloudera, AWS, Scala, Spark, Devops) to join their London offices, and provide a new light on their Data Transition to the Cloud.
The ideal Big Data Engineer with Talend (Hadoop, Cloudera, AWS, Scala, Spark, Devops) candidate will be of one from Solid BIG DATA background, using AWS as an absolute requirement. The opportunity to learn Talend will be on offer as well as some of the most creative, complex and stimulating projects amongst some very BIG named Clients. You will be involved in a huge transformation and have autonomy in decision making, fantastic impact within a well-known company and extensive training and development in the most cutting edge technologies.
Overview of Background:
The Data Engineer will typically come from a strong IT development or coding background in Financial Services and have a proven track record in responsibility for the ingestion through to supporting the user self-service consumption of data within a Logical Lake based on Spark/HDFS. For technological context, the Big Data Framework will be hosted on AWS and based on Spark and Cloudera implementing both Permanent and Transitory Clusters, S3, Lambda, Talend, Sentry, HDFS etc. etc.
Note that the platform is driven by metadata and all data processing and definition must be reflected in the Metadata repository. Metadata is used to manage Data Lifecycle, Data Status (Raw / In-Progress / Trusted etc) Data Source and lineage across the Tiers of the "Lake", Ingestion, Refinement, Semantic, Sand Box Population, workflows, Data Classifications (e.g. Security and Personal), Content access policies for CRUD, Security and Privacy, Ownership / Stewardship, History/Auditing, Searchability / Discoverability.
- Data Engineers will follow best practice (Set by the Lead DE) across the platform and will provide mentoring and skills transfers to other less skilled Data Engineers.
- Work with Platform Engineers to define additional capabilities required for the overall Data Lake framework, Peer review of other Data Engineers designs and developed artefacts.
- Developing technical specifications corresponding with architecture, based on business analysis (performed by the DE) and functional analysis
- generating/Building the software/configuration as per the technical specification
- Capturing and enhancing metadata and the use of metadata to generate the Data Catalogue (and the mechanism use to expose that to the user community)
- Definition of Security Policies for the data implemented and the implementation of those policies in Sentry providing Authorisation controls at the required granularity.
- Experience in large scale distributed processing architectures e.g. enterprising caching, low latency data driven platforms (Enterprise Data Warehouse preferably)
- Expertise in fault tolerant systems in AWS including Clustering & multi AZ deployments.
- Working knowledge of setting up + running Big Data clusters e.g. Hadoop, Spark, Impala supported by various Databases, object stores (S3) on AWS
- Understanding of configuration management and Devops technologies (e.g. GitLab /Jenkins/Nexus )
- BI/Data Prep/Visualisation/ETL tools; Zeppelin, SpotFire, Tableau, StreamSets for large scale dashboards, analysis and reporting
- AWS / S3 / Lambda / Transient Clusters
- Cloudera Hadoop Distribution (CDH 5)
- Implementing data ingestion/presentation/semantics data layers as guided by BA/Data Governance/Conceptual Data models.
- Uses the data ingestion/data transformation frameworks developed by the Lead DE and where necessary assist in the development of new frameworks / or framework variants.
- Has a domain understanding of both the existing data and the use cases and therefore is responsible for normalising data within the lake to facilitate usability.
- Skills in data modelling (both structured and unstructured) data
- Skills in metadata repositories and technologies (Hive / Cloudera Navigator)
- Skills in data acquisition (landing, ingestion and metadata) of various data types including Salesforce, XML and Relational data
- Skills in data manipulation: Java, Scala, Python executing within a Spark environment, orchestrated by Oozie/Talend
- Experience with HDFS / Hive / AWS / Sentry Policies / Oozie / ZooKeeper
- Experience with Big Data self-describing formats (JSON/Parquet/ORC etc and when to use them)
- Experience with Spark clusters, both permanent (Cloudera) and transitory clusters Altus/ EMR
- Data presentation via Sentry/Impala to visualisation technologies e.g. Talend / SpotFire.
- Familiar with building secure software using modern security principles