Big Data/PySpark Engineer in McLean at Vaco

Date Posted: 11/15/2019

Job Snapshot

Job Description

Notes from the call with manager:


  • Risk Management Group
  • This is a Data Analytics team
  • The way the data is structured is a bit different then what the team is equipped to do
  • Million and millions of records, thousands of columns big
  • Big Data Functions

Day to Day:

  • Interact with data sets
  • Compute in memory
  • Different systems we need to establish
  • Looking at other people's code and optimize it.

Top skills will be:

  • PySpark
  • AWS - EMR clusters, EC2

Job Description:

Basic Qualifications:

  • At least 1 year of experience with Apache Spark coding and good understanding of optimizing Spark for memory and performance
  • At least 1 year of experience with PySpark
  • At least 1 year setting up EMR clusters on AWS
  • At least 2 years of professional experience with data engineering and tools like Hadoop, HDFS, HIVE etc.
  • At least 2 years' experience in writing good quality software in languages like Java, Python, Scala etc.

Preferred Qualifications:

  • Expert knowledge of Apache Spark internals
  • Experience with AWS Services like S3, Lambda and EMR
  • 2+ years of experience in Python, Java, or Scala
  • 2+ years of experience with Unix/Linux systems with scripting experience in Shell, Perl or Python
  • 2+ years of experience building data pipelines
  • 2+ years of automated deployment, CICD experience with tools like Jenkins
  • At least 1 year of Cloud (AWS, Azure, Google) development experience
  • Experience with Streaming and/or NoSQL implementation (Mongo, Cassandra, etc.) a plus

Job Requirements

PySpark, AWS EMR clusters, EC2, Big Data