Data Engineer
Plano, TX (On-Site)
Job Description:
Job Title: Data Engineer
Location: Richmond, VA or MCLean, VA or Plano, TX (Onsite)
Key Responsibilities:
- Create, maintain, and optimize ETL/ELT pipelines to ingest, process, and manage data from various sources using Python, Apache Spark, and AWS services.
- Design data models, build data structures, and implement data storage solutions that ensure data integrity, consistency, and security.
- Tune data processing workflows for performance, scalability, and cost efficiency on distributed systems using Spark and AWS.
- Work with cross-functional teams (e.g., data science, product, analytics) to understand data requirements and support business needs. Document data workflows, processes, and solutions for transparency and reproducibility
- Implement data quality checks, error handling, and recovery processes. Ensure compliance with data governance and security protocols.
Key Qualifications:
- Proficient in Python for data processing, scripting, and automation.
- Experience with Spark for data transformation, distributed processing, and ETL workflows.
- Hands-on experience with core AWS services like S3, Lambda, Glue, EMR, Redshift, and RDS. Knowledge of IAM, CloudFormation, and/or Terraform for infrastructure management is a plus.
- Strong understanding of SQL, data warehousing, and database design principles.
- Familiarity with data modeling, schema design, and query optimization.
Other Skills:
- Experience with version control (Git) and CI/CD practices.
- Strong problem-solving skills and ability to work in an Agile environment.
- Excellent communication skills and ability to work with non-technical stakeholders.
Preferred Qualifications:
- Familiarity with additional tools like Airflow for workflow orchestration.
- Experience with data streaming technologies (e.g., Kafka, Kinesis).
Key Skills:
- Data engineer and AWS and Airflow