5000 Córdoba
Job Description
We are looking for a highly motivated, energetic professional who wants to be part of a highly specialized team that is at the forefront of building a large scale data processing & analytics platform. The ideal candidate should be willing to work in a fast-paced dynamic environment where technical abilities will be challenged on a day to day basis. In this role, you will be driving design, development and production support building high quality scalable big data platform and applications.
Responsibilities
You will be responsible for analyzing large data sets to glean actionable insights, design classifiers, and ranking algorithms, perform ad-hoc statistical analysis,
present results of the analysis to team and leadership and create metrics to measure the success of the product. Candidates must be able to independently design, code, and test major features, as well as work jointly with other team members to deliver complex changes.
Responsibilities include:
Design and build scalable data processing infrastructure.
Build modular components in the large volume data movement and management.
Work with architects in implementing the data pipeline.
Key Qualifications
- Strong working knowledge of data mining algorithms including classifiers, clustering algorithms, and anomaly detection techniques.
- Practical understanding of linear algebra & statistics, and a passion for building machine learning models & systems.
- 5+ plus years of experience in Java Programming (design & architecture, algorithms).
- 2+ years of experience in building high throughput data pipelines using Big Data technologies.
- Exposure to large scale (Petabytes) data processing
- Extensive experience with MapReduce, HDFS
- Experience with Scala, Kafka, Flume, and Oozie is a plus
- Strong problem-solving and analytical ability
- Ability and comfort working independently and making key decisions on projects.
- Excellent interpersonal, written, and verbal communication skills
- Leadership, critical thinking, excellent verbal & written communication skills.
- Hands-on experience with NLP, mining of structured, semi-structured, and unstructured data
- Intuitive understanding of machine learning algorithms, supervised and unsupervised modeling techniques
- Experience working with large, real world data — big, messy, incomplete, full of errors
- Experience with Apache Hadoop, Spark, Solr/Lucene, Cassandra and related technologies
- Working knowledge of SQL, Hive, Pig, and other query languages
- Experience with machine learning tools and libraries such as Scikit-learn, R, Spark, TensorFlow
- Intuition about algorithm and system performance and throughput
- Experience Ranking entities or attributes a plus
- Multimodal learning applications a plus
- Architecture and system/pipeline layout experience a plus
- Deep learning, computer vision, topic modeling, graph algorithms are pluses
- Scala language experience a plus
- Attention to detail, data accuracy, and quality of output
- Strong interpersonal, written, and verbal communication skills
- Ability to effectively function in a fast-paced environment with shifting priorities and simultaneous projects
- Full ownership of project and its properly evaluated and deployed operation in production is a must.