Data Engineer, Web & Social Scraping

Department: Engineering

Location: Remote, USA

About The Role

We’re looking for a passionate Data Engineer / Scraping Expert to develop sophisticated social media and web data mining robots.

You will be responsible for writing crawlers and scraping large volumes of unstructured data – video, images, audio, text, and more – from public databases, websites, and social platforms using Python.

Applicants for this role will own our team’s data pipeline – deciding which data to scrape, parse and store – and help drive the success of our platform overall. As a key part of a fast-growing startup, you will have an active leadership role and work hand-in-hand with the team in the creation, documentation and implementation of scraping methodologies and tools. To that end, applicants should enjoy taking ownership while closely collaborating with others, enjoy iterating and pivoting quickly, and excel at building tools that are easy to understand and extensible. 

The ideal candidate is adept in Python, Selenium, JavaScript, Headless Chrome, Linux, and AWS, and comfortable working in an agile, test-driven environment and is experienced in continuous-delivery processes. Knowledge of digital advertising operations is nice to have.

Team & Technology

OpenBrand detects, labels, and indexes marketing data (logos, products, pricing, etc.) in natural video. We bring established Big Data practices and scale to new-age media like TikTok, Twitch, and Hulu, empowering agencies to automate menial attribution work and generate meaningful insight.

We adapt Scrum and Lean approaches depending on the project. Above all, we’re looking for a self-driven Engineer who enjoys getting their hands dirty and picking up new techniques, models, and APIs on the fly. Applicants must also be excellent problem solvers who know how to detect and evade anti-scraping / bot technologies.

If you’re the type of person who comes to work every day expecting to learn, contribute, teach, take ownership and have fun – and if you’re excited to work with massive amounts of data and the resulting challenges (storage, processing, etc.) then we think you’ll fit right in.

Responsibilities

  • Connect to public databases to ingest data as well as executing one-off imports of data.

  • Create new data ingestion and processing tooling to eliminate manual processes, inefficient or repetitive work, and address quality issues.

  • Make thoughtful judgements on data quality to clean data sources for import.

  • Use third-party APIs and web scraping tools to source data at scale.

  • Work with the team to scale and embed techniques and help with data ingestion projects.

  • Demonstrate common sense in applying business logic to ontological/schema decisions.

  • Extract data from a variety of relational databases, manipulate, explore data using quantitative, statistical and visualization tools

  • Develop and implement standards for clean code that maintain modularity, clarity, and portability.

  • Actively contribute ideas for product improvements and solutions to technology challenges, including features and performance considerations.

  • Contribute to the software release process.

  • Demonstrate passion for continued learning by staying abreast of new technology and trends.

Qualifications

  • 3+ years of professional web/software development experience.

  • 3+ years experience in Python.

  • 2+ years database experience.

  • 2+ year Javascript experience.

  • 2+ year Selenium experience.

  • Knowledge of Browser-Web Server interaction including DNS, HTTP, SSL, GET vs POST.

  • ETL, Pentaho experience.

  • SQL experience.

  • Pandas, scipy, numpy experience.

  • Command of software engineering principles, frameworks and technologies.

  • Experience prioritizing and performing multiple tasks in time-critical situations.

  • Comfort working within a fast-paced, dynamic and distributed environment.

  • Attention to detail, strong organizational skills and excellent follow-through.

  • Adept problem-solving ability, judgment and resourcefulness.

  • Strong written and verbal communication skills.

  • Ability to communicate cross-functionally.

  • Intellectual curiosity, self-motivation, independent with team building skills

Submit your resume for this position