Photo by Slim Emcee on Unsplash

This blog is about our last Hadoop migration but from a different angle, instead of describing the technical aspects of it (don’t worry, I will go over it in addition) I will be more focusing on the human perspective of it, how and why we took the decision to go from (attention, spoiler alert!!!) commercial to community solution.

Background

But first some context, Outbrain is the world’s leading content discovery platform. We serve over 400 billion content recommendations every month, to over 1 billion users across the world. In order to support such a large scale, we have a backend system…


Background

2020 was a unique year when most of the industries in the world were forced to work remotely for a long period of time due to the COVID-19 pandemic. Besides health and employment concerns, staying at home for a long period of time is not natural for the human race as we are communicative beings by nature, causing many challenges, where one of the biggest challenges is how to maintain the social fabric. …


When we were considering migrating our data delivery pipeline from batches of hourly files into a real time streaming system, the reasons we had in mind were the obvious ones; to reduce the latency of our processing system and to make the data available in real time. But as soon as we started to work on it, we realized that there are quite a few additional good reasons to embark on this complex project.

Overview

Our data delivery pipeline is designed to deliver data from various services into our Hadoop ecosystem. There it is processed using Spark and Hive in order…


One of the biggest challenges facing infrastructure groups is how to build TRUST with their customers. If achieved, this trust will lead to a better working environment where colleagues have good relationships and trust each other.

And like any relationship, one of the ways to gain someone’s trust is by being transparent. Transparency shows that you have nothing to hide, and allowing the free flow of information between sides.

In our engineering context, we are talking about transparency of the micro services owned by development teams and the resources that the infrastructure groups provide them.

Providing transparency for the resources’…

Avi Avraham

Director of Data Engineering @ Outbrain

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store