Challenge

Data from Adobe Campaign, Kickdynamic, Dynmark and other data sources was difficult to access.

In September 2013, Profusion’s data science team responded by designing a new analytical database (ADB) that merged data from multiple siloed data sources into one unified data warehouse.

The database empowered data scientists to query all multiple data sources using SQL. Data analysts were able to use the new in-frastructure to produce detailed marketing reports in one hour instead of eight. This marked a radical shift for the business, enabling quick insights for our teams of marketing consultants and campaign managers.

Eucalyptus backend

We are moving towards a world where we must deal with increasingly large datasets for the work that we do. We use Hadoop, Spark, Zeppelin and Hue to provide a powerful and unified analytics environment for big data.

Hadoop

Hadoop is a data infrastructure designed for parallelised computation. The advantage of using a Hadoop cluster over a server with a traditional SQL based data warehouse is this: we can organically grow and scale it by adding one Hadoop node at a time.

Spark

Using Hadoop, we are empowered to use parallelised computation. However, writing parallelised algorithms is very time-consuming, complex and challenging. Using Spark, we can hide away a lot of that complexity, make it very easy for data scientists to harness machine learning libraries optimised for parallelised computation. Spark provides a unified framework where querying and analysis can happen at the same time.

Notebook environment: Zeppelin and Hue

Using Zeppelin and Hue creates a user-friendly notebook environment where Spark and Hive codes can be written and shared with colleagues. Here it is much quicker and easier to explore data and develop code than it is at the command line.

Eucalyptus semantic layer

Currently it takes three to four months to train new data scientists to use and query the analytical database. Within Adobe Campaign, the data table structure is very complex. Adobe has been designed for users to log and store data efficiently, but not to understand and query it easily.

Our data science team has designed new clean and clear data tables known as the semantic layer. This allows newly recruited data analysts and data scientists to get to grips with the data at speed and produce valuable insights soon after they join. They need relatively little training to do so.

Eucalyptus dashboard

Using the clean data tables, this data is fed into a dashboard to empower marketers, campaign managers, email developers and third party stakeholders. Anyone can explore and ask questions of our marketing data, without a data analyst or data scientist. Key stakeholders can explore and interpret the data in real time to inform commercially-led decisions around time-critical client or customer needs.

Join our network

Mention charities and businesses you would like to find out more, please get in touch

Contact us
[contact-form-7 404 "Not Found"]