Founded in 2018 by Mathieu Gallet and Arthur Perticoz, Majelan launched its podcast application in June 2019. The application provides free access to more than 18 million podcasts from 50 countries. The startup also produces its own audio content, available by subscription.

Thomas Fillon, Data Director at Majelan:
"We analyze data to better understand our users, customize their experience with recommendations, and improve the product and our own content. Our data architecture is structured around Apache Kafka and runs in the cloud (AWS and GCP). At first, we used Elasticsearch as well as our own servers to perform the analysis. A growth in the number of users and an increase in the use of the platform meant we were hitting the limits of these services in terms of data storage and analysis. We turned to AWS for a solution that matched our ambitions."

Issues:

  • Analyze large volumes of data on all collected statistics
  • A reliable and inexpensive data storage solution (datalake)
  • Keeping the Apache Kafka solution in-house

Majelan collects granular data on how content is listened to, with the aim of making recommendations, improving content and gaining a better understanding of where a user stops listening to a program.

"We had set up our own data pipeline, built around the Apache Kafka solution and designed to comply with GDPR. They were housed in the Amazon Elasticsearch Service, says Thomas Fillon. Soon, due to our growth, we reached the limits of what Elasticsearch could provide". Since its launch, more than 200 million data events have been collected. 

Majelan's need was twofold: to set up a datalake, i.e. to store the data in flat files in order to have a reliable storage solution, and to analyze it. For this, Majelan needed a high-performance solution capable of analyzing a large volume of data. After having shared their technical needs with both AWS and Google Cloud Platform (GCP), the startup finally chose AWS, "for the proposed solution and transparency on how it would be implemented".

The solution is Amazon Athena, which makes it possible to run SQL queries on large volumes of data. "It is accessible since queries are written in SQL rather than in another more "complex" language such as Python. It’s an efficiency gain. For example, since our Growth Hacking manager has mastered SQL, we gave him access to Athena, which allows him to write queries directly without going through the Data team, says Thomas Fillon. Athena is also more flexible at adapting to the architecture around Kafka. AWS offered us a way to connect our Apache Kafka solution to Athena, without pushing their AWS solution (Kinesis) or forcing us to go solely through their own services.

Having statistics on the platform is a key tool for Majelan (the number of users connected at a given moment, the most used content, trends, popularity, etc.). "Elasticsearch allowed us to index the contents of our database. It is very helpful to get live data on the platform’s usage, and to retrieve short term statistics. To begin with, we could easily aggregate data over a week, or a month. But after 3 months, the operation could take 10-15 minutes or longer, our serverìs RAM became the limiting factor. With Athena, queries take just a few seconds and are run directly in the cloud".

Moreover, in a classic SQL database, "if you run a query on 200 million events, generally, you will stress the system and you won’t have the expected performance. Athena makes speed possible while keeping the simplicity of the SQL language."
The startup also has better control over its costs: "If we don't do calculations and data processing, we only pay Athena for the relatively low cost of storage," notes Thomas Fillon.

In addition to Athena, Majelan uses 2 basic services: 

Firstly, Amazon S3 where 30 GB of files are compressed and hosted. "If one day our workloads are large, we’ll be able to retrieve data fairly easily and cheaply compared to other storage solutions," said Thomas Fillon.

AWS Glue, which links Amazon S3 and Amazon Athena "using indexing robots that crawl through S3 files and provide Athena with a structured view so that the service does not have to read 200 million events with each request. This is important, says Thomas Fillon, because Athena's cost is linked to the amount of data read to perform the query. Limiting costs can be done by making intelligent queries that don’t use all the data, for example by partitioning files by date, user or type of use."

Thomas Fillon welcomes the support of AWS in the implementation of this solution. "A Solutions Architect came to our premises to help us link Kafka and the datalake (S3), to set up the automatic jobs managed by Glue, etc. For two days, we programmed, tested and implemented the solution in our pre-production environment. Within a week, we released it to our production environment."

This was an advantage because "we prefer our developers to spend time improving the app or setting up services to improve user experience. Having someone who knows how to configure these services has saved us valuable time," says the Director of Data.

"We don't see AWS as a service provider, but rather as a partner with whom we share many of our issues. I think that in order to move forward, it is important for a startup to have a relationship of trust with stakeholders such as AWS. We used their feedback to implement our 2020 strategy, which we really wanted to define based on reliable data obtained through Athena. We learned a lot from them. This type of collaboration is really important."

Based on this experience, Majelan has since migrated a large part of its architecture to AWS to make it secure and easy to maintain.

  • More time for analysis thanks to reduced query time, down from 10-15 minutes to a few seconds
  • Inexpensive and secure data storage
  • One week to put the Athena solution into production with the support of a Solution Architect