Building an LLM pipeline to extract insights from DataCamp’s learning videos
Founded in 2013, DataCamp aims to make data science education accessible to all. They offer an online learning platform covering data science, engineering, and AI, serving over 10 million users and more than 2,500 companies globally. DataCamp requested our assistance in assembling a product team to accelerate specific roadmap goals, including the development of an LLM pipeline to automatically extract insights from the webinars on their learnings platform.

Challenge
DataCamp was looking for a temporary product team to architect and build an AI solution to unlock insights from the video data on their learning platform (500+ webinars).
Solution
We set out on a 6-months mission to engineer an LLM pipeline that extracts the narrative from videos, processes it, and automatically generates insightful summaries per video, integrated into the DataCamp platform.
Approach
We developed an AI pipeline leveraging multiple Large Language Models in sequence. We used Pyannote to extract speakers and timestamps from videos, Whisper-1 for transcriptions and GPT 4o for summarization.

We wish the whole DataCamp team the very best of luck on their rapid growth journey, educating people all around the world on data science and AI. We are proud to have a significant contribution to the buildout of your platform. We’ll be happy to keep supporting in every way possible.

Let's build. Together!
Are you looking for an entrepreneurial product development partner? Never hesitate to schedule a virtual coffee.

