Making Kubernetes ready for spark workloads

by Andrea Tortorella, Nunzio Iaccarino

Cloud English

At Xriba we need to analyze and transform a lot of accountability data, to improve our machine learning models and offer our customers a fast and reliable overview of their company KPIs.

Spark is one of the best technologies for high volume data processing that optimizes time and resource utilization, but managing a spark cluster is not an easy task.

We will describe our journey from Dataproc, the Google Spark offering, to a self-managed deployment on Kubernetes that helped us to keep costs under control and deployment strategies on par with the rest of our company software.

Andrea Tortorella
Tech Lead, Xriba L.t.d.

I'm a polyglot programmer with 10 years of experience. I've worked in different fields ranging from constrained IOT devices and mobile development, to data pipelines and distributed systems. I'm currently focusing on solving problems in micro service architectures and cloud infrastructure.

Nunzio Iaccarino
Python developer, Xriba

Several years of experience in sfotware development, always curios about new technologies and new strategy to write manutenable and faster code, I worked with several programming language digging on some different development branch starting from mobile apps development, to back/front-end development, currently I have accepted a new challenge joining in an machine learning development team, discovering the data scientist world ...