Presented by:

Aimilios Tsouvelekakis

from Frontiers Media SA

Aimilios Tsouvelekakis has several years of experience in Devops, Cloud and Software Engineering. Currently he is working at Frontiers Media as a software engineer. He is passionate about learning new stuff, experimenting with new technologies and contributing to the open source community.

  • Apasionado por el open source, ha participado y mantenido varios proyectos a lo largo de los años en diferentes lenguajes.
  • Ha trabajado años creando soluciones/aplicaciones, principalmente en ingenieria de datos y web.
  • En estos últimos años se centra más en el desarrollo de librerias/frameworks.
  • Activo en proyectos de Open Source.
  • Antes trabajaba de Senior Software Engineer en @Frontiers, ahora trabaja de Database Ecosystem Engineer en @CrateDB.
No video of the event yet, sorry!

Data in any organization is one of its most valuable assets. A modern solution is the 'Datalake' a centralized repository designed to store, process, and secure large amounts of structured, semistructured, and unstructured data.

As the datalake grows, complexity builds up, the data engineer faces many challenges:

  • Diverse data sources spanning SQL databases, events, S3/Blob files, SFTP, and web crawling.
  • Different environments or versions for the datalake.
  • Data models and transformations.
  • Data type validations and correctness.
  • Visibility and monitoring.
  • Orchestration.

Many teams end up having large and complex notebooks, scripts and pipelines, with little to no centralized definitions of the Data models, storages and environments, this eventually slows down development.

We faced this issue: more than a hundred models, terabytes of data, different storages/types, several environments for developing, complex QA and validations.

We decided that this problem needed a more Software Engineering oriented solution: developing our custom solution.

A bespoke framework to create models, data pipelines, handle the complexity of storages and instances and many of the aforementioned problems.

The solution became one of the cornerstones of the data team.

We re-wrote some pieces from scratch in an Open Source library written in Python and based on the popular and modern dataframe library Polars, join us as we expand on Data problems, Datalake challenges and some of the code we wrote to solve it.

Date:
2024 June 21 - 17:00
Duration:
45 min
Room:
Sala Canillas
Conference:
OpenSouthCode 2024
Language:
English
Track:
Difficulty:
Medium

Happening at the same time:

  1. Crazy Labs ?. Puedo hackear tu casa … ¡¡y lo sabes ?!! (Meetup de BoquerónSEC)
  2. Start Time:
    2024 June 21 16:00

    Room:
    Sala Riogordo 1

  3. Crazy Labs ?. Puedo hackear tu casa … ¡¡y lo sabes ?!! (Meetup de BoquerónSEC)
  4. Start Time:
    2024 June 21 16:00

    Room:
    Sala Riogordo 2

  5. WordPress Málaga - Montaje y Configuración de WordPress con DDEV
  6. Start Time:
    2024 June 21 17:00

    Room:
    Sala Mollina

  7. Técnicas de aprendizaje colaborativo
  8. Start Time:
    2024 June 21 17:00

    Room:
    Sala Fuengirola

  9. Where does your Ansible code come from?
  10. Start Time:
    2024 June 21 17:00

    Room:
    Sala Riogordo 3

  11. Videojuegos 2D con Python Arcade
  12. Start Time:
    2024 June 21 17:00

    Room:
    Sala Benalmádena