2024
This project focused on transforming a monolithic Python workflow into a modular, reproducible data science pipeline. I redesigned the architecture into structured components for data preprocessing, model training, and model selection, ensuring clear separation of responsibilities and improved scalability. I implemented data versioning with DVC to track datasets and artifacts across experiments, enabling consistent and traceable workflows. Using Dagger (Go), I orchestrated automated pipeline execution, managing each stage from preprocessing to model validation and artifact generation within a containerized environment. The pipeline was integrated with GitHub Actions for CI/CD, allowing every run to be reproducible, documented, and validated through automated tests. This setup ensured reliable experiment tracking, streamlined collaboration, and efficient deployment of production-ready models.