Software Engineering Intern
Join our team and surround yourself with highly motivated and skilled coworkers to build cutting edge solutions for prestigeous clients around the globe
Topic 1: Design of a Schema-Resilient Data Ingestion Architecture for MASS Analytics Using Apache NiFi and Apache Iceberg
Description:
MASS Analytics products ingest data from external client systems such as Snowflake and process it through analytics models and AOA workflows.
Frequent schema changes in source systems can break ingestion, modeling, and automation pipelines, causing downtime and manual intervention.
This project aims to design and implement a robust data ingestion and storage architecture using Apache NiFi and Apache Iceberg to detect, control, and manage schema evolution.
Key attributes / Main competencies:
· Java and Python programming
· Relational databases and SQL
· Data modeling and schema management
· Data pipeline design and integration
· Problem-solving and analytical skills
· Software engineering principles
Learning Outcomes:
· Understand challenges of schema evolution in large-scale analytics platforms
· Design resilient data pipelines decoupled from source system changes
· Implement controlled schema evolution using modern Lakehouse technologies
· Evaluate pipeline stability and performance under schema variability
Topic 2: Design of an intelligent orchestration framework for MASS Analytics' Always On Analytics workflows using MCP and Large Language Models
Description:
The project exposes AOA components as MCP tools and uses an LLM to dynamically plan, execute, and monitor end-to-end workflows. The solution handles failures, conditional steps, and component dependencies through policy-driven decision making. Built-in guardrails ensure secure, explainable, and auditable execution suitable for enterprise environments. framework the outcome is a more resilient, adaptive, and maintainable AOA pipeline orchestration
Key attributes / Main competencies:
· Large Language Models and AI-assisted systems
· Distributed systems and workflow orchestration
· API-based system integration and MCP concepts
· Software architecture and modular design
Learning Outcomes:
· Understanding of LLM-based orchestration and decision-making systems
· Ability to design and integrate distributed workflow components
· Application of policy-driven control and guardrails in AI systems
· Analysis and handling of failures in automated pipelines
· Evaluation of system resilience, explainability, and maintainability
Topic 3: Design and implement an Always-On Analytics (AOA) application for the Databricks Marketplace that continuously runs cost-efficient analytics pipelines.
Description:
The project focuses on incremental data processing to refresh models automatically while minimizing compute usage.
It includes monitoring mechanisms to track data quality, model stability, and performance over time.
The application generates actionable insights and prioritized recommendations ready to drive the “next dollar” of value.
The solution is built as a scalable, reusable, and marketplace-ready Databricks app
Key attributes / Main competencies:
· Incremental and batch data processing
· Machine learning model lifecycle management
· Data quality monitoring and validation
· Performance analysis and system monitoring
· Distributed computing with Databricks and Spark
· Scalable application design
Learning Outcomes:
· Understanding incremental data processing strategies to optimize compute usage
· Ability to automate model refresh and evaluation pipelines
· Application of data quality and model stability monitoring techniques
· Design of scalable and reusable analytics applications
· Generation of data-driven insights and business recommendations
About MASS Analytics
We specialize in Marketing Mix Modeling (MMM) and Media Effectiveness Measurement. We offer our clients a comprehensive MMM software suite backed up by a wide range of managed services solutions to help identify sales drivers, measure MROI and optimize Marketing budgets.