Data Warehouse Migration to Cloud-Native Architecture on AWS

Client

Pixel Federation

Pixel Federation, a leading Slovak mobile gaming company, successfully migrated their on-premise data warehouse infrastructure to a modern, cloud-native architecture on AWS, achieving significant gains in both performance & operational efficiency. The project transformed their legacy Hadoop and Spark setup into a scalable, cost-optimized solution built on Amazon EKS, Apache Spark, Apache Airflow & Amazon S3. The migration delivered a 60% performance improvement while substantially reducing operational overhead & maintenance burden.

Customer Background & Challenge

Pixel Federation is a prominent Slovak mobile gaming company developing and operating multiple successful titles with millions of active users worldwide. Like all modern gaming companies, Pixel Federation relies heavily on data analytics to understand player behavior, optimize game mechanics, improve user retention, and drive business decisions.

The Legacy Infrastructure

Prior to the migration, Pixel Federation operated an on-premise data warehouse built on Hadoop and Apache Spark. While this infrastructure had initially served the company well, it had evolved into a significant operational burden with several critical limitations:

Limited Automation: The original setup required substantial manual intervention for routine operations, updates, and maintenance. In-place updates were virtually impossible. Hardware replacement was the only viable upgrade path, requiring procurement of new servers and construction of a parallel cluster before any migration could occur.
Scaling Constraints: As game popularity grew, the on-premise infrastructure struggled to scale. Adding capacity required physical hardware procurement, installation and configuration: a process measured in weeks or months.
High Maintenance Overhead: Considerable engineering time was consumed by physical infrastructure maintenance, software updates, hardware failure triage, and system availability management, diverting resources from strategic work.
Cost Inefficiency: The on-premise model demanded significant upfront capital investment, ongoing maintenance, and data-centre costs. Resources frequently sat idle during low-demand periods, representing poor capital utilization.
Licensing Complexity: Managing software licences across the Hadoop ecosystem added a further layer of cost and operational complexity.

Strategic Objectives

Pixel Federation recognized that modernizing their data infrastructure was essential to sustain growth and competitive advantage. The primary migration objectives were:

Reduce operational complexity and maintenance burden
Improve scalability to handle growing data volumes and analytical workloads
Optimize costs through more efficient resource utilization
Enhance performance for faster analytical insights
Ensure long-term maintainability with modern, well-supported technologies
Leverage existing cloud infrastructure to create a unified operational environment

Solution Architecture

The decision to migrate to AWS was straightforward: Pixel Federation had already been running all their game backends on AWS for several years. This existing relationship provided familiarity with AWS services, established networking infrastructure, and operational expertise directly applicable to the data warehouse migration.

Core Technology Stack

Amazon EKS: Container orchestration platform. AWS-managed control plane reduces ops overhead while maintaining Kubernetes portability and ecosystem benefits.
Apache Spark: Primary data processing engine. Retained for continuity with existing pipelines; team expertise minimized retraining whilst gaining cloud-native deployment patterns.
Apache Airflow: Workflow orchestration and scheduling. Robust, programmable platform for managing complex data pipeline dependencies and execution schedules.
Amazon S3 Data Lake: Foundation for data storage. Virtually unlimited scalability, 99.999999999% durability, and intelligent tiering for cost-optimized access patterns.
LARA Platform: Labyrinth Labs' proprietary Terraform/EKS/GitOps framework. Accelerated deployment from weeks to days, ensuring consistency, repeatability, and IaC best practices across all environments.

Cost Optimization Strategy

Cost optimization was a primary consideration from the outset. The team implemented several strategies to maximize cost efficiency:

Spot Instances: The majority of Spark analytical compute workloads run on AWS EC2 Spot Instances, offering up to 60% savings vs on-demand. Since analytical jobs tolerate interruptions, Spot provides an ideal cost-performance balance.
Right-Sizing: Cloud infrastructure enables precise matching of compute resources to actual workload requirements, eliminating fixed capacity for peak loads.
Storage Tiering: S3 Intelligent Tiering automatically moves data between access tiers based on usage patterns, optimising storage costs without manual intervention.

Following the initial migration, Pixel Federation continued to refine the platform through several significant enhancements, each delivering further improvements in performance, cost, and operational efficiency.

%%60%%%

Performance Improvement

%%20-40%%%

Better Price-Performance

%%0 Brokers%%

Cutting Streaming Infra Cost

Optimization & Enhancements

1.STEP

[Streaming] Migration from AWS MSK to WarpStream

The initial architecture used AWS Managed Streaming for Apache Kafka [MSK] for real-time data streams from gaming applications. The team identified a significant optimization opportunity by migrating to WarpStream, a Kafka-compatible streaming platform built on object storage.

Reduced Complexity: WarpStream eliminates broker and EBS management entirely, dramatically simplifying streaming operations.
Cost Efficiency: By leveraging S3 instead of EBS volumes, streaming infrastructure costs dropped materially. S3 VPC endpoints eliminated cross-AZ data transfer charges. Data is written directly to S3 in one AZ/account and read from another without incurring inter-AZ traffic costs.
Seamless Integration: Kafka API compatibility meant existing producers and consumers required minimal code changes.
Cloud-Native Scalability: WarpStream's architecture scales naturally with cloud-native patterns in a way that brokered Kafka cannot match.

Full implementation details are available in WarpStream's published case study: warpstream.com/blog/pixelfederation-powers-mobile-analytics-platform-with-warpstream

[Compute] Graviton ARM Migration

One of the most impactful optimisations was migrating all compute instances to AWS Graviton [ARM] processors, delivering superior price-performance compared to traditional x86 instances.

Cost Savings: Graviton instances typically provide 20–40% better price-performance ratios.
Energy Efficiency: ARM architecture reduces environmental impact through improved energy efficiency.
Smooth Migration: Pixel Federation's analytical workloads, primarily Java [Spark] and Python [Zeppelin], proved highly portable and straightforward to migrate. This contrasted with their gaming applications, which use PHP with architecture-dependent extensions and could not be easily migrated, highlighting the importance of architecture-agnostic code design.

[Performance] Data Layer Caching

Intelligent caching at the data layer was implemented to address frequently accessed data patterns:

Frequently accessed datasets served from cache rather than S3, dramatically reducing query latency
Reduced S3 API calls and data transfer costs
Faster query results for interactive analytics and dashboards

[Infrastructure] Custom Kubernetes Scheduler

A custom Kubernetes scheduler was implemented to improve node utilization and reduce waste. Addressing the suboptimal pod placement that can leave nodes underutilised while new nodes spin up unnecessarily.

Better Bin Packing: More efficient placement of pods on existing nodes before scaling out.
Reduced Node Churn: Fewer node additions and removals, reducing Spot Instance interruption exposure.
Cost Optimisation: Better utilisation of existing capacity before incurring costs for additional nodes.‍
Improved Stability: Reduced cluster scaling activity led to more predictable performance characteristics.

Results & Business Impact

This improvement resulted from multiple converging factors: more powerful compute instances on demand, better data locality with the S3 data lake, optimized Spark configurations for cloud deployment, and efficient caching for frequently accessed data.

Operational Efficiency

The cloud-native architecture dramatically reduced the time and effort required to maintain the data warehouse infrastructure. Tasks that previously required manual intervention are now automated or handled by AWS-managed services:

No hardware maintenance or replacement cycles
Automated software updates for managed services
Self-healing infrastructure through Kubernetes
Simplified capacity planning and scaling

This operational improvement freed the engineering team to focus on higher-value activities: developing new analytical capabilities, optimizing data pipelines, and supporting data science initiatives.

Cost Optimization

While specific figures are confidential, the combination of Spot Instances, Graviton processors, WarpStream and improved resource utilization delivered substantial cost savings compared to both the previous on-premise infrastructure and a naive lift-and-shift cloud migration approach.

Ongoing Challenges

Cost Observability & Attribution: Accurately attributing costs to specific users, teams, queries, and projects in a shared dynamically-allocated environment requires sophisticated tagging, monitoring, and cost allocation strategies. Some challenges stem from inherent platform constraints. For example, attributing exact S3 API call volumes to individual queries is extremely difficult. The team continues to refine their approach through enhanced tagging, custom allocation tools, AWS Cost Explorer / CUR integration, and internal chargeback models.
Continuous Optimization: The flexibility of the cloud environment means optimization is an ongoing discipline, not a one-time event. The team maintains a backlog of potential improvements and regularly evaluates new AWS services.

Long-Term Maintainability

Perhaps the most significant but hardest-to-quantify benefit is improved long-term maintainability:

Standard Technologies: Widely adopted open-source and AWS-managed services ensure long-term support and rich community resources.
Easier Updates: Containerization and infrastructure-as-code make updates more predictable and less risky.
Talent Attraction: Modern technology stacks make it easier to recruit and retain skilled engineers.

Branislav Bernát

CIO

Pixel Federation

Labyrinth Labs expertise in cutting edge technologies helped us take our innovations and progress to another level.

Lessons Learned

Leverage Existing Infrastructure: Building on an existing AWS presence avoided multi-cloud complexity and leveraged established expertise and networking, accelerating time-to-value.
Iterate and Optimize: The initial migration established a solid foundation; subsequent optimizations [WarpStream, Graviton, caching, custom scheduling] delivered additional compounding benefits. Treat migration as a journey, not an event.
Balance Familiarity & Innovation: Retaining familiar technologies [Spark, Airflow] minimized disruption while adopting cloud-native deployment patterns provided modernization benefits.
Architecture-Agnostic: Ability to migrate to Graviton ARM with minimal effort demonstrated the value of platform-agnostic code that can exploit new technologies as they emerge.
Plan Cost Observability Early: Implementing comprehensive cost tracking and attribution from the start is far easier than retrofitting it later.
Use Platform Management Tools: Scalable, modular platform management software such as LARA significantly streamlines infrastructure orchestration, improves operational consistency, and reduces complexity at scale.

Best Practices for Cloud-Native Data Warehouses

Based on this engagement, organizations planning similar migrations should consider:

Start with a solid foundation: invest in proper architecture, networking, security, and IaC design upfront
Implement cost controls early: build tagging strategies, monitoring, and cost allocation mechanisms from day one
Use managed services strategically: balance reduced operational burden against cost and control trade-offs
Plan for Spot Instance interruptions: design workloads to handle interruptions gracefully
Embrace containerization: containers provide portability, resource efficiency, and operational benefits
Automate everything: IaC, CI/CD pipelines, and automated testing reduce errors and accelerate deployments
Plan for sustainable maintenance: decisions made today have recurring consequences; architect for the long term

Future Considerations

Pixel Federation continues to evaluate opportunities for further optimization:

Enhanced Cost Attribution: More sophisticated cost allocation and chargeback mechanisms to provide per-team, per-project, and per-query visibility.
Advanced Analytics: Exploring Amazon Athena for ad-hoc queries, AWS Glue for data cataloguing, and Amazon Redshift for specific data warehouse use cases.
ML Integration: Investigating tighter integration of machine learning workflows with the data lake, potentially leveraging Amazon SageMaker.
Multi-Region: Evaluating multi-region deployment patterns for improved latency and data residency compliance as the business grows globally.

Sustainability Focus: Continuing to optimize for energy efficiency via ARM processors and AWS sustainability tooling.