STP302: Using Amazon EMR to build a Spark ecosystem at Opendoor

AWS re:Invent 2019 - A podcast by AWS

Categories:

In this session, you learn how Opendoor, an online home-buying and selling service, manages medium-sized real estate data using Spark. The session covers the journey from in-house data processing solutions using Kubernetes to benchmark against providers, such as Amazon EMR and Databricks, to migrating to Amazon EMR to achieve a balance of cost and performance. As a cost-conscious startup with heavy data processing needs, Opendoor had to balance cost, performance, availability, and developer experiences when designing its ETL and machine learning platform. This session focuses on how Opendoor optimized its data workflow by migrating Spark workloads from Kubernetes to Amazon EMR.