Maximum Availability Architecture
Oracle University Podcast - A podcast by Oracle Corporation - Tuesdays
Categories:
Join Lois Houston and Nikita Abraham, along with Alex Bouchereau, as they talk about Oracle Maximum Availability Architecture, which provides architecture, configuration, and lifecycle best practices for Oracle Databases. Oracle MyLearn: https://mylearn.oracle.com/ Oracle University Learning Community: https://education.oracle.com/ou-community LinkedIn: https://www.linkedin.com/showcase/oracle-university/ Twitter: https://twitter.com/Oracle_Edu Special thanks to Arijit Ghosh, David Wright, Ranbir Singh, and the OU Studio Team for helping us create this episode. -------------------------------------------------------- Episode Transcript: 00;00;00;00 - 00;00;39;11 Welcome to the Oracle University Podcast, the first stop on your cloud journey. During this series of informative podcasts, we'll bring you foundational training on the most popular Oracle technologies. Let's get started. Hello and welcome to the Oracle University Podcast. I'm Nikita Abraham, Principal Technical Editor with Oracle University, and I'm joined by Lois Houston, Director of Product Innovation and Go to Market Programs. 00;00;39;18 - 00;01;12;09 Hi, everyone. Last week, we discussed Oracle's Maximum Security Architecture, and today, we're moving on to Oracle Cloud Infrastructure's Maximum Availability Architecture. To take us through this, we're once again joined by Oracle Database Specialist Alex Bouchereau. Welcome, Alex. We're so happy you're becoming a regular on our podcast. So, to start, what is OCI Maximum Availability Architecture? Now, before we actually jump into the specifics, it's important to understand the problem we're trying to address. 00;01;12;11 - 00;01;38;01 And that is database downtime and data protection. We don't want any data loss and the impact of both of these types of occurrences can be significant. Now, $350K on average of costs of downtime per hour, 87 hours average amount of downtime per year is pretty significant. So, it's a very, very common occurrence. It's $10 million for a single outage, depending on how critical the application is. 00;01;38;03 - 00;02;02;28 And 91% of companies have experienced unplanned data center outages, which means this occurs fairly often. So, what can we do about this? How do we address the problem of data loss? It's important to understand a different terminology first. So, we'll start with high availability. High availability provides redundant components to go ahead and ensure your service is uninterrupted in case of a type of hardware failure. 00;02;03;01 - 00;02;24;24 So, if one server goes down, the other servers will be up. Ideally, you'll have a cluster to go ahead and provide that level of redundancy. And then we talk about scalability. Depending upon the workload, you want to ensure that you still have your performance. So, as your application becomes more popular and more end users go ahead and join it, the workload increases. 00;02;24;26 - 00;02;42;28 So, you want to ensure that the performance is not impacted at all. So, if we want to go ahead and minimize the time of our planned maintenance, which happens more often and a lot more often than unplanned outages, we need to do so in a rolling fashion. And that's where rolling upgrades, rolling patches, and all these types of features come into play. 00;02;42;29 - 00;03;10;20 Okay, so just to recap, the key terms you spoke about were high availability, which is if one server goes down, others will be up, scalability, which is even if the workload increases, performance isn’t impacted, and rolling updates, which is managing planned updates seamlessly with no downtime. Great. What's next? Disaster recovery. So, we move from high availability to disaster recovery, protecting us from a complete site outage. 00;03;10;27 - 00;03;35;02 So, if the site goes down entirely, we want to have a redundant site to be able to failover to. That's where disaster recovery comes into play. And then how do we measure downtime and data loss? So, we do so with Recovery Point Objectives, or RPOs, measuring data loss and Recovery Time Objectives, or RTOs, measuring our downtime. 00;03;35;05 - 00;04;00;22 Alex, when you say measure downtime, how do we actually do that? Well, we use a technique called chaos engineering. Essentially, it's an art form at the end of the day because it's constantly evolving and changing over time. We're proactively breaking things in the system and we're testing how our failover, how our resiliency, and how our switchovers, and how everything goes ahead and works under the covers with all our different features. 00;04;00;23 - 00;04;21;28 A lot of components can suffer an outage, right? We have networks and servers, storage, and all these different components can fail. But also human error. Someone can delete a table. You could delete a bunch of rows. So, they can make a mistake on the system as well. That occurs very often. Data corruption and then, of course, power failures. 00;04;22;00 - 00;04;45;03 Godzilla could attack and take out the entire data center. Godzilla! Ha! And you want to be able to go ahead and have a disaster recovery in place. And then there's all kinds of maintenance activities that happen with application updates. You might want to reorganize the data without changing the application and the small, little optimizations. And these can all happen in isolation and or in combination with each other. 00;04;45;05 - 00;05;19;03 And so chaos engineers take all this into consideration and build out the use cases to go ahead and test the system. Do we have some best practices in place for this, then? Oracle Maximum Availability Architecture, MAA, is Oracle's best practice blueprint based on proven Oracle high availability technologies, end-to-end validation, expert recommendations, and customer experiences. The key goal of MAA is to achieve optimal high availability, data protection, and disaster recovery for Oracle customers at the lowest cost and complexity. 00;05;19;05 - 00;05;54;07 MAA consists of reference architectures for various buckets of HA service-level agreements, configuration practices, and HA lifecycle operational best practices, and are applicable for non-engineered systems, engineered systems, non-cloud, and cloud deployments. Availability of data and applications is an important element of every IT strategy. At Oracle, we've used our decades of enterprise experience to develop an all-encompassing framework that we can all call Oracle MAA, for Maximum Availability Architecture. 00;05;54;07 - 00;06;20;21 And how was Oracle's Maximum Availability Architecture developed? Oracle MAA starts with customer insights and expert recommendations. These have been collected from our huge pool of customers and community of database architects, software engineers, and database strategists. Over the years, this has helped the Oracle MAA development team gain a deep and complete understanding of various kinds of events that can affect availability. 00;06;20;24 - 00;06;48;11 Through this, they have developed an array of availability reference architectures. These reference architectures acknowledge not all data or applications require the same protection and that there are real tradeoffs in terms of cost and effort that should be considered. Whatever your availability goals may be for a database or related applications, Oracle has the product functionality and guidance to ensure you can make the right decision with full knowledge of the tradeoffs in terms of downtime, data loss, and costs. 00;06;48;11 - 00;07;04;01 These reference architectures use a wide array of our HA features, configurations, and operational practices. 00;07;04;03 - 00;07;29;04 Want to get the inside scoop on Oracle University? Head on over to the all-new Oracle University Learning Community. Attend exclusive events. Read up on the latest news. Get firsthand access to new products and stay up-to-date with upcoming certification opportunities. If you're already an Oracle MyLearn user, go to mylearn.oracle.com to join the community. You will need to log in first. If you've not yet accessed Oracle MyLearn, visit mylearn.oracle.com and create an account to get started. 00;07;29;04 - 00;07;57;19 Join the community today. Welcome back. Alex, you were telling us about how Oracle MAA or Maximum Availability Architecture has reference architectures that use a series of high availability features and configurations. But, how do these help our customers? They help our end customers achieve primarily four goals. 00;07;57;22 - 00;08;29;29 Number one, data protection, reducing data loss through flashback and absolute data protection through zero data loss recovery appliance. Number two, active replication, which allows customers to connect their applications to replicated sites in an active-active HA solution through Active Data Guard and GoldenGate. Number three, scale out, which allows customers the ability to scale compute nodes linearly through RAC, ASM, and Sharding. 00;08;30;01 - 00;08;58;19 Four, continuous availability. This allows transparent failovers of services across sites distributed locally or remote, through AC and GDS. These features and solutions allow customers to mitigate not only planned events, such as software upgrades, data schema changes, and patching, but also unplanned events, such as hardware failures and software crashes due to bugs. Finally, customers have various deployment choices on which we can deploy these HA solutions. 00;08;58;22 - 00;09;25;02 The insights, recommendations, reference architectures, features, configurations, best practices, and deployment choices combine to form a holistic blueprint, which allows customers to successfully achieve their high availability goals. What are the different technologies that come into play here? Well, we'll start with RAC. So, RAC is a clustering technology spread through different nodes across the different servers, so you don't have a single point of failure. 00;09;25;05 - 00;09;46;13 From a scalability standpoint and performance standpoint, you get a lot of benefit associated with that. You constantly add a new node whenever you want to without experiencing any downtime. So, you have that flexibility at this point. And if any type of outage occurs, all the committed transactions are going to be protected and we'll go ahead and we'll move that session over to a new service. 00;09;46;15 - 00;10;07;27 So, from that point, we want to go ahead and also protect our in-flight transactions. So, when it comes to in-flight transactions, how are we going to protect those in addition to the RAC nodes? Well, we can go ahead and do so with another piece of technology that's built into RAC, and that's the Transparent Application Continuity feature. So, this feature is going to expand the capabilities of RAC. 00;10;08;03 - 00;10;28;18 It's a feature of RAC to go ahead and protect our in-flight transactions so our application doesn't experience those transactions failing and coming back up to the layer, or even up to the end users. We want to capture those. We want to replay them. So that's what application continuity does. It allows us to go in and do that. 00;10;28;21 - 00;10;51;03 It supports a whole bunch of different technologies, from Java, .NET, PHP. You don't have to make any changes to the application. All you have to do is use the correct driver and have the connection string appropriately configured and everything else is happening in the database. What about for disaster recovery? Active Data Guard is the Oracle solution for disaster recovery. 00;10;51;05 - 00;11;29;08 It eliminates a single point of failure by providing one or more synchronized physical replicas of the production database. It uses Oracle Aware Replication to efficiently use network bandwidth and provide unique levels of data protection. It provides data availability with fast, manual, or automatic failover to standby should a primary fail and fast switch over to a standby for the purpose of minimizing planned downtime as well. An Active Data Guard standby is open, read only, while it is being synchronized, providing advanced features for data protection, availability, and protection offload. 00;11;29;08 - 00;11;50;23 We have different database services, right? We have our Oracle Database Cloud servers, we have Exadata Cloud servers, and we have Autonomous Database. Do they all have varying technologies built into them? All of them are Database Aware architecture at the end of the day. And the Oracle Database Cloud Service, you have the choice of single instance, or you can go ahead and choose between RAC as well. 00;11;50;25 - 00;12;23;25 You can use quick migration via Zero Downtime Migration, or ZDM for short. We have automated backups built in, and you can set up cross-regional or cross availability to do any DR with Active Data Guard through our control play. And we build on that with Exadata Cloud Service by going ahead and changing the foundation to Exadata, with all the rich benefits of performance, scalability, and optimizations for the Oracle Database, and all the different HA and DR technologies that run within it, to the cloud. 00;12;23;27 - 00;12;50;22 Very easy to go ahead and move from Exadata on-premise to Exadata Cloud Service. And you have choices. You can do the public cloud, or you can do Cloud@Customer or ExaCC, as we call it, to go ahead and run Exadata within your own data center--Exadata Cloud Service and your own data center. And building on top of that, we have Autonomous, which also builds on top of that Exadata infrastructure. 00;12;50;25 - 00;13;19;12 And we have two flavors of that. We have shared and we have dedicated, depending upon your requirements. And is all of this managed by Oracle? Now, at this point, everything's managed by Oracle and things like Data Guard can be configured. We call it Autonomous Data Guard in the Autonomous Database. With a simple two clicks, you can set up cross-regional or cross availability domain VR. And then everything is built, of course, from a high-available multitenant RAC infrastructure. 00;13;19;15 - 00;13;48;02 So, it's using all other technologies and optimizations that we've been talking about. Thanks, Alex, for listing out the different offerings we have. I think we can wind up for today. Any final thoughts? So high availability, disaster recovery, absolute requirements. Everybody should have it. Everybody should think of it ahead of time. We have different blueprints, different tiers of our MAA architecture that map different RTO and RPO requirements depending upon your needs. 00;13;48;04 - 00;14;12;01 And those may change over time. And finally, the business continuity we can provide with MAA is for both planned maintenance and unplanned outage events. So, it's for both. And that's a critical part to this as well. Thank you, Alex, for spending this time with us. That's it for this episode. Next week, we'll talk about managing Oracle Database with REST APIs, and ADB built-in tools. 00;14;12;04 - 00;16;57;28 Until then, this is Nikita Abraham and Lois Houston signing off. That's all for this episode of the Oracle University Podcast. If you enjoyed listening, please click Subscribe to get all the latest episodes. We'd also love it if you would take a moment to rate and review us on your podcast app. See you again on the next episode of the Oracle University Podcast.