In my previous post Big, white, puffy clouds can still evaporate, I mentioned the Google Gmail outage from February and pointed out the implied resiliency (or lack there-of) of cloud-based platforms. Rich Miller over at DataCenterKnowledge.com just posted an interview with Urs Holzle from Google entitled "How Google Routes Around Outages." The title of this post is a quote from the referenced article, and it seems rather fitting.
On the one hand, I was a bit surprised to read Holzle admit the preference to have human involvement in remediating certain types of outages. It seems Google has automated technological controls to handle single system and small-scale failures (a whole equipment rack, etc.) within a single data center; but larger scale outages (such as a whole data center) are mitigated manually. On the other hand, I can understand the desire to have explicit human oversight; large-scale failures tend to have unpredictable cascading effects that are hard to account for with automated systems. In fact, Holzle mentions that the very systems designed to handle larger-scale outages tend to be the same systems responsible for the outages; the February GMail outage is an explicit example.
Having said that, I still believe clouds need to be as self-sustaining as possible (preferably in automated fashion) despite the current limits of technology and engineering. That implied resiliency and redundancy is one of the core value propositions of clouds. Without it, clouds get devalued into not much more than a traditional managed hosting platform/service.
My suggestion to organizations considering a purchase from a cloud-based vendor is to have a due-diligence discovery session regarding the cloud architecture that vendor has engineered. You want to see something akin to a fairly detailed Visio diagram depicting multiple datacenter/regional locations, with appropriate systems in each location and the roles they provide to the overall cloud platform and offered services. You want to see redundancy in systems and roles, with distribution and backups across disparate regions. Look for points of failure, and ask them how the entire cloud handles an outage and thwarts a service interruption during the failure for any given node you happen to point to. Especially point to an entire datacenter/region and ask what happens if that entire area of the cloud system goes down due large-scale power outage, natural disaster, etc. Also be wary of how cloud entrance points could become affected during an outage; it is great if the overall service will naturally shift processing to a secondary region if the entire first region experiences an outage, but it is also not so great if you have to explicitly go back and reconfigure your organization to point to the secondary region. Even though the processing survived, your point of entrance changed and thus you still had an outage. You definitely want to ensure all points of presence/entrance are fault tolerant.
Overall, going with a cloud vendor still requires you to think about redundancy, fault tolerance, business continuity, etc. And it might require a little extra effort up front in order to ferret out the appropriate details from your cloud vendor during the initial evaluation. Fortunately, once you are satisfied with the resiliency offered by the vendor's cloud architecture, you can then focus your attention elsewhere--the vendor is the one that has to deal with the implementation and ongoing maintenance of the necessary architecture redundancy. And that is one headache that is definitely nice to outsource.
Until next time,
- Jeff
Blog da Zscaler
Receba as últimas atualizações do blog da Zscaler na sua caixa de entrada
Changing A Tire While Going 60mph
Esta postagem foi útil??
Aviso legal: este post no blog foi criado pela Zscaler apenas para fins informativos e é fornecido "no estado em que se encontra", sem quaisquer garantias de exatidão, integridade ou confiabilidade. A Zscaler não se responsabiliza por quaisquer erros, omissões ou por quaisquer ações tomadas com base nas informações fornecidas. Quaisquer sites ou recursos de terceiros vinculados neste post são fornecidos apenas para sua conveniência, e a Zscaler não se responsabiliza por seu conteúdo ou práticas. Todo o conteúdo está sujeito a alterações sem aviso prévio. Ao acessar este blog, você concorda com estes termos e reconhece que é de sua exclusiva responsabilidade verificar e utilizar as informações conforme apropriado para suas necessidades.
Receba as últimas atualizações do blog da Zscaler na sua caixa de entrada
Ao enviar o formulário, você concorda com nossa política de privacidade.



