
Are your applications choking in the cloud?
We previously talked about the three dimensions of speed: infrastructure, fabric architecture and global reach, and how you’re only as fast as the slowest point of your network. In this post, we’ll explore choking hazards inside public clouds, different types of workloads and the overcrowding effect.
Workload Example #1: A large number of “micro” applications
Imagine you have a cloud infrastructure in the public cloud and you are working with a small number of machines (stateless applications). As an example, when you ask a question to a cloud application like Siri or Alexa, there is a fairly tiny stateless service with a data clip (your question) that searches against a vast database of results and indices to find your answer.
With the number of such services that are available at any given time, there are probably millions of these “micro” applications running. If Siri slows down or does not respond immediately, this would be considered an “outage.” Therefore, assistant applications like Siri need to be provided with lots of resources so they don’t choke.
Workload Example #2: A small number of “chatty” applications
Let’s say you have one big application running with thousands of microservices running inside it. The infrastructure is now required to provide communication and storage resources for all of those microservices. As a result, the communication and data paths get crowded.
For the cloud, these examples produce a large number of communications and can totally overwhelm the process unless that proper messaging/communications framework is in place.
The Overcrowding Effect
Public clouds were not designed to handle the challenges represented in these workload examples. To overcome these obstacles, some public cloud providers try to solve the problem by enlarging their clouds. The crux of the problem is that they were not originally built to facilitate the growing need for compute and storage sides to communicate quickly and congestion-free.
Consider a hard drive with a delay of a millisecond according to its specifications. What happens when thousands of different clients on hundreds of servers all try to use the same architecture? The results are disastrous. Delays increase exponentially as all these applications step on each other. This is the overcrowding effect — and it causes delays of hundreds of milliseconds or even seconds.
What is your experience? Are all your applications running without delay or are they choking in the public cloud? Comment below or send us an email at info@kodiakdata.com.