TrackVia’s Data Center Infrastructure Evolution: Part 2 of 3

trackvia_data_center_hybrid_model

TrackVia’s Evolution in our Data Center Infrastructure:
A Middle Ground: The Hybrid Cloud

In this series of articles we’re walking through some of the key evolutionary stages of our data center infrastructure, identifying the key decision points, the options considered, and their pros and cons. We started the series with a look back at getting started in a managed service model; today we’ll cover our hybrid cloud model: some traditional managed services, some cloud-based services.

A hybrid cloud model is popular today, as companies reevaluate their legacy colocation and managed service operations, and at the same time are bombarded by cloud hype. We conducted just such a re-evaluation for TrackVia and identified two areas where the hybrid cloud made sense for us. How does this thought process compare to yours, if you’re going through or have gone through a similar evaluation? Let me know in the comments section below!

We identified three key elements in our hybrid cloud strategy:

  • The core product stack, which looks like a lot of presentation tier, application logic tier, and data store multi-tier Web application architectures;
  • Ancillary services within the overall architecture, handling queuing operations, delivering notifications, and other “out of band” functions; and
  • Disaster recovery needs

Let’s look at each of them in turn.

The core product stack: not a great fit for cloud instances… yet

The core of our current enterprise product reflects many decisions common to startups: rapid development of features, some one-off customizations for early key customers, and an occasional (perhaps frequent) short-changing of a long-term scalable architecture in favor of the must-have feature to close today’s sale. In our case, what has resulted is a product that meets our customers’ needs very well, but does so at a higher cost than we’d like: it is inefficient in its use of computational resources.

Moving this product to a pure cloud-based solution was not an option for us (we tried, and it performed very poorly). A key difference between traditional dedicated servers and “cloud servers” or “instances” is that the latter run in shared environments: a single cloud server is one virtual server among perhaps many, all running atop a virtual machine layer on a shared physical server. Our enterprise product puts enough demands on dedicated servers that the addition of this extra layer in the form of a virtual machine resulted in unacceptable performance. We could have paid a higher per-hour price for more performant cloud instances, but that made for a less compelling financial argument to migrate.

In short, we found that the core of our current enterprise product requires dedicated servers.

Ancillary services: a great way to get started with hybrid cloud

Alongside our core product are several key services that add value to our overall product: services such as those that deliver notifications and reporting schedules that customers have configured, manage the documents and attachments that customers associate with their records, and several other ancillary services.

We’ve found these ancillary services to be a great way to leverage lower-cost cloud services, and many of them are now operating within an Amazon Web Services (AWS) environment. By migrating these services into AWS, we’ve gained additional redundancy and lower costs — a win for our hybrid cloud strategy.

Disaster Recovery: cost efficiencies in our hybrid cloud

Early in 2013 we stood up a full, standby data center operation for Disaster Recovery (DR) needs. Once we had implemented the necessary redundancy within one primary data center, the next step in providing highly available services to our customers was to protect against a full data center failure.

The first decision typically faced when deciding to build out a multi data center strategy is between an “active-standby” model and an “active-active” model. In an active-standby model, one data center handles all normal operations, and you have a standby data center that is only used when the primary fails. In an active-active model, both data centers are handling customer traffic simultaneously, and you’ve built into your architecture the necessary logic to coordinate between the two.

Remember when I talked about architecture tradeoffs earlier? We chose an active-standby model due to some challenges in our current architecture that made active-active difficult enough that the tradeoffs didn’t make sense. However, with our active-standby model we’re able to leverage some strong cost efficiencies in our hybrid-cloud DR site. Our DR site is in the Rackspace Cloud and leverages RackConnect to incorporate some dedicated servers and some on-demand cloud servers.

With this hybrid-cloud approach to disaster recovery, we are constantly replicating live data from our primary data center to the dedicated servers at Rackspace, and in the event of a data center failover, we activate many other cloud servers to operate a fully populated, fully capable failover data center. As a result, we minimize the cost of our failover data center, paying for the dedicated servers every day but only paying for the cloud servers when we need them. This cost-savings is a key element of our hybrid cloud strategy.

I recently wrote an article about this very topic in the Rackspace Blog — it has a few additional details about our hybrid strategy.

Conclusions

Our hybrid cloud solutions have worked very well for us, providing high availability and performance to our customer base and excellent cost efficiencies internally. Every product will have different architectural demands, but for us the key areas where a hybrid cloud solution is attractive are:

  • Disaster Recovery needs, particularly in an active-standby model; and
  • Ancillary services that handle many of the typical queuing, non-realtime services that aren’t directly tied to user interactions

In the next article we will discuss our exciting new product, TrackVia Express, and the pure-cloud model that we’ve reached.