avatar

Customer viewpoint: Using Boundary to troubleshoot a new distributed system

Posted by on June 18th, 2013

exactTargetGuest post from Colin Rand, Director of Engineering, Social Products at ExactTarget. Read more from Colin on his blog, Dataerous. The original post is here.

I looked at the Boundary dashboard for one of our production application servers and was shocked to see that in the list of nodes receiving traffic was a staging database! For the past several weeks we had been finalizing our new data intake pipeline, with the most recent effort being on monitoring and alerting. We had installed several different tools to look at system stats and custom application usage, one of which was Boundary. The system we were upgrading served to break our main read / write path dependency and use a high throughput proxy buffer in-between our  product data store and our intake systems.

Like most modern services, the install was trivial; we just added a recipe to our Chef server and poof, our nodes began to show up immediately. The UI is decent, good enough for some tough situations so far, but the real time 1 second lag in data is incredible.

However, when we went to deploy our pipeline into production Boundary was invaluable. Taking what could have been a 3-4 day process for troubleshooting some gnarly configuration issues into something that took only a few hours.

Here are a two bugs we solved with Boundary:

1. Before we finalized the pipeline, we had deployed a proof of concept / alpha version into the production environment to see how it would hold up. Our plan was to build the new pipeline, divert the traffic and drain the old pipeline. Once empty it would be shut down. However, when we turned on the new feed, only some of the messages were coming into the pipeline. Our engineers immediately began to hunker down and see where we were dropping messages. While they were digging through logs and configs, I flipped open Boundary and looked and the farthest component upstream in our system, an HA Proxy load balancer.

I saw that traffic was being sent to both the old and new pipeline because Chef had pulled the new servers into the existing application pool. We had given them the same Chef role name! In case you are not familiar with Chef, suffice it to say this would have taking a long time to determine rather than the 2 minutes it took me to detect. Removing the old nodes from HA proxy then completed the task and we were again on our way.

2. Later we had all the data flowing through our intake system, but somehow the data was not reaching our customer facing product data store. We began the arduous process of looking for where in the system we were dropping messages. When you have a distributed system, this can be quite time consuming to comb through logs, understand errors and check configs.

Again, I checked out Boundary and this time noticed that our production system was sending traffic to our staging database. Errggggh, that’s not good. We noticed that one process that recently had been completed (but never ran in the test pipeline) had been deployed with the staging db configuration. The fix was quick, five  minutes, but the find was quicker. I can only imaging how long this would have taken to figure out since there were no errors being generated and our application monitoring stats were properly incrementing!

Ok, enough cheerleading. Boundary is a great platform but only one part of our tool suite that we use to understand what our production systems are doing. Now, if I could only have NewRelic reach that 1 second frequency we’d really be cooking…

 

avatar

Coming Soon: Active Flows

Posted by on June 17th, 2013

We’re working on some great new features here at Boundary, one of which is an Active Flows metric for tracking the number of in-use connections or interactions between hosts.  Here’s a teaser of an updated Streams view with its supporting traffic table:

Active Flows Streams
Active Flows Traffic

What’s this for?

The Active Flows metric counts the number of active (sending packets/data) flows per time period. If the protocol being counted is connection oriented (like TCP) then this metric will tell you the total number of active connections.  Likewise, if the protocol is message oriented (like UDP) it will tell you how many clients are interacting with a datagram service.

Why would this number be higher than I think it should be?

You may have some misbehaved applications and/or clients that are flapping and/or setting up new connections in a tight loop.  We see the normal connection setup and teardown as the beginning of a flow, and do not wait for the connection to be established in order to report on them.

Can I set alerts / reports / dashboards on active flows?

Getting active flows in the streaming graph is the first step.  We’ll be adding them to all the other pieces shortly.

We’re currently beta testing this feature with select customers.  If you’re interested in checking it out, please let me know! michael@boundary.com

avatar

Surviving an Aggressive DDoS Attack

Posted by on June 17th, 2013

dnsimplelogoDNSimple shares how Boundary “saved our bacon” during a sudden disruption in service.

We love it when our customers tell us how our solution helped them solve a problem –and  this week we got a note from Darrin Eden, senior software engineer and operations specialist with our long-time customer, DNSimple, a hosted DNS provider with around 8,000 customers:

Boundary saved our bacon yesterday. DNSimple is under a DDoS style attack and without Boundary I’m pretty sure we would be flying blind.”

We called Darrin to get a little more insight into what happened during the attack and how Boundary helped. Apparently, distributed denial of service (DDoS) attacks are nothing new for DNSimple and Boundary has helped in the past identify when these attacks are occurring. “In DNS land if you are a managed provider, you are often the target of malicious attacks,” Eden explains.  The attack typically has a distinct pattern in the Boundary dashboard, which Eden can quickly identify.

“We knew how to react to this style of attack because we had seen it before, but this time was different,” he recalls. “Boundary showed a different pattern and in fact, our software actually failed, and all of our customers went offline.” This was the first time the company had suffered such an outage that affected customers.

Saving customers hours of downtime

Eden and his teammates quickly began researching the cause of the issue and were able to start working on a fix almost immediately. Without Boundary, it could have taken 10 times as long to discover the source of the issue and where to focus resolution efforts, he adds. “The time from the start of the event to the initial resolution was an hour, and then it took a few more hours to come to complete resolution.”

Naturally, the three-person company was quickly flooded with customer emails, Twitter messages and calls as the disruption began. Yet because of the granular information that Eden was getting from Boundary, he could provide Twitter updates every 10 or 20 minutes so customers knew what was going on and how the restore was progressing. That level of detail insight was critical, he says.

“Without Boundary, there’s a chance we could have been dead as a company.  If a web hosting company is down for hours and is not providing regular updates to their customers, that usually results in a mass exodus of customers.”

Twitter updates proved invaluable

Since Boundary allowed the company to restore normal operations to its customers in around an hour, and during that time provided useful updates, customers didn’t panic and jump ship. In fact, he says, since the attack occurred a little over a week ago, DNSimple has actually seen an uptick in new sign-ups.

“I am sure we lost some customers, but in the end our communications on Twitter was actually really good publicity for us,” he says.

While there’s no foolproof way to prevent denial of service attacks, this type of attack won’t happen again since DNSimple now knows the pattern and its software has been modified accordingly to respond without failing. More importantly, Eden and team gained confidence that in the event of any future significant outage, they’ll be able to find the root cause quickly and keep customers apprised of the situation.

“When something is looking wonky in our metrics, we always look at Boundary first,” Eden says.

 

 

 

avatar

Joe Panettieri Talks Cloud IaaS and its Many Players

Posted by on June 14th, 2013

JoeP

Joe Panettieri

Joe Panettieri is executive VP and editorial director of Nine Lives Media, a division of Penton Media. He oversees  The VAR GuyTalkin’ Cloud, and MSPmentor.net, the leading online communities for VARs, CSPs and MSPs. Follow Joe on Twitter: @joepanettieri.

Boundary: How has the cloud IaaS market changed in the last 12 months in terms of both demand and vendor positioning?

Panettieri: We have reached a tipping point in the channel for cloud. Even though we are a decade into this journey, thanks to SaaS and public cloud companies, integration with the channel is really just beginning. The large cloud providers are now acknowledging that their partner ecosystems need to push beyond ISVs to VARs, integrators and MSPs. Smaller startups and nimble IaaS providers are starting up cloud partner programs from the very beginning. It’s not just an agent model with a one-time fee but a true model for generating recurring revenues, and to become the partner of record and really own the customer relationship.

B: How does this shift help end customers?

Panettieri: It’s a great thing. CIOs have discovered that they had multiple cloud services running without their approval. CMOs and many others are activating services and billing the employer every month, and if you roll it all up it’s a big expense for the organization. So how do you regain control? CIOs and other corporate leaders are reaching out to their channel partners and asking them to bring some order to their cloud strategy. They’re looking for a cloud broker who recommends and gives seal of approval to cloud providers and whom can be the one throat to choke. This is giving customers and CIOs clarity in terms of consolidating cloud providers and getting expenses under management.

B: Let’s talk about the big players. It seems like Microsoft Azure doesn’t get much respect. How do you rate the big ones?

Panettieri: To start with, Amazon (AWS) is an interesting beast because so many channel-focused software companies are now going to the cloud and are launching their services in the Amazon cloud. Microsoft has been very early on focused on Azure for ISVs and corporations, versus channel partners and VARs. The channel play was initially focused on Hosted Exchange and Sharepoint. In the last six months, however, we are seeing Microsoft connect the dots between Azure and Office 365. Channel partners are launching Office 365 extensions in the cloud.  So beyond reselling Office 365 they layer on additional management and maintenance services, and they can make last-minute changes for customers.  A company called 365Command makes tools for managing Office365 on Azure. Quosal is another company and they focus on customer approval software to help VARs close business more quickly, in Azure. Google Compute Engine (GCE) is still early days. From my last check, which was about a year ago, there are about 6,000 Google Apps resellers. I think we’ll see the trend when they begin to offer GCE services to customers. Finally, the Rackspace channel is committed as well. Rackspace began a partner advisory council in 2012 that has partners with a range of expertise, and the company has an added interest in open stack, an open source cloud platform. If I were a cloud reseller I would be watching Rackspace closing with regard to open stack.

B: What are the gaps right now?

Panettieri: The biggest gap is localization.  If I am working with big cloud provider can they work well with smaller VARs and their smaller customers. We are seeing upstart providers hire seasoned channel execs who know how to work with local partners. ViaWest hired Rackspace execs to get their channel program going. Tier 3 has a channel program where they are trying to work on a local level with VARs and MSPs and gaining some momentum there.

B: How do you see the segmentation of the IaaS market in the coming 12 months, and what will this bring to customers?

Panettieri: The cloud market will begin to resemble the retail market here in the United States. We will see big horizontal, massive players like Target that compete on price and offer great inventory. The large cloud providers are pretty similar and they will compete on price, but back to the retail model, there will be boutiques in various verticals such as healthcare. More and more cloud providers will focus on vertical expertise.

B: Due to the cloud, are we now experiencing a renaissance for the sector of IT management and monitoring technologies?

Panettieri: A lot of the management and monitoring tools grew up on the corporate IT side to manage servers and apps inside the network. Then MSPs began to offer that service. But now we are shifting again, to cloud-based management tools. There are three areas of applications: first, for managing SaaS apps on third-party cloud services; second, for managing internal systems hosted in the cloud; and third, for monitoring third-party public cloud services. These tools will alert you when AWS begins to slow down, for instance. There’s a lot of funding right now in that last area. As well, there will be some consolidation and acquisition of all of these tools. Avanade, a large MSP and Microsoft consulting firm, just acquired a cloud monitoring and management company.

 

avatar

Hangover 3, the Tech Version

Posted by on June 14th, 2013

The official start to summer is almost here, which can only mean one thing: it’s time to beat the heat and hit the movies. After reviewing the lineup of new releases, we got to thinking that a few of the leading storylines seemed akin to the drama playing out in big tech today.

Epic: Microsoft’s Ultimate Battle… For survival

This lush animated movie with adorable talking slugs masquerades as the ultimate fight between good and evil. Will the smart and chatty heroine Mary Katherine (Microsoft) prevail against the evil spider Queen Tara (Google, Apple, the entire US press force…) to save her world?

After Earth: will Google be there?

Will and Jaden Smith star in this film about a futuristic Earth full of darkness and danger: Google Glasses to the rescue? Rest assured, Google will come up with something cool or weird to battle the demons still alive on this barren planet, in about 1,000 years.

OwenandVince

Owen Wilson and Vince Vaughn, using Web video chat for the first time as the Google interns.

The Internship: Hiring Them (Marissa Mayer) was A Brilliant Mistake

Yahoo’s still fresh CEO Marissa Mayer is somewhat like Vince Vaughn and Owen Wilson in this spoof-like film about a Google internship unwittingly thrust upon two “old” and out of touch sales reps. People want to hate them/her, but they can’t. Will they/she have the most profound impact on the company since its founding?

Now You See Me

This film portrays a band of big-stage entertainers practicing the fine art of illusion while stealing millions of dollars, kind of like Apple’s latest tax evasion strategies overseas. Will the deceivers own up to their acts, or will they continue to be the heroes of the young, beautiful and gullible?

Fast & Furious 6

Vin Diesel’s latest action-packed thriller is like Twitter: the company’s moving so quickly, teenagers are running screaming from Facebook to jump into this once geeky, social media race car.

Star Trek (Tumblr): Into Darkness

Star Trek’s latest film finds the crew of the Enterprise facing “an unstoppable force of terror from within their own organization, that has detonated the fleet and everything it stands for, leaving our world in a state of crisis.” Oh Tumblr, as you peer into the dwindling remnants of your bank account, Yahoo has stepped into save you, but which way is salvation? Or do both paths lead to unfathomable darkness in the end?

Facebook: Hangover 3?

Facebook, we still love you, but really, it’s time to stop these antics. First, your IPO: enough said. Then, the teenagers who made you famous are now leaving you, for Twitter! And now, more disappointing news on the earnings front? Like moviegoers everywhere, investors can only pray there’s not a Hangover 4.

Leo Capri as "Gatsby"

Leo Capri as “Gatsby”

The Great Gatsby: Dream on, Larry

Much like the charming, enigmatic Jay Gatsby, Larry Ellison is on top of his game. He still seems to think that he can win the world by buying it. So far, this self-serving, hedonistic strategy has worked reasonably well, but let’s get real. The world is changing Larry, and you are not changing with it. People think you’re old school, inefficient, overpriced and inherently selfish. Eventually, you and your lavish, overwrought software company are going to crash right into the San Francisco Bay. But for now, live hard and enjoy the ride. Why not?

Iron Man 3:  Free Michael Dell!

Dear Michael: Hang tight man, you’re nearly free of those irritating, overbearing, power-hungry investors. Once your chains to the public markets have been broken (it’s only a matter of time), you can at last do whatever you wish and believe is best to win again in this crappy PC market! Who cares that your Q1 really sucked? You’ve got gumption, and that is what matters for the Iron Man: “With his back against the wall, Stark (Michael) is left to survive by his own devices, relying on his ingenuity and instincts to protect those closest to him. As he fights his way back, Stark discovers the answer to the question that has secretly haunted him: does the man make the suit or does the suit make the man?”

 

 

 

 

Page 1 of 41 Older Posts