avatar

Robin Bloor talks Big Data and Evolution of Performance Monitoring

Posted by on May 22nd, 2013

Bloor

Robin Bloor

We chatted with Robin Bloor (@robinbloor), co-founder and chief analyst of The Bloor Group. He has more than 30 years of experience in data and information management and is the author of several books including: “The Electronic B@zaar, From the Silk Road to the eRoad.”

What is the presiding model for enterprise application architecture today?

I don’t think there is any model emerging or consensus yet, because we are in a time of major change. The main thing that has happened is the introduction of RESTful interfaces. While this trend has been fairly efficient, my conviction is that we are moving toward an event-driven architecture and real-time operations. Real-time can be a bad word, but there are some things that if they would happen faster, it would be very good for organizations. The new stuff is around getting a lot of event data in hand before the transaction occurs. The things that precede a financial transaction are being captured and it then becomes possible to leverage the information to provide better service to customers or upsell them. That is a customer perspective but this can apply to every business process.  You can better direct the process if you know the direction it’s headed in…by getting intelligence fast enough so you can respond in real time.


 

“My conviction is that we are moving toward an event-driven architecture and real-time operations.”

Robin Bloor, The Bloor Group


What is the difference between transactions and events when it comes to monitoring?

Events are pretty much anything happens that is useful data. If someone sends a positive or negative tweet, you can either promote or suppress it. People today talk a lot about the instrumentation of the world or the Internet of things, such as a car updating the driver about its status all the time so that the driver knows if a failure is about to happen or if the car is needing an oil change or whatever. You can extend this concept to all of transportation, utilities, the movement of oil, and there is going to be an explosion of that kind of data. Embedded processes are growing at an exponential rate. This is going to be the next revolution and the source of all the Big Data volumes. It’s already happening to a certain extent through analyzing log files. The end goal for the organization is to leverage the data to maybe improve customer service and be very competitive or to boost sales. It depends on what the organization sees as the opportunity, but the opportunity is there.

 Where’s the IT industry at today in terms of putting Big Data practices to use?

There is a mixed story about that. Most of the technology around Big Data is very recent. It’s only been in the last year or two that Hadoop has gained a lot of chatter on the Web, for instance. We are still in the first wave of experimentation and innovation, which is why there is such a big diversification in database products and management products. What happens next is, the products will become more standard and we will have Moore’s Law for at least the next eight years. The hardware will continue to accelerate, and software will backfill. We haven’t really had efficient software until now, but the new applications take advantage of parallel operations, which is pulling down latencies, and that is a positive change. It’s still very expensive, however, to organize and analyze a petabyte of information. As the cost of storage falls low enough, we will see the advent of the Big Market. This will be where everyone is conversant with Big Data technologies and it’s no longer causing a lot of problems for users. This isn’t happening yet.

So what’s the timeframe for that in your prediction?

It’ll probably be another six years before we transition from pioneering technology to the mainstreaming of the technology. Then, it will be another six years after that in which Big Data is common to everyone and companies all over the place are gathering lots of data and doing meaningful things with it. The big problem is, Big Data doesn’t fix on anything. Today hundreds of terabytes is considered Big Data. But in six years, large companies will be processing exabytes of data, and it won’t be easy to do.

How are Big Data and performance management intersecting?

I think that is doomed to happen, that APM solutions will need to process huge volumes of data. When trying to handle these high volumes, companies are trying to manage points in the flow. Data in motion is the trend, so management software must be able to perform in high-capacity and respond to data events in flight. We’re not really looking at discrete events anymore but at what is happening in the data flow, and that is much more onerous. It requires smarter software.

Will today’s “smarter” startups soon make the big players, like BMC, irrelevant?

The large companies are not innovative. CA, Cisco, BMC, HP and IBM don’t have the corporate environment to make internal startups work well and they really don’t want to do it. About 90% of startups fail and lose everything in the first year, so they’re waiting until the startups fight their way out of the swamp. And then they’ll just buy one. Even if the big companies pay top dollar for something very small, they can multiply the revenues quickly because they have a large and established sales force. So they don’t need to innovate. On the other hand, some of these startups will hold their own, such as Splunk, which is now a public, billion-dollar company.

 

avatar

The DNA of APM: Event to Incident Flow

Posted by on May 21st, 2013

By Larry Dragich

LarryD

Larry Dragich

This article is the corollary to “The Anatomy of APM” which outlines four foundational elements of a successful APM strategy:  Top Down Monitoring, Bottom Up Monitoring, Reporting, and Incident Management.  Here I provide a deeper context on how the event-to-incident flow is structured.

It is the correlation of events and the amalgamation of metrics that bring value to the business by way of dashboards and trending reports, and it’s in the way the business interprets the accuracy of those metrics that determines the success of the implementation.  If an event occurs and no one sees it, believes it, or takes action on it, APM’s value can be severely diminished and you run the risk of owning “shelfware.”

Overall, as events are detected and consumed by the system, it is the automation that is the lifeblood of an APM solution, ensuring that the pulse of the incident flow is a steady one.  The goal is to show a conceptual view of how events flow through the environment and eventually become incidents.  At a high level, the Trouble Ticket Interface (TTI) will correlate the events into alerts, and alerts into incidents which then become tickets, enabling the Operations team to begin working toward resolution.

 Dragich Event Flow

The event flow moves from the outside in, and then from the center to the right. 

Here’s how it works:

  • The outside blue circles represent the monitoring toolsets that collect information directly from the infrastructure and the critical applications.
  • The inner green (teal) circles represent the toolsets the Enterprise Systems Management (ESM) team manages, and is where most of the critical application thresholds are set.
  • The dark brown circles are logical connection points depicting how the events are collected as they flow through the system: Once the events hit this connection point they go to three output queues.
  • The Red circles on the right are the Incident Output queues for each event after it has been tracked and correlated.

The transformation between event-to-incident is the critical junction where APM and ITIL come together to provide tangible value back to the business.  If you only take one thing away from this picture, it would be the importance of managing the strategic intent of the output queues, because this is the key for managing action, going red to green, and trending.

Conclusion

It is not necessarily the number of features or technical stamina of each monitoring tool to process large volumes of data that will make an APM implementation successful, it’s the choices you make in putting them together to manage the event-to-incident flow that determines your success.  Timeliness and accuracy in this area will help you gain credibility and confidence with each of the constituents and business partners you support.

Related Links:

If you have deeper questions about APM and are looking to connect with thought leaders and creative thinkers in the APM technology space join the Application Performance Management (APM) Strategies Group on LinkedIn.

 

avatar

Cloud Environments: Dynamic, Sometimes Turbulent

Posted by on May 20th, 2013

Turbulent-Clouds.jpg

Cloud computing has become pervasive. And while some cloud environments are simple enough for casual consumers to use, the cloud has not done away with IT, as it was once widely expected to do. Instead, the cloud has taken center stage among the challenges facing IT operations and DevOps teams.

Cloud environments are dynamic (i.e. varied and highly fluid). Though people still speak of “the cloud,” there are a wide range of clouds built around differing platforms and support languages. These clouds may be public, private, or hybrid, each with its own distinct constraints. Clouds may host applications, data, or support such as infrastructure. Even control systems for managing cloud environments can be hosted in the cloud.

Everything as a Service

Strictly speaking, cloud computing in and of itself is not a technology model. It is a business model, offering capabilities for rent rather than purchase. Call it Everything-as-a-Service (EaaS). This EaaS business capability is itself built on top of two technologies, the Web and virtualization.

Virtualized environments are inherently fluid and dynamic, even when locally hosted. Another layer of fluidity is added by the Web, which poses problems of network behavior, including latency issues, that are not under the full control of the user.

The infrastructure layer of the environment is particularly challenging from an operational and DevOps perspective, because new-gen applications interact with their environments in much more complex ways than older applications did. IT operations must thus be able to manage cloud infrastructure environments ranging from Cacti and Graphite to Nagios.

The Challenge of Managing EaaS

Finally, while the cloud itself is not a technology model, the cloud adds layers of dynamism – and resulting complexity challenges – that IT operations and DevOps teams must work with. These teams do not “own” the environments in which their applications, data, or other operations swim.

Just to enrich the challenge, many of these teams are themselves part of EaaS: They are serving their end-user customers from the cloud. Those end users don’t want to hear about cloud complexities and challenges. They just want their services to run smoothly from the sky.

All of this means that managing cloud environments place heavy demands on IT operations and DevOps teams. Because the environments are dynamic, their behavior needs to be tracked in real time. Instant notification of IT events is required, so that issues can be dealt with before they turn into problems. Control systems must work smoothly in environments provided by AWS, VMware, Joyent, Rackspace, and other cloud vendors.

Legacy IT control systems struggle to handle these fast-moving, complex – and external – environments. What IT needs is control systems that are adapted to the current-era management stack, and provide real-time context about complex environments without interfering with these environments. The good news is that a new generation of IT operations tools is now available that meet these standards and are capable of handling even turbulence in the clouds.

Image Source: Flickr

avatar

Dennis Callaghan on the Changing World of IT Monitoring

Posted by on May 17th, 2013

DennisC

Dennis Callaghan

Dennis Callaghan is a senior analyst on 451 Research’s Infrastructure Computing for the Enterprise (ICE) team, leading the firm’s coverage of application and Internet performance management, service-level monitoring and management, and IT asset and service management. Follow Dennis on Twitter at @DennisCallaghan.

What are the biggest challenges for small and large companies today in the area of IT operations management?

What comes back to us from the research we do, whether it’s about product trends or talking to customers, there are two things that stand out. First, the nature of distributed IT environments means that you need a good discovery plan in place to find out what is the infrastructure underpinning the applications, and where that infrastructure is located. Then, whether the application is running in traditional environments or in the cloud, you need to diagnose the performance issue and the impact on the end-user―what they are experiencing.

What performance metrics are companies most interested in tracking these days?

Response time is a key measure. How long do you have to wait to get a response or result from doing a transaction or a task? It’s always important to look at backend metrics like CPU and I/O but mostly, you need to understand how users are affected. And if you’re running in the cloud, you really need to understand latency.

With the cloud and distributed environments, have these metrics changed or increased in number?

The metrics themselves haven’t changed, but it’s more that the infrastructure is different and the environment is more complex. A decade or so ago, companies were primarily looking at the app server, the Java server. Now, they are also looking at network service levels, database performance, the storage array, the web server and they also need to get some decent end user metrics at the browser level. So there are a lot of different areas that people are looking at to get a more complete view of app performance, and when there’s an issue, they have to be able to triage it and figure out how to best remedy it. The reason behind this drive for a more comprehensive view is partly the demand for instant results. As an example, Google search now serves up results when you start typing a word into the search box. Yet the larger issue is that you have so many different systems interfacing with each other today. A few weeks ago, an IT end user  told me that he has several different systems that are exchanging process flows and he has no way to monitor the transaction performance between those systems, just within them. That is a challenge a lot of people are dealing with.

How are the tools evolving to meet these needs – and do you see the continuation of the need for many tools a.k.a. Best of Breed environment?

There are always going to be new issues for products to solve, and vendors tend to specialize in covering different layers of the infrastructure so having five or six tools is very common. Companies also don’t want to trust their environment to just one vendor. People often want an integrated console to see all this information in one place, but still, using multiple consoles is pretty common. There have been standardization drives to allow performance management tools from different vendors to work together, but they haven’t really gone anywhere. WSDM—Web Services Distributed Management—hasn’t been heard from in almost seven years now.

Are large companies switching away from their legacy APM and monitoring tools?

On a grand scale, no, not really. I think that’s true when they are launching development projects with new technologies such as Ruby on Rails and PhP. But they are not throwing out systems from CA, BMC, HP or IBM―at least not yet. But these one-off projects and proof of concepts with some of the newer vendors can certainly lead to that transition down the road.

Why not? Aren’t these tools poorly suited for new, distributed environments?

It has to do with the investment they’ve made in the legacy systems but also it has a lot to do with the back-end applications that the tools are monitoring. Believe it or not, mainframes are still very common especially in the Fortune 500. And the incumbent technologies are the best ones to monitor those environments. Yet the startups are beating incumbents on new technology clients that are running modern, distributed environments. For instance, AppDynamics (a Boundary partner) is the main management tool for Hotels.com and also has Netflix as a customer. So that’s how the market is being segmented right now.

What area of innovation is hot right now in the space of application and network monitoring?

We are very high on the SaaS model. The new vendors are all SaaS-based. It is just so much easier to get up and running and maintain the software that way. Boundary does a lot of big data handling that is offloaded to the cloud, which is another wonderful benefit for a customer. Lots of VC dollars are being poured into this space right now and we are only at the tip of the iceberg. The Big 4 have all introduced new SaaS offerings because they have to in order to compete, but they are going to be playing catch-up for a while.

avatar

Boundary helps Netradar launch mobile performance app

Posted by on May 16th, 2013

netradarlogoNetradar is a global mobile network measurement and analysis service and smart phone app designed by researchers at Aalto University in Finland. Netradar delivers data on mobile performance to users through a smartphone app (Android, iOS, Windows Phone, Symbian and Meego). Their app has more than 30,000 installs to date. Users can see an aggregated map of network throughput between different carriers and various statistics about devices and networks. Arttu Tervo, a developer with Netradar, re-built the server architecture in early 2013 using Amazon Web Services primarily, to prepare for growth in users and features.

How Boundary Is Helping

Network planning and potential cost reduction:  Tervo deployed Boundary to help make design decisions as he was rewriting Netradar. “Using Boundary helped me detect potential issues in the system where there would be over-use of the network,” he observes. “Seeing that ahead of time was very important so that we can optimize our system and save on our service costs.” He estimates that Boundary helped conserve 50 percent of the required internal data transfer amounts needed to run the platform.

Development agility: Boundary is also helping the Netradar team with rapid development processes, through enabling a simple method to test new features in the staging environment for their performance-readiness. “With the help of automated Capistrano deployment events to Boundary, we can see the effect of even the smallest change on network usage immediately,” Tervo remarks.  “It hasn’t been this easy in my previous projects.”

“The most important benefit of Boundary to Netradar so far is the ability to detect possible performance issues before we make any changes in the production system. As we grow and scale our service around the globe, we will benefit from Boundary’s trending views showing how much data we transferred over our customer base, which will help us plan better for our AWS usage.” — Arttu Tervo, developer with Netradar

 

Page 1 of 39 Older Posts