[seopress_breadcrumbs]

The Future of IT Operations with Automation and Real-Time Insights – with Troy Felix of BigPanda

This interview analysis is sponsored by BigPanda and was written, edited, and published in alignment with our Emerj sponsored content guidelines. Learn more about our thought leadership and content creation services on our Emerj Media Services page.

Modern IT operations are inundated with alerts from various monitoring tools, leading to alert fatigue among IT professionals. The constant barrage of notifications can desensitize teams, causing them to overlook critical issues. 

A study published in ACM Computing Surveys by CSIRO, Sydney, Australia, highlights that alert fatigue is a significant challenge in Security Operations Centers (SOCs), often resulting in delayed responses to genuine threats. A Trend Micro survey cited in the same study found that 51% of SOC teams feel overwhelmed by alert volume, with analysts spending over 25% of their time handling false positives.

Additionally, research published in Computer Networks, a leading peer-reviewed journal by Elsevier known for rigorous research in networking and IT systems, indicates that excessive alerts, many of which are non-actionable, can overwhelm administrators, leading to decreased efficiency and increased risk of missing vital incidents.

According to the study, even a typical production system in cloud monitoring environments can generate between 10,000 and 20,000 metrics and several hundred distinct alerts. Compounding the problem, alerts are often labeled only as “critical” or “warning,” regardless of severity. Such hap-hazard labeling for a trivial time-sync issue might carry the same urgency as a Denial of Service attack, making effective triage nearly impossible.

In more positive developments, research from the Distributed and Operating Systems Group, Technische Universität Berlin, Berlin, Germany, highlights that AI for IT Operations (AIOps) is emerging as a solution to the above challenges by leveraging machine learning and big data analytics to automate and enhance IT operations. AIOps platforms can process and analyze large volumes of data in real-time, enabling proactive identification of issues, predictive maintenance, and automated incident responses.

Emerj Editorial Director Matthew DeMello recently sat with Troy Felix, Regional Vice President of Sales Engineering at BigPanda, to discuss how AIOps transforms IT operations by making teams more efficient and proactive. 

Their conversation highlights AIOps’ role in reducing alert noise, empowering staff, and improving IT efficiency and productivity. This article examines two critical insights for CX leaders from their conversation: 

  • Streamlining alert management with AI: Enabling IT teams to quickly analyze data, identify root causes, and automate remediation, reducing manual effort and improving efficiency.
  • Improving IT efficiency with AIOps: Implementing AIOps to shift IT teams from reactive to proactive, reducing alert noise, minimizing tickets, and empowering entry-level staff to handle complex tasks.

Listen to the full episode below:

Guest: Troy Felix, RVP of Sales Engineering, BigPanda

Expertise: Sales, Virtualization, Technical Support

Brief Recognition: Troy is a seasoned professional in sales engineering.

Before his role at BigPanda, Troy held several positions at LogicMonitor, including senior manager of Sales Engineering for Managed Service Providers and Manager of Sales Engineering for the West region. Troy completed his Bachelor of Science degree in Biology from UC Santa Barbara in 1999.

Streamlining Alert Management with AI

Troy opens the podcast by explaining the traditional challenge IT teams face in managing and processing vast amounts of alert data. The volume of this data is continually growing, and it is challenging to distinguish what is meaningful (“signal”) from what is irrelevant (“noise”). While observability tools are crucial for capturing the data, the next critical step is determining what is essential.

Traditionally, IT teams would spend hours identifying and correlating issues to pinpoint the root cause. AI, however, can analyze vast amounts of data, utilizing historical context, enriched data, and information from systems like the CMDB (Configuration Management Database) or change data. It can make intelligent predictions about the root cause of the issue and determine which specific team or individuals should address it.

Troy tells the Emerj executive podcast audience that AI has eliminated the need for mass collaboration calls. Instead, it can identify the exact issue and the responsible party. The platform can even automate remediation in more advanced AI-driven operations, invoking pre-established playbooks or workflows to fix the problem. It has unlocked significant efficiencies, making IT operations teams more effective and efficient. 

Troy highlights that heavily regulated industries like financial services and banking see the most success with AIOps due to strict compliance, penalties for downtime, and tight SLAs. Healthcare organizations also benefit as AIOps ensures uninterrupted operation for life-saving applications like EMRs and EHRs. 

Additionally, he mentions that managed service providers, who deliver IT services to numerous enterprises, are rapidly adopting AIOps to scale their operations efficiently and manage the complexities of providing services across a broad client base.

Improving IT efficiency with AIOps

Troy further explains that implementing an AIOps platform isn’t just about installing the technology and then feeding it alerts. It’s a comprehensive approach that also involves changing how an organization works internally, especially within IT teams. It’s about anticipating alerts, responding effectively, and dealing with operational events in a new, more efficient way.

He emphasizes the importance of aligning people, processes, and technology. Organizations need to transform their approach to handling IT operations by establishing KPIs like ‘Mean Time to Repair’ and benchmarking them to track improvement. In this area, Troy notes that AI can drive down what he calls “noise”-related alerts.

He goes on to assert that AI can decrease alert noise by up to 95%, helping reduce the overall number of tickets submitted to IT service management systems like ServiceNow.

He highlights a significant shift from a reactive, or “firefighting,” approach to IT problems to a proactive one. With AI, the system can detect patterns in event data and warn IT teams of potential issues before they escalate. For example, AI can predict when a series of events might lead to a significant issue, such as a critical application downtime, and alert the team to address it before it becomes a larger problem.

He also states that AIOps is a journey that begins with collecting all relevant data into the system, enabling AI to learn and make intelligent correlations. This data can include everything from configuration management databases to change data and even spreadsheets still used in some enterprises. The more data the AI has, the more innovative and more effective it becomes in understanding complex IT environments and predicting issues before they occur.

In the end, Troy explains the broad benefits of implementing an AIOps platform, highlighting how it positively impacts every level of an organization. 

  • For CIOs: They benefit because the AIOps platform helps meet business objectives, such as cost reduction and improved uptime.
  • For IT Leaders: Managers benefit from having high-performing teams that are not overworked, enabling them to do productive and impactful work without burning out.
  • For Frontline Workers: Those in the trenches, handling triage and problem resolution, are more effective and impactful because they have more information at their disposal. AI enables entry-level staff (L1) to handle tasks that previously required higher-level expertise (L2, L3), providing detailed insights, root cause analysis, and specific guidance for remediation.

Troy also highlights the mental and operational benefits of AIOps, including reducing noise in alerts, which prevents unnecessary work and burnout. For example, false positives — or alerts that are not real problems — can be highly demotivating, especially when they disrupt workers at inconvenient times. AI helps reduce false positives, making the workflow healthier and more productive: 

“Perhaps most importantly, your senior architects and engineers are no longer stuck in constant firefighting mode. Instead, they’re free to focus on business-facing initiatives that drive real value and move the organization forward — wherever those priorities lie.”

Troy Felix, RVP of Sales Engineering, BigPanda

Share article

Subscribe to updates

Subscribe to weekly email with our best articles Financial Services updates that have happened in the last week.

Recommended from Emerj

Close the CTA

Stay Ahead of the Machine Learning Curve

Join over 20,000 AI-focused business leaders and receive our latest AI research and trends delivered weekly.

This Content is Exclusive to Emerj Plus Members

You’ve reached a category page only available to Emerj Plus Members.

Members receive full access to Emerj’s library of interviews, articles, and use-case breakdowns, and many other benefits, including:

In-Depth Analysis

Consistent coverage of emerging AI capabilities across sectors.

Created with Sketch.

Exclusive AI Capabilities Matrix

An explorable, visual map of AI applications across sectors.

Created with Sketch.

Exclusive AI White Paper Library

Every Emerj online AI resource downloadable in one-click

Created with Sketch.

Best Practices and executive guides

Generate AI ROI with frameworks and guides to AI application

View membership options

Register