The Fundamentals of Enterprise Data Fabric – Unlocking the Value in Enterprise Data – with Daniel Hernandez of IBM

The Fundamentals of Enterprise Data Fabric – Unlocking the Value in Enterprise Data@2x

This article has been sponsored by IBM, and was written, edited, and published in alignment with Emerj’s sponsored content guidelines.

It’s a problem that almost all enterprise AI projects face: accessing and governing data. Today’s AI project teams must overcome data silos, deal with copies and permutations of data, and face other challenges that complicate AI project goals and objectives.

Companies often copy their data in order to gather it in a single place, but that’s expensive and can lead to data security and compliance issues during the life cycle of the data. However, reasons still exist to consolidate that data. A data fabric can be an architectural alternative that many companies can pull from their toolbox and put to use in allowing them to:

Access the data in place
Manage the life cycle of the data
Use automation to move the data

In this second episode of a three-part series, we examine a long-term strategic way to think about data strategy and explore some use-cases that IBM is implementing today. Our guest this week is Daniel Hernandez. He is the General Manager of the IBM Data and AI business unit, which is home to Watson, Cloud Pak for Data, Planning Analytics + Cognos, OpenPages, Db2, Netezza, DataStage, and Informix. Daniel oversees the unit’s P&L, operations, strategy, products, support, services, marketing, and sales.

IBM provides integrated solutions and products that can leverage its data, information technology, and expertise across a broad ecosystem of partners and alliances. IBM’s hybrid cloud platform and AI technology and service capabilities support their clients’ digital transformations and draw from the experience and expertise of its globally recognized research organization.

IBM generated $73.6 billion in revenue for the year ended December 31, 2020, according to its 10-K.

We cover three distinct topics in this 40-minute interview:

The data fabric concept and how IBM deploys this approach
How to bring the data fabric concept to life within your enterprise
How to communicate to leadership the value of data infrastructure and a data fabric

Listen to the full episode or skim our interview takeaways and the full transcript below:

Guest: Daniel Hernandez – General Manager, IBM Data and AI

Expertise: Leads IBM’s Data and AI business unit, which is dedicated to helping clients become data-driven in everything they do. Oversees its P&L, operations, strategy, products, support, services, marketing, and sales.

Brief Recognition: Before becoming IBM’s General Manager of Data and AI in April 2020, Daniel held various positions of increasing responsibility since joining IBM in 2007. Earlier in his career, he was a director of engineering at GlobeRanger, and a programmer at EXE Technologies.

Key Insights

The data fabric is an architectural alternative to data consolidation. The concept of the data fabric is guided by three ideas: accessing the data in place, managing the entire lifecycle of the data in a distributed way, and using automation to offer a convenient means to move the data.
AI goes beyond just automation. The C-suite often sees AI as automation and efficiencies, but AI supersedes this ability. AI can provide new ways to report, and find connections and similarities in the data through data fabrics that provide BI and help us unlock new capabilities.

Full Interview Transcript

Daniel Faggella: So Daniel, I’m glad we’re able to have you with us here on the show and today we’re going to be diving deep on the topic of data fabric. Certainly a newer term for many of the listeners here in our audience. I wonder if you could start us off with how you define the term data fabric for business listeners?

Daniel Hernandez: So, let’s talk about the problem.

Faggella: Sure.

Daniel Hernandez: If you are trying to put data to work, whether it’s for data science so that you could train them model to do a prediction, could be customer segmentation, trying to have an attractive promotion and matchmaking those to your customer segments that you’ve analyzed or just doing business analytics, trying to understand critical insights that help you understand the performance of your business. Today, the majority of our customers are copying data from wherever it originates and consolidating it into a single place. The problem with that is its expensive. It’s proliferating data that then causes all sorts of nasty ripple effects, data quality issues that if you’re to remediate in one place are hard to remediate in another. You have governance and compliance problems, not to mention security issues because every single copy of this thing needs to be protected. In some cases, especially if it’s PII, can’t be in certain locations in order to deal with compliance obligations.

Daniel Hernandez: And so there’s just a whole host of problems that are triggered by copying data and attempting to consolidate into a single place. So typically when we talk about consolidating, we’re saying things like put it in a cloud data warehouse, put it in a data lake house, and then the whole game gets even more perverse and complicated when you copy data into a data lake only to then consolidate data into a data warehouse, only to then extract it and put it inside of a sandbox for data sign. So, the copy consolidate answer to the putting to work problem has a host of problems that are associated with it, that the data fabric is attempting to solve. So, against that problem, then what the data fabric says is there’s a better way to put data to work. Number one, access data, if you can in the place that it exists without copying it.

Daniel Hernandez: Number two, manage the entirety of that life cycle of the data in a distributed way while essentially managing the governance and policies of it. And then number three, there’s good reason actually to copy data and consolidate it into say an authoritative source, like a data fabric, make it easy through automation to onboard that data. So the data fabric is an architectural alternative to the consolidate answer that most companies have had at in their toolbox for putting data to work. And it’s just guided by those three ideas. Accessing data in place, managing the entire life cycle of the data in a distributed way or in a distributed way with central governance. And then using automation to offer convenient means to move the data if you want.

Faggella: Got it. Okay. And you mind if we poke into this just to make some analogies, make this click for people. So, probably everybody listening in, in an enterprise environment can resonate with this. And even people who are just thinking about tech and their very personal lives, we’ve all had an excel sheet that we sent around that had sales numbers or customer service info or whatever. And then everybody was excited about it, brainstormed about it, talked about it. We came together and realized that Steve deleted the third tab. Jacob’s been updating the numbers every week. And so his stuff is totally different than everybody else’s.

Faggella: And this is what poured a lot of people over into the Google drives of the world or whatever else for those kind of tools, clearly that problem of let’s splinter it out and use it wherever we need to use it in whatever dark corners that problem, multiple files being shared with multiple people, updated in new formats, just gets astronomically larger in the enterprise. When we’re taking all of our fraud data and we’re duplicating it and moving into an AWS instance, and then we’re forming it when it’s already up there, we’re doing new stuff with it again, multiple places to store things. Am I on the right page about kind of the extrapolation of what could be a very personal tech problem?

Daniel Hernandez: Yeah. A hundred percent. I mean, but look that personal problem that you’re describing, I expect that you’re admitting you’ve been guilty of it.

Faggella: Oh yeah.

Daniel Hernandez: I have to admit that I’ve been guilty of it too. And so has everybody in any business, really. Because it’s convenient, it’s easier to do and there have not been good alternatives to deal with the core issue.

Faggella: Yeah. Yeah. But of course an excel doc is not going to cost anybody very much, but if we’re copying gigantic core buy of documents or files or what have you, it seems like this not only becomes like, well, I guess it’s convenient like, geez, it’s also expensive. And the inconveniences are at a way greater scale. So this is the problem we’re trying to solve here with the fabric.

Daniel Hernandez: I want to challenge you a little bit.

Faggella: Go for it.

Daniel Hernandez: On the one hand, the unit costs to store the excel file, especially you already have the subscription, right? You could argue it’s marginal cost zero. And one of the saving graces of excel is you can’t put a lot in it because at some point it just breaks, right?

Faggella: Yes.

Daniel Hernandez: You can’t download a data warehouse into it. The non-obvious cost would be a data breach. Imagine, the vast majority of analytics are in one way or another, you’re going to touch customer data. What would it cost you if you had hundreds of thousands, maybe millions of rows in that excel sheet that had personal information on your customer, that you do an analytics on. And because there was no governance or data protection on it got disclosed to the world, like what is that cost?

Faggella: Yeah.

Daniel Hernandez: So, we focus a lot on the incremental cost of the thing that we’re doing analytics on, not the non obvious costs, whenever bad things happen, but ultimately, the enterprise actually has to pay for.

Faggella: Yeah.

Daniel Hernandez: Right.

Faggella: Yeah.

Daniel Hernandez: One way or another.

Faggella: Yeah. Well, and we see time and time again and I’m not saying it’s a wrong move or like a salesy move, but we see on the vendor side, often the exact quantification of, oh, how much does it save us in the cost of storing this data or with this workflow? We see actually a lot of arguments on what we call kind of a plausible risk reduction argument. So for document search and discovery, it might not be like, okay, I can tell you how many microseconds it’s going to save your mortgage pro or person to find this file or find this file. It’s more like, well, how much does it run you when these two numbers don’t line up and you give somebody a mortgage and now you’re on the hook for it, right? It’s more like a plausible risk reduction argument. And here it’s obviously similar.

Faggella: What if we are sharing data in this common way, that has some massive underlying risk that we’re just not addressing because we’re treating it like we do all our other data problems. What could we run into if that continues to be our norm? So like, the argument’s pretty compelling. I mean, it’s not like a mind stretch to say, okay, there’re things that are better done this way, but clearly the folks that bring idea to life probably do so in particular use cases. So maybe we can move into what we can do. So we get it conceptually, what are some examples of what we could do with a data fabric that maybe we couldn’t do with previous approaches. Maybe you’ve got some great customer examples we can talk about.

Daniel Hernandez: Let’s talk about customer 360.

Faggella: Awesome.

Daniel Hernandez: Every everybody wants to do it.

Faggella: Yes they do.

Daniel Hernandez: Right? Who doesn’t for anybody that’s in business. Actually, even if you’re a public agency, you’re still serving customers. You’re just not necessarily getting revenue nor do you have a for-profit notion, but in general, knowing your customer tends to be a good thing. I think that’s hardly disputable the way that we’ve gone about knowing and your customer through 360 programs. Since I’ve been in data and AI, we’ve had 360 programs anchored on technology like master data management, various methods for doing it. And so long as I’m going to be in data and AI, there’s going to be 360 motives for customer, and there’ll be new techniques. Largest networking company in the world, 9 million customers that they serve to consolidate the information and standardize that information for a single entity cost them two months.

Daniel Hernandez: There’s 9 million customers that they serve. They’re growing two months per customer to standardize and drive consensus on a single entity, more to link them and establish relationships between them. So that’s the before, the after is within two days, they’re able to standardize a single entity. Now, truthfully, it was a bunch of process that was getting in the way and a complete lack of automation to that process. That was the core issue, but the data silo problem was the root cause of it. All right. Because the multiple steps and consensus making across data stewards that span multiple lines of business was kind of the underlying problem because they were able to access data in place, use metadata, to analyze the data quality of that, and standardize that information through data virtualization and with built-in governance that allows them to actually control who has access to this.

Daniel Hernandez: Like they save multiple weeks of work. Now the effect of that are only beginning to be realized. So for them it was a cost savings thing.

Faggella: Yep.

Daniel Hernandez: But now it’s starting to have really interesting implications to how quickly they’re recognizing revenue for their customers, how well they know their customer? How much better they know their customers? That they could have targeted promotions and campaigns against not just a customer demographic and aggregate, but to a specific customer. So I think the benefits in terms of client satisfaction, revenue yield are to come. The initial reason to get started with the data fabric was I just need to reduce the cost to standardize a single entity…

Faggella: Yep.

Daniel Hernandez: Of nine million from months to this.

Faggella: Got it, got it.

Daniel Hernandez: Good story. I got another one.

Faggella: All right?

Daniel Hernandez: So, largest telecommunication company, in general, the thing that they’re most sensitive to is data privacy, not just meeting their compliance obligations, but doing right by their customer and offering even better data protections than what are required by regulation. Their issue was they didn’t understand where PII was. So they had a compliance governance motive in their particular case, not a cost one, trying to understand where PII simply was an issue that they didn’t have methods to solve. So they used our data catalog. They index metadata in place. The data analytics on the metadata, and they were able to establish heat maps of where this personally identifiable information was and now they’re remediating it. So the risk profile to them is reduced, the cost to reduce that risk significantly reduced. And now they’re moving on to self-service analytics because now they know where the information is not just PII information, they could better service their data science and business analytics needs as kind of the secondary use case. So, it’s a wonderful like, what’s next story with that customer at least

Faggella: Got it. And can we go into a little bit of nuance on these? I’ve got some, some questions as we fly through.

Daniel Hernandez: Go for it.

Faggella: So, good places to start here. When you mentioned networking company, I actually assumed you’re talking about telecom in some way, we don’t have to name names, but how should we grasp the nature of the business for the people tuned in? Then I’ve got a few questions about that first example.

Daniel Hernandez: They power a lot of the networking equipment in enterprises and in telco.

Faggella: Okay. Got it. So, in that particular example, again, we often find the cases that, as you’d mentioned, once we’ve got all the data accessible for the 360 customer shebang. And as you’re aware and as you stated frankly, every industry wants that telecom wants that, healthcare wants that, financial services wants. Everybody wants to know everything they can about this particular customer, particularly in legacy enterprises where those silos are a horrendous issue.

Faggella: In their particular case, it sounds as though the sort of you’ve talked about it kind of like unite unifying an entity. In other words, and the examples of where I’ve seen this come up are, let’s say invoices, we get an invoice from, I mean IBM themselves probably sends invoices out as a lot of permutations. It’s probably not always like, international business machines Inc. Right. Sometimes like IBM India, LTD, blah, blah, blah. And we need to understand. So invoices, like one place where this happens, sometimes it’s like contacts and contact emails. Who do they belong to? What company should they be appended to? What kind of data are we talking about when you’re talking about this consolidation, just to make this concrete?

Daniel Hernandez: Well, you gave really good examples, right? Like, take an entity like IBM. IBM has subsidiaries and virtually every single country we do business with, we’re not alone. I mean it’s a pretty common and thing. If you’re going to do business and serve a particular country, typically you would have or often you will have a entity inside of that type organization. So what is IBM is kind of a really interesting question to unravel and what’s the revenue basis and how important is that customer to you if serve us? Right. So, in this particular case, they had commissions, they had what is the purchasing power of that customer considering that the more they from us, the more discounts they give given that they’re purchasing is distributed across their subsidiaries. Right?

Daniel Hernandez: So when do we recognize certain revenues is also depending on the identity information, at least in their particular case. And so consolidating, reconciling, standardizing, not just the individual entities and matching them, but also the hierarchy of these things so that you could do things like, hey, this company spent a million dollars. Now they’re going to earn a 40% discount rate for every dollar they spend in above that, and that spend was distributed across their [inaudible 00:16:28] just examples of how they’re using this solution to now drive business outcomes. And they’re much better for it, for sure.

Faggella: Got it. So invoices recognizing revenue, potentially that the contacts associated with it. And again, this will help to say, not just okay, we made X revenue, but what customers were responsible for what who’s paid, what’s been due. And again, what are subsidiaries of a larger companies that we understand, the value of a single client, because it’s useful to know who our whales are, whatever the case may be. And you can imagine you said nine million customers doing that at scale is a big deal. And so being able to get to the bottom of that data is helpful. You also mentioned something that really rings a bell for me, in terms of just hearing a million stories of vendors journeys into the selling to enterprises, as well as enterprise journeys adopting artificial intelligence, very rarely is the creation of new capabilities to unlock new business models and revenue opportunities the early steps towards AI, particularly in legacy enterprises, but enterprise of any type, even anywhere in the Western world, even it’s just not move number one, it’s risk reduction or efficiency.

Faggella: The C-suite often sees AI as, and I don’t think you and I do, so it’s almost analogous to automation just like AI equals automation. Now, not every C-suite person, right? We’ve got plenty of exceptions, but if you’re going to roll the dice and you’re going to say, what do they associate with AI? It’s going to be some kind of efficiencies most of the time. This is my experience with too many vendors, but you had mentioned that once this initial application comes to bear, we can make this certain process efficient. Now, all of a sudden, there’s new ways to report, there’s new ways to maybe find connections and similarities and say, who are other customers that have purchased at this kind of a speed and rate, who have propensity to maybe buy these additional services or who are the customers that have like a higher churn likelihood when these other factors come into place?

Faggella: Well, now we can actually run that stuff. Now we can unlock not just more BI we could do boring BI if we have a better data fabric, presumably, right? Like not even special, just boring BI, but it matters so much. We can even do that, but now we can start to unlock new capabilities. Do you often find that this is the journey. Like, we do something for what we assume AI is about use case, and then the capabilities bloom, as we start to realize like, oh, we can open this stuff up now.

Daniel Hernandez: I think your initial assertion, which is when people think of AI, they think of the application of that to yield efficiencies primarily through automation. I think that is mostly true. But if you want to run a marathon, you probably need to run your first model. One of the best ways to get skill, to develop your own bonafides and confidence that this stuff works is targeting problems that are, I won’t say relatively easy to solve, but given alternatives, there is no other game in town. And that if you did what yield benefits, usually the cost equation is where most people go because it’s quantifiable. I spend X, if I could reduce 50% of that, I bring that capital and I could redeploy it to more productive use. So I don’t think there’s anything wrong with it, but I do think the advancement of that to, how do you serve your customers better? How do you know your customer better? So they could serve them better.

Daniel Hernandez: Typically, when we talk about like the non-cost equation, we talk about marketing to your customers, better, delivering better promotions, more targeted promotions what’s missing in that entire discourse is how is that going to make your customer better? And when you’re serving them, how do you make them feel better about do business with you? I think that’s equally important, not just yielding more revenue to the business based on more effective sales and marketing to the end customer that you serve.

Faggella: Big time. Yeah. And I think it’s the responsibility a lot of the time of the vendors and service providers to help open the aperture of what AI could mean. Because I think it often is not always, not always, but often is associated with kind of like reduction of monotony, driving of efficiencies. If you just take a C-suite person, you just grab one out of the hopper, just one of them, somebody with a director title they’re going to associate with that stuff. And I think it often lies in the purview of the service provider to expand that definition like you said, say, hey, how can this unlock X, Y, Z, et cetera. And clearly this data foundation, this is going to be a lot of the early work in these transformation projects.

Faggella: We look across every industry, something like making our data more accessible and having features that matter for future AI use cases and having governance structures in place. So, we’re not putting ourselves at risk, just a ubiquitous part of the journey for almost everybody involved. Do you see this kind of data fabric dynamic and the related work around it as for many enterprises, almost being like a first step towards, wherever they want to go with AI because I know so many firms are so early stage and even just this stuff.

Daniel Hernandez: Well, It’s kind of interesting. So, most of our customers approached us over the last five years around, hey, I got this big idea, AI is going to be the magic answer, need your tools, right? Like those step one, give me the tools. So I could equip myself to apply this to big problems. It could be take out cost serving customer betters, better, whatever.

Faggella: Yep. Yep.

Daniel Hernandez: Critical issue like phase zero was there aren’t a lot of skills inside of the typical enterprise to wield these tools, the way that they needed to, to solve the problem. So we started making investments, right? Like, there’s stuff we were doing in a tool to make it more accessible, but there was also a skill that we were making available on our dime to help our customers. So, I introduced something called the data elite team, which are a bunch of wizards that are data scientists that allow a customer to bring a problem, bring their data, bring like the people that have skin in the game to work with us to actually apply data science, using various techniques of artificial intelligence, typically machine learning to see if there was something there, just to see if that experiment would yield positive results.

Daniel Hernandez: What we realized was, hey, look, there’s not enough. This core data problem is the critical bottleneck, right? Setting aside skills, setting aside tools, which we invested, the industry’s invested, I would say largely an open source to do something about this core data bottleneck is the bottleneck. And unless we figure out a way to solve it, like your ability to use artificial intelligence to build your field of dreams will be largely limited. So, I won’t say we stumbled on data fabric as an antidote to the AI problem. I guess I will say that we did stumble.

Faggella: Yeah.

Daniel Hernandez: It’s like, hey look, we better figure out this problem. And the current tech, it isn’t to build a better data world. Yes. That has a place, right? Like if you’re using a hundred SAS apps that are siloed in all of them happen to run on one or more public clouds, consolidated data into a cloud data warehouse makes sense.

Daniel Hernandez: And so having the best data warehouse makes sense, but we knew even then we could make the best data warehouse for the next 100 years to come. And the core problem still would not be addressed. And so, that kind of is the reason why we reason through this problem. We believe the data fabric which is supported by tech, but is a idea, architectural strategy, which is independent of technology. And we certainly have convinced ourselves working with our customers, that it is the best hope all of us have as an industry to apply AI. Because unless you solve this core data problem, you won’t be able to do it.

Daniel Faggella: Yeah. Yeah. And well, and like you mentioned, you’re working on getting past the other problems. There’s the culture and there’s the skill problems as well, and you had mentioned having this elite team that would go in, help people kind of see the broader capabilities of AI, leverage some of your own high power talent to be able to bring those things to bear, come up with the good ideas, maybe build a little bit of excitement so that people are willing to do an R and D type project, which often AI is like, but even if you had all those things and you had them in spades, if you were stuck in silos, you’d be trapped as all heck to do anything meaningful, they could make its way into deployment or out of a sandbox somewhere. So, interesting to see that this, again, as you had said, technology agnostic kind of approach is almost like what you bumbled into as where we need to go enterprise wise to make AI come to life.

Faggella: So, now we can move into how to start moving in this direction. As you mentioned, there’s different ways to do it. I mean, there’s different kinds of technologies that are storing various and sundry data, depending on what industry you’re in, what country you’re in, what have you, what does it look like from an executive’s perspective? Like one of our listeners to think through and move towards a data fabric, a source of truth for the most important core data that we have.

Daniel Hernandez: Start small, similar with the data science game, it was pick a problem, make sure that we have the adequate data and there’s people with skin in a game in your business that will work with us. We’re advising our customers for data fabric to do the same, right? The intergalactic vision of the data fabric is the 75% of the dark data in your enterprise will all be accessible, will all have data protection. So people that have privilege can access it, people that don’t, and you’re able to manage the entirety of the life cycle. So for an instance, if there’s data that you are spending money to manage and it offers and confer zero value to you, then you’re deleting it as a for instance, like that’s, that’s the outcome we all want to drive. There’s no chance you’re going to be able to realize it if you set up an agenda with that scope as the initial waiting start.

Faggella: Yeah.

Daniel Hernandez: So definitely got to start small. Just go back to the two examples I gave you. The customer 360 problem definitive, like in tactical terms was it takes me two months to master a single entity. And there’s a whole bunch of business pain that I am incurring because I haven’t figured out a way to solve the problem. Their experiment was, can you reduce it by 50 percent, just give it to me in a month. What we’re able to prove is we could do it in days and that’s even with some manual steps throughout the process. Our goal is instantaneous. So starting small, whether it’s the customer 360 mastering the single entity or in the case of the telco, it was, I don’t know where PI is. And they had special sensitivity to their data lake in their own private cloud because they were building models and training models and inserting those models in high stakes decision making and business critical scenarios.

Daniel Hernandez: And so like they didn’t want inventory, the entirety of their it landscape. It was that data lake, which was under a couple petabytes large. They just wanted to inventory and analyze where PII. So that those are two small examples where just very targeted initial use cases and then to make it easy for our customers, I’m saying hey, look, I’m going to make an investment in you. I’m going to bring a data elite team that can help you stand up the components of a data fabric targeted at that problem. And if we solve the problem, I want to get paid. But then the benefit to you is you, yes, you get the answer to that particular question or the solution to that particular problem. But I’ve laid down a foundation, the underlying components of a data fabric, which can graduate to one that accesses all your data, protects it all and give you convenient means to move it all.

Faggella: Yeah. So I’ve one other short question as we wrap, but I want to poke into exactly where you’re going here. Again, the idea of starting small, but thinking big something we’ve heard on a number of different occasions.

Daniel Hernandez: Everywhere.

Faggella: Yep. Clearly the way to play the game. And you’re saying, look just like for an AI project, we want to think the same way about this data fabric concept totally makes sense. And there is this interesting tension and I got to get a little bit of your perspective on this because we think about this constantly. When it comes to bringing AI to life, in terms of return on investment. There is a tendency to perpetuate the plug and play perspective. So we think about it like a continuum where on the one side you have the plug and play perspective. The other side, you have a paradigm shift plug and play is like, tell leadership, tell stakeholders.

Faggella: Yeah. It’s just like IT we’re just going to plug it in we’ll get this result for you. That’s one set of expectations we can have, the other set of expectations, like you said, is the gala galactic vision, right? Of all the new ways we can unlock and bring to bear the value of data and automatically delete things we need to and have features, be appended to stuff automatically in ways that are instantly searchable. That’s too intimidating. It’s too much work for now. It’s also too much of a conceptual shift for now. So it’s about explaining this first project to stakeholders who have skin in the game. As you said, the people cutting the checks, the people involved in the projects, involved in the outcomes, finding the right place on that continuum. Because to your point, if you slide all the way to gala galactic vision, it’s just too much.

Faggella: It’s too much work. And honestly, it’s too much of a conceptual shift. But if you make them think it’s just IT, then you’re not going to get what you just said, what you ended with, which is, hey, you’re going to wrap this project up with the beginnings of a new way of doing things. What we call a capability ROI. It’s not even necessarily the immediate efficiencies you get. It’s a capability ROI. So if you don’t at least talk about that paradigm shift, then you won’t get any of that capability ROI, which is so to the juice you’re articulating. How do you think about explaining and talking to enterprise leaders in a way that’s not intimidating, but also steers clear of that plug and play model that doesn’t let them learn and doesn’t let them see that longer term investment. I’d love your take on this Daniel.

Daniel Hernandez: Well, you and I are kind of running a simulation of the kind of conversations I’m having with clients now, right? It is driven by some pain point.

Faggella: Sure, yeah.

Daniel Hernandez: Oh, I’m spending $30 million on my traditional data warehouse. I’ve replicated that on my public cloud, I’m spending another $30 million and then I’m sub setting that from my data science and I’m spending another $90, $30 million. I’ve got a $100 million of total spend on just core data management and that’s not complete, right? Like this is not even talking about the transactional store, supporting their mission critical workloads. And yet I’ve got everybody complaining that it takes too long down board data. The data quality is suspect like, please help. Right? So, the conversations always start with a critical pain point…

Faggella: Yeah.

Daniel Hernandez: That a client is suffering from. And then the discussion is, would the idea of a data fabric help because if it is, then let’s explore an experiment to get you started to see if an acute and very limited issue could be solved. And in solving that we could begin to replicate more capability, deeply integrated that then graduates to this, this data fabric.

Daniel Hernandez: So, in many ways to be honest, the conversation here is the conversation with our customers. And I think we’re better for it is forcing them to just think about, hey look, actually, what I’m challenging them to do is to say, if you continue as you are or let’s even do a thought experiment. Let’s imagine you’re 10X more efficient in what you’re doing. Are you going to be that much more successful? I think the answer on unequivocally when these people are thinking about the nature of the problem and their techniques we’re solving and the progress so far that they’ve made, despite enormous spend, I think they’re willing to say, yeah, no, it’s not.

Daniel Hernandez: And so I better think differently. I better start to experiment with solving the problem a different way. And I just give them a low cost, low risk way to just start that process.

Faggella: Yeah. Okay. I like where you’re headed here. So what you’re saying is it starts with the pain point as any good engagement will, right? When we start with man, we want to start using some AI around here. Those are the most questionable conversations, but if it’s, hey, here’s a pain point I’m dealing with, we need the right tools and approach. We are saying is, if you can look under the hood at what the cause of this dog on issue is and see that the undergirding infra could really be why we’ve got pain here, if you can make them say, yeah, you know what, this stuff is actually kind of what’s holding up the process. Now you can open up a capability conversation and it’s not you imposing it. It’s them agreeing to it’s them saying, you know what, it is our capabilities that are ultimately bounding us here. And that lets that conversation roll out without it being an intimidating paradigm shift, a galactic vision as you said.

Daniel Hernandez: Galactic vision with a galactic project plan probably not a good combination.

Faggella: Yeah.

Daniel Hernandez: But a pretty ambitious vision with a practical way to realize it is kind of what we’re arguing here.

Faggella: Yep.

Daniel Hernandez: Just to talk about the data fabric in isolation is not particularly helpful.

Faggella: No.

Daniel Hernandez: Because the data fabric is in support of something, right?

Faggella: Yes, yes, yes.

Daniel Hernandez: To what end are you trying to use the data fabric?

Faggella: Yep.

Daniel Hernandez: It exists to supply information with data protection built in that you could do something. If you’ve got a data science campaign within your business and you’re building say models for something like, I don’t know, customer onboarding for say a loan, you had better make sure that those models that are doing things like determining credit scoring are trustworthy compliant. And ultimately you begin to unravel that a little bit and you say, okay, how can I do that? Yes, the data science tooling, including those that are through tools like mine, Watson studio made available through cloud back for data are part of the solution. But the underlying data in which those models were initially built, be back tested against new information so they can be potentially retrained and new models that are better performing graduated. Like, the data fabric should exist to service applications of use like that. Just installing one with no link to consumption is like extremely wasteful too.

Faggella: Of course. Yeah. You can’t get budget for it anyway. Right?

Daniel Hernandez: I don’t know. In this free interest environment.

Faggella: Maybe, I normally see checks get cut when we can address a specific problem. If we say this is building AI maturity, this will help unlock future capability. I just don’t see checks get cut for that. I see maturity get built for better or for worse. I see maturity get built, whether it’s culture skills or infra like we’re talking about, I see maturity in an organization get built as it pertains to. And as it follows from the spearhead of a particular project whose ROI can be accountable in a relatively near term. And what you’re advocating for is, look that’s how we think about fabric as well.

Faggella: So, Daniel closing question, I think about my own audience, a lot of them are going to be the very early stages of bringing their data to life to make AI reality. Data fabric will be one of the many ideas maybe that they potentially integrate into their plan. If you were to talk to enterprise leaders who are beginning this journey towards a data fabric and towards sort of unlocking their data infra, getting out of the silos and getting early use cases rolling. Are there any other bits of parting advice you’d to them to make those initiatives successful?

Daniel Hernandez: I would link it to kind of a use case around use. Let me organize them in business analytics, everything from classical PI to predictive analytics are going to generate the need for data that may or may not be easy consume. And so much as the data is not easy to get because you can’t self-service it, is a good sign that a data fabric ought to be married to it. I talked a little bit about the data science scenarios, where you’re building models, you’re putting them into business critical scenarios, and you need to be able to trust that those models perform as advertised, continue to perform as advertised, and that ultimately is going to create a need for you to make sure that the stuff that it was trained on is good. And that the stuff that it will continuously be trained on is current contemporary reflective of the reality of the business.

Daniel Hernandez: So that’s another good example. And then in terms of protection, like there are a lot of data privacy and compliance obligations that are difficult to solve with current methods of data governance and compliance and a data fabric offers ways to help you meet those obligations cost effectively. And with benefits that are, that have nothing to do with compliance at all. Like if I go back to the telco scenario.

Faggella: Yeah. Yeah.

Daniel Hernandez: If I understand with where my PII data is, I know where my data is. I know what it is. And because of that, I can actually put it to work. So data discovery, scenarios, building model scenarios, like our secondary benefits of dealing with of things like privacy and compliance. But those anchoring your data fabric ambitions against one of those three critical areas, data science, business analytics, and compliance and data privacy are generally where I encourage our customers to start because it’ll force you to confront the data problem and realize, hey, your methods for solving that problem today are not adequate. If you want to do something about it, you’re going to have to take an alternative approach.

Daniel Faggella: Cool. Okay. So you’ve got three lenses that’ll often lead back to a fabric, make it a valid choice and selection. I hopefully this is useful advice for the folks who are in the enterprise, who not only have to spot these opportunities where to start this journey, but also communicate its value up stairs, lots and lots of meat in this episode, Daniel, I know that’s all we have for time, but I sincerely appreciate you sharing your expertise here. Thank you so much for joining us on the show.

Daniel Hernandez: Thank You, Dan.

The Fundamentals of Enterprise Data Fabric – Unlocking the Value in Enterprise Data – with Daniel Hernandez of IBM

Key Insights

Full Interview Transcript

Services

Resources

Company

Key Insights

Full Interview Transcript

Related Posts

Related posts (5)

Data-Driven Software for Enterprise – Evolving Industry Standards

Follow the Data: Deep Learning Leads the Transformation of Enterprise – A Conversation with Naveen Rao

Why Big Data in Business Still Needs Human Intuition

How Existing Businesses Should Organize Their Data Assets for AI

Why Big Data is Not Necessarily the Best Data for Business – A Conversation with Slater Victoroff

Services

Resources

Company

Stay Ahead of the Machine Learning Curve