By Dan Woods.
One of the key lessons that emerged from the last 25 years of BI is that successful companies treat the data they provide internally and externally as a product.
One of the key lessons that emerged from the last 25 years of BI is that successful companies treat the data they provide internally and externally as a product. When data is a product, people know what it means, how it was created, and how to put it to use.
The original data warehouse stated out the promise to create a single version of the truth, which was very much a data product in that it was usually painstakingly researched, carefully designed.
But because such a data product evolved so slowly for various reasons, the next era focused on freedom and self-service, which put data in more hands, but let go of the research and design that created the data product. Access to data created benefits but everyone designed and crafted the data to their own needs. The result was too often data chaos leading to data brawls about whose data was correct.
The current generation of BI attempts to restore the clarity of the original data warehouse and preserve open access and self-service. Frank Bien, CEO of Looker, calls this the era of the data platform.
Recently, I had the opportunity to speak with Bien about this transformation and the three eras of BI he described at Looker’s Join conference last year. What interested me is Bien’s vision for how and why a data platform creates value. In an interview this year, I explored questions such as: What is a data platform? What is the difference between a data platform and what has come before? How do we know if we have a data platform? What are the signs we don’t have one? What will it do for us? And perhaps most importantly, why should we want one?
In my view, companies who understand how to create a data product that is both clearly defined but also supportive of self-service, will more rapidly unlock the value contained in their data.
Here’s how Bien views the history of BI and his thoughts on how to build a data platform.
The Three Eras of BI: Monolithic, Self-Service, and the Data Platform
Bien outlined his three eras of BI evolution as follows:
Monolithic Stack Era
First there were the monolithic stacks. Back at the dawn of time (or at least it seems that way given how quickly technology has changed over the past few decades), BI analytics were entirely dependent on SQL databases.
The first generation of SQL databases were slow and expensive and created through a lengthy research and design process. However, their prime advantage was that after all that effort, you knew the answer you got was the correct one. You could trust the view of the world the database gave you, as a singular truth was presented.
However, this analytic accuracy came with a lot of costs: in addition to the time and expense, the database was also extremely cumbersome to change. Data warehouses had to be optimized for performance using precomputation that populated information cubes designed to speed processing for a set of anticipated questions. Any change to the framework meant that getting the answers you needed could be delayed either by slow processing or by the need to enhance the design of the cubes. It was also a challenge for non-experts to use and plug together. The technology had to be highly curated. In essence, the stack offered industrialized answers to high value questions. You knew you could get the answer if you tried hard enough, but the process could be long or costly or both. Remember, compared to what came before, this was a victory.
Data breadlines often occurred in the monolithic era when analysts wanted to ask and answer questions and wait because self-service tools of the monolithic areas such as reporting environments had few degrees of freedom. Only questions that were planned for could be answered. If you got help answering question, often you would quickly have to wait again to get a new answer, leading to frustration.
The monolithic era provoked, according to Bien, an aversion in users to asking too much of the database. “You never wanted to ask the database big questions, like analytic questions, because it would bring it to its knees,” he said. “There was a lengthy extraction process and everything was top down and preplanned. There was no room for spontaneity.”
The plodding nature of the industrialized stack left a lot of people frustrated. For too many people, the inefficiency of the stack meant that many of their questions were left unanswered. As a result, a variety of vendors created products that either side-stepped or enhanced the industrialized data pipeline. These products were self-service tools that allowed more people, including non-experts, to explore and analyze data. Yet, while this greater access was a benefit, the self-service world solved some, but not all of the data warehouse platform problems. The products addressed only certain facets of a data warehouse platform. There were none that offered a comprehensive solution. Some focused solely on visualizations, others on visualizations of larger data troves, others on simple data transformations. But even more problematically, self-service still required a lot of plumbing to be done to the data. As a result, companies data breadlines may have been smaller, but did not go away.
The new problem in the self service era were data brawls that occurred when people are using so many different products and data sets that it’s difficult for a coherent analytic view of the world to emerge. The prime benefit was that self-service improved the time-lag and, with the advent of the cloud especially, access to data compared to the stack era. Yet, the overriding consequence of self-service tools was that people tend to do whatever they can to get the answers they want, but no one knows what the data means.
Again, amid the breadlines and the data brawls, immense value was created. More data was in the hands of more people, who could do far more than they could just with simple spreadsheets.
“The self-service models presupposed that you had the data and that you knew what it meant, even though that wasn’t true,” Bien said. “It was a completely siloed approach. Everyone was doing their own thing with no common understanding of what the data meant. It was the wild west. It was totally ungoverned and it still is. That’s the problem.”
But in my experience in this era the light of the data platform started to shine. Certain dashboards became well-understood data products and supported the use of data in important processes. Certain people became trusted so that they data they produced could be relied on. The value of a data product could emerge, but it was not the result of how things were done. It happened in an impromptu way that was not predictable.
Data platform Era
The data platform is the third era of Bien’s vision. A data platform is created when the benefits of the stack and self-service combine to allow users to integrate, visualize, explore, and deliver analytics without the drawbacks of the previous two waves. The data platform is built on the back of the previous technology – there are now databases that are extremely fast, enabling the definition of the single version of the truth to be moved out of the database. The goals of access and speed of the self-service second wave remain primary, but the starting point is a much wider data product that can be updated far faster. By adding self-service to a fast moving data product, it is possible to get high quality, well-understood answers avoiding both breadlines and data chaos.
The data product is a key requirement of the data platform. Like the canonical model of the monolithic era, the data product creates consistent definitions of all they key terms and concepts that define a business. This prevents data brawls. But unlike the monolithic models, the data product is easier to define and update, and as a result is much wider and deeper.
The data platform sits on top of the rest of the data architecture and provides common governance and metrics, but also enables the self-service BI tools of the second wave. But perhaps the biggest benefit is that is a dramatically reduced need for data extraction and preparation to allow people to get the answers they want. The ideal state is that the data product is defined in a modeling layer that can keep up with the needs of the business. A design process and product management of the data product must take place, but because the model is wide and deep, users can find the answers they need via exploration of the model. When the model needs to be extended, it takes hours or days, not months, avoiding breadlines.
“With the data platform, you don’t have to get all these tools and piece them back together. You put the data into a data lake and then allow users to work directly from that so you get very rapid data prep and prototyping,” Bien said.
The speed of modern databases is key to making a data platform work. Essentially what happens is this:
- A simplified form of data stored in the database, reducing the amount of ETL done on the way in.
- The type of data prep usually associated with creating data objects to support specific types of analysis is done in the modeling layer. As the user traverses the model, the data is extracted and transformed by the data platform and delivered to the users.
- For this to happen without pre-computation, the data platform must generate queries to support navigation of the data. These queries must execute quickly to enable a pleasing user experience. Fortunately, the best modern databases are up to the task.
“The data platform moves beyond data extraction – the ETL framework,” he said. “Literally, the physical reality of the problem is changed. In the old world, you had to physically move the data to query it, move it again to model, and then move it again to use the self-service BI tools. In the new world, the physical step is just when you throw it in the data lake. Modeling and governance are all virtual. So now, if you want to change the question or you decide the way you modeled something was wrong and you want to remodel it, you’re not physically moving it. This makes everything easier.”
The result is that users can leverage data in new and powerful ways. They can ask questions and have reliable metrics, and build new applications through APIs based on those insights. In effect, users can build a data halo around an enterprise application that then describes what needs to be done for specific use cases. The bottleneck for data platform becomes the work needed to do to define the data product and implement a model, not the technology work needed to implement a supporting database. The key is to have the right level and right amount of staff doing the modeling.
These possibilities are why Bien is so optimistic about the outlook for data platforms going forward. He pointed out that no self-service BI tool companies ever hit the billion dollar mark in revenue. He thinks data platforms technology will because they offer users more capabilities. “It’s like we solved the issue of hunger for the data world,” Bien said. “And so now we can move up to higher level needs. We can go farther up the value chain and do higher level solutions. That’s the real value of the data platform.”
In my view, what will happen is that the race to find value in data will become the focus. Data, not BI technology, will be the star. The data product will expand as new kernels of value are found.
How is this achievable? I asked Bien about his thoughts on what the signs are if you’re living in a third wave world, or in one of the previous analytic eras. There are certain capabilities that show you’re living in a data platform world.
What are the signs you’re living in a data platform stage?
Here are some of the signs to show you’ve advanced to the data platform era.
- You can easily add new data when it arrives, and allow users to explore it and incorporate its meaning into the analytic framework you already have.
- You have an analytic platform that allows users to explore data how they see fit and offers them an unlimited view of the answers.
- There’s also a coherent view of the data, meaning that if users ask the same questions, they will get the same answers.
- Intermediation is manageable, and users do not experience data breadlines.
- If users need something new, they can get the answers in a reasonable amount of time and be satisfied and don’t have to wait every time they need something changed.
- Users can build up applications that go up the value chain, with applications geared for specific user communities. As an example, account managers can suddenly access data about how customers are using a product, offering all new insights.
- Data is no longer fenced in by BI tools. It can show up in applications, email, Slack, or where ever people are working, rather than requiring a context switch.
- And last, but certainly not least, the data team transforms from a gatekeeper to a curator. They are creating and managing data as a product that provides value to users rather than being the bottleneck through which all data requests must proceed.
What are signs you’re not yet in the data platform stage?
- If you still don’t have broad agreement on what metrics track performance, leading to data brawls and data chaos, and people being trusted instead of data.
- If you can’t add data easily.
- If there isn’t a cohesive platform for all your data and users are relying on individual self-service tools.
- If you can only do single queries against a small set of tables, you’re not in a data platform stage. You need to have a wide and coherent view of data.
- If you have silos of data that don’t allow users to ask broad questions.
- If your BI tools trap your data, keeping it separate from the tools people are using every day to do their work. BI is not a separate application in the data platform.
- If your data person is the gatekeeper and the only person who can provide answers about data. That’s the antithesis of the spirit behind the data platform. Even with self-service tools, users have to ask the data person to extract the data so it can be put into the tool. But in platform world, the data team is a curator rather than someone who specifically answers questions.
To Bien the whole point of the data platform is to free knowledge and make the entire company wiser through data-driven decisions. “With the data platform, you’re really able to build corpus of knowledge that gets better over time as people contribute more and more information. Data becomes a product that is designed to provide value.”
Follow Dan Woods on Twitter: