Data Democratization: the Dream and the Disaster
What is data democratization?
Data democratization has become a business buzzword these last few years, but what does it mean? The basic idea behind this movement is allowing non-technical users access to company data. Instead of requesting a report or analysis from an internal data group, employees of all technical abilities can access data and perform their own analysis.
Why do this? The goal behind this shift is to address a few common business challenges. First, data democratization aims to eliminate silos of data within a business. By allowing free access to data across departments there are fewer 'territory wars' about who owns which dataset and how it can be used. The second issue data democratization tackles is long turn-around times. When a company has a centralized data group, that group’s to-do list tends to grow faster than they can manage. Delivery times start slipping. That report you asked for yesterday? Two months minimum until you see it. If everyone can work with the data, though, you're not waiting on a small group of analysts anymore. Thirdly, this shift is about getting relevant results. After all, who knows your department better than you do? In data democratization, the people asking the questions are the ones finding the answers. Thus, the dream of data democratization: fast, relevant results that you don't have to fight for.
So where can this go wrong? Data democratization at its most dismal is beyond frustrating! Opening the floodgates to data means massive duplication of effort across the organization. Let's walk through an example. Company DatDemo, Inc has recently launched a massive data democratization initiative.
DatDemo’s Marketing and Shipping departments both want to know how many orders are coming in. Thanks to data democratization, they don't have to ask for this report: they just build. Immediately, the company now has two reports trying to provide the same number. There's a problem beyond duplication of efforts, though: the reports don't match. The Marketing department’s report shows nearly 37% more orders each month than the Shipping department’s report. Which report is right? Well, whichever report was produced fastest and distributed the furthest. Truth in this scenario mimics an actual democracy in that it gets voted on.
Executives don't want two different numbers so they involve that original group of analysts to create a third report which doesn't match either of the first two. After investigation, it appears that Marketing was counting unique product id's for their order count while Shipping was counting unique tracking numbers. The order table lists accessories under different product ID’s so Marketing was overcounting the total orders when it counted the accessories separately. Meanwhile, Shipping was understating orders because DatDemo ships multiple items together on a single tracking number. Each department's assumptions were reasonable and ultimately incorrect.
Dividing the difference
Why did it all go so wrong? The scenario above occurs when companies leap into an open-data concept without first building a data governance policy, cleaning the data, preparing a data catalog, and establishing a centralized visualization solution. A data governance policy will dictate which users can access each dataset as well as how analyses are distributed and acted upon. Oftentimes, the centralized data group becomes a data review department responsible both for maintaining the data and reviewing analyses to ensure data is not being misinterpreted. Cleaning and documenting the data is also critical to your success. Behind the scenes of a database, fields can often be named in misleading ways that will lead your new report-developers astray. Creating a centralized dictionary of what each field is will help avoid those mistakes. Lastly, a centralized visualization platform such as Tableau or Power BI will help reduce redundant work if end-users are trained to search it before building new analyses.
Before taking the leap into a data democracy, make sure the shift will address your company's current issues. If you decide to go forward, put in the work: establish a data governance policy, clean and document your data sources, and invest in a modern visualization solution.