the Long Tail of Data Democratization
Long tail of data
Pareto principle or its well- known long-tail statistical counterpart, as coined by Chris Anderson is not new in the economics of supply demand. But how about applying these from the perspective of data democratization instead?
Similar with the demand curve, from the focus on a relatively small number of key datasets or popular monthly reporting data cubes, changing business dynamics often coerced analysts to move towards exploring the huge number of niche analysts-created datasets at the tail end of the curve. But like supply-demand dynamics, the tail withers when demand is unable to play catch-up with this new supply. In data demand context, it is not just the myriad of data but also the data consumers gravitating towards it. Unless data consumers are offered access to this massive expansion of data demand, and need them, these variety of data has no meaning, and it is unlikely that the true shape of the tail can be accurately revealed and assessed.
Characteristics
The head or leading end can be represented by the existing reporting data cubes found in monthly dashboards or reports; and at the tail, this would be the data created by domain analysts, often residing in their local machines. In short, the characteristic of data at the head and tail can be summarized in the following chart, largely derived from Heidorn (2008):
Driving demand down the tail
Following the 6 themes of long tail age with 3 critical supporting demand drivers by Anderson, we can similarly superimpose this to reflect the data long tail scenario.
1 Democratizing data creation tools
Democratizing the tools of production include driving down the cost of production, from giving employees the capacity (time) to create; and to making access to and use of analysis and BI tools more ubiquitous. Inevitably, the number of producers tend to increase and hence also the emergence of analytics champions.
2 Democratizing data distribution
Here, the priority is in driving down the costs of consumption and essentially fattening the tail. Cloud based data analysis/ visualization tools such as Data Studio that enables effortless sharing allows everyone to eventually be a data distributor of their analytics piece, driving consumption and essentially increasing the area under the curve.
3 Connecting data supply and demand using filters as drivers
Where filters and mapping of data supply demand is concerned, an effective knowledge-based repository, metadata/ data dictionary, and analytics driver activities help to introduce data consumers to these newly available datasets, eventually lowering the search cost in driving demand down the tail.
4 Flattening of the data demand curves
This is made possible with the expanded variety and data dictionaries to sieve through it. People often do not know what is possible or what they want until they know what is available. The leads and lags of the curve exists still but largely flattened at this stage.
5 Collective importance of data at the tail
Although these datasets might largely remain as standalone pieces, but collectively, the importance of these smaller, seemingly disparage datasets could potentially compete with those of key data cubes where insights derivation is concerned.
6 Revelation of the natural shape of data demand
Post elimination of bottlenecks and scarcity, the data demand curve is likely less exponential and often more diverse than what we would have initially thought of.
Summary
Data Democratization sounds like an unwieldy journey, but when breaking it down, it can be likened to the concept of harvesting potential long tail profit, and on driving down scarcity, ie the cost of data creation, the cost of distribution and having curated materials, driving accessibility to all that is available, and keeping available materials in a communication-friendly manner to potential data consumers.
References and Idea:
1. Anderson, C. (2006). The long tail: Why the future of business is selling less of more. Hachette Books.
2. Heidorn, P. B. (2008). Shedding light on the dark data in the long tail of science. Library trends, 57(2), 280–299.
3. Bronchiosaurus in Chart Idea taken from Maga Cabral’s The Long Tail Economics https://www.slideshare.net/mgcabral/the-long-tail-economics
The idea to put forth this in writing originates from a request for a team vision statement recommendation in September 2020, and partly because I was also tasked to drive a similar initiative in a previous employment. Naturally, my proposal was “Data democratization for everyone”, a quick modification from Human Rights Campaign’s “Equality for everyone”.