By Mangala Namasivayam, Senior Programme Officer, ARROW
One year has passed since the adoption of the 2030 Agenda for Sustainable Development and the Sustainable Development Goals (SDGs). The development world has time and again acknowledged the importance of data if we are realistic about translating the 17 aspirational ‘Global Goals’ and 169 targets from paper to practice and properly implementing and monitoring them on a global scale.
A key lesson learnt from the Millennium Development Goals (MDGs) was that a lack of reliable data can and will undermine governments’ ability to set goals, optimize investments decisions, and measure progress. In line with that, the importance of data as well as the inadequacies of current data sources was brought to the forefront and discussed in great length during the negotiation stages of the goals and the indicator framework. Whilst the solutions for overcoming the data challenge in measuring, monitoring and holding governments accountable are complex and convoluted by their very nature, harnessing the potential of the data revolution requires careful analysis and handling.
The loudest rhetoric around the 2030 Agenda has been the much-publicized ‘Leaving No One Behind’. In reality, to do so, the need for better data to monitor and implement the goals requires disaggregated data by geographic location and by sex and gender, but more specifically by population groups, such as children, women of reproductive age, youth, older people, ethnic and religious minorities, and other vulnerable groups that are usually the ones left furthest behind. In monitoring the 2030 Agenda, there is a need to emphasize trends, whether progressive or regressive, and to monitor policies and policy changes, not just outcomes. The statistical analysis should complement, not substitute, qualitative assessments.[i]
The data that is generated also needs to be easily accessible to and useable by decision-makers, civil society groups, the private sector and citizens. However, the pertinent question at this point is – are we on track to maximizing the potential of the data revolution, or “data deluge” and what are some of the challenges ahead? How does this ‘revolution’ address the added data gap that continues to exist in gender data?
There are many limitations of data on development, resulting from the culture in development, a sector that usually does not prioritise investments in extensive data systems and evidence-informed policy making is not traditionally cultivated. However, data on women and girls suffer from a more systemic issue that is rooted in the intrinsic biases in measurement and attention, resulting in bad data as well as a situation where there is no data on critical dimensions of their lives. Which brings us to the question of which is worse – no data or bad data?
Lack of data or no data is commonly found in aspects of the lives of women and girls that are not highly valued by society. What is not counted is not valued and what is not valued is often not counted. It is a vicious cycle and expands on the systemic inequalities that women and girls are subjected to. These include unpaid work in home production, time spent fetching fuel and carrying water, housework, childcare and eldercare, all activities carried out mostly by women and girls. These are part of a ‘care economy’ that undervalues and therefore does not count in official statistics.[ii]
Bad data on the other hand is data that systematically misrepresents reality, particularly in ways that make women appear to be more dependent and less productive than they actually are. It results from traditional gender role stereotypes where the man is the producer and provider while the woman is the reproducer and caretaker.[iii] Bad data introduces systematic biases in data collection, reinforces stereotypes and hampers the ability to influence gender specific policies, track progress and demand accountability.
Big data is an umbrella term referring to the large amounts of digital data continually generated by the global population. A large share of this output is “data exhaust”, or records generated passively as a by-product of everyday interactions with digital products or services[iv]. It has to be understood that big data is more a process than an object, a verb more than a noun, because it is not so much about the size of data as it is about the process that involves analysis that characteristic the data as big.
Today, anything from mobile call logs, online searches, mobile-banking transactions, online user-generated content such as blog posts, Facebook posts and tweets, satellite images and a range of other online activities can be turned into actionable information using computational techniques. These can unveil trends and patterns within and between extremely large socioeconomic datasets.
Big data for development is a concept that refers to the identification of sources of big data relevant to policy and planning of development programmes. Sources of big data for development are those which can be analysed to gain insight into to human well-being and development and is usually digitally generated, passively produced (by-product of interactions with digital services), automatically collected, can be tracked geographically or temporarily and is continuously analysed (in real-time).[v]
The amount of available digital data at the global level grew from 150 exabytes in 2005 to 1200 exabytes in 2010. It is projected to increase by 40% annually in the next few years, which is about 40 times the much-debated growth of the world’s population. This rate of growth means that the stock of digital data is expected to increase 44 times between 2007 and 2020, doubling every 20 months.[vi]
At the most general level, properly analysed, these new data can provide snapshots of the well-being of populations at high frequency, high degrees of granularity, and from a wide range of angles, narrowing both time and knowledge gaps. Real-time awareness of the status of a population and real-time feedback on the effectiveness of policy actions should in turn lead to a more agile and adaptive approach to international development, and ultimately, to greater resilience and better outcomes. Big data for Development is about turning imperfect, complex, often unstructured data into actionable information.[vii]
The appeal of big data and the promise it carries is mainly because it is produced at a much more disaggregated level, e.g. individual instead of country level, allowing portions of populations that were previously neglected to be considered in decision-making, such as women and girls. Big data can help deepen knowledge into women and girls’ individual preferences and the collective behaviour that results from these preferences. However, applying big data analytics to guide development work faces several challenges.
From a technological and analytical aspect, a huge challenge faced in using big data for development in lies in both analysis and interpretation. The issue around selection bias is very real and continues to be the most significant barrier to the use of big data for official gender statistics. At this point, it has to be understood that big data is not whole data (Boyd and Crawford 2012)[viii]. For an example, the digital divide (in something as simple as mobile phone ownership between men and women in many developing countries) could greatly alter representation in data analysis. Another challenge is that of spurious correlations, a situation in which two variables have no direct connection (or correlation) but it is incorrectly assumed they are connected as a result of either coincidence or the presence of a third hidden factor, something that increases with the amount of data used[ix]. Correlation is potentially useful due to their predictive power, but it does not also imply causation which means it gives very little insight on why these two things are related.
While it is tempting to get carried away with all that big data offers for development and public good, it is also pertinent that we stay aware of the ethical trade-off that this might entail. It is a difficult question and the lines often get blurred when the debate is between private rights versus public good. When, and to what degree, does public good outweigh the privacy of the individual right and the security of their information? Privacy is a fundamental human right and an important pillar of democracy, and is a persistent concern while using big data for development. The right to agree or refuse is often not available in this context to the public and this becomes problematic as they unknowingly consent to the collection and usage of data without understanding how it may be used.
These concerns are closely related to the extent of the legal mandate needed by data collectors to collect, analyse and disseminate data. It also includes the acquiring and usage of data from a third party, whose methods and means might not always be transparent. Are the data brokers, e-commerce providers and other companies legally required to be transparent with the data they collect and retain?
We need to acknowledge that advances in big data hold the promise of an exceptional development of a new demographic knowledge for practitioners in development. However big data is not the one stop solution to all our data gaps and needs. There is a need to be aware that being able to count on plenty of data does not mean that we have the right data as data bias could inherently lead to conclusions that are inaccurate or worse, misleading. Therefore the use of big data needs to be in tangent with traditional data sources, to add new layers of details while using traditional sources to validate big data[x].
We also have the social responsibility to balance the usage of big data for public good while respecting privacy and individual rights. Both data literacy and transparency are needed for driving and enhancing social discourse as we continue to debate on finding the desired balance between data collectors and their transparency, or consumers and their understanding of the implications behind their digital interactions and activities.
Civil society groups are crucial but currently underrepresented in debates about privacy and the rights of technology users. Civil society has a responsibility to build critical awareness of the ways big data is being used by corporations, governments and other actors to sort, categorize and intervene in low-income and middle-income countries. [xi] As civil society, it is pertinent for us to create platforms and engage actively on conversation about the potential of data and its challenges. As we build momentum towards the follow up and review processes of the SDGs, we need to identify ways to work with it as well as critique it when necessary.
Making good use of big data will require continued and establishing new collaborations of various stakeholders including data scientists and practitioners, leveraging their strengths to understand the technical possibilities as well as the context within which insights can be practically implemented.