Struggling with Open? Data

A colleague of mine pointed me today at an interest resource for mobile-related statistics. The Mobile and Development Intelligence website hosts several datasets on the developing world mobile industry and beyond. Ken Bank’s blog mentions this has been done by the GSMA team, in partnership with ThoughtWorks and PwC, and investor the Omidyar Network.
The about page states that «MDI is an Open Data portal for the developing world mobile industry. We believe that open access to high quality data…»

So far, so good.

I then tried a sneak peek at the data and this is what I found, a sign in/register page:

MDI login page

No, I’m sorry, but whatever you have behind this it’s not open data.

The terms and conditions are not much open either. The licence section states that «GSMA grants You a non-exclusive, non-transferable, non-assignable licence to use and/or to access the Web Site and Data therein.» So what if I want to re-publish the data, e.g. I use some of that data with data from other sources, mash it up, and want to re-publish as open data the end result? Houston, I’ve a problem!
The section on «restrictions and permissions» also worths a read.

Honestly, it’s disappointing we are still seeing this things in 2012, especially coming from such a smart set of partners. I hope this will fixed rather sooner than later.

Note: I then decided to register and also to investigate further, register and, yes, I could doownload the data in CSV format.

A more generalized issue

One I got to the data, I realized that some of it was not from MDI itself but coming from well known sources wuch as the World Bank, IMF and others, according to the sources listed there. In fact, some of the datasets looked familiar, so I decided to compare the data shown at the MDI with (supposedly) the same data as offered by some of those sources (where I can really get it as open data).

Let’s take as an example the rural population dataset, people living in rural areas as defined by national statistical offices:

MDI Rural Population

MDI Rural Population

WB Rural Population

WB Rural Population

The first screenshot above shows the MDI data while the second shows the WB data. Can you spot discrepancies? It’s quite easy to do so. Not big differences but they are there.

MDI list as data sources: World Bank World Developmen Indicators & GDF, while WB lists the World Development Indicators. If I track back these I start to find more sources from UN, etc.

What’s the issue here? On one hand, there’s no direct reference to the data source (ideally a URI) where I can check whether the data presented to me is accurate or not according to the source. On the other, it doesn’t look like raw data to me, more like a combination of sources in a way I cannot really know about. As another example, the Bank’s total population dataset lists the following data sources: (1) United Nations Population Division. World Population Prospects, (2) United Nations Statistical Division. Population and Vital Statistics Reprot (various years), (3) Census reports and other statistical publications from national statistical offices, (4) Eurostat: Demographic Statistics, (5) Secretariat of the Pacific Community: Statistics and Demography Programme, and (6) U.S. Census Bureau: International Database.
Again, no direct links to sources but general pointers at organizations and no mention on how the data has been mixed.

I don’t want to go into much detail in this post about these issues but I wanted to note that in these days where transparency and accountability discussions are all over the place, when I’m hearing concerns about data manipulation every other day, it wouldn’t hurt to seriously think about these and sort them out the soonest.

Anuncio publicitario