Although the term 'dark data' has a slightly sinister ring to it, there is nothing illegal or immoral about dark data. The term simply refers to the information that organisations have gathered and stored, perhaps in the course of other activities such as business communication, but which is not currently being used. 'Unstructured data' is probably a less emotive term, which also hints at the challenges presented by information of this type.
Estimates vary about the proportion of information held by organisations which would come into this category, but it's certainly more than half of everything gathered and by other estimates, perhaps as much as ninety per cent.
An obvious example is old email traffic, which may contain substantial and usable information about customers and competitors. Website visitor records, accounts, telephone data and even GPS records are also potential resources which are rarely put to good commercial use. There are probably many others yet to be discovered.
In the past, these underused sources were often seen as simply too cumbersome to analyse. Those data sets would necessarily be incomplete and nuggets of valuable information would probably be hidden in very large quantities of useless verbiage. Starting new research from scratch might therefore make better commercial sense.
Modern techniques of data analysis may be changing the economics of unstructured data. Mining for gold may involve processing a great deal of unwanted material, but if the technology can cope, the process can still be worthwhile; it seems that the same holds true for data mining.