Access Control
Introducing access control means that you are restricting who can see and use a given data set. As such, you are no longer publishing Open Data, and instead are either publishing Shared Data, creating steps to authorise and authenticate suitable users, or looking to negotiate 1-2-1 bespoke contracts between yourself and data reusers, prior to providing access to the data.
A common access control measure is the creation of login credentials, such as a username and password. When creating their account, a given user will likely be asked to accept specific terms and conditions relating to how they will access and use the data.
When might access control measures be valuable?
- The value of a dataset is the sensitive detailed data itself. Attempts to mitigate the risk of identifying either individuals or locations significantly reduces or removes the value to the potential reusers of that data.
Underlying assumptions
By introducing access control measures, you are restricting the number of individuals able to directly access and use a given dataset. You are therefore reducing the probability of potentially sensitive data being accessed by a wider audience.
With a small number of users, it becomes easier to monitor how data reusers are using the data, and as such, it is easier to confirm that the data is being used in ways that follow the terms and conditions or licensing.
Key considerations and risks
While it is often tempting to restrict access to a given data set, data publishers must consider whether the perceived benefits outweigh the overhead required in developing the access control system, and whether there really are significant risks in making the data either publicly available under the same terms and conditions, or openly available for all to reuse.
Determining who should be given access
You will need to ensure you have clarified internally, and communicated externally, which data reusers should be able to access the data.
Access could be granted to specific groups e.g. academic researchers or local authorities. This may suggest it is easy to just check whether individuals work for an appropriate organisation, however, there will always be edge cases to consider; not all academic researchers have a university affiliation, and many companies deliver outsourced services to local authorities.
Access could be granted to individuals or organisations working on a specific use-case, such as adoption of Electric Vehicles or retrofitting social housing. Again, there are challenges here as there will be inherent subjectivity inbuilt into assessing whether a specific use-case is relevant.
Clarify internally and communicate externally who makes the final decision, and how users excluded by your existing access policy can call for a review of this decision.
Authorising access
If many individuals are likely to wish to gain access to your dataset, there will be overhead in setting up appropriate systems to verify data reusers. There could be a single person or team responsible for seeing incoming requests and verifying these, or this could be supplemented by a system that automatically recognises and accepts certain email domains. Either way, it will need to be clear who is accountable for ensuring authorisation.
You will also need to consider what is the appropriate level of information to ask of data re-users when creating an account, and how this data will be internally stored. You might ask for organisation name and address, and official registration numbers; but ask whether this is needed, or if a simple email address will suffice. If you believe your data is truly sensitive, consider how you might introduce a two-factor authentication system.
Terms & Conditions and licensing
A specific set of terms and conditions will need to be created for users to sign up to, which expressly tells reusers how they are able to use the data set. This should be done being mindful that it can be easy to accidentally restrict legitimate and intended use.
Transparency
It will be important to proactively state what data is contained within an accessed controlled dataset. This will reduce the numbers of people looking to gain access but finding the data contained is not useful.
It will also be important to proactively state what terms and conditions and licensing constraints are placed upon the data, so data reusers can identify if this is a valuable source of information for them.
You cannot remove risks entirely
Data cannot be truly contained. Even if there are only a small number of individuals and organisations who have access, it is possible that individuals could share either the data or their log-in details with others, or that those with hostile intentions could find passwords. You might reduce the latter risk by encouraging the use of strong passwords.
Even with strict terms and conditions in place, you cannot prevent others from using the data in ways you do not anticipate. However, you may be able to take legal action if you become aware of this.