Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is mobile data collection with a cloud server.
Principles
A departure from the usual format of this feature to bring you more of a tutorial based outline of this method. Surveys are an important data collection tool for social scientists. Among the most commonly used surveys for health economists are the British Household Panel Survey and Health Survey for England. But we also interrogate patients in trials and follow-up to complete well-being and quality of life surveys like the EQ-5D. Sometimes, when there’s funding, we’ll even do our own large-scale household survey. Many studies use paper-based methods of data collection when doing these surveys, but this is more expensive than it’s digital equivalent and more error-prone.
Implementation
Rather than printing out survey forms then having to transcribe the results back into digital format, the surveys can be completed on a tablet or smartphone, uploaded to a central server, automatically error checked, and then saved in a ready to use format. However, setting up a system to do digital data collection can seem daunting and difficult. Most people do not know how to run a server or what software to use. In this post, we will describe the process of setting up a server using a cloud server provider and using open source survey software. Where other tutorials exist on individual steps we link to them, otherwise, the information is provided here.
Software
OpenDataKit (ODK) is a family of programs designed for the collection of data on Android devices using a flexible system for designing survey forms. The programs include ODK Collect, which is installed on phones or tablets to complete forms; ODK Aggregate, which collates responses on a server; and ODK Briefcase, which works on a computer and ‘pulls’ data from the server for use on the computer.
Using ODK, mobile data collection can be implemented using a cloud server in the following 7 steps.
1. Set up a cloud server
For this tutorial we’ll be using Digital Ocean, a cloud server provider that is easy to use, permits you to deploy as many ‘droplets’ (i.e. cloud servers) as you want, and enables you to specify your hardware requirements. ODK Aggregate will run on Amazon Web Services and Google App Engine as well and is, in fact, easier to deploy on these platforms. However, we’ve chosen to go with Digital Ocean to make sure we control where our data are being stored – a key issue to ensure we’re compliant with EU data protection regulations, especially the GDPR. Digital Ocean met all our needs for information security.
You will need an account with Digital Ocean with a credit card to pay for services. The server we use costs around $15 a month to run (or 2 cents an hour), but can be ‘destroyed’ when not in use and then rebooted when needed by storing a ‘snapshot’ before you destroy it. Once you are logged in, create a new droplet with Ubuntu 16.04.4 x64. For our purposes, 2 GB of memory and 2 vCPUs will be sufficient. You will receive an email with the root password of the droplet and IP address. More info can be found here.
To log into the server from a Windows computer, you can download and run Putty. From Linux based or Mac OS X you can use the ssh command in the terminal. We recommend you follow this tutorial to perform the initial server set up, i.e. creating a user and password.
ODK Aggregate requires Tomcat to run, so we will install that. We then provide an optional step to allow access to the server only over https: (i.e. encrypted internet connections). This provides an extra layer of security, however, ODK also features RSA-key encryption of forms when transmitting to the server, so https: can be avoided if required. You will require a registered domain name to use https:.
2. Install Tomcat 8
Once you’re logged into the server and using your new user (it’s better to avoid using the root account when possible), run the following code:
sudo apt-get update sudo apt-get install default-jdk sudo groupadd tomcat sudo useradd -s /bin/false -g tomcat -d /opt/tomcat tomcat cd /tmp
In the next line, you may want to replace the line of code with a link to a more recent version of Tomcat
wget http://mirror.ox.ac.uk/sites/rsync.apache.org/tomcat/tomcat-8/v8.5.27/bin/apache-tomcat-8.5.27.tar.gz -O apache-tomcat-8.5.24.tar.gz sudo mkdir /opt/tomcat sudo tar xzvf apache-tomcat-8*tar.gz -C /opt/tomcat --strip-components=1 cd /opt/tomcat sudo chgrp -R tomcat /opt/tomcat sudo chmod -R g+r conf sudo chmod g+x conf sudo chown -R tomcat webapps/ work/ temp/ logs/
Now we’re going to open up a file and edit the text using the nano text editor:
sudo nano /etc/systemd/system/tomcat.service
Then copy and paste the following:
[Unit] Description=Apache Tomcat Web Application Container After=network.target [Service] Type=forking Environment=JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64/jre Environment=CATALINA_PID=/opt/tomcat/temp/tomcat.pid Environment=CATALINA_HOME=/opt/tomcat Environment=CATALINA_BASE=/opt/tomcat Environment='CATALINA_OPTS=-Xms512M -Xmx1024M -server -XX:+UseParallelGC' Environment='JAVA_OPTS=-Djava.awt.headless=true -Djava.security.egd=file:/dev/./urandom' ExecStart=/opt/tomcat/bin/startup.sh ExecStop=/opt/tomcat/bin/shutdown.sh User=tomcat Group=tomcat UMask=0007 RestartSec=10 Restart=always [Install] WantedBy=multi-user.target
Then save and close (Ctrl+x then Y then Enter). Going on,
sudo systemctl daemon-reload sudo systemctl start tomcat sudo systemctl status tomcat sudo ufw allow 8080 sudo systemctl enable tomcat
Again, we’re going to open a text file, this time to change the username and password to log in:
sudo nano /opt/tomcat/conf/tomcat-users.xml
Now we need to set the username and password. Here we’ve used the username ‘admin’, the password should be changed to something strong and memorable. The username and password are changed in this block:
<tomcat-users . . .> <user username="admin" password="password" roles="manager-gui,admin-gui"/> </tomcat-users>
Now, we need to comment out two blocks of text in two different files. The block of text is
<Valve className="org.apache.catalina.valves.RemoteAddrValve" allow="127\.\d+\.\d+\.\d+|::1|0:0:0:0:0:0:0:1" />
and the two files can be accessed respectively with
sudo nano /opt/tomcat/webapps/manager/META-INF/context.xml sudo nano /opt/tomcat/webapps/host-manager/META-INF/context.xml
then restart Tomcat
sudo systemctl restart tomcat
3. Install an SSL certificate (optional)
Skip this step if you don’t want to use an SSL certificate. If you do, you will need a domain name (e.g. www.aheblog.com), and you will need to point that domain name to the IP address of your droplet. Follow these instructions to do that. It is possible to self-sign an SSL certificate and so not need a domain name, however, this will not work with ODK Collect as the certificate will not be trusted by Android. SSL certificates are issued by trusted authorities, a service for which they typically charge. However, Let’s Encrypt does it for free. To use it we need to install and use certbot:
sudo apt-get update sudo apt-get install software-properties-common sudo add-apt-repository ppa:certbot/certbot sudo apt-get update sudo apt-get install certbot
Now, use to get the certificates for your domain:
sudo cerbot certonly
Completing all the questions that are prompted – you want a certificate for a ‘standalone’.
Now, we need to convert the certificate files into the Java Key Store format so that it can be used by Tomcat. We will need to do this as the root user (replace the domain name and passwords as appropriate):
su - root cd /etc/letsencrypt/www.domainnamehere.com openssl pkcs12 -export -in fullchain.pem -inkey privkey.pem -out fullchain_and_key.p12 -name tomcat keytool -importkeystore -deststorepass PASSWORD -destkeypass PASSWORD -destkeystore MyDSKeyStore.jks -srckeystore fullchain_and_key.p12 -srcstoretype PKCS12 -srcstorepass PASSWORD -alias tomcat mkdir /opt/tomcat/ssl cp MyDSKeyStore.jks /opt/tomcat/ssl/
We can then switch back to our user account
su - USER
Open the following file
sudo nano /opt/tomcat/conf/server.xml
and replace the text with in the document where you see
<!-- Define a SSL Coyote HTTP/1.1 Connector on port 8443 -->
making sure to input the correct password:
<Connector port="8443" protocol="org.apache.coyote.http11.Http11NioProtocol" maxThreads="150" SSLEnabled="true" scheme="https" secure="true" clientAuth="false" sslProtocol="TLS" keystoreFile="/opt/tomcat/ssl/MyDSKeyStore.jks" keystoreType="JKS" keystorePass="PASSWORD"/>
Exit the file (Ctrl+X, Y, and Enter) and restart the service
sudo systemctl restart tomcat
Now, we want to block unsecured (HTTP) connections and allow only encrypted (HTTPS) connections:
sudo ufw delete allow 8080 sudo ufw allow 8443
If you use an SSL certificate, it will need to be renewed every 90 days. This is an automatic process, however, converting to JKS and saving into the Tomcat directory is not. You can either do it manually each time or write a Bash script to do it.
4. Install ODK Aggregate
We’ll firstly install the database software that will hold the collected data, we’ll use PostgreSQL (you can always use MySQL as well). The following code will install the right packages:
sudo apt-get update sudo apt-get install postgresql-9.6
Now we are going to install ODK Aggregate. This is straightforward as the ODK Aggregate code provides a walkthrough. It is important to select the correct options during this process, which should be obvious from the above tutorial. In the first line of the following code, you may need to replace the link in the first command if the download does not work.
sudo wget https://opendatakit.org/download/4456 -O /tmp/linux-x64-installer.run sudo chmod +x /tmp/linux-x64-installer.run sudo /tmp/linux-x64-installer.run
then follow the instructions. Now, we are going to connect to the database to ODK Aggregate:
sudo -u postgres psql \cd '/ODK/ODK Aggregate' \i create_db_and_user.sql \q
Now copy the ODK installation to the Tomcat directory
sudo cp /ODK/ODK\ Aggregate/ODKAggregate.war /opt/tomcat/webapps
And the final step is to restart Tomcat
sudo systemctl restart tomcat
To test whether the installation has been successful go to (replacing the URL as appropriate):
https://www.domainnamehere.com:8443/ODKAggregate
or if you do not have a domain name use:
http://<IP address>:8443/ODKAggregate
Use this URL for accessing ODK Aggregate.
To log on, use the ‘admin’ account and the password ‘aggregate’. Once you are logged in you can change the admin account password and create new user accounts as required.
5. Programme a survey
Surveys can be complex with logical structures that skip questions for certain responses, require different type of responses like numbers, text, or multiple choice, and can need signatures or images. Multiple languages are often needed. All of this is possible in ODK as well. To programme your survey into a format that ODK can use, you can use XLSform, a standard for authoring forms in Excel. The website has a good tutorial. It is important to try to learn as much as possible as it is very flexible. A few key things and tips to note:
- Skip and logical sequences are managed with the ‘Relevant’ column;
- If you want to be able to skip big groups of questions, you should use ‘groups’;
- If the same question is to be asked multiple (but an unknown number of) times you should use ‘Repeats’. Note that when you download the data the multiple responses from repeat type questions will be grouped in their own .csv files rather than with the rest of the survey;
- Encryption of a survey is managed by putting an RSA public key in the Settings tab;
- There are multiple different question types, learn them!;
- You can use the response from one question as text in another question, for example, someone’s name – using ${} syntax to refer to questions as with the ‘Relevant’ column;
- You can automatically collect default data like start and end time to check there’s been no cheating by data collectors.
Once your form is complete, you can convert it here, it will notify you of any errors. Once you have a .xml file ready to go, you can upload it in the ‘Form Management’ section of the ODK Aggregate interface.
6. Use ODK Collect
ODK Collect can be downloaded from the Google Play store on any Android device. It is easy to use and a number of training guides exist. You will firstly need to link the app to the server by going to General Settings -> Server and inputting the URL that directs to your ODK Aggregate interface. The form you uploaded to your server can be downloaded with ‘Get Blank Form’ in the main menu and then data collected with it by selecting ‘Fill Blank Form’. Swiping left or right moves between questions. You can save at any point and come back to the survey at a later time. There are options you may also want to consider such as ‘Auto-finalising’, which means that once a survey is complete it is no longer accessible on the device, and ‘Auto send’ which will automatically send the data to the server when the form is finalised and there’s an internet connection. What you choose depends on your information security requirements.
7. Download the data
Submissions to the ODK Aggregate server need to be downloaded to a computer in order to be decrypted and the data processed and analysed. Exporting files from the server requires a number of pieces of software on the computer to which it is being downloaded:
- Java 8. Update to the latest version of Java or install it if it is not already on the computer. Java can be downloaded here.
- Unlimited Strength JCE Policy Files. These files are necessary if you are using encrypted forms and can be downloaded from here. To install the files, extract the contents of the compressed file, and copy and paste the files in the folder to the following location [java-home]/lib/security/policy/unlimited.
- ODK Briefcase. This can be downloaded from the OpenDataKit website.
When you first launch ODK Briefcase the first thing you must do is choose a storage location where it will save all data downloads. Once you have done this go to the ‘Pull’ tab to download the surveys. Click ‘Connect…’ to input the URL of the ODK Aggregate instance. Select which forms you will need to download data for and click ‘Pull’ in the lower right-hand side of the window.
To download data submissions go to the ‘Export’ tab. The available forms are listed in this window. By the form for which you want to download submissions, input a storage location for the data. In the next row, select the location of the private RSA key if you are using encrypted forms, which must be in .pem format. Many programs will generate .pem keys, but some provide the keys as strings of text that will be saved as text files. If the text begins with ‘— BEGIN RSA PRIVATE KEY —‘, then simply change the file type to .pem.
Applications
Now you should be good to go. Potential applications are extensive. The ODK system has been employed in evaluative studies (of performance-based financing, for example), to conduct discrete choice experiments, or for more general surveillance of health service use and outcomes.
There are ways of extending these tools further, such as collecting GPS and location and map data through Open Map Kit, which links to Open Street Map. There are also private companies who use OpenDataKit-based products to offer data collection and management services like SurveyCTO. However, we have found a key part of complying with data protection rules involves knowing exactly where data will be stored and having complete control over accessing it, which many services cannot offer. The flexibility of managing your own server permits more control, for example, you can write scripts to check for data submissions and to process them and upload them to another server or you can host other data collection tools when you need. Many universities or institutions may provide these services ‘in-house’, but if they do not support the software it can be difficult using company servers. A cloud server provider gives us an alternative solution that can be up and running in an hour.
Credit