Method of the month: Mobile data collection with a cloud server

Once a month we discuss a particular research method that may be of interest to people working in health economics. We’ll consider widely used key methodologies, as well as more novel approaches. Our reviews are not designed to be comprehensive but provide an introduction to the method, its underlying principles, some applied examples, and where to find out more. If you’d like to write a post for this series, get in touch. This month’s method is mobile data collection with a cloud server.

Principles

A departure from the usual format of this feature to bring you more of a tutorial based outline of this method. Surveys are an important data collection tool for social scientists. Among the most commonly used surveys for health economists are the British Household Panel Survey and Health Survey for England. But we also interrogate patients in trials and follow-up to complete well-being and quality of life surveys like the EQ-5D. Sometimes, when there’s funding, we’ll even do our own large-scale household survey. Many studies use paper-based methods of data collection when doing these surveys, but this is more expensive than it’s digital equivalent and more error-prone.

Implementation

Rather than printing out survey forms then having to transcribe the results back into digital format, the surveys can be completed on a tablet or smartphone, uploaded to a central server, automatically error checked, and then saved in a ready to use format. However, setting up a system to do digital data collection can seem daunting and difficult. Most people do not know how to run a server or what software to use. In this post, we will describe the process of setting up a server using a cloud server provider and using open source survey software. Where other tutorials exist on individual steps we link to them, otherwise, the information is provided here.

Software

OpenDataKit (ODK) is a family of programs designed for the collection of data on Android devices using a flexible system for designing survey forms. The programs include ODK Collect, which is installed on phones or tablets to complete forms; ODK Aggregate, which collates responses on a server; and ODK Briefcase, which works on a computer and ‘pulls’ data from the server for use on the computer.

Using ODK, mobile data collection can be implemented using a cloud server in the following 7 steps.

1. Set up a cloud server

For this tutorial we’ll be using Digital Ocean, a cloud server provider that is easy to use, permits you to deploy as many ‘droplets’ (i.e. cloud servers) as you want, and enables you to specify your hardware requirements. ODK Aggregate will run on Amazon Web Services and Google App Engine as well and is, in fact, easier to deploy on these platforms. However, we’ve chosen to go with Digital Ocean to make sure we control where our data are being stored – a key issue to ensure we’re compliant with EU data protection regulations, especially the GDPR. Digital Ocean met all our needs for information security.

You will need an account with Digital Ocean with a credit card to pay for services. The server we use costs around $15 a month to run (or 2 cents an hour), but can be ‘destroyed’ when not in use and then rebooted when needed by storing a ‘snapshot’ before you destroy it. Once you are logged in, create a new droplet with Ubuntu 16.04.4 x64. For our purposes, 2 GB of memory and 2 vCPUs will be sufficient. You will receive an email with the root password of the droplet and IP address. More info can be found here.

To log into the server from a Windows computer, you can download and run Putty. From Linux based or Mac OS X you can use the ssh command in the terminal. We recommend you follow this tutorial to perform the initial server set up, i.e. creating a user and password.

ODK Aggregate requires Tomcat to run, so we will install that. We then provide an optional step to allow access to the server only over https: (i.e. encrypted internet connections). This provides an extra layer of security, however, ODK also features RSA-key encryption of forms when transmitting to the server, so https: can be avoided if required. You will require a registered domain name to use https:.

2. Install Tomcat 8

Once you’re logged into the server and using your new user (it’s better to avoid using the root account when possible), run the following code:

sudo apt-get update
sudo apt-get install default-jdk
sudo groupadd tomcat
sudo useradd -s /bin/false -g tomcat -d /opt/tomcat tomcat
cd /tmp

In the next line, you may want to replace the line of code with a link to a more recent version of Tomcat

wget http://mirror.ox.ac.uk/sites/rsync.apache.org/tomcat/tomcat-8/v8.5.27/bin/apache-tomcat-8.5.27.tar.gz -O apache-tomcat-8.5.24.tar.gz
sudo mkdir /opt/tomcat
sudo tar xzvf apache-tomcat-8*tar.gz -C /opt/tomcat --strip-components=1
cd /opt/tomcat
sudo chgrp -R tomcat /opt/tomcat
sudo chmod -R g+r conf
sudo chmod g+x conf
sudo chown -R tomcat webapps/ work/ temp/ logs/

Now we’re going to open up a file and edit the text using the nano text editor:

sudo nano /etc/systemd/system/tomcat.service

Then copy and paste the following:

[Unit]
Description=Apache Tomcat Web Application Container
After=network.target

[Service]
Type=forking

Environment=JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-amd64/jre
Environment=CATALINA_PID=/opt/tomcat/temp/tomcat.pid
Environment=CATALINA_HOME=/opt/tomcat
Environment=CATALINA_BASE=/opt/tomcat
Environment='CATALINA_OPTS=-Xms512M -Xmx1024M -server -XX:+UseParallelGC'
Environment='JAVA_OPTS=-Djava.awt.headless=true -Djava.security.egd=file:/dev/./urandom'

ExecStart=/opt/tomcat/bin/startup.sh
ExecStop=/opt/tomcat/bin/shutdown.sh

User=tomcat
Group=tomcat
UMask=0007
RestartSec=10
Restart=always

[Install]
WantedBy=multi-user.target

Then save and close (Ctrl+x then Y then Enter). Going on,

sudo systemctl daemon-reload
sudo systemctl start tomcat
sudo systemctl status tomcat
sudo ufw allow 8080
sudo systemctl enable tomcat

Again, we’re going to open a text file, this time to change the username and password to log in:

sudo nano /opt/tomcat/conf/tomcat-users.xml

Now we need to set the username and password. Here we’ve used the username ‘admin’, the password should be changed to something strong and memorable. The username and password are changed in this block:

<tomcat-users . . .>
 <user username="admin" password="password" roles="manager-gui,admin-gui"/>
</tomcat-users>

Now, we need to comment out two blocks of text in two different files. The block of text is

 <Valve className="org.apache.catalina.valves.RemoteAddrValve"
 allow="127\.\d+\.\d+\.\d+|::1|0:0:0:0:0:0:0:1" />

and the two files can be accessed respectively with

sudo nano /opt/tomcat/webapps/manager/META-INF/context.xml
sudo nano /opt/tomcat/webapps/host-manager/META-INF/context.xml

then restart Tomcat

sudo systemctl restart tomcat

3. Install an SSL certificate (optional)

Skip this step if you don’t want to use an SSL certificate. If you do, you will need a domain name (e.g. www.aheblog.com), and you will need to point that domain name to the IP address of your droplet. Follow these instructions to do that. It is possible to self-sign an SSL certificate and so not need a domain name, however, this will not work with ODK Collect as the certificate will not be trusted by Android. SSL certificates are issued by trusted authorities, a service for which they typically charge. However, Let’s Encrypt does it for free. To use it we need to install and use certbot:

sudo apt-get update
sudo apt-get install software-properties-common
sudo add-apt-repository ppa:certbot/certbot
sudo apt-get update
sudo apt-get install certbot

Now, use  to get the certificates for your domain:

sudo cerbot certonly

Completing all the questions that are prompted – you want a certificate for a ‘standalone’.

Now, we need to convert the certificate files into the Java Key Store format so that it can be used by Tomcat. We will need to do this as the root user (replace the domain name and passwords as appropriate):

su - root
cd /etc/letsencrypt/www.domainnamehere.com

openssl pkcs12 -export -in fullchain.pem -inkey privkey.pem -out fullchain_and_key.p12 -name tomcat keytool -importkeystore -deststorepass PASSWORD  -destkeypass PASSWORD -destkeystore MyDSKeyStore.jks -srckeystore fullchain_and_key.p12 -srcstoretype PKCS12 -srcstorepass PASSWORD -alias tomcat

mkdir /opt/tomcat/ssl
cp MyDSKeyStore.jks /opt/tomcat/ssl/

We can then switch back to our user account

su - USER

Open the following file

sudo nano /opt/tomcat/conf/server.xml

and replace the text with in the document where you see

<!-- Define a SSL Coyote HTTP/1.1 Connector on port 8443 -->

making sure to input the correct password:

<Connector port="8443" protocol="org.apache.coyote.http11.Http11NioProtocol"
maxThreads="150" SSLEnabled="true" scheme="https" secure="true"
clientAuth="false" sslProtocol="TLS" keystoreFile="/opt/tomcat/ssl/MyDSKeyStore.jks" 
keystoreType="JKS" keystorePass="PASSWORD"/>

Exit the file (Ctrl+X, Y, and Enter) and restart the service

sudo systemctl restart tomcat

Now, we want to block unsecured (HTTP) connections and allow only encrypted (HTTPS) connections:

sudo ufw delete allow 8080
sudo ufw allow 8443

If you use an SSL certificate, it will need to be renewed every 90 days. This is an automatic process, however, converting to JKS and saving into the Tomcat directory is not. You can either do it manually each time or write a Bash script to do it.

4. Install ODK Aggregate

We’ll firstly install the database software that will hold the collected data, we’ll use PostgreSQL (you can always use MySQL as well). The following code will install the right packages:

sudo apt-get update 
sudo apt-get install postgresql-9.6

Now we are going to install ODK Aggregate. This is straightforward as the ODK Aggregate code provides a walkthrough. It is important to select the correct options during this process, which should be obvious from the above tutorial. In the first line of the following code, you may need to replace the link in the first command if the download does not work.

sudo wget https://opendatakit.org/download/4456 -O /tmp/linux-x64-installer.run
sudo chmod +x /tmp/linux-x64-installer.run
sudo /tmp/linux-x64-installer.run

then follow the instructions. Now, we are going to connect to the database to ODK Aggregate:

sudo -u postgres psql 
\cd '/ODK/ODK Aggregate'
\i create_db_and_user.sql
\q

Now copy the ODK installation to the Tomcat directory

sudo cp /ODK/ODK\ Aggregate/ODKAggregate.war /opt/tomcat/webapps

And the final step is to restart Tomcat

sudo systemctl restart tomcat

To test whether the installation has been successful go to (replacing the URL as appropriate):

https://www.domainnamehere.com:8443/ODKAggregate

or if you do not have a domain name use:

http://<IP address>:8443/ODKAggregate

Use this URL for accessing ODK Aggregate.

To log on, use the ‘admin’ account and the password ‘aggregate’. Once you are logged in you can change the admin account password and create new user accounts as required.

5. Programme a survey

Surveys can be complex with logical structures that skip questions for certain responses, require different type of responses like numbers, text, or multiple choice, and can need signatures or images. Multiple languages are often needed. All of this is possible in ODK as well. To programme your survey into a format that ODK can use, you can use XLSform, a standard for authoring forms in Excel. The website has a good tutorial. It is important to try to learn as much as possible as it is very flexible. A few key things and tips to note:

  • Skip and logical sequences are managed with the ‘Relevant’ column;
  • If you want to be able to skip big groups of questions, you should use ‘groups’;
  • If the same question is to be asked multiple (but an unknown number of) times you should use ‘Repeats’. Note that when you download the data the multiple responses from repeat type questions will be grouped in their own .csv files rather than with the rest of the survey;
  • Encryption of a survey is managed by putting an RSA public key in the Settings tab;
  • There are multiple different question types, learn them!;
  • You can use the response from one question as text in another question, for example, someone’s name – using ${} syntax to refer to questions as with the ‘Relevant’ column;
  • You can automatically collect default data like start and end time to check there’s been no cheating by data collectors.

Once your form is complete, you can convert it here, it will notify you of any errors. Once you have a .xml file ready to go, you can upload it in the ‘Form Management’ section of the ODK Aggregate interface.

6. Use ODK Collect

ODK Collect can be downloaded from the Google Play store on any Android device. It is easy to use and a number of training guides exist. You will firstly need to link the app to the server by going to General Settings -> Server and inputting the URL that directs to your ODK Aggregate interface. The form you uploaded to your server can be downloaded with ‘Get Blank Form’ in the main menu and then data collected with it by selecting ‘Fill Blank Form’. Swiping left or right moves between questions. You can save at any point and come back to the survey at a later time. There are options you may also want to consider such as ‘Auto-finalising’, which means that once a survey is complete it is no longer accessible on the device, and ‘Auto send’ which will automatically send the data to the server when the form is finalised and there’s an internet connection. What you choose depends on your information security requirements.

7. Download the data

Submissions to the ODK Aggregate server need to be downloaded to a computer in order to be decrypted and the data processed and analysed. Exporting files from the server requires a number of pieces of software on the computer to which it is being downloaded:

  • Java 8. Update to the latest version of Java or install it if it is not already on the computer. Java can be downloaded here.
  • Unlimited Strength JCE Policy Files. These files are necessary if you are using encrypted forms and can be downloaded from here. To install the files, extract the contents of the compressed file, and copy and paste the files in the folder to the following location [java-home]/lib/security/policy/unlimited.
  • ODK Briefcase. This can be downloaded from the OpenDataKit website.

When you first launch ODK Briefcase the first thing you must do is choose a storage location where it will save all data downloads. Once you have done this go to the ‘Pull’ tab to download the surveys. Click ‘Connect…’ to input the URL of the ODK Aggregate instance. Select which forms you will need to download data for and click ‘Pull’ in the lower right-hand side of the window.

To download data submissions go to the ‘Export’ tab. The available forms are listed in this window. By the form for which you want to download submissions, input a storage location for the data. In the next row, select the location of the private RSA key if you are using encrypted forms, which must be in .pem format. Many programs will generate .pem keys, but some provide the keys as strings of text that will be saved as text files. If the text begins with ‘— BEGIN RSA PRIVATE KEY —‘, then simply change the file type to .pem.

Applications

Now you should be good to go. Potential applications are extensive. The ODK system has been employed in evaluative studies (of performance-based financing, for example), to conduct discrete choice experiments, or for more general surveillance of health service use and outcomes.

There are ways of extending these tools further, such as collecting GPS and location and map data through Open Map Kit, which links to Open Street Map. There are also private companies who use OpenDataKit-based products to offer data collection and management services like SurveyCTO. However, we have found a key part of complying with data protection rules involves knowing exactly where data will be stored and having complete control over accessing it, which many services cannot offer. The flexibility of managing your own server permits more control, for example, you can write scripts to check for data submissions and to process them and upload them to another server or you can host other data collection tools when you need. Many universities or institutions may provide these services ‘in-house’, but if they do not support the software it can be difficult using company servers. A cloud server provider gives us an alternative solution that can be up and running in an hour.

Credit

Author

  • Health economics, statistics, and health services research at the University of Warwick. Also like rock climbing and making noise on the guitar.

Join the discussion

This site uses Akismet to reduce spam. Learn how your comment data is processed.