Skip to main content

Open edX Complete Backup Solution


Learn how to create a complete backup solution for your Open edX installation. This detailed step-by-step how-to guide covers backing up MySQL and MongoDB, organizing backup data into a single date-stamped tarball zip file, plus how to setup a cron job and how to copy your backups to an AWS S3 storage bucket.

Summary

The official Open edX documentation takes a laissez faire approach to many aspects of administration and support, including for example, how to properly backup and restore course and user data. This article attempts to fill that void. Implementing an effective backup solution for Open edX requires proficiency in a number of technologies, which is fine if you’re part of a full IT team at a major university, but can this can otherwise be a real obstacle to competently supporting your Open edX platform.
Open edX stores course data, including media uploads such as images and mp4 video files in MongoDB. To do this, MongoDB’s core functionality is extended with a technology called GridFS that provides an infinitely scalable file system for all course-related data. For student data Open edX takes a more relational approach with MySQL, noting however that the Open edX platform relies on several databases (see right-hand diagram). These are excellent architectural choices and both technologies are best-of-breed and getting better all the time. Nonetheless, having two entirely different persisting strategies under the hood really complicates simple IT management responsibilities like data backups.
We’re going to setup an automated daily backup procedure that backs up the complete contents of the MongoDB course database (including file/document, media and image uploads) and each individual MySQL database that contains learner user data. We’ll create a Bash script that combines these files into a single date-stamped linux tarball and then pushes this to an AWS S3 bucket for long-term remote storage.

Assumptions

  • Your Open edX instance is running from an AWS account
  • Your AWS EC2 instance is running on an Ubuntu 16.04 LTS server built from the Amazon Linux AMI
  • You have SSH access to your EC2 instance and sudo capability
  • You have permissions to create AWS IAM users and S3 resources
  • Your Open edX instance is substantially based on the guidelines published here: Native Open edX Ubuntu 16.04 64 bit Installation

1. Create credentials for AWS CLI

AWS provides an integrated security management system called Identity and Access Management (IAM) that we are going to use in combination with AWS’ Command Line Interface (CLI) to give us a way to copy files from our local Ubuntu file system to an S3 bucket in our AWS account. If you’re new to either topic then it behooves you to follow these two links to learn the basics of both topics.
In this step, we’ll create a new IAM user with full access to S3 resources.
In the next step we’ll permanently store our “Access Key ID” and “Secret key ID” values in the AWS CLI configuration file on our Ubuntu server instance so that we’ll be able to seamlessly make calls to AWS S3 from the command line, and importantly, from a Bash script.

2. Install AWS CLI (Command Line Interface)

Next, lets install AWS’ command line tools. The version of Ubuntu that you should be using has Pip preloaded. We’ll use this to install AWS CLI.
$ pip install awscli --upgrade --user
after installing run the AWS “configure” program to permanently store your IAM user credentials
$ aws configure
The “configure” program adds a hidden directory to your home folder named .aws that contains a file named “config” and another named “credentials”; both of which you can edit. So just fyi, you can change your configuration values in the future either by re-running aws configure or by editing these two files.
If you need to troubleshoot, or heck, if you just want to know more then you can read AWS’ CLI installation instructions.

3. Create An AWS S3 Bucket

Amazon S3 is object storage built to store and retrieve any amount of data from anywhere – web sites and mobile apps, corporate applications, and data from IoT sensors or devices. It is designed to deliver 99.999999999% durability, and stores data for millions of applications used by market leaders in every industry. We’ll use S3 to archive our backup files. In my view S3 is a very stripped down file system that has been highly optimized for its specific use case.
Creating an S3 bucket is straightforward, and AWS’ step-by-step instructions are very good. There are no special configuration requirements for our use case as a permanent file archive, so feel free to create your bucket however you like. You’ll reference your new S3 bucket in the next step.

4. Test AWS C3 sync command

Let’s do a simple test to ensure that a) AWS CLI is correctly installed and configured and b) your AWS S3 bucket is accessible from the command line. We’ll create a small test file in your home folder, then call the AWS CLI “sync” command to copy it to your bucket.
cd ~
echo "hello world" > test.file
aws s3 sync test.file s3://[THE NAME OF YOUR BUCKET]
If successful then you’ll find a copy of your file inside your AWS S3 bucket. Otherwise, please leave comments describing your particular situation so that I can continue to improve this how-to article.

5. Test connectivity to MySQL

In this step we want to verify that you can connect to your MySQL server from the command line. If you are running an Open edX instance based on the Native Open edX Ubuntu 16.04 64 bit Installation then your MySQL server is running on your EC2 instance and is therefore accessible as “localhost”, which incidentally is the default host value. Additionally, the password value for the root MySQL account will not have been set unless you yourself did this afterward. If this is all true then you should be able to connect to MySQL with the following command
mysql -u root
If successful you’ll see a screen substantially like the following. Type ‘exit’ and enter to logout of MySQL. Please leave me a comment if you have trouble logging in so that I can improve this article.

6. Test Connectivity to MongoDB

To connect to MongoDB you’ll need your admin password. The third step of the Open edX native build installation script creates a plain text file in your home folder named my-passwords.yml that contains a list of all unique password values that were automagically created for you during the installation. Your MongoDB admin password is located in this file on approximately row 39 and identified as MONGO_ADMIN_PASSWORD.
Next, we’ll create a small Bash script to test whether we can connect to MongoDB, as follows
#!/bin/bash
for db in edxapp cs_comment_service_development; do
    echo "Dumping Mongo db ${db}..."
    mongodump -u admin -p'ADD YOUR PASSWORD HERE' -h localhost --authenticationDatabase admin -d ${db} --out mongo-dump
done
Save this in a file named ~/mongo-test.sh, then run the following commands to execute your test
sudo chmod 755 ~/mongo-test.sh  #This makes the file executable
bash ~/mongo-test.sh   #execute the Bash script
If successful you’ll see output approximately like the following

7. Create Bash script

In this step we’ll bring everything together into a single Bash script that does the following
  • Creates a temporary working folder
  • Backs up schemas and data for each database that is part of the Open edX platform. Importantly, we want to exclude system database in order to avoid data restore problems
  • Backs up MongoDB course data and Discussion Forum data
  • Combines both sets of data into a tarball
  • Copies the tarball to our AWS S3 bucket
  • Prunes our AWS S3 bucket of old backup files
  • Cleans up and removes our temporary working folder
The complete script is available here on Github, or if you’re in a hurry just copy/paste the snippet below.
Credit for the substance of this script goes to Net Batchelder.
#!/bin/bash
#---------------------------------------------------------
# usage:      backup MySQL and MongoDB data stores
#             combine into a single tarball, store in "backups" folders in user directory
#
# reference:  https://github.com/edx/edx-documentation/blob/master/en_us/install_operations/source/platform_releases/ginkgo.rst
#---------------------------------------------------------

BACKUPS_DIRECTORY="/home/ubuntu/backups/"
WORKING_DIRECTORY="/home/ubuntu/backup-tmp/"

if [ ! -d "$WORKING_DIRECTORY" ]; then
    mkdir "$WORKING_DIRECTORY"
    echo "created backup working folder ${WORKING_DIRECTORY}"
fi

if [ -f "$WORKING_DIRECTORY/*" ]; then
  sudo rm -r "$WORKING_DIRECTORY/*"
fi

cd "$WORKING_DIRECTORY"

#Backup MySQL databases
MYSQL_CONN="-uroot"
echo "Reading MySQL database names..."
mysql ${MYSQL_CONN} -ANe "SELECT schema_name FROM information_schema.schemata WHERE schema_name NOT IN ('mysql','information_schema','performance_schema')" > /tmp/db.txt
DBS="--databases $(cat /tmp/db.txt)"
NOW="$(date +%Y%m%dT%H%M%S)"
SQL_FILE="mysql-data-${NOW}.sql"
echo "Dumping MySQL structures..."
mysqldump ${MYSQL_CONN} --add-drop-database --no-data ${DBS} > ${SQL_FILE}
echo "Dumping MySQL data..."
# If there is table data you don't need, add --ignore-table=tablename
mysqldump ${MYSQL_CONN} --no-create-info ${DBS} >> ${SQL_FILE}
echo "Done backing up MySQL"

#Backup Mongo
for db in edxapp cs_comment_service_development; do
    echo "Dumping Mongo db ${db}..."
    mongodump -u admin -p'YOUR PASSWORD HERE' -h localhost --authenticationDatabase admin -d ${db} --out mongo-dump-${NOW}
done
echo "Done backing up MongoDB"

#Check to see if a backups/ folder exists. if not, create it.
if [ ! -d "$BACKUPS_DIRECTORY" ]; then
    mkdir "$BACKUPS_DIRECTORY"
    echo "created backups folder ${BACKUPS_DIRECTORY}"
fi

#Tarball all of our backup files
tar -czf /home/ubuntu/backups/openedx-data-${NOW}.tgz ${SQL_FILE} mongo-dump-${NOW}
sudo chown ubuntu /home/ubuntu/backups/openedx-data-${NOW}.tgz
sudo chgrp ubuntu /home/ubuntu/backups/openedx-data-${NOW}.tgz
echo "Created tarball of backup data openedx-data-${NOW}.tgz"

#Prune the Backups/ folder by eliminating all but the 30 most recent tarball files
if [ -d "$BACKUPS_DIRECTORY" ]; then
  cd "$BACKUPS_DIRECTORY"
  ls -1tr | head -n -15 | xargs -d '\n' rm -f --
fi

#Remove the working folder
echo "Cleaning up"
sudo rm -r "$WORKING_DIRECTORY"

echo "Sync backup to AWS S3 backup folder"
aws s3 sync /home/ubuntu/backups s3://[ADD YOUR BUCKET NAME]
echo "Done!"

8. Create a Cron job

Lets setup a cron job that executes our new backup script once a day at say, 6:00am GMT.
crontab -e
Then add this row to your cron table
# m h  dom mon dow   command
0 6 * * * sudo ./edx.backup.sh > edx.backup.out
And now you’re set! Your Open edX site is getting completely backed up every day and stored offset in an AWS S3 bucket.

Popular posts from this blog

How to read or extract text data from passport using python utility.

Hi ,  Lets get start with some utility which can be really helpful in extracting the text data from passport documents which can be images, pdf.  So instead of jumping to code directly lets understand the MRZ, & how it works basically. MRZ Parser :                 A machine-readable passport (MRP) is a machine-readable travel document (MRTD) with the data on the identity page encoded in optical character recognition format Most travel passports worldwide are MRPs.  It can have 2 lines or 3 lines of machine-readable data. This method allows to process MRZ written in accordance with ICAO Document 9303 (endorsed by the International Organization for Standardization and the International Electrotechnical Commission as ISO/IEC 7501-1)). Some applications will need to be able to scan such data of someway, so one of the easiest methods is to recognize it from an image file. I 'll show you how to retrieve the MRZ infor...

How to generate class diagrams pictures in a Django/Open-edX project from console

A class diagram in the Unified Modeling Language ( UML ) is a type of static structure diagram that describes the structure of a system by showing the system’s classes, their attributes, operations (or methods), and the relationships among objects. https://github.com/django-extensions/django-extensions Step 1:   Install django extensions Command:  pip install django-extensions Step 2:  Add to installed apps INSTALLED_APPS = ( ... 'django_extensions' , ... ) Step 3:  Install diagrams generators You have to choose between two diagram generators: Graphviz or Dotplus before using the command or you will get: python manage.py graph_models -a -o myapp_models.png Note:  I prefer to use   pydotplus   as it easier to install than Graphviz and its dependencies so we use   pip install pydotplus . Command:  pip install pydotplus Step 4:  Generate diagrams Now we have everything installed...

How to Remove course from Open-edX

Go to vagrant  => 1. In the edx-platform directory:  - cd /edx/app/edxapp/edx-platform 2. Run the following Django management command:   - sudo -u www-data /edx/bin/python.edxapp /edx/bin/manage.edxapp lms dump_course_ids --settings aws    - sudo -u www-data /edx/bin/python.edxapp /edx/bin/manage.edxapp lms dump_course_ids --settings=devstack 3. Find the course ID which you'd like to delete in the resulting list of course IDs. 4. Copy the course ID into the following command and run it:  - sudo -u www-data /edx/bin/python.edxapp /edx/bin/manage.edxapp cms delete_course <COURSE_ID> --settings aws  -   sudo -u www-data /edx/bin/python.edxapp /edx/bin/manage.edxapp cms delete_course <COURSE_ID> --settings=devstack  - You'll be asked to verify the deletion . To verify the deletion, run the command from step 2 above and ensure that the course ID is not in the list. Help reference : ...