CS-195, Spring 2015
Preliminary Proposal: Automated Backup Regime
John Mason Neal
In today's swiftly changing environment, people expect to have a reliable and intuitive way to backup their data. This proposal will attempt to offer a solution to that desire, while leveraging the full advantage of the cloud.
A backup solution with the power of R-sync, and the capabilities of Google Drive would be quite a powerful tool. A user could specify directories to be monitored and kept backed up, while simultaneously having their backup images saved and compressed in one or more cloud-based storage services (AWS, Google Drive, Dropbox, Box, etc.) for redundancy purposes. The user would have the option of encrypting their data, so as to ensure the security of packaged backups. A user could even schedule the program to take a full disk image snapshot of his/her machine every other week, just in case.
When the time comes to restore a previous backup, or choose what needs to be restored, a user should be able to navigate through his/her backups and choose which package to unpack and restore on their local machine. This would all be accomplished via a seamless front-end user-friendly UI with the option of using a provided CLI.
The first challenges to be aware of are cross-platform performances; not everyone will have ruby, python, or cloud-dev API's installed on their machines. Consideration of packaging dependency programs like R-sync into the application will need to be thought out.
This automated backup regime will allow users to seamlessly make encrypted and compressed backups of their (Unix) system via dependencies wrapped and packaged with this application. This system will leverage the flexibility and scalability of cloud services like AWS, Google Drive, Dropbox, and more. Users will also have the ability to restore previous backups made by this application with a click of a button. These features paint a preliminary picture of this application's use cases.
Scope & Design
Individuals looking for a platform to host their data backups securely will be the audience using this program. During the initial setup, users will need to setup a password that will be used to encrypt and decrypt their backups being generated. Upon acquiring the application and performing any preliminary setup, users should be able to easily run the application main and be presented with a blank window with a button to schedule a backup. Once the user, clicks the button, they will be directed to a page where they will give more details about the type of backup they want to run (system-wide or directory specific), and how many backups they would like to keep before the oldest begin being deleted. Upon choosing a backup, the user will then be prompted to choose how they would like their backup to be stored. As default the application will create a backup to the local machine of the user. However, on this screen, the user may click a “+” icon and choose from a list of supported cloud-based storage solutions in which they may send their backed up data. Note, that the user may also specify more than one place to send their content, and not all of the remote sites are necessarily “cloud based”; the user could also specify a “USER@IP” address that would be the host of their backups.
Ultimately, the user will have full control of what is being backed up and how. As stated before, the default backup location will be the user's machine, but if there is at least one destination specified, that setting could be toggled off. Furthermore, the user will have the choice to encrypt their backups via duplicity (http://duplicity.nongnu.org/). Duplicity will act as the driving force of this backup regime; it will be spawning backups, encrypting them, and in the case of network-localized backups – sending them. The gnu Duplicity project is presently kept up to date, free to use, and open source.
Finally, the process of data recovery. No matter what the case may be (i.e. simply misplacing one or two files, or having the original machine destroyed in a fiery explosion), users should be able to re-acquire the backup regime application and navigate to the “restore” tab where they can choose which cloud based storage they would like to recover from. In the event that they simply need to restore from a local-network storage device, they can simply provide the “USER@IP”. The application will search in a particular directory within the host by default for a list of packages to restore from. Once the user has gone through the process of choosing a storage provider and linking their account as is specific for each provider, the user will be able to choose from a list of packaged backups that are available.
This application will place an emphasis on a lightweight and user-friendly environment. It should simply be expected to work. Users should expect a high degree of reliability in the use of this application. Redundancy will be achieved with the use of cloud-based applications, as well as user-specified points of storage (local network storage). Users should expect to be able to access their backups anywhere, but will need to have a password in order to decrypt and thus unpack their backups. Furthermore, it will be necessary for users to use this backup regime application in order to decrypt any backups made previously. In the case that the user is familiar with Duplicity, it is also possible to simply download their backups, install Duplicity, and restore via the Duplicity command line interface. Lastly, this application will be limited to the Linux operating system and its derivatives (Mac included).
This application will be written predominately in Bash with the aid of Ruby and Python for cloud-based API access. Details of how to handle packaging of dependencies such as Duplicity, Ruby, and/or Python will be handled at a later date. Worst case scenario, this backup regime's functionality will be limited to those who run Linux, and to those willing to install the necessary dependencies on their own.
Milestones & Expectations
Milestone One: March 2 - The Foundation
- Lay the the foundation for Duplicity to begin executing backups (without the need of encryption)
- The application should provide the basics of dynamic backup management:
- When a backup is made, it should check to see if any others are obsolete based on hard-coded constraints
- The application should be able to provide minimal API support to cloud based services (Google Drive & AWS)
- Should be able to retrieve and unpack specified packages (again using hard-coded, Duplicity CLI commands)
- Being the foundational base-line for the rest of the project, the code should be optimized and made as abstract as possible
Duplicity relies on private/public keys created by GPG. This will not work in the event that a user missplaces a key or loses this program and needs to restore files manually.
Use of OpenSSL to create a symmetric key that the user can create with a password. In an event that packages need to be downloaded and unencrypted manually, decryption will be possible.
Milestone Two: March 20 - Encryption
At this stage the application should be *mostly finished on the backend and be ready for a gui overhaul.
- The application should provide support for encryption and decryption of packages via Duplicity.
- Duplicity has been removed in favor of SSL-based encryption.
- The application should be able to unpack and restore encrypted backups to their specific directories with ease
- The application should have a logically sound backend that is ready to be implemented in a gui
- Being the foundational base-line for the rest of the project, the code should be optimized and made as abstract as possible. A complete refactor of the code has been completed for this purpose.
- The application should provide more support for API services (Dropbox & Box)
The implementation of the scheduling feature has been pushed back due to the refactoring of the code-base. SSL-based encryption/decryption will be a much more convenient tool for the user than Duplicity would have been.
Milestone Three: April 8 - The Frontend
At this stage the application should come packed with a shiny frontend that is described in more detail in the scope & design.
- The frontend should be light but intuitive
- There is no css due to a prioritized development need.
- Users should be able to see a list of remote and local backup on the landing window of the application with options of restoring or deleting any of them
- Users can see only local backups
- Further details will come from the scope & design
The implementation of functionality for the browser-esk gui is greatly simplified by using Ruby's Sinatra-based mini web-framework. The user may browse there system's file structure and mark files/directories that they wish to backup. Suggested perks would be to mark the last time a file/directory has been backed up; the ability to name backups; the ability to see what was backed up (perhaps a rendering of the history log taken at backup time).
Implementation of a backend server daemon needs to be implemented. This will require a "main.exe" type file that will start-up a server for the user to be properly routed to the views of this app.
Milestone Four: April 22 - Polish & Deployment
At this stage the application should be production-ready. Depending on the atmosphere of the project by this date, the project may have a cross-platform compatibility implemented into the package.
- The application should be package-like (apt for Ubuntu users) in that it is capable of residing in /usr/bin
During the course of this project, assistants will become more familiar with the process of software development. Understanding how to work as a member of a team is critical, and assistants will be expected to take on their own portion of responsibility. By bringing a positive attitude toward the project, assistants will not only benefit themselves, but their project manager as well.
Throughout the course of the project, assistants will need to budget time for weekly development "sprints", or milestones. These sprints will be kept on a tight time-table due to the shortness of the semester. Time management and flexibility are to be expected from the applicant.
- Ability to learn quickly
- Ability to contribute as a member of a team
- Ability to be flexible with time
- Knowledge of AGILE development