DuraCloud is a hosted service and open technology developed by DuraSpace that makes it easy for organizations and end users to use cloud services. It is a cloud-based service that leverages existing cloud infrastructure to enable durability and access to digital content. The service is particularly focused on providing preservation support and access services for academic libraries, academic research centers, and other cultural heritage organizations. DuraCloud builds on pure storage from expert storage providers by overlaying the access functionality and preservation support tools that are essential to ensuring long-term access and ease of use. DuraCloud offers cloud storage and replication of content across multiple providers, via one web-accessible interface.
Once digital content is stored in the cloud, compute services are the key to unlocking its value. DuraCloud provides services that enable digital preservation, data access, transformation, and data sharing. DuraCloud offers customers an elastic capacity with a "pay as you go" approach. It is appropriate for individuals, single institutions, or for multiple organizations that want to make use of cross-institutional infrastructure. DuraCloud has undergone several rounds of pilot testing and will be released as a service hosted by the DuraSpace not-for-profit organization in the summer of 2011.
DuraCloud currently offers services that support storage, preservation, and media access. In particular, DuraCloud makes it easy to copy your content onto several different cloud storage providers with one click and also keep that content synchronized with the primary cloud store. DuraCloud also offers a service that allows you to check the health of your content stored in DuraCloud. Through DuraCloud, you can also stream audio and video files, serve images, and easily transform images from one file format to another. Best of all, DuraCloud services are easy to configure through the web interface. The full list of services can be found here.
DuraCloud has been designed to support replication and backup activities, preservation and archiving, repository backup, and multimedia access. DuraCloud also acts as a mediation layer between you and cloud storage providers, therefore eliminating the risk of vendor lock-in. The full list of solutions can be found here.
DuraCloud provides multiple levels of security, including an instance firewall, encrypted transmissions, application authentication, and storage provider access control. The instance firewall provides protection to each DuraCloud instance by blocking all access except via the standard HTTP and HTTPS ports. Data transmission to and from DuraCloud is via HTTPS encrypted requests and responses that can only be read by the intended recipient. The DuraCloud application requires users accessing their DuraCloud instance via either the web or the REST API interfaces to authenticate with credentials. Users of a DuraCloud instance may have various roles with associated permission levels. Once logged in, content stored within DuraCloud can be designated either “open” or “closed” at the space level. Content that is stored in a closed space can only be accessed by a user that has authenticated to a DuraCloud instance, while content in an “open” space does not require authentication to be accessed. Access to the underlying storage providers used by a DuraCloud instance is restricted to only DuraCloud applications. This ensures that all actions involving content must occur through DuraCloud.
DuraCloud is an independent cloud-based service focused on preservation and access services that are complementary to DSpace and Fedora. DuraCloud is integrated with DSpace and Fedora repositories to provide replication of local content to the cloud. In the future, DSpace and Fedora will most likely be offered as a hosted service within the DuraCloud platform.
The DuraCloud service allows a customer to transfer local content to the DuraCloud application and choose whether to store that content with one or several cloud storage providers, all in several easy steps. Once your content is stored in DuraCloud, you can choose to replicate all or a portion of it to another storage provider, check the integrity of the content, stream the content, serve the content, or transform the content. All of these activities can be done with a few simple clicks from the web interface.
There are no requirements to how your content must be structured for ingest into DuraCloud. In terms of content, DuraCloud is essentially a blob store. You can upload any bitstream, in any format. DuraCloud is also capable of storing any type of package (i.e., AIP, ZIP, TAR, etc.). And since there are no requirements, you can easily transfer data to DuraCloud yourself. There are three options for uploading content to DuraCloud: via the web interface, the client-side synchronization utility, or the REST API.
DuraCloud does not require any specific metadata schema as it is not a repository system. Through the DuraCloud web interface or REST API, you can choose to add as many different name/value pairs of metadata as you need, on a content item or DuraCloud space basis. You can also tag your content stored in DuraCloud in the same way.
DuraCloud gives you the flexibility of choosing whether you want the content you store in DuraCloud open or closed, or some mixture of both. You have the ability to choose, on a space basis, what content you would like public versus the content that is private. Content is always accessible to DuraCloud administrators via the web interface.
DuraCloud is a service that is built on open source software and has the support of an open source community. If you choose to close your DuraCloud account, you have the ability to download all of your content beforehand. If the DuraCloud service is no longer available, you have the option of continuing to run the service on your own infrastructure. You will also have the option to request your individual cloud storage provider credentials.
With DuraCloud, you can make multiple copies of your content and store those copies in multiple locations under multiple administrations. You can also use DuraCloud to synchronize all of your copies with the primary copy. Through the web interface, all of your content is web accessible and can be viewed and downloaded at any time. DuraCloud also provides an integrity checking service that allows you to compare your primary and secondary copies of content with the original manifest for the content. Further services planned to enhance preservation support include format identification, provenance auditing, and automatic repair of secondary copies of content.
Almost no resources or special skills are required to support a solution implemented with DuraCloud. You will need to designate an administrator to manage your DuraCloud account and you may need some technical assistance to transfer your content from your local system to DuraCloud (depending on how your local content is stored).
Hopefully, the cloud will make it easier to do support activities that are difficult to provision and manage internally. The cloud will alleviate the pressure of managing and upgrading internal hardware, as well as simplifying the process of forecasting server and storage requirements.
You, the customer, own and manage your own account and content. You are not handing it over to DuraCloud, so you can do what you want with it at any time. Further, the DuraCloud software is open source, so if you ever decide to run the whole stack/application on your own, you can! Your DuraCloud account is integrated with multiple cloud providers; therefore, DuraCloud lowers the risk of vendor lock-in if one goes out of business. If one provider goes out of business, the DuraCloud team will assist you to move your content to another provider.
DuraCloud is one low-level component of an overall preservation strategy. It does not address fine-grained policy and access control considerations. It can be used to house entire collections of confidential data, and/or support a system which provides granular controls, but it does not do so itself. DuraCloud does support basic authentication; and you can make spaces within DuraCloud dark or light.
DuraCloud can store any "bundle of bits;” however, it does not provide its own primitives for encryption. Due to the remote nature of many DuraCloud use cases, it is not possible for DuraCloud to maintain encryption on an end-to-end basis.
When the DuraCloud media streaming service is deployed, content items are transferred to Amazon CloudFront "edge locations" that are part of the CloudFront network, and are then streamed from there. For a more thorough description of Amazon CloudFront, please see: http://aws.amazon.com/cloudfront/.
One feature of DuraCloud is the ability to associate metadata (name/value pairs) as well as tags (keywords) with both individual content items and entire spaces. The creation of custom metadata is a user-driven activity. Any metadata that the user adds to a DuraCloud item or space will be preserved.
In regards to jpeg2000 images, the format itself is an image coding standard that supports storing an extensive variety of metadata within the image file itself. .http://www.jpeg.org/jpeg2000/metadata.html.
As is the case with all content in DuraCloud, the bits that comprise any given file are definitely kept and preserved. If there is metadata stored in a separate file or another form external to the actual jpeg2000 image file, and this metadata is not provided to DuraCloud, then DuraCloud will naturally be unable to preserve it.
Yes, DuraCloud provides a synchronization service that allows you to keep your content in sync between cloud storage providers. You can begin the synchronization process when you begin the initial transfer of your content to DuraCloud via the Duplicate on Ingest service. Any changes you make to the content being stored in the primary storage location will then propagate to the secondary storage provider. The Duplicate on Demand service allows you to make an exact copy of content you've already stored in your primary storage provider and move that copy to your secondary store. To learn more about these services, please click here.
Currently it is not possible to index pdfs or perform a keyword search on them in DuraCloud. If this is something that would be highly useful/required at your organization, please contact us and explain how you would expect this feature to work within DuraCloud and what other configuration options this capability would include.
The Duplicate on Demand service allows you to make an exact copy of content you've already stored in your primary storage provider and move that copy to your secondary store. To learn more about this service, please click here.
There could be several things occurring to cause upload errors. The most probable would be failures due to network issues, http giltches, drops, etc. DuraCLoud has a 5GB file limit, so files larger than that are not allowed to be stored. We recommend uploading files no larger than 1GB. If you have files that are larger than that and you would like to add them to DuraCloud, you can use the chunker tool utility (https://wiki.duraspace.org/display/duracloud/DuraCloud+Chunker+Tool), to split apart a single file into smaller files. The sync tool (https://wiki.duraspace.org/display/duracloud/DuraCloud+Sync+Tool) has the chunker feature already included in it, all you need to do is run the sync tool with the -m parameter set.
Combining chunked files is a planned feature for the retrieval tool (https://jira.duraspace.org/browse/DURACLOUD-82). In the meantime, the chunked files can be combined on linux/unix systems by using the cat command, as in
cat file1 file2 file3 > file4
"Stitching" of content chunks is a recognized gap and is on the DuraCloud roadmap.
If you have content you would like to upload to DuraCloud stored in a local file directory structure (or several directories), you can configure the sync tool to run on the directory(ies) that you state and it will run until it has uploaded all of the content on your local machine to the specified DuraCloud space you included in your initial configuration settings. You can also use the DuraCloud REST API methods with custom written scripts to upload content in a bulk fashion to DuraCloud (https://wiki.duraspace.org/display/duracloud/DuraCloud+REST+API).
The DuraCloud software will build and run on Windows XP using Tomcat 6.0.29 and Maven 2.2.1. If you are working in this or a similar configuration, two recommended environments settings are:
Note that not all DuraCloud services will run in a Windows environment. Refer to the documentation about building DuraCloud from source for more information about building and running the DuraCloud software.