FAQ
Below are "Frequently Asked Questions". These questions are divided into two sections - general, and technical.
General Questions
What is DuraCloud?
DuraCloud is a hosted service and open technology developed by DuraSpace that makes it easy for organizations and end users to use cloud services. It is a cloud-based service that leverages existing cloud infrastructure to enable durability and access to digital content. The service is particularly focused on providing preservation support and access services for academic libraries, academic research centers, and other cultural heritage organizations. DuraCloud builds on pure storage from expert storage providers by overlaying the access functionality and preservation support tools that are essential to ensuring long-term access and ease of use. DuraCloud offers cloud storage and replication of content across multiple providers, via one web-accessible interface.
Once digital content is stored in the cloud, compute services are the key to unlocking its value. DuraCloud provides services that enable digital preservation, data access, transformation, and data sharing. DuraCloud offers customers an elastic capacity with a "pay as you go" approach. It is appropriate for individuals, single institutions, or for multiple organizations that want to make use of cross-institutional infrastructure. DuraCloud has undergone several rounds of pilot testing and will be released as a service hosted by the DuraSpace not-for-profit organization in the summer of 2011.
What services are currently available in DuraCloud?
DuraCloud currently offers services that support storage, preservation, and media access. In particular, DuraCloud makes it easy to copy your content onto several different cloud storage providers with one click and also keep that content synchronized with the primary cloud store. DuraCloud also offers a service that allows you to check the health of your content stored in DuraCloud. Through DuraCloud, you can also stream audio and video files, serve images, and easily transform images from one file format to another. Best of all, DuraCloud services are easy to configure through the web interface. The full list of services can be found here.
What solutions does DuraCloud provide? What use cases has DuraCloud been designed to support?
DuraCloud has been designed to support replication and backup activities, preservation and archiving, repository backup, and multimedia access. DuraCloud also acts as a mediation layer between you and cloud storage providers, therefore eliminating the risk of vendor lock-in. The full list of solutions can be found here.
What security is available in DuraCloud?
DuraCloud provides multiple levels of security, including an instance firewall, encrypted transmissions, application authentication, and storage provider access control. The instance firewall provides protection to each DuraCloud instance by blocking all access except via the standard HTTP and HTTPS ports. Data transmission to and from DuraCloud is via HTTPS encrypted requests and responses that can only be read by the intended recipient. The DuraCloud application requires users accessing their DuraCloud instance via either the web or the REST API interfaces to authenticate with credentials. Users of a DuraCloud instance may have various roles with associated permission levels. Once logged in, content stored within DuraCloud can be designated either “open” or “closed” at the space level. Content that is stored in a closed space can only be accessed by a user that has authenticated to a DuraCloud instance, while content in an “open” space does not require authentication to be accessed. Access to the underlying storage providers used by a DuraCloud instance is restricted to only DuraCloud applications. This ensures that all actions involving content must occur through DuraCloud.
What is DuraCloud’s privacy policy?
Coming soon.
How does DuraCloud relate to the existing suite of DuraSpace open technologies?
DuraCloud is an independent cloud-based service focused on preservation and access services that are complementary to DSpace and Fedora. DuraCloud is integrated with DSpace and Fedora repositories to provide replication of local content to the cloud. In the future, DSpace and Fedora will most likely be offered as a hosted service within the DuraCloud platform.
How does DuraCloud work?
The DuraCloud service allows a customer to transfer local content to the DuraCloud application and choose whether to store that content with one or several cloud storage providers, all in several easy steps. Once your content is stored in DuraCloud, you can choose to replicate all or a portion of it to another storage provider, check the integrity of the content, stream the content, serve the content, or transform the content. All of these activities can be done with a few simple clicks from the web interface.
How must data be structured for ingest? And who does it?
There are no requirements to how your content must be structured for ingest into DuraCloud. In terms of content, DuraCloud is essentially a blob store. You can upload any bitstream, in any format. DuraCloud is also capable of storing any type of package (i.e., AIP, ZIP, TAR, etc.). And since there are no requirements, you can easily transfer data to DuraCloud yourself. There are three options for uploading content to DuraCloud: via the web interface, the client-side synchronization utility, or the REST API.
Are there specific metadata schema requirements?
DuraCloud does not require any specific metadata schema as it is not a repository system. Through the DuraCloud web interface or REST API, you can choose to add as many different name/value pairs of metadata as you need, on a content item or DuraCloud space basis. You can also tag your content stored in DuraCloud in the same way.
How is geographical distribution accomplished?
DuraCloud provides the option to store your content with several cloud storage providers who each maintain physical storage facilities in various locations throughout the United States.
What are the costs and how are they determined?
DuraCloud costs are detailed here.
Is this an open, closed, or dim archive or do we have the option?
DuraCloud gives you the flexibility of choosing whether you want the content you store in DuraCloud open or closed, or some mixture of both. You have the ability to choose, on a space basis, what content you would like public versus the content that is private. Content is always accessible to DuraCloud administrators via the web interface.
What is the exit strategy?
DuraCloud is a service that is built on open source software and has the support of an open source community. If you choose to close your DuraCloud account, you have the ability to download all of your content beforehand. If the DuraCloud service is no longer available, you have the option of continuing to run the service on your own infrastructure. You will also have the option to request your individual cloud storage provider credentials.
What preservation activities does DuraCloud support?
With DuraCloud, you can make multiple copies of your content and store those copies in multiple locations under multiple administrations. You can also use DuraCloud to synchronize all of your copies with the primary copy. Through the web interface, all of your content is web accessible and can be viewed and downloaded at any time. DuraCloud also provides an integrity checking service that allows you to compare your primary and secondary copies of content with the original manifest for the content. Further services planned to enhance preservation support include format identification, provenance auditing, and automatic repair of secondary copies of content.
What preservation standards does DuraCloud support?
Although not an official "standard,” the general approach DuraCloud has taken is to provide the ability to rebuild your DuraCloud instance from just the content itself.
What resources and skills are required to support a solution implemented with DuraCloud?
Almost no resources or special skills are required to support a solution implemented with DuraCloud. You will need to designate an administrator to manage your DuraCloud account and you may need some technical assistance to transfer your content from your local system to DuraCloud (depending on how your local content is stored).
What infrastructure does DuraCloud rely on?
DuraCloud relies on public cloud storage and compute as well as private cloud storage.
How can the cloud environment impact digital preservation activities?
Hopefully, the cloud will make it easier to do support activities that are difficult to provision and manage internally. The cloud will alleviate the pressure of managing and upgrading internal hardware, as well as simplifying the process of forecasting server and storage requirements.
Assuming content is stored in DuraCloud today, what mechanisms are in place to ensure that content can be retrieved in 50 years?
You, the customer, own and manage your own account and content. You are not handing it over to DuraCloud, so you can do what you want with it at any time. Further, the DuraCloud software is open source, so if you ever decide to run the whole stack/application on your own, you can! Your DuraCloud account is integrated with multiple cloud providers; therefore, DuraCloud lowers the risk of vendor lock-in if one goes out of business. If one provider goes out of business, the DuraCloud team will assist you to move your content to another provider.
How does DuraCloud address the following concern: Confidential Data?
DuraCloud is one low-level component of an overall preservation strategy. It does not address fine-grained policy and access control considerations. It can be used to house entire collections of confidential data, and/or support a system which provides granular controls, but it does not do so itself. DuraCloud does support basic authentication; and you can make spaces within DuraCloud dark or light.
How does DuraCloud address the following concern: Encrypted Data?
DuraCloud can store any "bundle of bits;” however, it does not provide its own primitives for encryption. Due to the remote nature of many DuraCloud use cases, it is not possible for DuraCloud to maintain encryption on an end-to-end basis.
How does DuraCloud address the following concern: Legal Compliance?
Content access and copyright for content stored in DuraCloud is controlled and managed by the user/account holder.
Technical Questions
What is DurAdmin coded in?
The Duradmin UI itself is constructed largely of JSPs leveraging JQuery for the AJAX/javascripting support. Like most of the other components of DuraCloud, the underlying logic is primarily Java.
How does Amazon CloudFront work?
When the DuraCloud media streaming service is deployed, content items are transferred to Amazon CloudFront "edge locations" that are part of the CloudFront network, and are then streamed from there. For a more thorough description of Amazon CloudFront, please see: http://aws.amazon.com/cloudfront/.
Is JPEG2000 metadata kept/preserved after being transferred to DuraCloud?
One feature of DuraCloud is the ability to associate metadata (name/value pairs) as well as tags (keywords) with both individual content items and entire spaces. The creation of custom metadata is a user-driven activity. Any metadata that the user adds to a DuraCloud item or space will be preserved.
In regards to jpeg2000 images, the format itself is an image coding standard that supports storing an extensive variety of metadata within the image file itself. .http://www.jpeg.org/jpeg2000/metadata.html.
As is the case with all content in DuraCloud, the bits that comprise any given file are definitely kept and preserved. If there is metadata stored in a separate file or another form external to the actual jpeg2000 image file, and this metadata is not provided to DuraCloud, then DuraCloud will naturally be unable to preserve it.
Is it possible to synchronize different storage providers in DuraCloud (i.e. sync primary store content with secondary store)?
Yes, DuraCloud provides a synchronization service that allows you to keep your content in sync between cloud storage providers. You can begin the synchronization process when you begin the initial transfer of your content to DuraCloud via the Duplicate on Ingest service. Any changes you make to the content being stored in the primary storage location will then propagate to the secondary storage provider. The Duplicate on Demand service allows you to make an exact copy of content you've already stored in your primary storage provider and move that copy to your secondary store. To learn more about these services, please click here.
Is it possible to index pdfs and then perform a keyword search?
Currently it is not possible to index pdfs or perform a keyword search on them in DuraCloud. If this is something that would be highly useful/required at your organization, please contact us and explain how you would expect this feature to work within DuraCloud and what other configuration options this capability would include.
In the Content Items area, how many content items need to be loaded before the Previous and Next buttons work?
201 content items must be added to one DuraCloud space before the Previous and Next buttons will work.
Is there a way replicate objects from Amazon to Rackspace after the objects are uploaded to an Amazon space?
The Duplicate on Demand service allows you to make an exact copy of content you've already stored in your primary storage provider and move that copy to your secondary store. To learn more about this service, please click here.
What could cause issues when uploading large files to DuraCloud?
There could be several things occurring to cause upload errors. The most probable would be failures due to network issues, http giltches, drops, etc. DuraCLoud has a 5GB file limit, so files larger than that are not allowed to be stored. We recommend uploading files no larger than 1GB. If you have files that are larger than that and you would like to add them to DuraCloud, you can use the chunker tool utility (https://wiki.duraspace.org/display/duracloud/DuraCloud+Chunker+Tool), to split apart a single file into smaller files. The sync tool (https://wiki.duraspace.org/display/duracloud/DuraCloud+Sync+Tool) has the chunker feature already included in it, all you need to do is run the sync tool with the -m parameter set.
How do I combine chunked files?
Combining chunked files is a planned feature for the retrieval tool (https://jira.duraspace.org/browse/DURACLOUD-82). In the meantime, the chunked files can be combined on linux/unix systems by using the cat command, as in
cat file1 file2 file3 > file4
"Stitching" of content chunks is a recognized gap and is on the DuraCloud roadmap.
Is there a quicker method to upload content to DuraCloud other than file by file via DurAdmin?
If you have content you would like to upload to DuraCloud stored in a local file directory structure (or several directories), you can configure the sync tool to run on the directory(ies) that you state and it will run until it has uploaded all of the content on your local machine to the specified DuraCloud space you included in your initial configuration settings. You can also use the DuraCloud REST API methods with custom written scripts to upload content in a bulk fashion to DuraCloud (https://wiki.duraspace.org/display/duracloud/DuraCloud+REST+API).
When an upload fails, do remnants persist in DuraCloud?
Not to the best of the DuraCloud team's knowledge, no remnants of a failed upload exist/persist in DuraCloud.
Will the DuraCloud open source software run on a Windows machine?
The DuraCloud software will build and run on Windows XP using Tomcat 6.0.29 and Maven 2.2.1. If you are working in this or a similar configuration, two recommended environments settings are:
CATALINA_OPTS="-server -Xmx512m"
MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=1024m"
Note that not all DuraCloud services will run in a Windows environment. Refer to the documentation about building DuraCloud from source for more information about building and running the DuraCloud software.
Other than the image conversion service, are there other file format migration services being discussed for use with DuraCloud?
Although there are no other conversion services on the immediate horizon, format identification is currently being pursued. If you have a particular suggestion or service in mind, please contact us.




