Configuring Storage Backend

By default, Cordra uses the local file system for storing digital objects. However, Cordra can be configured to use alternate storage backend systems. It is mandatory to use an alternate storage backend system when Cordra is deployed as a distributed system.

There are a few storage technologies that Cordra can use for its storage. Cordra includes storage modules, which translate Cordra storage requirements into what each of the storage technologies natively offer.

To configure a storage module, other than for the default file system based storage, add a storage section to the Cordra config.json file. For example:

"storage" : {
    "module" : "module-name-goes-here",
    "options" : {

    }
}

Storage Modules

The following storage modules are included within the Cordra distribution:

Filesystem Storage

If no storage module is configured in config.json, the Cordra will store most information from digital objects in a local BerkeleyDB database and the payloads from those digital objects in a directory on the local filesystem. The identifiers of the payloads are hashed, and the hashes are used in the storage directory to ensure that payloads are spread out evenly among the storage sub-directories.

This storage module is only applicable for a single instance deployment scenario.

In the unusual case of needing explicit configuration (for example, with Multiple Storages, described below), use:

Module Name:

bdbje

Module Options:

Option name

Description

cordraDataDir

Filesystem path to the Cordra data directory; generally this will be automatically populated.

System Memory Storage

Module Name:

memory

Module Options: none

The memory storage module uses the system memory; as such the digital objects will be erased once the Cordra process configured to use the memory module stops. Cordra configured with memory module is useful for testing purposes. There are no options required to be specified for this storage module.

This storage module is only applicable for a single instance deployment scenario.

The storage section of the config.json file looks like this:

"storage" : {
    "module" : "memory"
}

MongoDB Storage

Module Name:

mongodb

Module Options:

Option name

Description

connectionUri

MongoDB-style connection URI

maxTimeMs

“maxTimeMs” value used for MongoDB operations, which gives a time limit for processing; default is 30 seconds. Generally this should not need to be set.

maxTimeMsLongRunning

“maxTimeMs” value used for long-running MongoDB operations used in reindexing; default is 10 days. Generally this should not need to be set.

databaseName

Database name; defaults to “cordra”

collectionName

Collection name; defaults to “cordra”

gridFsBucketName

GridFS bucket name (for payload storage); defaults to “fs”

useLegacyIdField

(Advanced option.) Boolean if set to false object ids will use the standard MongoDB property “_id” rather that a custom “id” property and index; defaults to true. Setting to the non-default false may have benefits for storage size or performance, but this cannot be changed for an existing storage.

The MongoDB module will store objects in an MongoDB storage system. The connectionUri is a standard MongoDB-style connection string. If no connectionUri is configured, the default URI of localhost:27017 will be used.

When connecting to MongoDB using TLS, additional configuration may be required. See Enabling TLS for details.

Amazon S3 Storage

Module Name:

s3

Module Options:

Option name

Description

bucketName

(required) Name of bucket to use for storage.

region

(required) AWS region for bucket. (e.g., us-west-2)

accessKey

AWS access key for user with access to this bucket

secretKey

AWS secret key for user with access to this bucket

s3KeyPrefix

Prefix to use for keys on S3 objects.

endpoint

For setting a non-standard service endpoint.

signerOverride

Signature algorithm to be used by the client (possibly useful for non-standard service endpoints).

If accessKey and secretKey are omitted, credentials will be picked up from the environment as described here.

The S3 module stores Cordra objects in an Amazon S3 bucket. In order to use this module, you will need to create the bucket on AWS and create a user with full access to that bucket. An example is below:

"storage" : {
    "module" : "s3",
    "options" : {
        "accessKey" : "XXXXXXXXXXXXXX",
        "secretKey"    : "XXXXXXXXXXXXXX",
        "bucketName"  : "my-bucket-name.example.org",
        "s3KeyPrefix": "testing1234",
        "region": "us-east-1"
    }
}

Multiple Storages

Module Name:

multi

Module Options:

Option name

Description

storageChooser

(required) Fully-qualified class name of an implementation of net.cnri.cordra.storage.multi.StorageChooser.

storageChooserOptions

(optional) A JSON object passed to the constructor of the StorageChooser.

storageMap

(required) A map from String names of storages to storage configurations (with “module” and “options” properties).

This module allows multiplexing among several storage implementations. A custom implementation of net.cnri.cordra.storage.multi.StorageChooser can be provided to determine which storage is accessed for each call.

This module can be used if different types of digital objects are to be managed in different storage systems.

For a standard single-instance Cordra deployment, a JAR file containing the class can be placed in the lib subdirectory of the Cordra data directory, along with any dependency JARs (the cordra-core and Gson dependencies will be provided automatically). If Cordra is deployed in a separate servlet container, the JAR file should be deployed in the servlet container or in Cordra’s own WEB-INF/lib directory.

The StorageChooser can make use of a special feature of the REST API: any call can take a query parameter “requestContext”, which encodes a JSON object. That user-supplied context is made available to the methods of the StorageChooser.

Custom Storage

Module Name:

custom

It is possible to create a custom storage module which implements the Java interface. net.cnri.cordra.storage.CordraStorage. In addition to "module": "custom", there should be a sibling property of "module", "className", which should be set to the fully-qualified name of the Java class implementing CordraStorage.

If the class has a constructor which takes a com.google.gson.JsonObject, the "options" from the configuration will be passed to that constructor to instantiate the class. If the "options" does not already have a property "cordraDataDir", that property will be populated with the filesystem path of the Cordra data directory. If there is no constructor taking a JsonObject, a default constructor (taking no arguments) will be called.

For a standard single-instance Cordra deployment, a JAR file containing the class can be placed in the lib subdirectory of the Cordra data directory, along with any dependency JARs (the cordra-core and Gson dependencies will be provided automatically). If Cordra is deployed in a separate servlet container, the JAR file should be deployed in the servlet container or in Cordra’s own WEB-INF/lib directory.