By default, Cordra uses the local file system for storing digital objects. However, Cordra can be configured to use alternate storage backend systems. It is mandatory to use an alternate storage backend system when Cordra is deployed as a distributed system.
There are a few storage technologies that Cordra can use for its storage. Cordra includes storage modules, which translate Cordra storage requirements into what each of the storage technologies natively offer.
To configure a storage module, other than for the default file system based storage, add a storage
section to the
Cordra config.json
file. For example:
"storage" : {
"module" : "module-name-goes-here",
"options" : {
}
}
The following storage modules are included within the Cordra distribution:
If no storage module is configured in config.json
, the Cordra will store most information from digital objects
in a local BerkeleyDB database and the payloads from those digital objects in a directory on the local filesystem.
The identifiers of the payloads are hashed, and the hashes are used in the storage directory to ensure that payloads
are spread out evenly among the storage sub-directories.
This storage module is only applicable for a single instance deployment scenario.
In the unusual case of needing explicit configuration (for example, with Multiple Storages, described below), use:
bdbje
Module Options:
Option name |
Description |
---|---|
cordraDataDir |
Filesystem path to the Cordra data directory; generally this will be automatically populated. |
memory
Module Options: none
The memory
storage module uses the system memory; as such the digital objects will be erased once the Cordra process
configured to use the memory module stops. Cordra configured with memory module is useful for testing purposes. There
are no options required to be specified for this storage module.
This storage module is only applicable for a single instance deployment scenario.
The storage
section of the config.json
file looks like this:
"storage" : {
"module" : "memory"
}
mongodb
Module Options:
Option name |
Description |
---|---|
connectionUri |
MongoDB-style connection URI |
maxTimeMs |
“maxTimeMs” value used for MongoDB operations, which gives a time limit for processing; default is 30 seconds. Generally this should not need to be set. |
maxTimeMsLongRunning |
“maxTimeMs” value used for long-running MongoDB operations used in reindexing; default is 10 days. Generally this should not need to be set. |
databaseName |
Database name; defaults to “cordra” |
collectionName |
Collection name; defaults to “cordra” |
gridFsBucketName |
GridFS bucket name (for payload storage); defaults to “fs” |
useLegacyIdField |
(Advanced option.) Boolean if set to false object ids will use the standard MongoDB property “_id” rather that a custom “id” property and index; defaults to true. Setting to the non-default false may have benefits for storage size or performance, but this cannot be changed for an existing storage. |
The MongoDB module will store objects in an MongoDB storage system. The connectionUri
is
a standard MongoDB-style connection string.
If no connectionUri
is configured, the default URI of localhost:27017
will be used.
When connecting to MongoDB using TLS, additional configuration may be required. See Enabling TLS for details.
s3
Module Options:
Option name |
Description |
---|---|
bucketName |
(required) Name of bucket to use for storage. |
region |
(required) AWS region for bucket. (e.g., us-west-2) |
accessKey |
AWS access key for user with access to this bucket |
secretKey |
AWS secret key for user with access to this bucket |
s3KeyPrefix |
Prefix to use for keys on S3 objects. |
endpoint |
For setting a non-standard service endpoint. |
signerOverride |
Signature algorithm to be used by the client (possibly useful for non-standard service endpoints). |
If accessKey and secretKey are omitted, credentials will be picked up from the environment as described here.
The S3 module stores Cordra objects in an Amazon S3 bucket. In order to use this module, you will need to create the bucket on AWS and create a user with full access to that bucket. An example is below:
"storage" : {
"module" : "s3",
"options" : {
"accessKey" : "XXXXXXXXXXXXXX",
"secretKey" : "XXXXXXXXXXXXXX",
"bucketName" : "my-bucket-name.example.org",
"s3KeyPrefix": "testing1234",
"region": "us-east-1"
}
}
multi
Module Options:
Option name |
Description |
---|---|
storageChooser |
(required) Fully-qualified class name of an implementation of net.cnri.cordra.storage.multi.StorageChooser. |
storageChooserOptions |
(optional) A JSON object passed to the constructor of the StorageChooser. |
storageMap |
(required) A map from String names of storages to storage configurations (with “module” and “options” properties). |
This module allows multiplexing among several storage implementations. A custom implementation of net.cnri.cordra.storage.multi.StorageChooser can be provided to determine which storage is accessed for each call.
This module can be used if different types of digital objects are to be managed in different storage systems.
For a standard single-instance Cordra deployment, a JAR file containing the class can be
placed in the lib
subdirectory of the Cordra data directory, along with any
dependency JARs (the cordra-core and Gson dependencies will be provided automatically).
If Cordra is deployed in a separate servlet container, the JAR file should be deployed
in the servlet container or in Cordra’s own WEB-INF/lib directory.
The StorageChooser can make use of a special feature of the REST API: any call can take a query parameter “requestContext”, which encodes a JSON object. That user-supplied context is made available to the methods of the StorageChooser.
custom
It is possible to create a custom storage module which implements the Java interface.
net.cnri.cordra.storage.CordraStorage. In addition to "module": "custom"
, there
should be a sibling property of "module"
, "className"
, which should be
set to the fully-qualified name of the Java class implementing CordraStorage.
If the class has a constructor which takes a com.google.gson.JsonObject, the "options"
from the configuration will be passed to that constructor to instantiate the class.
If the "options"
does not already have a property "cordraDataDir"
, that property
will be populated with the filesystem path of the Cordra data directory.
If there is no constructor taking a JsonObject, a default constructor (taking no arguments) will be called.
For a standard single-instance Cordra deployment, a JAR file containing the class can be
placed in the lib
subdirectory of the Cordra data directory, along with any
dependency JARs (the cordra-core and Gson dependencies will be provided automatically).
If Cordra is deployed in a separate servlet container, the JAR file should be deployed
in the servlet container or in Cordra’s own WEB-INF/lib directory.