What is Cordra?
Cordra is a highly configurable software for managing digital objects with resolvable identifiers at scale.
Cordra offers a distinct experience to software developers who intend to build scalable infrastructures for managing information as digital objects. Cordra saves substantial development time as it is built-in with functions that developers desire ranging from user authentication and access control to information validation, enrichment, storing, and indexing. Cordra provides these functions at scale, which can be tailored via just configurations.
And Cordra is open source.
Globally Accessible Digital Objects
Management of NoSQL information as digital objects is now made easy. A digital object is an information structure that has an associated unique resolvable identifier, a type, and access control information. Types prescribe schemas and rules to apply during various stages of a digital object lifecycle.
Cordra manages JSON records and payloads as typed digital objects. Using the type information and related configurations, Cordra determines how to validate a digital object, how to enrich it, what operations to invoke on it, and when and whom to give access to it.
Users need not know the location of Cordra instances to access or invoke operations on digital objects. Users can resolve identifiers using the resolution protocol via native clients or proxy services, such as hdl.handle.net or doi.org, to know the location of the digital objects. That said, global access can be removed via configuration changes.
Is Cordra a Database or an Indexer?
It is neither. It orchestrates the two, and provides a unified interface. At scale.
Cordra ensures data integrity across storage and indexing services using synchronized, distributed locking mechanisms. Cordra supports MongoDB, Amazon S3, Elasticsearch, and Solr by default.
Information Lifecycle Management
What information is stored and indexed, and what is returned back to user, is all controllable now.
Not only can you validate incoming information against schemas that are defined using types, Cordra provides hooks that can be used to take specific actions at various stages of the information lifecycle. Those hooks enable you to take control over identifier generation, schema validation, de-duplication, dynamic information updates at write-time or read-time, access controls, what gets stored, and what gets indexed.
What if you have complex processing logic that falls outside the regular lifecycle, but do not want to maintain another software process.
Cordra can be extended by adding custom operations that can read from or write to digital objects. These operations take advantage of the built-in locking mechanisms to avoid data integrity issues.
Authentication and PKI
User authentication using PKI is now made simple. When public keys are associated with digital objects, Cordra can issue PKI challenges.
Password-based authentication whenever useful is available for use. Cordra stores only hashed and salted variations of passwords, not the actual passwords.
Linking and Hashing
Linking between digital objects is straightforward in Cordra. Cordra uses schema definitions to allow or deny links made to other objects for any given type of digital object. Graph style linking is permitted. Cordra can also compute and store digital object hashes in the objects, and links can be based on hashes. Object immutability verification based on hashed links is natively supported.
Consensus algorithms are not included. You can implement them if needed.
Readers and writers can simply be declared in a digital object, or in its type, or system-wide. Readers and writers are themselves represented as digital objects or as groups of such digital objects.
If you need more control than what declarations permit, lifecycle hooks may be used to allow or deny access to digital objects based on additional request context besides just the user identifier.
APIs and Clients
Cordra provides REST, DOIP, and IRP interfaces. REST reduces the entry barrier. DOIP operations allow extensibility and interoperability. IRP enables identifier resolution.
Reliability and Performance
Cordra is built to be a reliable distributed system.
Most data infrastructures require separate backend services for storage and indexing because each of them is optimized for its respective capabilities. This implies data infrastructures will need to ensure the integrity of the system states between storage and indexing services. While data integrity can be handled via in-memory locks in small-scale projects, robust internal machinery that can withstand partial system/network failures is required to enable data integrity in large-scale infrastructures.
Cordra ensures data integrity across storage and indexing services, at scale, using synchronized, distributed locking mechanisms.
Cordra is built to scale horizontally. This means multiple Cordra instances can work with the same set of backend (storage and indexing) services, and user requests can be spread and load balanced across those instances.
To enable this, internally, a variety of scenarios are handled automatically, and Cordra: 1) provides concurrency controls so updates to the same digital object can be requested on different Cordra instances in parallel; 2) ensures consistency between backend services even when a subset of those services temporarily fail; and 3) does not drop requests even when a subset of Cordra instances crash (if at all).
Cordra provides built-in support for MongoDB and AWS S3 storage services. For indexing, Cordra supports Apache Solr and Elastic services. For prototypes and quick experiments, Cordra could also be readily instantiated in a standalone fashion on a desktop, leveraging local file system or system memory for storage, and embedded Apache Lucene for indexing.
Support for other storage and indexing services could be added.
Ensuring durability and independence from region-wide blackouts requires cross-region and perhaps cross-cloud replication. However, replicating storage and indexing data independently from each other leads to data integrity issues. A holistic and infrastructure level replication solution is needed.
Cordra solves this issue using clusters. A cluster in this context is a configured collection of Cordra instances and backend services that can exist independently of other clusters. Cordra replicates managed digital objects across clusters, in different continents and/or clouds, and is programmable to varying levels of data consistency guarantees.
How to Scale?
Use a load balancer of your choice across those instances.
Repeat this setup across clouds or geographical regions if needed; Cordra then carries out the replication automatically.
Admin Interface and Tools
Most Cordra features are configuration driven. Cordra provides an admin interface for making configuration changes. Cordra also generates a custom web user interface dynamically based on type information for creating, viewing, updating, and administering digital objects of any type.