Resource Identifiers
- Type
- technical
- Labels
- protocolidentifiers
- Created
- December 7, 2025
This proposal is open for feedback.
Join the discussion on GitHub →Abstract
This OEP explores identifier schemes for resources in the Open Science Archive protocol. Rather than prescribing a specific format, it establishes the requirements that any identifier scheme must satisfy, surveys existing approaches, and proposes one candidate (Structured Resource Names) for community feedback.
Motivation
Every resource in OSA—Records, Depositions, Vocabularies, Schemas, Validators—needs an identifier. The choice of identifier scheme has far-reaching consequences for the protocol's usability, longevity, and interoperability.
Scientific data archives present unique challenges:
- Longevity: Identifiers may be cited in papers for decades
- Federation: Multiple independent nodes must avoid collisions
- Machine use: Software needs to parse, route, and validate identifiers
- Human use: Developers and researchers need to debug and discuss identifiers
Getting this wrong is costly. Changing identifier schemes after deployment breaks existing references.
Requirements
Any identifier scheme for OSA should satisfy the following properties:
Must Have
Globally unique: Two resources must never share an identifier, even across independent nodes operated by different organizations.
Resolvable: Given an identifier, there must be a defined mechanism to retrieve the resource or its metadata.
Stable: Once assigned, an identifier must continue to refer to the same resource. Identifiers should not be reassigned or recycled.
Should Have
Human-readable: Developers should be able to understand what an identifier refers to without dereferencing it. At minimum, identifiers should be pronounceable and not excessively long.
Type-aware: The identifier should indicate what kind of resource it refers to (Record, Vocabulary, etc.), enabling validation and routing without network calls.
Version-aware: For versioned resources, the identifier should support pinning to a specific version.
Decentralized minting: Nodes should be able to create identifiers without coordinating with a central authority.
Nice to Have
Persistent across migrations: If an organization changes its domain or infrastructure, existing identifiers should remain valid.
Content-addressable: Identifiers could be derived from content hashes, enabling integrity verification and deduplication.
Compatible with existing standards: Alignment with URN, DID, DOI, or other established schemes reduces implementation burden and improves interoperability.
Existing Approaches
URLs
https://archive.example.org/records/abc123
Pros: Universal, familiar, directly resolvable, existing tooling.
Cons: Conflates identity with location. When domains change, URLs break. No built-in versioning or typing.
Used by: Most web APIs, many data repositories.
DOIs (Digital Object Identifiers)
doi:10.1234/abc.5678
Pros: Designed for persistence, widely adopted in academia, resolver infrastructure exists (doi.org), citable in papers.
Cons: Opaque (no type or origin information), requires registration with a DOI agency (cost, bureaucracy), resolution depends on Handle System (centralized).
Used by: Academic publishing, Zenodo, Figshare, DataCite.
URNs (Uniform Resource Names)
urn:isbn:978-3-16-148410-0
urn:ietf:rfc:3986
Pros: W3C/IETF standard, separates naming from resolution, extensible namespace system.
Cons: No universal resolution mechanism (each namespace defines its own), requires IANA registration for formal namespaces.
Used by: ISBN, IETF RFCs, various domain-specific schemes.
DIDs (Decentralized Identifiers)
did:web:example.org
did:plc:abc123xyz
Pros: W3C standard, designed for decentralization, supports cryptographic verification, multiple "methods" for different tradeoffs.
Cons: Designed for entities (people, organizations) not resources, verbose, emerging ecosystem.
Used by: AT Protocol (Bluesky), identity wallets, Verifiable Credentials.
ARNs (Amazon Resource Names)
arn:aws:s3:us-east-1:123456789:bucket/object
Pros: Proven at scale, encodes region/account/service/resource hierarchy, enables policy-based access control.
Cons: AWS-specific, complex syntax, assumes single operator (Amazon).
Used by: All AWS services.
UUIDs
550e8400-e29b-41d4-a716-446655440000
Pros: Trivial to generate, guaranteed unique (v4), no coordination required.
Cons: Opaque, no context about resource type or origin, not human-friendly, not directly resolvable.
Used by: Databases, internal systems, anywhere uniqueness matters more than readability.
Content Identifiers (CIDs)
bafybeigdyrzt5sfp7udm7hu76uh7y26nf3efuylqabf3oca...
Pros: Derived from content hash, self-verifying, enables deduplication, immutable by design.
Cons: Long, not human-readable, requires content to generate identifier, any content change = new identifier.
Used by: IPFS, Filecoin, content-addressed storage systems.
Analysis
| Scheme | Unique | Resolvable | Stable | Readable | Typed | Versioned | Decentralized |
|---|---|---|---|---|---|---|---|
| URLs | ✓ | ✓ | ✗ | ✓ | ✗ | ✗ | ✓ |
| DOIs | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✗ |
| URNs | ✓ | ◐ | ✓ | ◐ | ◐ | ◐ | ◐ |
| DIDs | ✓ | ✓ | ✓ | ✗ | ✗ | ✗ | ✓ |
| ARNs | ✓ | ✓ | ✓ | ◐ | ✓ | ✗ | ✗ |
| UUIDs | ✓ | ✗ | ✓ | ✗ | ✗ | ✗ | ✓ |
| CIDs | ✓ | ◐ | ✓ | ✗ | ✗ | ✗ | ✓ |
✓ = yes, ✗ = no, ◐ = partial/depends
No existing scheme fully satisfies our requirements. This suggests either:
- Extending an existing scheme (likely URN or DID)
- Defining a new scheme purpose-built for OSA
Candidate: Structured Resource Names (SRNs)
One option is to define a URN-based scheme that embeds the properties we need. We propose this as a starting point for discussion, not as a final specification.
Format
urn:osa:{node-id}:{type}:{local-id}[@{version}][#{fragment}]
Components
urn:osa: — Fixed prefix indicating an OSA identifier.
{node-id} — The originating node. Options:
- DNS hostname (e.g.,
data.imperial.ac.uk) — simple, enables direct resolution, but breaks if domain changes - DID (e.g.,
did:web:data.imperial.ac.uk) — more persistent, adds complexity - Opaque ID with registry lookup — most persistent, requires central infrastructure
{type} — Resource type: rec, dep, vocab, schema, val, tool.
{local-id} — Node-assigned identifier, opaque to clients.
@{version} — Optional version suffix for immutable snapshots.
#{fragment} — Optional fragment for sub-resources (e.g., vocabulary attributes).
Examples
urn:osa:data.imperial.ac.uk:rec:xyz789@v1
urn:osa:archive.embl.org:vocab:rnaseq@v2.1#mapped-reads-percent
urn:osa:did:web:data.imperial.ac.uk:dep:abc123
Open Questions
Node identity: Should node-id be a DNS hostname, a DID, or something else? DNS is simple but fragile. DIDs add persistence but complexity.
DID integration: If nodes have DIDs (via did:web or similar), should the SRN embed the full DID or just the hostname with an implied DID?
Registration: Should urn:osa be registered with IANA? This adds legitimacy but bureaucracy.
Versioning syntax: Is @v1 the right format? Alternatives: /v1, ?version=1, separate field.
Migration: How should identifiers survive domain changes? Options include redirect protocols, DID-based persistence, or accepting breakage as rare.
Alternative: DID-Native Approach
Rather than inventing SRNs, we could use DIDs directly:
did:osa:data.imperial.ac.uk:rec:xyz789
This would require defining a did:osa method specifying:
- Identifier format
- Resolution process
- CRUD operations on DID Documents
Pros: Aligns with W3C standard, potential interop with Verifiable Credentials, existing DID tooling.
Cons: DIDs are designed for entities not resources, would be non-standard usage, more complex resolution.
Alternative: Minimal Approach
Use simple URLs with conventions:
https://data.imperial.ac.uk/osa/records/xyz789/v1
Rely on HTTP redirects for persistence. Accept that URLs may break.
Pros: Simplest to implement, no new concepts, universal tooling.
Cons: Fragile, no type information, conflates identity with location.
Next Steps
This OEP seeks feedback on:
- Requirements: Are the requirements complete and correctly prioritized?
- Existing schemes: Are there schemes we should consider that aren't listed?
- SRN proposal: Is this a reasonable starting point, or should we pursue a different direction?
- Node identity: What should node-id be? DNS hostname, DID, or hybrid?
- Migration: How important is surviving domain changes? What tradeoffs are acceptable?
Based on community input, a follow-up OEP will specify the chosen scheme in detail.