Files
What are Files in Open Register?
In Open Register, Files are binary data attachments that can be associated with objects. They extend the system beyond structured data to include documents, images, videos, and other file types that are essential for many applications.
Files in Open Register are:
- Securely stored and managed
- Associated with specific objects
- Versioned alongside their parent objects
- Accessible through a consistent API
- Integrated with Nextcloud's file management capabilities
Attaching Files to Objects
Files can be attached to objects in several ways:
- Integrated Uploads: Files can be uploaded directly within object POST/PUT operations using multipart/form-data, base64-encoded content, or URL references
- Schema-defined file properties: When a schema includes properties of type 'file', these are automatically handled during object creation or updates
- Direct API attachment: Files can be added to an object after creation using the file attachment API endpoints
- Base64 encoded content: Files can be included in object data as base64-encoded strings
- URL references: External files can be referenced by URL and will be downloaded and stored locally
Integrated File Uploads
OpenRegister supports integrated file uploads directly within object POST/PUT operations, providing a unified approach to handling structured data (objects) and unstructured data (files) together.
Upload Methods
1. Multipart/Form-Data Upload (Recommended)
Use Case: Uploading files from web forms or file inputs
Authentication Note: ⚠️ Multipart file uploads require session-based authentication or API tokens. HTTP Basic Authentication is not supported for multipart uploads due to Nextcloud security policies. For API testing with Basic Auth, use base64-encoded files instead (see method 2 below).
Example:
POST /index.php/apps/openregister/api/registers/documents/schemas/document/objects
Content-Type: multipart/form-data
title=Annual Report 2024
attachment=@report.pdf
thumbnail=@cover.jpg
JavaScript Example (with session cookies):
const formData = new FormData();
formData.append('title', 'Annual Report 2024');
formData.append('attachment', fileInput.files[0]);
formData.append('thumbnail', thumbnailInput.files[0]);
fetch('/index.php/apps/openregister/api/registers/documents/schemas/document/objects', {
method: 'POST',
body: formData,
credentials: 'include' // Important: includes session cookies
})
.then(response => response.json())
.then(data => console.log('Created:', data));
Why this is recommended:
- ✅ Most efficient: No encoding overhead, files transferred directly
- ✅ Preserves metadata: Original filename and MIME type are maintained
- ✅ No guessing: Extension and filename are exactly as uploaded
- ✅ Best file quality: No conversion or inference errors
- ✅ Low memory footprint: Can stream directly from disk to disk
- ✅ Fastest method: Direct transfer without intermediate conversions
Authentication Methods for Multipart Uploads:
- ✅ Session cookies (recommended for web applications)
- ✅ Nextcloud App Passwords (for external applications)
- ✅ OAuth2 tokens (for third-party integrations)
- ❌ HTTP Basic Auth (not supported due to Nextcloud security policies)
2. Base64-Encoded Files (Recommended for API Testing)
Use Case: Embedding files in JSON payloads, API integrations, testing with HTTP Basic Auth
Authentication: Works with all authentication methods including HTTP Basic Auth.
Data URI Format:
{
"title": "Screenshot",
"image": "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA..."
}
Plain Base64 Format:
{
"title": "Document",
"attachment": "JVBERi0xLjQKJeLjz9MKMyAwIG9iago8PC9MZW5ndGggMj..."
}
Note: Base64 encoding increases file size by approximately 33% and original filenames are lost. Use only for small files (< 100 KB) or when multipart is not possible.
3. URL References
Use Case: Referencing remote files, importing from external sources
Example:
{
"title": "External Document",
"attachment": "https://example.com/files/document.pdf",
"logo": "https://cdn.example.com/images/logo.png"
}
Note: URL references are slower as the server must download the file from the external URL. Use only for trusted sources or migration scenarios.
4. Mixed Upload Methods
You can combine all three methods in a single request:
POST /index.php/apps/openregister/api/registers/documents/schemas/document/objects
Content-Type: multipart/form-data
title=Complete Package
mainDocument=@contract.pdf
signature=data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUA...
reference=https://example.com/terms.pdf
Array of Files
Files can be uploaded as arrays:
Schema:
{
"properties": {
"attachments": {
"type": "array",
"items": {
"type": "file"
}
}
}
}
Upload:
{
"title": "Multi-File Document",
"attachments": [
"data:application/pdf;base64,JVBERi0xLjQKJeL...",
"https://example.com/file2.pdf",
"data:image/png;base64,iVBORw0KGgo..."
]
}
Update Operations
File properties work the same way with PUT/PATCH operations:
PUT /index.php/apps/openregister/api/registers/documents/schemas/document/objects/abc-123
Content-Type: multipart/form-data
title=Updated Document
attachment=@new-version.pdf
Note: Updating a file property replaces the previous file.
Error Handling
Invalid MIME Type
{
"error": "File at attachment has invalid type 'application/zip'. Allowed types: application/pdf, application/msword"
}
File Too Large
{
"error": "File at attachment exceeds maximum size (10485760 bytes). File size: 15728640 bytes"
}
Upload Error
{
"error": "Failed to read uploaded file for field 'attachment'"
}
URL Download Failure
{
"error": "Unable to fetch file from URL: https://example.com/missing.pdf"
}
Backward Compatibility
✅ Existing file endpoints remain unchanged:
POST /api/objects/{register}/{schema}/{id}/filesGET /api/objects/{register}/{schema}/{id}/filesDELETE /api/objects/{register}/{schema}/{id}/files/{fileId}
Both approaches work and can be used interchangeably.
Performance Comparison
| Method | Speed | File Size | Metadata | Use Case |
|---|---|---|---|---|
| Multipart | Fastest | Original | Preserved | ✅ Recommended for all uploads |
| Base64 | Medium | +33% larger | Lost | ⚠️ Small files only (< 100 KB) |
| URL | Slowest | Original | Preserved | 🐌 External imports only |
Best Practices
-
✅ ALWAYS use Multipart for user uploads
- Users expect filenames to be preserved
- Prevents confusion about generic filenames
-
⚠️ Base64 only for APIs
- When API client doesn't support multipart
- Document that filenames will be lost
- Always use data URI format with MIME type
-
🐌 URLs only for trusted sources
- Use timeout limits (max 30 seconds)
- Validate content-length headers upfront
- Implement retry logic
-
📝 Document your choice
- If using base64 or URL, explain why
- Make users aware of trade-offs
-
🧪 Test performance
- Measure upload times in production
- Monitor failure rates for URL downloads
File Metadata and Tagging
Each file attachment includes rich metadata:
- Basic properties (name, size, type, extension)
- Creation and modification timestamps
- Access and download URLs
- Checksum for integrity verification
- Custom tags for categorization
Tagging System
Files can be tagged with both simple labels and key-value pairs:
- Tags with a colon (':') are treated as key-value pairs and can be used for advanced filtering and organization
Version Control
The system maintains file versions by:
- Tracking file modifications with timestamps
- Preserving checksums to detect changes
- Integrating with the object audit trail system
- Supporting file restoration from previous versions
Security and Access Control
File attachments inherit the security model of their parent objects:
- Files are stored in NextCloud with appropriate permissions
- Share links can be generated for controlled external access
- Access is managed through the OpenRegister user and group system
- Files are associated with the OpenRegister application user for consistent permissions
File Operations
The system supports the following operations on file attachments:
- Retrieving Files
- Updating Files
- Deleting Files
File Preview and Rendering
The system leverages NextCloud's preview capabilities for supported file types:
- Images are displayed as thumbnails
- PDFs can be previewed in-browser
- Office documents can be viewed with compatible apps
- Preview URLs are generated for easy embedding
Integration with Object Lifecycle
File attachments are fully integrated with the object lifecycle:
- When objects are created, their file folders are automatically provisioned
- When objects are updated, file references are maintained
- When objects are deleted, associated files can be optionally preserved or removed
- File operations are recorded in the object's audit trail
Technical Implementation
The file attachment system is implemented through two main service classes:
- FileService: Handles low-level file operations, folder management, and NextCloud integration
- ObjectService: Provides high-level methods for attaching, retrieving, and managing files in the context of objects
These services work together to provide a seamless file management experience within the OpenRegister application.
File Structure
| id | integer Unique identifier of the file in Nextcloud |
| uuid | string Unique identifier for the file |
| filename | string Name of the file |
| downloadUrl | string <uri> Direct download URL for the file |
| shareUrl | string <uri> URL to access the file via share link |
| accessUrl | string <uri> URL to access the file |
| extension | string File extension |
| checksum | string ETag hash for file versioning |
| source | integer Source identifier |
| userId | string ID of the user who owns the file |
| base64 | string Base64 encoded content of the file |
| filePath | string Full path to the file in Nextcloud |
| created | string <date-time> ISO 8601 timestamp when file was first shared |
| updated | string <date-time> ISO 8601 timestamp of last modification |
{- "id": 123,
- "uuid": "123e4567-e89b-12d3-a456-426614174000",
- "filename": "profile.jpg",
- "extension": "jpg",
- "checksum": "abc123",
- "source": 1,
- "userId": "user-12345",
- "base64": "base64encodedstring",
- "filePath": "/files/profile.jpg",
- "created": "2023-02-15T14:30:00Z",
- "updated": "2023-05-20T10:15:00Z"
}How Files are Stored
Open Register provides flexible storage options for files:
1. Nextcloud Storage
By default, files are stored in Nextcloud's file system, leveraging its robust file management capabilities, including:
- Access control
- Versioning
- Encryption
- Collaborative editing
2. External Storage
For larger deployments or specialized needs, files can be stored in:
- Object storage systems (S3, MinIO)
- Content delivery networks
- Specialized document management systems
3. Database Storage
Small files can be stored directly in the database for simplicity and performance.
File Features
1. Versioning
Files maintain version history, allowing you to:
- Track changes over time
- Revert to previous versions
- Compare different versions
2. Access Control
Files inherit access control from their parent objects, ensuring consistent security:
- Users who can access an object can access its files
- Additional file-specific permissions can be applied
- Permissions can be audited
3. Metadata
Files support rich metadata to provide context and improve searchability:
- Standard metadata (creation date, size, type)
- Custom metadata specific to your application
- Extracted metadata (e.g., EXIF data from images)
4. Preview Generation
Open Register can generate previews for common file types:
- Thumbnails for images
- PDF previews
- Document previews
5. Content Extraction
For supported file types, content can be extracted for indexing and search:
- Text extraction from documents
- OCR for scanned documents and images
- Metadata extraction
OpenRegister now includes enhanced text extraction with entity tracking (GDPR), language detection, and language level assessment. See Enhanced Text Extraction & GDPR Entity Tracking for details.
Asynchronous Processing: Text extraction happens in the background after file upload, ensuring:
- Fast uploads: Your file uploads complete instantly without waiting
- Non-blocking: Users don't experience delays during file operations
- Reliable: Background jobs automatically handle retries for failed extractions
- Resource-efficient: Processing happens when resources are available
Text Extraction Options:
OpenRegister supports two text extraction engines:
-
LLPhant (Default) - PHP-based extraction:
- ✓ Native support: TXT, MD, HTML, JSON, XML, CSV
- ○ Library support: PDF, DOCX, DOC, XLSX, XLS (requires PhpOffice, PdfParser)
- ⚠️ Limited: PPTX, ODT, RTF
- ✗ No support: Image files (JPG, PNG, GIF, WebP)
- Best for: Privacy-conscious environments, regular documents
- Cost: Free (included)
-
Dolphin AI - Advanced AI-powered extraction:
- ✓ All document formats with superior quality
- ✓ OCR for scanned documents and images (JPG, PNG, GIF, WebP)
- ✓ Advanced table extraction
- ✓ Formula recognition
- ✓ Multi-language OCR
- Best for: Complex documents, scanned materials, images with text
- Cost: API subscription required
Extraction Scope Options:
- None: Text extraction disabled
- All files: Extract from all uploaded files
- Files in folders: Extract only from files in specific folders
- Files attached to objects: Extract only from files linked to objects (recommended)
Typical Processing Times:
- Text files: < 1 second
- PDFs (LLPhant): 2-10 seconds
- PDFs (Dolphin): 3-15 seconds
- Large documents or OCR: 10-60 seconds
- Images with OCR (Dolphin): 5-20 seconds
You can configure text extraction in Settings → File Configuration. Check extraction status in the file's metadata after upload.
Technical Implementation
Background Job Processing:
Text extraction uses Nextcloud's background job system for reliable, async processing:
- File Upload - User uploads a file
- Job Queuing - 'FileChangeListener' automatically queues 'FileTextExtractionJob'
- Job Execution - Background job system processes the file when resources are available
- Text Extraction - Selected extractor (LLPhant or Dolphin) processes the file
- Chunking - Text is automatically split into chunks with overlap (1000 chars per chunk, 200 char overlap)
- Storage - Extracted text and chunks stored in 'FileText' entity for reuse
- Completion - Status updated to 'completed' or 'failed'
Note: Text extraction is now fully independent of SOLR. Chunks are generated during extraction and stored in the database, making them reusable for SOLR indexing, vector embeddings, AI processing, or any other service that needs chunked text.
File Type Compatibility Matrix:
LLPhant Support:
- ✓ Native (TXT, MD, HTML, JSON, XML, CSV) - Perfect quality, very fast
- ○ Library (PDF, DOCX, DOC, XLSX, XLS) - Good quality, medium speed
- ⚠️ Limited (PPTX, ODT, RTF) - Basic text only, use Dolphin for better results
- ✗ No Support (JPG, PNG, GIF, WebP) - Requires Dolphin with OCR
Dolphin AI Support:
- ✓ All formats with superior quality
- ✓ OCR for scanned documents and images
- ✓ Table extraction with structure preserved
- ✓ Formula recognition (LaTeX format)
- ✓ Multi-language support
- ✓ Layout understanding (multi-column, etc.)
OCR-Specific Use Cases (Dolphin only):
- Document Digitization - Scanning paper archives into searchable text
- Receipt Processing - Photo receipts from mobile devices
- Screenshot Analysis - Extract text from application screenshots
- Infographic Text - Extract text from images with embedded text
- Historical Documents - Digitize old scanned materials
Quality Requirements for OCR:
- Minimum: 150 DPI resolution
- Recommended: 300+ DPI
- Clear, high-contrast images
- Minimal blur or distortion
- Properly oriented (not rotated)
Extraction Configuration Options:
Configure in Settings → File Configuration:
-
Text Extractor Selection:
- LLPhant (default) - Local, free, privacy-friendly
- Dolphin - Advanced AI, requires API key
-
Extraction Scope:
- None - Disabled
- All files - Every uploaded file
- Files in folders - Specific folders only
- Files attached to objects - Only object attachments (recommended)
-
Extraction Mode:
- Background (default) - Async via background jobs
- Immediate - Synchronous during upload (slower)
- Manual - Triggered by admin action only
-
Enabled File Types:
- Select which file extensions to process
- Different for LLPhant vs Dolphin
- Enable OCR formats (images) only if using Dolphin
Integration Tests:
The file text extraction system includes comprehensive integration tests:
# Run file extraction tests
vendor/bin/phpunit tests/Integration/FileTextExtractionIntegrationTest.php
# Test cases covered:
# - File upload queues background job
# - Background job execution completes
# - Text extraction end-to-end with content verification
# - Multiple file format support (TXT, MD, JSON)
# - Extraction metadata recording (status, method, timestamps)
Monitoring Extraction:
Check extraction status via logs:
# Watch extraction progress
docker logs -f nextcloud-container | grep FileTextExtractionJob
# Check for errors
docker logs nextcloud-container | grep 'extraction failed'
# View extraction statistics
# Settings → File Configuration → Statistics section
Files Management Page
The Files page provides a centralized view of all files tracked in the text extraction system.
Accessing the Files Page:
Navigate to Files in the main menu to view all files with their extraction status.
Features:
-
File List Table:
- File name and path
- File type and size
- Extraction status (Pending, Processing, Completed, Failed)
- Number of text chunks created
- Last extraction timestamp
-
Status Indicators:
- 🟠 Pending: File discovered but not yet extracted
- 🔵 Processing: Extraction in progress
- 🟢 Completed: Successfully extracted
- 🔴 Failed: Extraction error occurred
-
File Actions:
- Retry: Re-extract failed files
- View Error: See detailed error message for failed extractions
-
Pagination:
- Browse through large file lists (50 files per page)
- Navigate between pages
-
Refresh:
- Update the list to see latest extraction status
Use Cases:
- Monitor extraction progress across all files
- Identify and retry failed extractions
- View error details for troubleshooting
- Verify which files have been processed
Core File Extraction API:
OpenRegister provides dedicated API endpoints for file text extraction (moved from settings to core functionality):
GET /api/files- List all tracked files with extraction statusGET /api/files/{id}- Get single file extraction informationPOST /api/files/{id}/extract- Extract text from specific filePOST /api/files/extract- Extract all pending files (batch processing)POST /api/files/retry-failed- Retry all failed extractionsGET /api/files/stats- Get extraction statistics
Smart Re-Extraction:
The system automatically detects when files need re-extraction by comparing:
- File modification time ('mtime' from Nextcloud's 'oc_filecache')
- Last extraction time ('extractedAt' from 'oc_openregister_file_texts')
If 'mtime > extractedAt', the file is re-extracted to ensure content is up-to-date.
File Tracking Table:
Extracted text and metadata are stored in 'oc_openregister_file_texts' with:
- 'file_id' - Links to Nextcloud's 'oc_filecache' table
- 'extraction_status' - pending, processing, completed, failed
- 'extractedAt' - Timestamp of last extraction
- 'text_content' - Full extracted text
- 'text_length' - Character count
- 'chunked' - Whether text has been chunked
- 'chunk_count' - Number of chunks created
- 'chunks_json' - JSON array of text chunks with offsets (new in v0.2.7)
- 'extraction_method' - LLPhant or Dolphin
- Plus SOLR indexing and vectorization tracking
Chunking Details: Each chunk in 'chunks_json' contains the chunk text, start offset, and end offset. This allows for precise text retrieval and consistent chunking across all services.
Working with Files
Uploading Files
Files can be uploaded and attached to objects:
POST /api/objects/{id}/files
Content-Type: multipart/form-data
file: [binary data]
metadata: {"author": "Legal Department", "securityLevel": "confidential"}
Retrieving Files
You can download a file:
GET /api/files/{id}
Or get file metadata:
GET /api/files/{id}/metadata
Listing Files for an Object
You can retrieve all files associated with an object:
GET /api/objects/{register}/{schema}/{id}/files
Pagination
The listing endpoint supports standard pagination parameters:
| Parameter | Default | Minimum | Maximum | Notes |
|---|---|---|---|---|
_page | 1 | 1 | — | Page number |
_limit | 30 | 1 | none | Values below 1 are clamped to 1; no upper ceiling — _limit=5000 is honoured |
Per-object attachment counts are the natural bound for this endpoint, so there is no artificial cap on _limit. Call sites that want a fixed page size should set _limit explicitly.
Lock-aware response (authenticated callers)
When the request is made by an authenticated user, each file entry in the response carries two additional fields that expose Nextcloud lock state:
{
"id": 42,
"title": "report.pdf",
"locked": true,
"lock": {
"type": "user",
"scope": "exclusive",
"owner": "alice",
"createdAt": "2026-04-21T10:00:00+00:00",
"expiresAt": "2026-04-21T10:30:00+00:00"
}
}
| Field | Description |
|---|---|
locked | true if the file has an active Nextcloud lock, false otherwise |
lock.type | Lock type — one of "user", "app", "token" |
lock.scope | WebDAV lock scope — one of "exclusive", "shared" |
lock.owner | User id (for user/token types) or app id (for app type) |
lock.createdAt | ISO 8601 timestamp of when the lock was acquired |
lock.expiresAt | ISO 8601 timestamp of expiry, or null if the lock has no timeout |
When Nextcloud's lock provider is not available (the files_lock app is disabled), locked is false and lock is omitted.
Anonymous callers
When the request has no authenticated session (e.g. a public object fetched without credentials), both locked and lock are omitted from every entry. This prevents anonymous callers from observing who holds a lock or what apps are editing the file.
Locked-file resilience
A single Nextcloud-locked file no longer crashes the listing. If any file raises a LockedException during formatting, the endpoint returns HTTP 200 with a minimal envelope for that entry instead of failing the whole request:
{
"id": 42,
"title": "report.pdf",
"locked": true,
"lock": { "type": "user", "scope": "exclusive", "owner": "alice", "createdAt": "...", "expiresAt": null },
"error": "locked"
}
For anonymous callers, the stub carries only {id, title, error: "locked"} — no locked, no lock. A structured info-level log line is emitted server-side for each locked file encountered, so operators can still observe lock contention.
Updating Files
Files can be updated in two ways:
1. Update File Content
Upload a new version of the file:
PUT /api/objects/{register}/{schema}/{objectId}/files/{fileId}
Content-Type: application/json
{
'content': '[base64 encoded content or raw content]',
'tags': ['tag1', 'tag2']
}
2. Update Metadata Only
Update only the file metadata (tags) without changing content:
PUT /api/objects/{register}/{schema}/{objectId}/files/{fileId}
Content-Type: application/json
{
'tags': ['updated-tag1', 'updated-tag2']
}
Note: The 'content' parameter is optional. If omitted, only the metadata will be updated without modifying the file content itself.
Deleting Files
Files can be deleted when no longer needed:
DELETE /api/files/{id}
File Relationships
Files have important relationships with other core concepts:
Files and Objects
- Files are attached to objects
- An object can have multiple files
- Files inherit permissions from their parent object
- Files are versioned alongside their parent object
Files and Schemas
- Schemas can define expectations for file attachments
- File validation can be specified in schemas (allowed types, max size)
- Schemas can define required file attachments
Files and Registers
- Registers can be configured with different file storage options
- File storage policies can be defined at the register level
- Registers can have quotas for file storage
Use Cases
1. Document Management
Attach important documents to business objects:
- Contracts to customer records
- Invoices to order records
- Specifications to product records
2. Media Management
Store and manage media assets:
- Product images
- Marketing materials
- Training videos
3. Evidence Collection
Maintain evidence for regulatory or legal purposes:
- Compliance documentation
- Audit evidence
- Legal case files
4. Technical Documentation
Manage technical documents:
- User manuals
- Technical specifications
- Installation guides
Advanced File Features
1. Auto-Share Configuration
File properties can be configured to automatically share uploaded files publicly. This is useful for assets that need to be accessible without authentication, such as product images or public documents.
Configuration via UI
When editing a schema in the OpenRegister UI:
- Select a property with type 'file' or 'array' with items type 'file'
- In the property actions menu, expand the 'File Configuration' section
- Check the 'Auto-Share Files' checkbox
- Save the schema
Files uploaded to this property will now be automatically publicly shared.
Configuration via API
In your schema definition, add the 'autoPublish' option to file properties:
{
'properties': {
'productImage': {
'type': 'file',
'autoPublish': true,
'allowedTypes': ['image/jpeg', 'image/png'],
'maxSize': 5242880
}
}
}
When 'autoPublish' is set to 'true', files uploaded to this property will automatically:
- Create a public share link
- Set the 'published' timestamp
- Generate a public 'accessUrl' and 'downloadUrl'
Important: Property-Level vs Schema-Level autoPublish
⚠️ Don't confuse these two different 'autoPublish' settings:
1. Property-Level autoPublish (this section):
{
'properties': {
'productImage': {
'type': 'file',
'autoPublish': true // ← Controls if FILES are published
}
}
}
Controls whether files uploaded to this specific property are automatically shared publicly.
2. Schema-Level autoPublish (different setting):
{
'configuration': {
'autoPublish': true // ← Controls if OBJECTS are published
}
}
Controls whether the object entity itself is published (has nothing to do with file sharing).
These are completely separate settings with different purposes. Setting one does NOT affect the other.
Example Response
{
'id': '12345',
'title': 'Product A',
'productImage': {
'id': 789,
'title': 'product-a.jpg',
'accessUrl': 'https://your-domain.com/index.php/s/AbCdEfG123',
'downloadUrl': 'https://your-domain.com/index.php/s/AbCdEfG123/download',
'published': '2024-01-15T10:30:00+00:00',
'size': 245678,
'type': 'image/jpeg'
}
}
2. Authenticated File Access
Files that are not publicly shared still have 'accessUrl' and 'downloadUrl' properties, but these URLs require authentication. This allows frontend applications to:
- Display file previews for logged-in users
- Provide download links that work within authenticated sessions
- Maintain security while offering convenient access
Authenticated URLs
Non-shared files return URLs with the following format:
- Access URL:
/index.php/core/preview?fileId={fileId}&x=1920&y=1080&a=1 - Download URL:
/index.php/apps/openregister/api/files/{fileId}/download
These URLs require the user to be authenticated to Nextcloud.
Example Response (Non-Shared File)
{
'attachment': {
'id': 456,
'title': 'confidential-report.pdf',
'accessUrl': 'https://your-domain.com/index.php/core/preview?fileId=456&x=1920&y=1080&a=1',
'downloadUrl': 'https://your-domain.com/index.php/apps/openregister/api/files/456/download',
'published': null,
'size': 1234567,
'type': 'application/pdf'
}
}
3. Logo/Image Metadata from File Properties
When a schema is configured to extract metadata fields like 'image' or 'logo' from file properties, the system automatically extracts the public share URL (or authenticated URL if not shared) and stores it in the object metadata.
Configuration
{
'properties': {
'logo': {
'type': 'file',
'allowedTypes': ['image/png', 'image/jpeg'],
'autoPublish': true
}
},
'configuration': {
'objectImageField': 'logo'
}
}
Result
The object's '@self.image' field will contain the share URL:
{
'id': '12345',
'title': 'Company A',
'logo': {
'id': 789,
'accessUrl': 'https://your-domain.com/index.php/s/XyZ789',
'type': 'image/png'
},
'@self': {
'name': 'Company A',
'image': 'https://your-domain.com/index.php/s/XyZ789'
}
}
This makes it easy to display company logos, product images, or other visual metadata in listings and search results.
4. File Deletion via API
Files can be deleted by setting the file property to 'null' (for single file properties) or an empty array (for array file properties).
Single File Deletion
PUT /api/objects/{register}/{schema}/{id}
Content-Type: application/json
{
'title': 'Updated Title',
'attachment': null
}
This will:
- Delete the file from Nextcloud storage
- Remove the file record from the database
- Set the 'attachment' property to 'null' in the object data
File Array Deletion
PUT /api/objects/{register}/{schema}/{id}
Content-Type: application/json
{
'title': 'Updated Gallery',
'images': []
}
This will:
- Delete all files in the array from Nextcloud storage
- Remove all file records from the database
- Set the 'images' property to an empty array in the object data
Use Cases
- Privacy Compliance: Remove sensitive files upon user request
- Storage Management: Clean up unused files
- Data Lifecycle: Remove temporary or expired files
- Error Correction: Remove incorrectly uploaded files
Executable File Blocking
OpenRegister automatically blocks executable files from being uploaded for security reasons. This prevents malicious code execution and protects your Nextcloud instance.
What is Blocked
Blocked File Types
Windows Executables
.exe,.bat,.cmd,.com,.msi,.scr.vbs,.vbe,.js,.jse,.wsf,.wsh.ps1(PowerShell),.dll
Unix/Linux Executables
.sh,.bash,.csh,.ksh,.zsh.run,.bin,.app.deb,.rpm(package files)
Scripts & Code
.php,.phtml,.php3,.php4,.php5,.phps,.phar.py,.pyc,.pyo,.pyw(Python).pl,.pm,.cgi(Perl).rb,.rbw(Ruby).jar,.war,.ear,.class(Java)
Containers & Packages
.appimage,.snap,.flatpak.dmg,.pkg,.command(macOS).apk(Android)
Binary Formats
.elf,.out,.o,.so,.dylib
Detection Methods
OpenRegister uses multiple layers of detection:
1. File Extension Check
Checks the file extension against a blacklist of dangerous extensions.
2. Magic Bytes Detection
Checks the first bytes of the file content for executable signatures:
MZ- Windows PE/EXE files\x7FELF- Linux/Unix ELF executables#!/bin/sh- Shell scripts#!/bin/bash- Bash scripts<?php- PHP scripts\xCA\xFE\xBA\xBE- Java class files
3. MIME Type Validation
Blocks dangerous MIME types:
application/x-executableapplication/x-dosexecapplication/x-msdownloadapplication/x-shapplication/x-phptext/x-shellscript- And more...
Default Behavior: Blocked
By default, ALL executable files are blocked.
POST /api/registers/docs/schemas/document/objects
{
"title": "My Document",
"attachment": "script.sh" // ❌ BLOCKED!
}
Response:
{
"error": "File at attachment is an executable file (.sh). Executable files are blocked for security reasons. Allowed formats: documents, images, archives, data files."
}
Explicit Allow (Not Recommended)
If you absolutely need to allow executables (e.g., for a software repository), you can set allowExecutables: true in your schema:
{
"properties": {
"softwarePackage": {
"type": "file",
"allowExecutables": true, // ⚠️ DANGEROUS!
"allowedTypes": ["application/x-deb"] // Still enforce MIME type
}
}
}
⚠️ WARNING: Only use allowExecutables: true if:
- You absolutely trust the source of files
- Users are administrators only
- You have other security measures in place (virus scanning, sandboxing)
- You understand the security risks
Examples
✅ Safe Files (Allowed by Default)
# Documents
curl -X POST '/api/registers/docs/schemas/document/objects' \
-F 'title=Report' \
-F 'attachment=@report.pdf' # ✅ OK
# Images
curl -X POST '/api/registers/docs/schemas/document/objects' \
-F 'title=Photo' \
-F 'image=@photo.jpg' # ✅ OK
# Archives
curl -X POST '/api/registers/docs/schemas/document/objects' \
-F 'title=Data' \
-F 'data=@archive.zip' # ✅ OK (ZIPs are allowed unless they're JARs)
❌ Blocked Files (Default)
# Windows executable
curl -X POST '/api/registers/docs/schemas/document/objects' \
-F 'title=Software' \
-F 'file=@program.exe' # ❌ BLOCKED
# Shell script
curl -X POST '/api/registers/docs/schemas/document/objects' \
-F 'title=Script' \
-F 'file=@setup.sh' # ❌ BLOCKED
# PHP script
curl -X POST '/api/registers/docs/schemas/document/objects' \
-F 'title=Code' \
-F 'file=@index.php' # ❌ BLOCKED
🎭 Bypassing Attempts (Also Blocked!)
OpenRegister detects renamed executables:
# Renamed EXE to TXT - Still blocked by magic bytes!
mv malware.exe document.txt
curl -X POST '/api/.../' -F 'file=@document.txt' # ❌ BLOCKED (magic bytes: MZ)
# PHP file renamed to JPG - Still blocked!
mv shell.php image.jpg
curl -X POST '/api/.../' -F 'file=@image.jpg' # ❌ BLOCKED (detects <?php)
Schema Configuration
Basic File Upload (Executables Blocked)
{
"slug": "document",
"properties": {
"title": {
"type": "string"
},
"attachment": {
"type": "file",
"allowedTypes": ["application/pdf", "application/msword"],
"maxSize": 10485760 // 10MB
// allowExecutables defaults to false
}
}
}
Security Recommendations
1. Keep Executables Blocked (Default)
DO:
- ✅ Use the default behavior (block executables)
- ✅ Only allow documents, images, archives
- ✅ Combine with virus scanning (ClamAV)
DON'T:
- ❌ Set
allowExecutables: trueunless absolutely necessary - ❌ Allow untrusted users to upload files to executable-allowed schemas
- ❌ Assume file extensions are safe
2. Layer Your Security
Even with executable blocking, use additional security:
3. Monitor and Log
All blocked uploads are logged:
# Check logs for blocked attempts
docker logs master-nextcloud-1 | grep "Executable file upload blocked"
Performance Impact
Minimal!
- Extension check: < 0.1ms
- Magic bytes check: < 1ms (only checks first 1KB)
- MIME type check: < 0.1ms
Total overhead: ~1-2ms per file upload
Frequently Asked Questions
Q: Can I upload ZIP files?
A: ✅ Yes! ZIP files (.zip) are allowed by default. Only executable ZIPs like JARs are blocked.
Q: What about JavaScript files (.js)? A: ❌ Blocked by default (can be executed in browsers). Use JSON or TXT for data.
Q: Can I upload Python notebooks (.ipynb)?
A: ✅ Yes! .ipynb is JSON format, not an executable. Allowed by default.
Q: What if I need to share code files? A: Use:
- Text files (
.txt) with code inside - Archives (
.zipor.tar.gz) containing code - Git repositories
- Dedicated code hosting (GitHub, GitLab)
Q: Does this protect against all malware? A: No! This blocks known executable formats. Malicious documents (PDF with exploits, Office macros) need virus scanning. Use ClamAV for complete protection.
Best Practices
- Define File Types: Establish clear guidelines for what file types are allowed
- Set Size Limits: Define appropriate size limits for different file types
- Use Metadata: Add relevant metadata to improve searchability and context
- Consider Storage: Choose appropriate storage backends based on file types and access patterns
- Implement Retention Policies: Define how long files should be kept
- Plan for Backup: Ensure files are included in backup strategies
- Consider Performance: Optimize file storage for your access patterns
- Use Auto-Publish Wisely: Only enable property-level 'autoPublish' for files that should be publicly accessible. Remember: property 'autoPublish' (file sharing) is different from schema 'autoPublish' (object publishing)
- Document File Deletion: Maintain audit trails when files are deleted for compliance
- Handle Authentication: Use authenticated URLs for sensitive files
- Keep Executables Blocked: Use the default executable blocking behavior unless absolutely necessary
- Layer Security: Combine executable blocking with virus scanning for complete protection
Conclusion
Files in Open Register bridge the gap between structured data and unstructured content, providing a comprehensive solution for managing all types of information in your application. With advanced features like auto-sharing, authenticated access, metadata extraction, and flexible deletion options, Open Register creates a unified system where all your data—structured and unstructured—works together seamlessly.
Technical Architecture
This section provides detailed visualization of the file handling system's architecture and data flow.
File Upload and Processing Flow
File Property Processing Pipeline
Text Extraction Process
File Storage Architecture
File Type Compatibility Matrix
File Text Extraction Settings
File Chunking for Solr
Note: As of v0.2.7, chunking happens during text extraction, not during SOLR indexing. Chunks are stored in the database and reused.
Performance Characteristics
File Upload Performance:
Small files (<1MB): ~100-200ms
Medium files (1-10MB): ~500ms-2s
Large files (>10MB): ~2-10s
Very large (>100MB): ~10-60s
Text Extraction Performance:
LLPhant:
- TXT/MD/HTML: <1s (instant)
- PDF (10 pages): 2-5s (library parsing)
- DOCX: 3-8s (library parsing)
- Images: N/A (not supported)
Dolphin AI:
- TXT/MD/HTML: 1-2s (API latency)
- PDF (10 pages): 5-10s (AI processing)
- DOCX: 4-8s (AI processing)
- Images (OCR): 5-15s (OCR + AI)
Chunking and Indexing:
Text chunking: <100ms for 100KB text (now part of extraction)
Solr indexing: ~50-200ms per document (reads pre-chunked data)
Batch indexing: ~500ms for 100 chunks (faster with pre-chunked data)
Note: Since v0.2.7, chunking is performed once during text extraction and stored in the database. This makes SOLR indexing faster and allows chunks to be reused for vector embeddings, AI processing, or any other service that needs chunked text.
Code Examples
Processing File Upload
use OCA\OpenRegister\Service\FileService;
// Create file from base64
$fileMetadata = $fileService->createFile(
objectEntity: $object,
fileData: [
'content' => 'data:image/jpeg;base64,/9j/4AAQ...',
'tags' => ['profile', 'avatar']
]
);
// Create file from URL
$fileMetadata = $fileService->createFile(
objectEntity: $object,
fileData: [
'url' => 'https://example.com/document.pdf',
'tags' => ['imported', 'external']
]
);
// Access file metadata
$fileId = $fileMetadata['id'];
$shareLinkUrl = $fileMetadata['accessUrl'];
$downloadUrl = $fileMetadata['downloadUrl'];
Text Extraction
use OCA\OpenRegister\Service\FileTextExtractionService;
// Extract text from file
$extractionService->extractText($fileId);
// Get extraction status
$fileText = $fileTextMapper->findByFileId($fileId);
$status = $fileText->getExtractionStatus(); // 'pending', 'processing', 'completed', 'failed'
$text = $fileText->getTextContent();
// Manually trigger extraction
$extractionService->queueExtraction($fileId);
Searching File Content
// Search across file content in Solr
$results = $solrService->searchFiles([
'_search' => 'contract terms',
'mime_type' => 'application/pdf',
'_limit' => 20
]);
// Access chunk results
foreach ($results['hits'] as $hit) {
$fileId = $hit['file_id'];
$chunkIndex = $hit['chunk_index'];
$text = $hit['chunk_text'];
$highlighted = $hit['highlighted_text'];
}
Testing
# Run file handling tests
vendor/bin/phpunit tests/Service/FileServiceTest.php
# Test text extraction
vendor/bin/phpunit tests/Service/FileTextExtractionServiceTest.php
# Test specific scenarios
vendor/bin/phpunit --filter testBase64FileUpload
vendor/bin/phpunit --filter testTextExtraction
vendor/bin/phpunit --filter testFileChunking
# Integration tests
vendor/bin/phpunit tests/Integration/FileIntegrationTest.php
Test Coverage:
- File upload (base64, URL, file object)
- File property processing
- Text extraction (LLPhant, Dolphin)
- Chunking and Solr indexing
- File deletion
- Share link generation
- Auto-tagging