Vectorization Architecture Refactoring
Date: 2024-12-15
Status: ✅ Completed
Summary
The vectorization infrastructure has been refactored to consolidate all vector-related operations under a single namespace with a clear public API facade pattern.
Changes
1. Namespace Consolidation
The vectorization services are organized as follows:
lib/Service/
├── VectorizationService.php (Public API - Facade)
└── Vectorization/
├── VectorEmbeddingService.php (Internal - Handler)
└── Strategies/
├── VectorizationStrategyInterface.php
├── ObjectVectorizationStrategy.php
└── FileVectorizationStrategy.php
Before:
use OCA\OpenRegister\Service\VectorEmbeddingService;
use OCA\OpenRegister\Service\VectorizationService;
use OCA\OpenRegister\Service\Vectorization\ObjectVectorizationStrategy;
After:
use OCA\OpenRegister\Service\VectorizationService;
use OCA\OpenRegister\Service\Vectorization\VectorEmbeddingService;
use OCA\OpenRegister\Service\Vectorization\Strategies\ObjectVectorizationStrategy;
2. Public API Facade
VectorizationService is now the single entry point for all vector operations:
- Other services should call
VectorizationServicemethods VectorEmbeddingServiceis an internal implementation detail- All public methods from
VectorEmbeddingServiceare exposed viaVectorizationService
Example:
// Before (❌ DON'T)
$vectorEmbeddingService->semanticSearch($query, $limit);
// After (✅ DO)
$vectorizationService->semanticSearch($query, $limit);
3. Facade Methods
VectorizationService exposes these public methods:
vectorizeBatch()- Batch vectorization of entitiesregisterStrategy()- Register entity strategiesgenerateEmbedding()- Generate single embeddingsemanticSearch()- Perform semantic searchhybridSearch()- Perform hybrid search (SOLR + vectors)getVectorStats()- Get vector statisticstestEmbedding()- Test embedding configurationcheckEmbeddingModelMismatch()- Check model consistencyclearAllEmbeddings()- Clear all embeddings
Benefits
1. Single Entry Point
All vector operations go through one service - clear API boundary.
2. Encapsulation
VectorEmbeddingService is private implementation - can be swapped/refactored without affecting consumers.
3. Better Organization
All vectorization code in one namespace, easier to navigate.
4. Future-Proof
Easy to add new embedding providers or change storage backend without breaking API.
Migration Guide
For Service Developers
If your service uses VectorEmbeddingService, update it:
- Update imports:
// Old
use OCA\OpenRegister\Service\VectorEmbeddingService;
// New
use OCA\OpenRegister\Service\VectorizationService;
- Update constructor:
// Old
public function __construct(
private VectorEmbeddingService $vectorService
) {}
// New
public function __construct(
private VectorizationService $vectorizationService
) {}
- Update method calls: All methods remain the same, just use the new service instance.
For Controller Developers
Same as above - update imports and use VectorizationService instead of VectorEmbeddingService.
For Test Developers
Update test files to use new namespaces:
use OCA\OpenRegister\Service\VectorizationService;
use OCA\OpenRegister\Service\Vectorization\VectorEmbeddingService;
Architecture Diagram
┌─────────────────────────────────────────────────────┐
│ Controllers / Services │
│ (ChatService, SettingsController, etc.) │
└─────────────────┬───────────────────────────────────┘
│
│ Call public API
▼
┌─────────────────────────────────────────────────────┐
│ VectorizationService (Public Facade) │
│ - vectorizeBatch() │
│ - semanticSearch() │
│ - hybridSearch() │
│ - getVectorStats() │
│ - testEmbedding() │
└─────────────────┬───────────────────────────────────┘
│
│ Delegates to
▼
┌─────────────────────────────────────────────────────┐
│ VectorEmbeddingService (Internal Handler) │
│ - generateEmbedding() │
│ - storeVector() │
│ - semanticSearch() │
│ - LLM provider management │
└─────────────────┬───────────────────────────────────┘
│
│ Uses
▼
┌─────────────────────────────────────────────────────┐
│ Strategies (Entity-specific logic) │
│ - ObjectVectorizationStrategy │
│ - FileVectorizationStrategy │
└─── ──────────────────────────────────────────────────┘
Files Changed
Services
lib/Service/Vectorization/VectorizationService.php- Moved and updatedlib/Service/Vectorization/VectorEmbeddingService.php- Movedlib/Service/Vectorization/Strategies/*.php- Moved
Controllers
lib/Controller/SettingsController.php- Updated to use VectorizationServicelib/Controller/Settings/LlmSettingsController.php- Updatedlib/Controller/Settings/VectorSettingsController.php- Updatedlib/Controller/SolrController.php- Updatedlib/Controller/FileSearchController.php- Updatedlib/Controller/FileExtractionController.php- Updated
Other Services
lib/Service/ChatService.php- Updated to use VectorizationServicelib/AppInfo/Application.php- Updated dependency injection
Testing
All existing tests should continue to work with updated imports. No behavioral changes were made.
To verify:
composer test:unit
composer phpqa