Build semantic search corpora from your documents
Turn any PDF into a deployable RAG corpus package — no terminal, no Python, no manual file management. On-device machine learning generates embeddings entirely on your Mac. Built for developers and content curators building knowledge-first apps.
A complete pipeline from raw PDF to production-ready corpus, built on Apple's native frameworks.
Built for people who work with knowledge — and the developers who build for them.
No Python, no heavy dependencies. Native Mac performance throughout.
| PDF extraction | PDFKit |
| Embeddings | Core ML · MiniLM-L6 |
| Vector search | Accelerate vDSP |
| Persistence | SwiftData |
| Export signing | CryptoKit |
| Platform | macOS 14+ |
Summary: CorpusKit Studio is a privacy-first application. All processing happens on your device, no data is transmitted to any server, and you have complete control over your documents and data.
CorpusKit Studio does not collect, transmit, or store any of the following:
All data processed by CorpusKit Studio is stored locally on your Mac at:
~/Documents/CorpusKitStudio/
This includes PDF documents you import, extracted text chunks, Core ML embeddings, highlights and annotations, and exported corpus bundles.
The app uses an on-device machine learning model (MiniLM via Core ML) to generate text embeddings. This processing happens entirely on your Mac — no data is sent to external servers.
CorpusKit Studio does not integrate with any third-party services, analytics platforms, or advertising networks.
CorpusKit Studio requests the following macOS permissions:
These permissions are required for core functionality and are managed by macOS's security system.
Since all data is stored locally, you have complete control. Delete the app and the CorpusKitStudio folder to remove all data. Use the app's export feature to create portable corpus bundles at any time.
Questions about this privacy policy: support@robroy.online