🧠

CorpusKit Studio

Build semantic search corpora from your documents

Universal app — one purchase for iPhone, iPad & Mac

Turn any PDF or EPUB into a deployable RAG corpus package — no terminal, no Python, no manual file management. On-device machine learning generates embeddings entirely on-device. Built for developers and content curators building knowledge-first apps.

Everything you need. Nothing you don't.

A complete pipeline from raw PDF or EPUB to production-ready corpus, built on Apple's native frameworks.

📄

PDF & EPUB Import

Drag in any PDF or EPUB. CorpusKit Studio extracts text using Apple's native frameworks — no third-party tools or Python required.

🧩

Smart Text Chunking

Configurable chunk size and overlap in pure Swift. Preview your chunks before embedding to dial in the right settings for your content.

⚡️

On-Device Embeddings

MiniLM-L6-v2 runs via Core ML on your device's Neural Engine — fast, private, and shared across the CorpusKit ecosystem.

🔍

Live Retrieval Testing

Query your corpus in real time before you ship. See ranked results with cosine similarity scores to validate your chunking strategy.

✏️

Highlight & Rate

Read your source document inside the app. Highlight passages and rate their importance — curator signals travel with the exported corpus.

📦

Signed .corpus Export

Export a signed bundle consumable by any CorpusKit iOS app. Includes chunks, embeddings, metadata, and your curation data.

Who it's for

Built for people who work with knowledge — and the developers who build for them.

Developers

Building RAG / AI Apps

Stop wrangling Python pipelines to create embeddings. CorpusKit Studio is a native app that produces corpus bundles your iOS app can consume directly — no backend required.

Content Curators

Researchers & Knowledge Workers

Highlight the passages that matter, rate their importance, and export a corpus that reflects your expertise — not just raw text extraction.

Organizations

Private Document Collections

Process sensitive documents entirely on-device. No data leaves your device during embedding. Distribute corpora to your team without a cloud intermediary.

🔒

All Local

Every document, chunk, and embedding stays in the app's Documents folder — visible via Finder on Mac (~/Documents/CorpusKitStudio/) or the Files app on iPhone and iPad. Nothing is sent to external servers.

🧠

On-Device ML

MiniLM embeddings run via Core ML on your Neural Engine. No API calls, no cloud inference, no data leaves your device.

🗝

No Account Required

Download and use immediately. No sign-up, no email, no subscription. Sensitive settings stored in Keychain.

📦

App Sandbox

The app operates within Apple's security sandbox. File access is managed by the system — only files you explicitly open are accessible.

Built on Apple's Frameworks

No Python, no heavy dependencies. Native performance throughout.

Document import	PDFKit (PDF) · EPUB parser
Embeddings	Core ML · MiniLM-L6
Vector search	Accelerate vDSP
Persistence	SwiftData
Export signing	CryptoKit
Platform	iOS 26+ · macOS 15.6+

Requirements

iPhone, iPad, or Mac
iOS 26 or later · macOS 15.6 or later
Available on the App Store
No internet connection required

Privacy Policy · Effective Date: December 2024 · Last Updated: December 2024

Summary: CorpusKit Studio is a privacy-first application. All processing happens on your device, no data is transmitted to any server, and you have complete control over your documents and data.

What We Collect

CorpusKit Studio does not collect, transmit, or store any of the following:

Personal identification information
User account data or contact information
Usage analytics or crash reports
Device identifiers

Local Data Storage

All data processed by CorpusKit Studio is stored locally in the app's Documents folder:

~/Documents/CorpusKitStudio/ on Mac, visible via the Files app under On My iPhone/iPad on iOS.

This includes PDF and EPUB documents you import, extracted text chunks, Core ML embeddings, highlights and annotations, and exported corpus bundles.

Machine Learning

The app uses an on-device machine learning model (MiniLM via Core ML) to generate text embeddings. This processing happens entirely on-device — no data is sent to external servers.

Third-Party Services

CorpusKit Studio does not integrate with any third-party services, analytics platforms, or advertising networks.

File Access Permissions

CorpusKit Studio requests the following permissions:

File Access — to read PDF and EPUB files you select for import
Export Location — to save exported corpus bundles to a folder you choose (Downloads or another folder on Mac; the Files app on iPhone/iPad)

These permissions are required for core functionality and are managed by the operating system's security sandbox.

Your Rights

Since all data is stored locally, you have complete control. Delete the app and the CorpusKitStudio folder to remove all data. Use the app's export feature to create portable corpus bundles at any time.

Contact

Questions about this privacy policy: support@robroy.online

Open Source Components

MiniLM (via Hugging Face) — Apache 2.0 License
Apple Frameworks (SwiftUI, PDFKit, Core ML, Accelerate) — Apple EULA