Projects & uploads
How to organise documents into projects, what formats are supported, how to update or remove files safely.
What a project is (and isn't)
A project is a private container for a set of related documents. Two things hang off projects:
- Scope. Searches and agents can be restricted to one or more projects, so a Sales agent only ever reads the bid library and never accidentally pulls from HR docs.
- Access. Members are added to projects with a role (owner / editor / viewer). A non-admin member sees nothing outside their projects.
Projects are not a folder hierarchy — there's no nesting. If you find yourself wanting sub-projects, that's usually a sign two separate projects are the right answer.
How to name them
Good project names describe what's inside, not where or when:
- Good: Bid library, Trust & security, Engineering ADRs, Acme Corp account, HR handbook.
- Worse: PDFs, Q3 2024, OneDrive, Stuff.
Supported file types
| Type | What gets extracted | Citation metadata |
|---|---|---|
Text via pdftotext; OCR via Tesseract for scanned PDFs | Page number | |
| DOCX | Paragraphs and headings | Heading section |
| XLSX | Every cell, sheet by sheet | Sheet name + row index |
| PPTX | Slide titles and body text | Slide number |
| CSV / TSV | Every row | Row index |
| TXT / MD | Paragraphs | — |
| PNG / JPG / WEBP | OCR text via Tesseract | — |
Default size limit per file is 64 MB (configurable per-tenant on Business / Enterprise plans).
About OCR
Scanned PDFs and standalone images run through Tesseract OCR. The default language pack is English + French; ask support if you need more (German, Spanish, Polish, Russian, Ukrainian, and most European languages are installable on shared hosting in a few minutes).
OCR is slower than plain text extraction — a 50-page scanned PDF can take 1-3 minutes versus 10 seconds for a text PDF of the same size. Once processed, search performance is identical.
Uploading
Open the Upload tab in the sidebar. You can:
- Drag & drop one file or many onto the drop zone.
- Pick a file via the system dialog.
- Pick one or more destination projects first — the file gets linked to all of them. A document can live in multiple projects (e.g. a company-wide policy document appearing in both "HR handbook" and "Compliance evidence").
Each upload becomes a row in the Jobs view. Status progresses
queued → extracting → embedding → completed. A 10-page PDF typically takes 5-10
seconds end-to-end. Failed jobs stay visible so you can read the error and retry.
Updating a document
Knowledge doesn't have an in-place "replace this file" button by design — versions can drift silently and citations get confusing. The honest pattern is:
- Upload the new version with a name that distinguishes it (employee_handbook_v3.pdf rather than overwriting employee_handbook.pdf).
- Wait for it to finish processing.
- Delete the old version from the Library view. Cascade: chunks and embeddings go too, so the old text never appears in future search results.
If two versions of the same document are in the same project, searches may pull passages from either — exactly the confusion we're trying to avoid. Keep one canonical version live at a time unless you're deliberately doing a comparison.
Deleting documents
Library tab → tick the row → Delete selected. Or open one document → trash icon. Deletion is immediate and cascades:
- Document row removed.
- All its chunks and embeddings removed.
- The on-disk file in
private/uploads/<uuid>/removed. - Any thread that cited it keeps the citation reference in its history, but clicking the citation will say "source no longer available."
Deletion is not undoable. There's no trash bin / 30-day recovery window — bytes are gone the moment the worker processes the delete. If you need archive-then-delete, download the file first from the document detail view.
Moving a document between projects
Library tab → open a document → Edit projects button → tick / untick. The document itself stays put; only its project memberships change. Useful for documents that outgrow their original project ("this used to be HR-only, now Compliance needs it too").
What admins see vs members
- Admins see every project, every document, every job in the tenant.
- Members see only projects they're a member of and the documents inside those projects. Searches scoped to "all my projects" stop at this boundary.
This is enforced at the database level — not just hidden in the UI — so it's safe to expose the workspace to people who shouldn't see everything.