How to Upload a ZIP File as a Knowledge Base Source
ZIP uploads let you bulk-import dozens of text files — markdown docs, CSVs, JSON configs, HTML pages — in a single step. Appalix unpacks the archive, reads every supported file, and embeds the content so your bot can answer from all of it instantly.
What you'll need
- An Appalix account on the Pro plan or above
- A .zip archive containing your text files (max 50 MB compressed)
- A configured bot with at least one source slot available
1. Prepare your ZIP file
Package the files you want indexed into a single .zip archive. Subdirectory structure is fine — Appalix walks the entire archive recursively.
Keep total uncompressed size under 50 MB. The compressed ZIP itself can be much smaller — only the expanded text content counts toward the limit.
Tip — what to put inside
- Export docs from Confluence or Notion as HTML / Markdown and ZIP them
- ZIP a folder of CSVs (product catalogue, FAQ pairs, pricing tables)
- Bundle static help-centre pages exported from your CMS
- Combine multiple JSON knowledge files into one archive
2. Add a new source
In the Appalix dashboard, go to Sources and click Add source. Select the PDF / Word / ZIP tile — this is the same tile used for PDFs, Word docs, and PowerPoints.
3. Upload the ZIP and submit
Click Choose file, select your .zip, and wait for the Done indicator. Enter a name for the source, then click Add & index source.
The file uploads directly to secure cloud storage — it never passes through Vercel's 4.5 MB serverless limit — so even large archives upload reliably.
4. Verify the source is ready
Return to the Sources list. Once ingestion finishes, the source will show a green Ready badge and a chunk count. Each readable file inside the ZIP becomes one or more chunks your bot can retrieve.
What Appalix reads from your ZIP
Appalix only extracts files with these extensions. Everything else is silently skipped — no errors, no partial reads.
| Extension | Format | How it's indexed |
|---|---|---|
.txt | Plain text | Read as-is. Great for FAQs, policies, and notes. |
.md | Markdown | Read as plain text. Headers, lists, and code blocks are preserved as text. |
.csv | CSV spreadsheet | Each row becomes searchable text. Column headers are included. |
.json | JSON data | The full JSON string is indexed. Ideal for structured knowledge dumps. |
.xml | XML | Raw XML text is indexed. Tag names and values are both searchable. |
.html / .htm | HTML | Full HTML source is indexed, including tag content and attributes. |
What Appalix skips
Files with any other extension are ignored entirely. They are never executed, stored, or sent to the AI model. This applies to:
Executables & scripts
.exe, .dll, .bat, .sh, .py, .js, .php
Never run. Skipped silently for security.
Images & media
.jpg, .png, .gif, .mp4, .mp3, .pdf
Binary files with no plain-text content to index.
Office documents
.docx, .xlsx, .pptx
Upload these directly as their own source type for full parsing.
Archives inside archives
Nested .zip, .tar, .gz
Only top-level content is processed. Nested ZIPs are skipped.
Security & safety
- Executables are never run — files are decoded as plain text strings only, never executed in any environment.
- Zip bomb protection — if the total uncompressed text content exceeds 50 MB, ingestion stops immediately with an error.
- No binary processing — only whitelisted text extensions are read. Unknown types are skipped without error.
- Isolated processing — ingestion runs in a sandboxed API service, separate from your bot's runtime environment.
Frequently asked questions
Can I include PDFs or Word docs inside the ZIP?
Not yet — those formats require a separate parsing pipeline. Upload PDF, Word (.docx), or Excel (.xlsx) files directly using their own source tiles. Inside a ZIP, only plain-text formats are indexed.
Does folder structure inside the ZIP matter?
No. Appalix flattens the archive and processes every matching file regardless of which subfolder it lives in. The folder path is shown as a section header in the indexed content so you can trace where each chunk came from.
What if some files inside are empty?
Empty files are skipped automatically — only files with non-whitespace content are indexed.
Can I re-upload a ZIP to update the knowledge base?
Yes. Delete the old source and add a new one with the updated ZIP, or use the Resync button on the source row if you replace the file at the same storage path.
Is there a limit on the number of files inside the ZIP?
There is no file count limit, only the 50 MB total uncompressed text content limit. A ZIP with 500 tiny .txt files will work fine as long as the combined text stays under 50 MB.