754a65e1d71cefe03c2cf9cccf7b379fbb1e0041
Some checks failed
Build and Push Web Docker Image / build (push) Failing after 2m59s
Cavepedia v2
- A RBAC managed cave information system with document processing, semantic search, and an AI chat interface. Email paul@bigcavemaps.com for access. Only about 60% vibe coded.
- URL: https://chat.caving.dev
Architecture
+------------------+
| Auth0 |
+--------+---------+
|
v
+------------------+ +----------+----------+
| | WebSocket | |
| Browser +------------->+ web/ (Next.js) |
| | | - CopilotKit UI |
+------------------+ | - Auth0 SSO |
+----------+----------+
|
v
+----------+----------+
| web/agent/ |
| (LangGraph) |
| - Google Gemini |
+----------+----------+
|
v
+----------+----------+
| mcp/ |
| (FastMCP Server) |
| - Semantic search |
+----------+----------+
|
+--------------------+--------------------+
| |
v v
+----------+----------+ +----------+----------+
| PostgreSQL | | Cohere |
| (pgvector) | | (Embeddings) |
| - embeddings | +---------------------+
| - metadata |
| - batches |
+----------+----------+
^
|
+----------+----------+
| poller/ |
| (Document Pipeline)|
| - PDF splitting |
| - OCR (Claude) |
| - Embeddings |
+----------+----------+
|
+-------------+-------------+
| | |
v v v
+------+------+ +----+----+ +------+------+
| S3: import | | S3: files| | S3: pages |
+-------------+ +----------+ +-------------+
Components
| Component | Description | Tech Stack |
|---|---|---|
| web/ | Frontend application with chat UI | Next.js, CopilotKit, Auth0 |
| web/agent/ | AI agent for answering cave questions | LangGraph, Google Gemini |
| mcp/ | MCP server exposing semantic search tools | FastMCP, Cohere |
| poller/ | Document ingestion and processing pipeline | Python, Claude API, Cohere |
Data Flow
-
Document Ingestion (poller)
- PDFs uploaded to
s3://cavepediav2-import - Poller moves to
s3://cavepediav2-files, splits into pages - Pages stored in
s3://cavepediav2-pages - Claude extracts text via OCR
- Cohere generates embeddings
- Stored in PostgreSQL with pgvector
- PDFs uploaded to
-
Search & Chat (mcp + agent)
- User asks question via web UI
- Agent calls MCP tools for semantic search
- MCP queries pgvector for relevant documents
- Agent synthesizes response with citations
Getting Started
See individual component READMEs:
- web/README.md - Frontend and agent setup
- poller/README.md - Document processing pipeline
Environment Variables
Each component requires its own environment variables. See the respective READMEs for details.
Never commit .env files - they are gitignored.
CI/CD
Gitea Actions workflows build and push Docker images on changes to main:
| Workflow | Trigger Path | Image |
|---|---|---|
| build-push-web | web/** (excluding agent) |
cavepediav2-web:latest |
| build-push-agent | web/agent/** |
cavepediav2-agent:latest |
| build-push-poller | poller/** |
cavepediav2-poller:latest |
License
MIT
Description
Languages
Python
54.8%
TypeScript
24.3%
CSS
12.1%
Dockerfile
5.9%
Shell
1.3%
Other
1.6%