readme
This commit is contained in:
111
README.md
Normal file
111
README.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Cavepedia v2
|
||||
|
||||
A RBAC managed cave information system with document processing, semantic search, and an AI chat interface. Email paul@bigcavemaps.com for access. Only about 60% vibe coded.
|
||||
|
||||
## Architecture
|
||||
|
||||
```
|
||||
+------------------+
|
||||
| Auth0 |
|
||||
+--------+---------+
|
||||
|
|
||||
v
|
||||
+------------------+ +----------+----------+
|
||||
| | WebSocket | |
|
||||
| Browser +------------->+ web/ (Next.js) |
|
||||
| | | - CopilotKit UI |
|
||||
+------------------+ | - Auth0 SSO |
|
||||
+----------+----------+
|
||||
|
|
||||
v
|
||||
+----------+----------+
|
||||
| web/agent/ |
|
||||
| (LangGraph) |
|
||||
| - Google Gemini |
|
||||
+----------+----------+
|
||||
|
|
||||
v
|
||||
+----------+----------+
|
||||
| mcp/ |
|
||||
| (FastMCP Server) |
|
||||
| - Semantic search |
|
||||
+----------+----------+
|
||||
|
|
||||
+--------------------+--------------------+
|
||||
| |
|
||||
v v
|
||||
+----------+----------+ +----------+----------+
|
||||
| PostgreSQL | | Cohere |
|
||||
| (pgvector) | | (Embeddings) |
|
||||
| - embeddings | +---------------------+
|
||||
| - metadata |
|
||||
| - batches |
|
||||
+----------+----------+
|
||||
^
|
||||
|
|
||||
+----------+----------+
|
||||
| poller/ |
|
||||
| (Document Pipeline)|
|
||||
| - PDF splitting |
|
||||
| - OCR (Claude) |
|
||||
| - Embeddings |
|
||||
+----------+----------+
|
||||
|
|
||||
+-------------+-------------+
|
||||
| | |
|
||||
v v v
|
||||
+------+------+ +----+----+ +------+------+
|
||||
| S3: import | | S3: files| | S3: pages |
|
||||
+-------------+ +----------+ +-------------+
|
||||
```
|
||||
|
||||
## Components
|
||||
|
||||
| Component | Description | Tech Stack |
|
||||
|-----------|-------------|------------|
|
||||
| **web/** | Frontend application with chat UI | Next.js, CopilotKit, Auth0 |
|
||||
| **web/agent/** | AI agent for answering cave questions | LangGraph, Google Gemini |
|
||||
| **mcp/** | MCP server exposing semantic search tools | FastMCP, Cohere |
|
||||
| **poller/** | Document ingestion and processing pipeline | Python, Claude API, Cohere |
|
||||
|
||||
## Data Flow
|
||||
|
||||
1. **Document Ingestion** (poller)
|
||||
- PDFs uploaded to `s3://cavepediav2-import`
|
||||
- Poller moves to `s3://cavepediav2-files`, splits into pages
|
||||
- Pages stored in `s3://cavepediav2-pages`
|
||||
- Claude extracts text via OCR
|
||||
- Cohere generates embeddings
|
||||
- Stored in PostgreSQL with pgvector
|
||||
|
||||
2. **Search & Chat** (mcp + agent)
|
||||
- User asks question via web UI
|
||||
- Agent calls MCP tools for semantic search
|
||||
- MCP queries pgvector for relevant documents
|
||||
- Agent synthesizes response with citations
|
||||
|
||||
## Getting Started
|
||||
|
||||
See individual component READMEs:
|
||||
- [web/README.md](web/README.md) - Frontend and agent setup
|
||||
- [poller/README.md](poller/README.md) - Document processing pipeline
|
||||
|
||||
## Environment Variables
|
||||
|
||||
Each component requires its own environment variables. See the respective READMEs for details.
|
||||
|
||||
**Never commit `.env` files** - they are gitignored.
|
||||
|
||||
## CI/CD
|
||||
|
||||
Gitea Actions workflows build and push Docker images on changes to `main`:
|
||||
|
||||
| Workflow | Trigger Path | Image |
|
||||
|----------|--------------|-------|
|
||||
| build-push-web | `web/**` (excluding agent) | `cavepediav2-web:latest` |
|
||||
| build-push-agent | `web/agent/**` | `cavepediav2-agent:latest` |
|
||||
| build-push-poller | `poller/**` | `cavepediav2-poller:latest` |
|
||||
|
||||
## License
|
||||
|
||||
MIT
|
||||
Referens i nytt ärende
Block a user