poller docker
This commit is contained in:
@@ -12,3 +12,47 @@ Every 5 minutes, this polls for new documents as follows:
|
||||
* A temporary public S3 file link is generated using a presigned s3 url.
|
||||
5. Checks the `embeddings` table for any rows that have been OCR'd, but do not have embeddings generated, then generates embeddings with cohere.
|
||||
* No batching is used with cohere.
|
||||
|
||||
## Environment Variables
|
||||
|
||||
| Variable | Required | Default | Description |
|
||||
|----------|----------|---------|-------------|
|
||||
| `COHERE_API_KEY` | Yes | - | Cohere API key for embeddings |
|
||||
| `S3_ACCESS_KEY` | Yes | - | S3/MinIO access key |
|
||||
| `S3_SECRET_KEY` | Yes | - | S3/MinIO secret key |
|
||||
| `DB_PASSWORD` | Yes | - | PostgreSQL password |
|
||||
| `ANTHROPIC_API_KEY` | Yes | - | Claude API key for OCR |
|
||||
| `DB_HOST` | No | localhost | PostgreSQL host |
|
||||
| `DB_PORT` | No | 5432 | PostgreSQL port |
|
||||
| `DB_NAME` | No | cavepediav2_db | PostgreSQL database name |
|
||||
| `DB_USER` | No | cavepediav2_user | PostgreSQL username |
|
||||
| `S3_ENDPOINT` | No | https://s3.bigcavemaps.com | S3 endpoint URL |
|
||||
| `S3_REGION` | No | eu | S3 region |
|
||||
|
||||
## Development
|
||||
|
||||
```bash
|
||||
# Create .env file with required variables
|
||||
cp .env.example .env
|
||||
|
||||
# Install dependencies
|
||||
uv sync
|
||||
|
||||
# Run
|
||||
python main.py
|
||||
```
|
||||
|
||||
## Deployment
|
||||
|
||||
The poller is automatically built and pushed to `git.seaturtle.pw/cavepedia/cavepediav2-poller:latest` on push to main.
|
||||
|
||||
```bash
|
||||
docker run \
|
||||
-e COHERE_API_KEY="xxx" \
|
||||
-e S3_ACCESS_KEY="xxx" \
|
||||
-e S3_SECRET_KEY="xxx" \
|
||||
-e DB_PASSWORD="xxx" \
|
||||
-e DB_HOST="postgres" \
|
||||
-e ANTHROPIC_API_KEY="xxx" \
|
||||
git.seaturtle.pw/cavepedia/cavepediav2-poller:latest
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user