3.x milestone
parent
33d50adf07
commit
dab94013f1
26
README.md
26
README.md
|
@ -4,6 +4,32 @@
|
|||
|
||||
- go 1.16+
|
||||
|
||||
## 3.x milestone
|
||||
### Main pipeline
|
||||
1. Use wasabi s3 as source of truth. Any new docs are uploaded to wasabi at `s3://pew-cavepedia-data/00_files/`
|
||||
1. Once a day, the `main.py` script runs, which:
|
||||
1. Pulls additions to deletions from `s3://pew-cavepedia-data/00_files/` to `/bigdata/archive/cavepedia/pew-cavepedia-data/00_files/`
|
||||
1. Validates `metadata.py` contains data for any new folders.
|
||||
1. Runs `00-01.py`
|
||||
1. Runs `01-02.py`
|
||||
1. Pushes additions or deletions to `s3://pew-cavepedia-data/{01_ocr,01_pages,02_json,02_text}`
|
||||
1. At this point all newly index data should be OCR'd and processed.
|
||||
1. Once a day, the cavepedia application (must be running on the same host), checks for any updates:
|
||||
1. Pulls additions or deletions from `/bigdata/archive/cavepedia/pew-cavepedia-data/00_files/`
|
||||
1. If changes, delete the local index and reindex all documents
|
||||
|
||||
### Offline export
|
||||
1. `./launch.sh release [tenant]` creates a local `release` directory for offline usage:
|
||||
1. Pulls files for the respective tenant from `/bigdata/archive/cavepedia/pew-cavepedia-data/00_files/` to `./00_files/`
|
||||
1. Indexes all tenant documents
|
||||
1. Saves index
|
||||
|
||||
### Multi-tenant
|
||||
1. Change url to have a `/{tenant}/ path part just after the host, for example `https://trog.bigcavemaps.com/public/search`, or `https://trog.bigcavemaps.com/vpi/search`
|
||||
1. During document indexing, each document has a list of tenants. During search, only documents owned by a given tenant are returned.
|
||||
1. Each path has its own password, specified in a hash format as env vars, eg `VPI_PASSWORD=asdf`, `PUBLIC_PASSWORD=`.
|
||||
1. The `public` tenant is all documents that are allowed to be public.
|
||||
|
||||
|
||||
## Add more documents
|
||||
|
||||
|
|
Loading…
Reference in New Issue