From dab94013f1c8f896b3e0950fc352051e0635913e Mon Sep 17 00:00:00 2001 From: Paul Walko Date: Wed, 22 Feb 2023 12:15:54 -0500 Subject: [PATCH] 3.x milestone --- README.md | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/README.md b/README.md index 9051ad2..c69fd32 100644 --- a/README.md +++ b/README.md @@ -4,6 +4,32 @@ - go 1.16+ +## 3.x milestone +### Main pipeline +1. Use wasabi s3 as source of truth. Any new docs are uploaded to wasabi at `s3://pew-cavepedia-data/00_files/` +1. Once a day, the `main.py` script runs, which: + 1. Pulls additions to deletions from `s3://pew-cavepedia-data/00_files/` to `/bigdata/archive/cavepedia/pew-cavepedia-data/00_files/` + 1. Validates `metadata.py` contains data for any new folders. + 1. Runs `00-01.py` + 1. Runs `01-02.py` + 1. Pushes additions or deletions to `s3://pew-cavepedia-data/{01_ocr,01_pages,02_json,02_text}` +1. At this point all newly index data should be OCR'd and processed. +1. Once a day, the cavepedia application (must be running on the same host), checks for any updates: + 1. Pulls additions or deletions from `/bigdata/archive/cavepedia/pew-cavepedia-data/00_files/` + 1. If changes, delete the local index and reindex all documents + +### Offline export +1. `./launch.sh release [tenant]` creates a local `release` directory for offline usage: + 1. Pulls files for the respective tenant from `/bigdata/archive/cavepedia/pew-cavepedia-data/00_files/` to `./00_files/` + 1. Indexes all tenant documents + 1. Saves index + +### Multi-tenant +1. Change url to have a `/{tenant}/ path part just after the host, for example `https://trog.bigcavemaps.com/public/search`, or `https://trog.bigcavemaps.com/vpi/search` + 1. During document indexing, each document has a list of tenants. During search, only documents owned by a given tenant are returned. + 1. Each path has its own password, specified in a hash format as env vars, eg `VPI_PASSWORD=asdf`, `PUBLIC_PASSWORD=`. + 1. The `public` tenant is all documents that are allowed to be public. + ## Add more documents