split pdf

This commit is contained in:
2025-05-26 08:54:13 -04:00
parent e0c6eef76d
commit aeae900cae
4 changed files with 67 additions and 16 deletions

View File

@@ -3,5 +3,6 @@
https://min.io/docs/minio/linux/developers/python/API.html#presigned-get-object-bucket-name-object-name-expires-timedelta-days-7-response-headers-none-request-date-none-version-id-none-extra-query-params-none
## TODO
- if pages > 100 -> chunk to cavepedia-v2-scratch -> collect content
- cavepedia-v2 ->
- split pdfs -> chunk and write to cavepedia-v2-pages ->
- cohere embedding limits TODO