I think Gemini Flash 1.5 is the best closed-source model for this. Very cheap. P...

ajcp · on Aug 9, 2024

> I convert each pdf page to an image and send one request per page to Flash

Why convert? Flash 1.5 accepts whole PDFs just fine. It will also increase the models response accuracy.

Context: I have found Flash 1.5 is excellent and stable for this kind of use-case. Even at a non-EA price-point it's incredibly cheap, especially when utilizing Batch Prediction Jobs (50% discount!).

jmeyer2k · on Aug 9, 2024

Curious how you test accuracy across different models, and how much is cost per page?

ajcp · on Aug 11, 2024

In my experience at this point all the flagship multi-modal LLM provide for the same accuracy. I see very little, if any, drift in output between them, especially if you have your prompts dialed.

For the Gemini Flash 1.5 model GCP pricing[0] treats each PDF page an image, so you're looking at pricing per image ($0.00002) + the token count ($0.00001875 / 1k characters) from the base64 string encoding of the entire PDF and the context you provide.

10 page PDF ($0.0002) + ~3,000 tokens of context/base64 ($0.00005625) = $0.00025625

Cut that in half if you utilize Batch Prediction jobs[1] and even at scale you're looking at a rounding error in costs.

For on-going accuracy tracking I take a static proportion of the generations (say 1%, or 10 PDFs for every 1,000) and run them through an evaluation[2] workflow. Depending on how/what you're extracting from the PDFs the eval method is going to change, but I find for "unstructured to structured" use-cases the fulfillment evaluation is a fair test.

0. https://cloud.google.com/vertex-ai/generative-ai/pricing 1. https://cloud.google.com/vertex-ai/generative-ai/docs/model-... 2. https://cloud.google.com/vertex-ai/generative-ai/docs/models...

sumedh · on Aug 10, 2024

> Flash 1.5 accepts whole PDFs just fine.

Sometimes models cannot extract the text from the pdf in that case you need to use give the image of the page.

ajcp · on Aug 11, 2024

Ah, yes, I've found pre-processing the PDFs to sanitize against things like that has been helpful. That's a whole other process though.

sumedh · on Aug 12, 2024

What steps does that involve?

ajcp · on Aug 12, 2024

Essentially what you're already doing, with one more step :) Get PDF > convert (read: rebuild) to TIFF > convert to PDF.

In my case all documents to be sent to the LLM (PDFs/Images/emails/etc) are already stagged in a file repository as part of a standard storage process. This entails every document being converted into a TIFF (read: rebuilt cleanly) for storage, and then into PDF upon export. This ensures that all docs are correct and don't maintain whatever went into originally creating them. I've found any number of "PDF" documents are not PDF, while others try and enforce some "protection" that makes the LLM not like the DOCS

sumedh · on Aug 17, 2024

Interesting, I will try the TIFF approach for some of the problems Pdfs I have.

Thanks

cowsaymoo · on Aug 10, 2024

Cheap for now. One day, once the market shares balance out, the cloud spend will increase. Local LLMs may be important to prioritize for code that may be running after multiple subscription cycles into the future.

Edit: oh you best wrote closed-source model whoops