Patense.local

Tech Stack:
vLLM
Next.js
TypeScript
React
tRPC
Prisma

Patense.local

100% private, local document analysis with LLMs.

Features

  • Inventive Feature Extraction
  • Deep Reference Search

AI Deep Search

A lot of what patent attorney's do is search documents for various features. The problem is that takes time and people get tired/distracted. And an LLM can read WAY faster (hundreds/thousands of words per second).

So why not have the LLM search the documents? It's the same idea as in Patense.ai. Enter a feature, break all the references into chunks (to increase the accuracy of the LLM response) and ask the LLM if the chunk discloses or suggests the feature (the standard for getting a patent).

Using this technique with a 3090, I was able to extract relevant paragraphs from ~60 pages of patents in 40 seconds.

This 'reads' all the documents and identified any sections that may be relevant to your possible inventive feature. Since LLMs hallucinate, I also included the source text as well for easy verification.

Inventive Feature Extraction

This uses a similar technique but searching your own Specification.

This extracts every possible feature so you can get an overview of potential strategies and easily search the references (your own disclosure defines the possible features you can use to distinguish your invention from the prior art).

The Tech

Originally this used a naive queue, then I learned about vLLM and continuous batching. Continuous batching is-hang on. In a graphics card, there's space for the model and space for the prompt. If there's space left over you can stuff in more prompts. When those prompts are finished, space opens up. Continuous batching is the process of automatically refilling the graphics card as prompts finish and free memory. This led to massive (8x) speed gains.

The original project, Patense.ai, used batched concurrent api calls to chatGPT. I chose not to use local because (1) the target user likely doesn't have the compute or the knowhow to run it locally and (2) parallel API calls were way faster.

But with continuous batching, local LLMs have a ~0.5-1.0 second response time (at least with an 8b model). Making it much faster than openAI (at my current price tier).

I think the entire process of obtaining a patent can be automated (or at least, seriously augmented) with LLMs. I'm building out tools for the various macro actions needed.