Patense.ai

Tech Stack:
Next.js
TypeScript
React
tRPC
Stripe
AWS Lambda
OpenAI

Patense.ai is a patent prosecution tool to help lawyers analyze office actions.

Patent 101

If someone already published your thing, you can't patent it. If you find a feature or combination of features that no one else has published, you can patent it (if it's not obvious).

During the application process, patent examiners find relevant references. Patent lawyers then either convince the examiner the references don't say the same thing or amend to differentiate from the reference.

Tool Overview

Patense.ai extracts every possible inventive feature from a specification (the initial document filed that reserves your place in time), searches the references for each feature, and uses GPTs to analyze whether the feature is disclosed or not. This essentially creates a map of every possible amendment, saving hours of attorney time.

V1 - Naive

V1 executed a single walk through the specification. 1-2 page chunks were sent one at a time to GPT-3.5 along with a running list of features. This was slow and didn't scale well to 100+ page specifications. Even after upgrading to premium Vercel hosting (5 minute serverless functions) and transferring to AWS serverless (15 minutes max runtime).

V2 - O(log(n))

V2: Split the entire specification into short 1-2 page chunks, send all chunks in parallel to separate GPT calls asking it to extract every inventive feature. Then recursively combine the lists of features two at a time with more GPT calls. The runtime was O(log(n)), meaning that if we doubled our input length from 100 to 200 pages, we're only adding one additional cycle of consolidation (s/o merge sort for the inspiration). I still ran into OpenAI API rate limits, but those can be managed with timeouts and money since I was well within the runtime limitations on AWS.

Vector Databases

Once all the possible inventive features are extracted, the cited references are split into small chunks, converted to vectors, and stored. This lets you quickly get relevant sections of text based on a query. The query is converted to a vector (tokenized) and then mapped in the same vector space as the references. From here, the closest vectors of text are likely the most relevant.

I used a 'parent document retriever' which is a little more sophisticated. It splits the text into chunks, then splits those chunks into even smaller chunks. The smaller chunks are queried for the search but it returns the larger chunks, providing more context to the model.

We get the most relevant chunks of references, pass them to GPT with each feature, and ask if the feature is disclosed by the text. Those responses are stored in a report so the user can easily tell which features make for good amendments and which don't.

Using a vector DB like this was fast but ultimately wasn't good enough to get the relevant text for each feature. It's not guaranteed that the vector query will get the same text from a documents that the examiner relied on.

V3 - $$$

Vector DBs aren't viable, so what is?

What's more precise than a vector search? Sending the whole text in as context. But we can't do that for every feature, that'd be insanely expensive, and wasteful.

But we don't need to search for EVERY feature, we just need to search for the best one. How do we determine that? We let the user decide.

Now we essentially pass in a feature and use LLM calls to ask "Does this page disclose this feature?". We take all the references a user uploads, split them into single pages and pass everything to an LLM in a few hundred simultaneous calls. With some prompt engineering and regex's we turn the output into something we can manage.

App Structure

The user uploads a PDF of the specification and all the cited references. Documents are stored using UploadThing, a developer-friendly wrapper on AWS S3. From there, some LangChain functions verify the PDFs have proper recognized text before proceeding.

Payments are handled with Stripe, using a subscription model with an allotted 2 million monthly tokens, and then additional tokens billed at $10/million. Authentication is handled with Clerk. Queries take a while and run on an AWS Lambda function with a 15-minute max runtime to handle the ~300 GPT calls and inevitable rate-limiting (until I give OpenAI all my money).

Users can let AI extract every possible inventive element for them, or just use it as a high-precision search function. Results are prompt engineered and parsed with regex's, then displayed for the user.