pdf-renamer

What it is

A small local AI tool that reads a PDF and suggests a meaningful filename. handbuch_final_v2.pdf becomes something like Bauknecht_Waschmaschine_WA-Eco-8_Bedienungsanleitung.pdf. Runs entirely on your own machine, no cloud API, no data leaves the host.

Repo: github.com/jemimakilab/pdf-renamer.

How it came about

In a call with someone who’d just started with vibe coding. We were talking about other things, and along the way I noticed they were missing a specific tool: hundreds of PDFs on the drive, all named useless things like Scan_2024-03-12_001.pdf. An hour later, a first working version was on GitHub.

That’s the whole Layer-8 point in one example. The tech wasn’t the problem, that part took an hour. The problem was listening to what was actually missing, instead of showing off what the tools can do.

How it works

Drop a PDF in via drag-and-drop, or point it at a folder (batch mode)
A local LLM via Ollama reads the first pages
It extracts metadata and proposes a structured filename
You review the suggestion, correct fields if needed, then apply the rename
In batch mode it runs through hundreds of files at once, and you only go through the review table once

For scanned PDFs without a text layer, it falls back in cascade: pypdf first, then pdfplumber, finally Tesseract OCR with German and English language packs.

Tech stack

Python as the main language
Ollama for local LLMs (default qwen2.5:32b-instruct for good quality, qwen2.5:7b-instruct or llama3.1:8b for lighter hardware)
JSON schema constraint via Ollama’s structured output, no parsing roulette with malformed LLM answers
Docker + Docker Compose for deployment
Tesseract for OCR fallback on scans
MIT-licensed on GitHub

Configurable naming schemas

In config/schemas.yaml you can define your own templates, for example a schema for invoices, one for product manuals, one for contracts. At runtime you pick which schema applies. The LLM is guided by the schema to extract specific fields like brand, type, model number, document type.

Privacy note

Everything runs on-premise. No cloud, no API key, no data leaves the machine. The only network call goes to your own Ollama on the same host or inside the same Docker network. For people processing contracts, invoices, or other sensitive documents, this is the only clean way. Doing this with the ChatGPT or Claude API means shipping full PDF contents to a third party, often without thinking twice.

What I took from it

The speed of vibe coding isn’t the impressive part of this project. The impressive part is that the tool actually fits, because I understood the real task, not a generic version of it. The person I built it for didn’t need “a PDF tool”, they needed “a tool that fits their filing structure”. That’s the difference between an AI tool that annoys and one that actually gets used.