BioFinderLM V2.5: From On-Demand Search to a Weekly Email Digest
This is the third post in the BioFinderLM series. If you’re new here, start with the project overview, then see v1 and v2.
What Changed, and Why
After running v2 for a few weeks, two pain points emerged:
I was still running it manually. Every Monday morning I’d open a terminal, run the script, wait 10 to 15 minutes, and check the results. This is exactly the kind of task that should run itself.
Classification was wasteful at the tail end. After DPR ranking, the top papers are genuinely relevant, but by paper #200 you’re deep into noise. The LLM was dutifully classifying papers as “Low” confidence, spending API tokens to confirm what the DPR score already suggested. I needed an early stopping mechanism.
v2.5 addresses both by turning BioFinderLM into a scheduled job that emails me a weekly digest of new high-confidence papers.
Workflow
BioFinderLM v2.5. Weekly cron on top, email digest at the tail.
Key Improvements Over v2
| Feature | v2.0 | v2.5 |
|---|---|---|
| Execution | Manual | Cron-schedulable |
| Classification | Always full run | Adaptive early stopping |
| Email digest | ❌ | ✅ Weekly HTML / plain-text |
| DPR ranking | Articles only | Articles + Preprints |
Adaptive Classification: Early Stopping
The core idea is simple: if the LLM has classified the last several papers (sorted by DPR score) and none of them are “High” confidence, there’s diminishing value in continuing. In practice, this cuts classification time noticeably on unfocused queries where the relevant literature is small, while leaving highly active fields mostly unaffected since high-confidence papers keep appearing throughout the ranking.
← Back to the BioFinderLM project overview.