BioFinderLM V2.5: From On-Demand Search to a Weekly Email Digest

BioFinderLM V2.5: From On-Demand Search to a Weekly Email Digest

This is the third post in the BioFinderLM series. If you’re new here, start with the project overview, then see v1 and v2.

What Changed, and Why

After running v2 for a few weeks, two pain points emerged:

I was still running it manually. Every Monday morning I’d open a terminal, run the script, wait 10 to 15 minutes, and check the results. This is exactly the kind of task that should run itself.

Classification was wasteful at the tail end. After DPR ranking, the top papers are genuinely relevant, but by paper #200 you’re deep into noise. The LLM was dutifully classifying papers as “Low” confidence, spending API tokens to confirm what the DPR score already suggested. I needed an early stopping mechanism.

v2.5 addresses both by turning BioFinderLM into a scheduled job that emails me a weekly digest of new high-confidence papers.

Workflow

BioFinderLM v2.5 workflow

BioFinderLM v2.5. Weekly cron on top, email digest at the tail.

Key Improvements Over v2

Featurev2.0v2.5
ExecutionManualCron-schedulable
ClassificationAlways full runAdaptive early stopping
Email digestWeekly HTML / plain-text
DPR rankingArticles onlyArticles + Preprints

Adaptive Classification: Early Stopping

The core idea is simple: if the LLM has classified the last several papers (sorted by DPR score) and none of them are “High” confidence, there’s diminishing value in continuing. In practice, this cuts classification time noticeably on unfocused queries where the relevant literature is small, while leaving highly active fields mostly unaffected since high-confidence papers keep appearing throughout the ranking.

← Back to the BioFinderLM project overview.