BioFinderLM: An AI Literature Discovery Engine for Biological Research
The Problem
If you work in biology, you know the feeling. You open PubMed with a specific question, say, “what computational tools exist for integrating spatial transcriptomics across tissues?”, and you get back 400+ results. Two hours later you’ve skimmed a mountain of abstracts to find maybe ten papers that are actually relevant. In fast-moving subfields (spatial transcriptomics, single-cell genomics, multi-omics integration), new preprints appear daily and the signal-to-noise ratio of keyword search is painful.
BioFinderLM is my attempt at fixing this for myself. What started as a weekend script in early 2025 has gone through three iterations, each one solving a real limitation I ran into with the previous version.
The Three Versions at a Glance
| v1.0 | v2.0 | v2.5 | |
|---|---|---|---|
| Data source | PubMed + PMC | Europe PMC | Europe PMC |
| Semantic ranking | ❌ | ✅ DPR embeddings | ✅ DPR embeddings |
| Adaptive early stopping | ❌ | ❌ | ✅ |
| Automation | ❌ Manual | ❌ Manual | ✅ Cron + email |
Each version is covered in more detail in its own post:
- BioFinderLM v1: natural-language query, Boolean generation, PubMed/PMC retrieval, LLM relevance classification.
- BioFinderLM v2: switches to Europe PMC for broader coverage and adds Dense Passage Retrieval (DPR) ranking via Gemini embeddings, so the most semantically relevant papers are classified first.
- BioFinderLM v2.5: adds a weekly email digest of new high-confidence papers so the tool runs on its own.
What I Learned Across the Versions
Start with the data source. The single highest-leverage change across the project was switching from PubMed to Europe PMC in v2. More coverage beats better ranking.
Ranking before classification is worth it. Sorting by DPR similarity before sending papers to the LLM classifier added a few seconds of latency but saved real API cost and made results feel better ordered.
Weekly automation turned it into a habit. Once v2.5 was emailing a digest every week, BioFinderLM went from an on-demand utility to something I actually use regularly.