Usage Examples#
The Three-Step Workflow#
Review Buddy provides a simple, production-ready workflow for systematic literature reviews:
π Fetch Metadata β Search databases and collect paper information
π― Filter Abstracts β Exclude unwanted papers (optional but recommended)
π₯ Download PDFs β Automatically retrieve full-text papers
Step 1: Fetch Paper Metadata#
Configure Your Search#
Edit 01_fetch_metadata.py to customize your search:
# ============= CONFIGURATION =============
# Option 1: Inline query (simple)
QUERY = "machine learning AND healthcare"
# Option 2: From text file (recommended for complex queries)
QUERY = Path("query.txt").read_text(encoding="utf-8").strip()
YEAR_FROM = 2020 # Starting year
MAX_RESULTS_PER_SOURCE = 50 # Per database (use 999999 for unlimited)
OUTPUT_DIR = Path("results")
Create query.txt (Recommended)#
For complex queries, create a query.txt file with multi-line formatting:
(
"machine learning" OR "deep learning" OR "artificial intelligence"
)
AND
(
healthcare OR medical OR clinical
)
NOT
(
review OR "systematic review"
)
See docs/QUERY_SYNTAX.md for more examples.
Run the Search#
python 01_fetch_metadata.py
Output:
================================================================================
REVIEW BUDDY - FETCH PAPER METADATA
================================================================================
β Available sources: Scopus, PubMed, arXiv, Google Scholar, IEEE Xplore
Search query: machine learning AND healthcare
Year from: 2020
Max results per source: 50
================================================================================
SEARCHING...
================================================================================
Searching Scopus...
Found 45 papers from Scopus
Searching PubMed...
Found 38 papers from PubMed
Searching arXiv...
Found 22 papers from arXiv
Searching Google Scholar...
Found 50 papers from Google Scholar
Searching IEEE Xplore...
Found 31 papers from IEEE Xplore
================================================================================
FOUND 142 UNIQUE PAPERS
================================================================================
Papers by source:
Scopus: 45
PubMed: 38
arXiv: 22
Google Scholar: 50
IEEE: 31
================================================================================
GENERATING OUTPUT FILES...
================================================================================
β BibTeX: results/references.bib
β RIS: results/references.ris
β CSV: results/papers.csv
Next step: Run 02_abstract_filter.py to filter papers by abstract
Or skip filtering and run 03_download_papers.py to download PDFs
Query Syntax Quick Reference#
Syntax |
Example |
Meaning |
|---|---|---|
|
|
Both terms required |
|
|
Either term |
|
|
Exclude term |
|
|
Exact phrase |
|
|
Grouping |
More examples: See docs/QUERY_SYNTAX.md
Step 2: Filter Abstracts (Optional)#
Option A: Keyword-Based Filtering#
Edit 02_abstract_filter.py to configure your filters:
# ============================================================================
# CONFIGURATION - CUSTOMIZE YOUR FILTERS HERE
# ============================================================================
# Enable/disable filters
FILTERS_ENABLED = {
'no_abstract': True, # Remove papers without abstracts
'non_english': True, # Remove non-English papers
'epilepsy': True, # Remove epilepsy papers (example)
'bci': True, # Remove BCI papers (example)
'non_human': True, # Remove animal studies
'non_empirical': True, # Remove review papers
}
# Define keyword filters
KEYWORD_FILTERS = {
'epilepsy': [
'epileptic spike', 'epileptic spikes', 'seizure spike',
'epileptiform', 'interictal spike'
],
'bci': [
'brain-computer interface', 'brain computer interface',
'brain-machine interface', 'bci', 'bmi'
],
'non_human': [
'rat', 'rats', 'mouse', 'mice', 'murine', 'rodent',
'monkey', 'primate', 'in vitro', 'animal model'
],
'non_empirical': [
'systematic review', 'meta-analysis', 'literature review',
'scoping review', 'review article'
],
# Add your own custom filters:
# 'custom_filter_name': [
# 'keyword1', 'keyword2', 'keyword3'
# ],
}
Run the filter:
python 02_abstract_filter.py
Output:
======================================================================
ABSTRACT-BASED PAPER FILTERING
======================================================================
Loaded 142 papers
Filters to apply: no_abstract, non_english, epilepsy, bci, non_human, non_empirical
======================================================================
FILTERING SUMMARY
======================================================================
Initial papers: 142
Papers kept: 89
Papers filtered out: 53
Retention rate: 62.7%
Breakdown by filter:
- no_abstract : 8 papers
- non_english : 3 papers
- epilepsy : 12 papers
- bci : 7 papers
- non_human : 18 papers
- non_empirical : 5 papers
β Filtered results: results/references_filtered.bib
β Filtered papers: results/papers_filtered.csv
β Filtered out papers: results/filtered_out/
Next step: Run 03_download_papers.py to download PDFs
Option B: AI-Powered Filtering (New!)#
For more intelligent filtering, use the AI-powered option with Ollama:
Prerequisites:
Install Ollama: https://ollama.ai
Pull a model:
ollama pull llama3.1:8bConfigure in
.env:OLLAMA_MODEL=llama3.1:8b
Edit 02_abstract_filter_AI.py:
# ============================================================================
# CONFIGURATION - CUSTOMIZE YOUR FILTERS HERE
# ============================================================================
# AI Model Configuration
AI_CONFIG = {
'model': 'llama3.1:8b',
'temperature': 0.1,
'confidence_threshold': 0.5, # Min confidence to filter (0.0-1.0)
}
# Filter Definitions (use natural language!)
FILTERS_CONFIG = {
'epilepsy': {
'enabled': True,
'prompt': "Does this paper focus primarily on epileptic spikes or seizure detection?",
'description': "Papers about epilepsy-related spike detection"
},
'bci': {
'enabled': True,
'prompt': "Is this paper about brain-computer interfaces (BCI) or brain-machine interfaces (BMI)?",
'description': "Papers about BCI/BMI systems"
},
'non_human': {
'enabled': True,
'prompt': "Is this paper based on animal studies, in-vitro experiments, or computational models only (not human subjects)?",
'description': "Non-human or in-vitro studies"
},
# Add your own filters with natural language prompts!
}
Run the AI filter:
python 02_abstract_filter_AI.py
Output:
======================================================================
AI-POWERED ABSTRACT FILTERING (LOCAL OLLAMA)
======================================================================
Model: llama3.1:8b
Confidence threshold: 0.5
======================================================================
AI FILTERING SUMMARY
======================================================================
Initial papers: 142
Papers kept: 94
Papers filtered out: 43
Manual review needed: 5
Retention rate: 66.2%
Ollama Usage:
- Model calls made: 142
- Cache hits: 0
- Cache hit rate: 0.0%
- Model used: llama3.1:8b
β Filtered results: results/references_filtered_ai.bib
β Manual review: results/manual_review_ai.csv
β Decision log: results/ai_filtering_log_*.json
Comparing Strategies: See docs/FILTER_WORKFLOW_EXAMPLE.md for a complete example.
Step 3: Download PDFs#
The downloader automatically uses filtered results if available:
results/references_filtered.bib(if you ran step 2)results/references.bib(if you skipped filtering)
Run the Downloader#
python 03_download_papers.py
Optional: Enable Sci-Hub (use responsibly per your local laws):
Edit 03_download_papers.py:
USE_SCIHUB = True # Enable Sci-Hub as fallback
Output:
================================================================================
REVIEW BUDDY - DOWNLOAD PAPERS
================================================================================
Input file: results/references_filtered.bib
Output directory: results/pdfs
Sci-Hub enabled: False
================================================================================
STARTING DOWNLOAD...
================================================================================
Downloading PDFs: 100%|ββββββββββββββββββββββ| 89/89 [05:23<00:00, 3.63s/paper]
================================================================================
DOWNLOAD COMPLETE!
================================================================================
Downloaded: 67 PDFs
Location: results/pdfs
Log file: results/pdfs/download.log
Success rate: 75.3%
Download Strategies#
The downloader tries 10+ methods automatically:
Direct PDF link - Publisherβs direct URL
arXiv - Preprint server (95%+ success)
bioRxiv/medRxiv - Biology/medicine preprints
Unpaywall API - Open access aggregator
Crossref - DOI-based full-text links
PubMed Central (US) - Free full-text articles
PubMed Central (Europe) - European mirror
Publisher patterns - MDPI, Frontiers, Nature, IEEE, ScienceDirect, Springer, PLOS
ResearchGate/Academia - Academic social networks
HTML scraping - Extract PDF links from paper pages
Sci-Hub - Optional fallback (if enabled)
Expected Success Rates:
arXiv papers: 95%+
bioRxiv/medRxiv: 95%+
Open access publishers: 80-90%
Overall (without Sci-Hub): 50-70%
Overall (with Sci-Hub): 70-90%
For details: docs/DOWNLOADER_GUIDE.md
Complete Workflow Example#
Scenario: Systematic Review on βEEG and Cognitive Assessmentβ#
Goal: Find papers on EEG-based cognitive assessment, excluding animal studies, reviews, and BCI research.
1. Configure Search#
Create query.txt:
(
EEG OR "event-related potential" OR electroencephalography
)
AND
(
"cognitive assessment" OR "cognitive function" OR "cognitive performance"
)
Edit 01_fetch_metadata.py:
QUERY = Path("query.txt").read_text(encoding="utf-8").strip()
YEAR_FROM = 2018
MAX_RESULTS_PER_SOURCE = 100
2. Run Search#
python 01_fetch_metadata.py
Result: 287 papers from 5 sources β results/references.bib
3. Configure Filtering#
Edit 02_abstract_filter.py:
FILTERS_ENABLED = {
'no_abstract': True,
'non_english': True,
'bci': True, # Exclude BCI papers
'non_human': True, # Exclude animal studies
'non_empirical': True, # Exclude reviews
}
KEYWORD_FILTERS = {
'bci': [
'brain-computer interface', 'brain computer interface',
'brain-machine interface', 'bci', 'bmi', 'neural interface'
],
'non_human': [
'rat', 'rats', 'mouse', 'mice', 'rodent', 'primate',
'animal model', 'in vitro', 'in-vitro'
],
'non_empirical': [
'systematic review', 'meta-analysis', 'literature review',
'review article', 'scoping review'
],
}
4. Run Filtering#
python 02_abstract_filter.py
Result: 287 β 156 papers β results/references_filtered.bib
Breakdown:
No abstract: 12 papers
Non-English: 8 papers
BCI: 34 papers
Non-human: 61 papers
Non-empirical: 16 papers
5. Review Filtered Papers (Optional)#
Check what was removed:
# View BCI papers that were filtered
head results/filtered_out/bci.csv
# View animal studies that were removed
head results/filtered_out/non_human.csv
If you find false positives, refine your keywords and re-run step 4.
6. Download PDFs#
python 03_download_papers.py
Result: 156 papers β 117 PDFs (75% success) β results/pdfs/
7. Final Output#
results/
βββ papers.csv # Original 287 papers
βββ references.bib # Original bibliography
βββ papers_filtered.csv # β
156 filtered papers
βββ references_filtered.bib # β
Bibliography for 156 papers
βββ filtered_out/ # Papers removed by each filter
β βββ no_abstract.csv
β βββ non_english.csv
β βββ bci.csv
β βββ non_human.csv
β βββ non_empirical.csv
βββ pdfs/ # β
117 downloaded PDFs
βββ paper1.pdf
βββ paper2.pdf
βββ ...
βββ download.log
Advanced Techniques#
Custom Filter Examples#
Exclude Pediatric Studies#
FILTERS_ENABLED = {
'pediatric': True,
# ... other filters
}
KEYWORD_FILTERS = {
'pediatric': [
'children', 'child', 'pediatric', 'paediatric',
'infant', 'toddler', 'adolescent', 'school-age'
],
# ... other filters
}
Exclude fMRI-Only Studies#
KEYWORD_FILTERS = {
'fmri_only': [
'fMRI only', 'exclusively fMRI', 'solely fMRI',
# Be careful with broad terms like 'fMRI' alone
# as they'll catch combined EEG+fMRI studies
],
}
Keep Only Clinical Trials#
KEYWORD_FILTERS = {
# Use negative filters to exclude everything except trials
'non_clinical': [
'simulation', 'computational model', 'theoretical',
'case report', 'review', 'survey'
],
}
Batch Processing Multiple Queries#
Create a script to process multiple related queries:
queries = [
"EEG AND attention",
"EEG AND memory",
"EEG AND executive function"
]
for i, query in enumerate(queries):
print(f"\n{'='*60}")
print(f"Processing query {i+1}: {query}")
print('='*60)
# Edit query in script
with open("01_fetch_metadata.py", "r") as f:
content = f.read()
# Simple replacement (or use more robust method)
content = content.replace(
'QUERY = "..."',
f'QUERY = "{query}"'
)
# Run search
# ... (call script or import and run)
Monitoring Download Progress#
from pathlib import Path
import time
pdf_dir = Path("results/pdfs")
# Watch downloads in real-time
while True:
pdfs = list(pdf_dir.glob("*.pdf"))
print(f"\rDownloaded: {len(pdfs)} PDFs", end="")
time.sleep(2)
Parsing Download Logs#
from pathlib import Path
log_file = Path("results/pdfs/download.log")
if log_file.exists():
with open(log_file) as f:
log = f.read()
# Extract statistics
if "SUCCESS:" in log:
success_lines = [l for l in log.split('\n') if "SUCCESS:" in l]
print(f"Successfully downloaded: {len(success_lines)}")
if "FAILED:" in log:
failed_lines = [l for l in log.split('\n') if "FAILED:" in l]
print(f"Failed downloads: {len(failed_lines)}")
Output Formats#
BibTeX (.bib)#
Standard format for LaTeX and most reference managers:
@article{Smith2020,
title = {Machine Learning in Healthcare},
author = {Smith, John and Doe, Jane},
journal = {Journal of Medical AI},
year = {2020},
volume = {15},
number = {3},
pages = {123-145},
doi = {10.1234/jmai.2020.001},
pmid = {12345678},
url = {https://example.com/paper}
}
RIS (.ris)#
Format for EndNote, Mendeley, Zotero:
TY - JOUR
TI - Machine Learning in Healthcare
AU - Smith, John
AU - Doe, Jane
JO - Journal of Medical AI
PY - 2020
VL - 15
IS - 3
SP - 123-145
DO - 10.1234/jmai.2020.001
UR - https://example.com/paper
AB - This paper presents a novel approach to...
ER -
CSV (.csv)#
For data analysis and spreadsheets:
Title |
Authors |
Year |
Journal |
DOI |
PMID |
Citations |
Sources |
|---|---|---|---|---|---|---|---|
Machine Learning⦠|
Smith; Doe |
2020 |
J Med AI |
10.1234β¦ |
12345678 |
45 |
Scopus, PubMed |
Tips & Best Practices#
Maximize Paper Discovery#
β
Use multiple sources - Each database has different coverage
β
Start with broad queries - Filter afterwards programmatically
β
Check year ranges - Recent papers may not be indexed everywhere
β
Use query.txt - Better for complex, multi-line queries
β
Save intermediate results - Keep original and filtered versions
Optimize Filtering#
β
Start conservative - Use specific keywords first
β
Review filtered-out papers - Check for false positives
β
Iterate - Refine filters based on review
β
Use word boundaries - Prevents substring matches
β
Test AI filtering - Compare with keyword approach
Improve Download Success#
β
Configure Unpaywall email - Significantly increases success rate
β
Enable PubMed sources - Better access to biomedical papers
β
Check institutional access - May have more access than open sources
β
Monitor logs - Understand why specific downloads fail
β
Consider Sci-Hub - Only if legal in your jurisdiction
Query Construction#
β Good:
"(machine learning OR deep learning) AND (healthcare OR medical)"
β Too narrow:
'"machine learning for medical diagnosis in pediatric cardiology"'
β Boolean logic:
"(EEG OR MEG OR iEEG) AND cognition NOT animal"
β Too complex for some sources:
'(((A OR B) AND (C OR D)) NOT (E OR F)) AND ((G AND H) OR I)'
Expected Success Rates#
Metadata Collection: ~95% (depends on API availability)
PDF Downloads:
Open access papers: 95%+
arXiv preprints: 98%+
bioRxiv/medRxiv: 95%+
PubMed papers: 60-70%
IEEE papers: 50-60%
Overall (no Sci-Hub): 50-70%
Overall (with Sci-Hub): 70-90%
Troubleshooting Common Issues#
No Papers Found#
Problem: Search returns 0 papers
Solutions:
Simplify query:
"machine learning"instead of complex booleanCheck year range (some databases lag 1-2 years)
Verify API keys are working
Try single source: Look for errors in specific searchers
Filtering Too Aggressive#
Problem: Too many papers filtered out
Solutions:
Review
filtered_out/*.csvfilesMake keywords more specific
Disable aggressive filters temporarily
Use AI filtering for nuanced decisions
Low Download Success Rate#
Problem: < 50% PDFs downloaded
Solutions:
Add
UNPAYWALL_EMAILto.envCheck you have PubMed-heavy sources (better access)
Enable Sci-Hub if appropriate
Check institutional VPN/access
Review
download.logfor specific failure reasons
Import/Path Errors#
Problem: ModuleNotFoundError or import errors
Solutions:
Run from project root:
cd /path/to/review_buddyCheck
src/__init__.pyexistsDonβt rename project folders
Use:
python 01_fetch_metadata.py(notpython3, not from subdirs)
Next Steps#
Review docs/QUERY_SYNTAX.md for advanced query patterns
Check docs/FILTER_WORKFLOW_EXAMPLE.md for complete filtering example
See docs/DOWNLOADER_GUIDE.md for download troubleshooting
Explore Additional Tools for complementary resources
Use LitMaps for citation network discovery