Skip to content

This repo contains the code for extracting structured data from CONSORT flow diagrams in PDF files reporting randomized trials.

Notifications You must be signed in to change notification settings

EPPI-Centre/flowchart-data-extraction

Repository files navigation

flowchart-data-extraction

This repo contains the code for extracting structured data from CONSORT flow diagrams in PDF files reporting randomized trials.

Contents

Setup

Create environment

conda create -n flow python==3.11.11 -y
conda activate flow

Install

git clone https://github.com/EPPI-Centre/flowchart-data-extraction.git
cd flowchart-data-extraction
pip install -e .

Assign OpenAI API Key

In the root of this repo, you must create a file called .env. In this file you will register your OpenAI API key as so:

OPENAI_API_KEY=COPY_AND_PASTE_YOUR_API_KEY_HERE

Quickstart

Extract Figures From PDF

Windows (Powershell):

$Env:OUTPUT_IMAGE_FORMAT = "PNG"
marker --output_dir OUTPUT_DIR INPUT_DIR

Mac/Linux:

export OUTPUT_IMAGE_FORMAT="PNG"
marker --output_dir OUTPUT_DIR INPUT_DIR

Extract CONSORT From Images Dir

python classify_images_as_flowchart.py

Parse CONSORT From Images Dir

python parse_flowchart_images.py

Figure extraction tools to test

About

This repo contains the code for extracting structured data from CONSORT flow diagrams in PDF files reporting randomized trials.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published