Skip to content

🐙 FastAPI Journal Automation uses Gemini and Groq LLaMA to auto-create and enrich journal articles via CORE API, with Pydantic validation and JSON output.

Notifications You must be signed in to change notification settings

aligoraya202/FastAPI-Journal-Automation-with-Generative-And-AI-Compound-AI-System

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FastAPI Journal Automation with Generative AI Compound System for Journals

Releases

Journal AI Hero

A FastAPI-based automation system for creating, enriching, and managing journal articles. It integrates Google Gemini, Groq’s LLaMA, and the CORE API to deliver a seamless pipeline — from metadata input to fully structured, AI-generated journal outputs.


Table of Contents


Overview

  • This repository hosts a compact, production-ready automation system for journals. It uses a compound AI approach to orchestrate multiple AI agents, each handling a slice of the pipeline: from metadata intake and enrichment to writing, formatting, and publication readiness.
  • The system is built on FastAPI, a lean Python framework that provides clean RESTful APIs, strong typing, and fast performance.
  • The AI stack blends Google Gemini for semantic understanding, Groq’s LLaMA for generation, and the CORE API for retrieval-augmented generation and knowledge sourcing.
  • Outputs can be structured JSON, LaTeX-ready content, Markdown manuscripts, and publication-ready PDFs. The design supports RAG workflows to pull relevant literature and ensure factual grounding.

Key Concepts

  • Compound AI System: A coordinated set of AI agents that work in sequence or parallel to complete a task. Each agent focuses on a subtask — metadata normalization, outline generation, content drafting, citation management, and QA.
  • Agent Orchestration: A lightweight director pattern that assigns tasks to specialized agents. It ensures data provenance, versioning, and auditable changes through the pipeline.
  • Retrieval-Augmented Generation (RAG): The system pulls external sources (via CORE) to ground generated content in credible literature. This reduces hallucinations and improves accuracy.
  • Metadata-First Pipeline: The process starts with metadata input (title, keywords, authors, abstracts) and grows into a fully structured article with sections, figures, tables, references, and appendices.
  • Output Flexibility: The pipeline can emit JSON schemas, LaTeX sources, Markdown manuscripts, and export-ready PDFs, enabling seamless integration with publishers and repositories.

Features

  • End-to-end journal automation: intake, enrichment, drafting, formatting, and export.
  • AI collaboration: Gemini for understanding, Groq LLaMA for generation, CORE for sources.
  • Structured outputs: JSON payloads, LaTeX sources, Markdown manuscripts, and PDF exports.
  • LaTeX-ready pipelines: ready-to-compile templates and bibliography management.
  • RAG-enabled literature integration: pull citations and snippets from trusted sources.
  • Fine-grained control: adjustable generation temperature, max tokens, and agent budgets.
  • Versioned outputs: deterministic seeds and content versioning for reproducibility.
  • Authentication-friendly API: token-based access suitable for internal tooling.
  • Extensible architecture: plug in new AI agents or data sources with minimal changes.
  • Local and cloud-ready deployments: Docker, Kubernetes, or raw Python environments.

Tech Stack

  • Language: Python
  • Web framework: FastAPI
  • AI components: Google Gemini, Groq LLaMA
  • Knowledge source: CORE API
  • Data formats: JSON, Markdown, LaTeX, PDF
  • Deployment: Docker, optional Kubernetes
  • Testing: pytest and HTTPX
  • Documentation: Markdown, with ready LaTeX templates

Architecture & Data Flow

  • Ingest: Users submit metadata and preferences via a REST API.
  • Validate: FastAPI models validate input, normalize terms, and resolve author details.
  • Enrich: The system calls Gemini to interpret intent, extract key questions, and shape the article outline.
  • Generate: LLaMA-based generators draft sections, figures, and captions under policy controls.
  • Source: CORE fetches relevant literature; results are ranked and injected into the draft through a RAG loop.
  • Assemble: The pipeline composes a complete manuscript, with bibliographic references, figures, and appendices.
  • Output: The final artifacts are serialized to JSON, LaTeX, Markdown, and PDF.
  • Audit: Every step is logged, with data lineage preserved for reproducibility and compliance.

Data Models & Metadata

  • Article: id, title, authors, affiliations, abstract, keywords, metadata version, publication date, language.
  • Section: heading level, title, content, citations.
  • Figures & Tables: captions, references, image/table payloads, sources.
  • Citations: id, DOI, URL, retrieved_at, source_agency.
  • Enrichment: notes, reliability_score, confidence, used_sources.
  • AIRun: id, model, seed, prompts, tokens_used, duration, status.
  • Provenance: input_version, output_version, data_pipeline_trace.
  • Users & Permissions: roles, tokens, access scopes.

Getting Started Prerequisites

  • Python 3.11+ installed locally or in your environment.
  • Virtual environment support (venv) or conda.
  • Basic familiarity with REST and command line.

Installation

  • Create a working directory and set up a virtual environment.
  • Install dependencies from requirements.txt or via pip install -r requirements.txt.
  • Prepare a .env file with API keys and endpoints for Gemini, LLaMA (Groq), and CORE.

Environment and Secrets

  • GOOGLE_GEMINI_API_KEY: your Gemini API key
  • GROQ_LLAMA_ENDPOINT: address of the Groq LLaMA service
  • CORE_API_KEY: your CORE API key
  • DATABASE_URL: connection string for metadata and artifacts storage
  • APP_SECRET_KEY: used for session and token signing

An example .env snippet

  • GOOGLE_GEMINI_API_KEY=your_key_here
  • GROQ_LLAMA_ENDPOINT=http://localhost:8001
  • CORE_API_KEY=your_core_key
  • DATABASE_URL=sqlite:///journal.db
  • APP_SECRET_KEY=change-me

Project Structure

  • app/
    • main.py: FastAPI application and router setup
    • models.py: Pydantic models for request/response schemas
    • schemas.py: database schemas and serialization helpers
    • services/
      • ai.py: orchestrates Gemini and LLaMA calls
      • core.py: pulls literature from CORE
      • drafting.py: content assembly and formatting
      • export.py: JSON, LaTeX, Markdown, and PDF generation
    • routers/
      • articles.py: CRUD and generation endpoints
    • templates/
      • latex/ and markdown/ templates for outputs
  • tests/
    • test_endpoints.py: API contract tests
    • test_ai_flows.py: AI agent flow tests
  • docs/
    • architecture.md
    • api_reference.md
    • usage_examples.md
  • scripts/
    • install-fastapi-journal.sh: installer script to bootstrap dependencies (see Releases)
  • docker/
    • Dockerfile
    • docker-compose.yml

Getting Started: Quick Start Guide

Usage & API Endpoints

  • POST /articles/generate
    • Creates a new article draft from provided metadata
    • Request body includes: title, authors, abstract, keywords, metadata_version, preferred_language, target_journal
    • Response includes: article_id, status, generation_summary
  • GET /articles/{article_id}
    • Retrieve a complete article draft with sections and citations
  • PATCH /articles/{article_id}
    • Update metadata or enrichment notes
  • POST /articles/{article_id}/export
    • Trigger export to JSON, LaTeX, Markdown, or PDF
  • POST /ai/feedback
    • Provide feedback to improve future generations
  • DELETE /articles/{article_id}
    • Remove drafts and associated artifacts

Examples

  • Submitting metadata
    • Title: "A Novel Approach to Journal Automation with AI"
    • Authors: ["Alex Doe", "Jamie Lee"]
    • Abstract: "This study explores automated generation of journal content using a compound AI system."
    • Keywords: ["journal automation", "AI", "NLP", "RAG", "LaTeX"]
  • Generating content
    • Call POST /articles/generate with the above payload
    • The system routes tasks to Gemini for intent capture, LLaMA for draft, and CORE for sources
  • Exporting to LaTeX
    • POST /articles/{article_id}/export with format=latex
    • Returns a .tex file ready for compilation

AI Integrations

  • Google Gemini
    • Role: Semantic interpretation, task planning, and prompt shaping
    • Usage: Used in the early stage to understand intent and generate an outline
  • Groq LLaMA
    • Role: High-quality text generation for sections, captions, and summaries
    • Usage: Generates draft content with controlled prompts and seeds to ensure determinism
  • CORE API
    • Role: Literature retrieval and grounding
    • Usage: Pulls citations, snippets, and context to ground the manuscript
  • RAG Loop
    • Pulls sources, ranks them by credibility, and injects them into the draft
    • Ensures citations align with the article content

Output Formats

  • JSON: Structured payload with all sections, citations, and metadata
  • Markdown: Human-readable manuscript with headers, lists, and embedded citations
  • LaTeX: Clean, publisher-ready source with bibliographies
  • PDF: Ready-to-publish PDF generated from LaTeX or Markdown pipelines

LaTeX & Publication Readiness

  • The LaTeX templates are designed to be drop-in replacements for common journal formats
  • Bibliography management relies on BibTeX or Biber, depending on the template
  • Figures and tables are positioned to meet typical submission guidelines
  • The system can embed DOI links and cross-references to related works

Deployment & DevOps

  • Local development: Simple Python environment with uvicorn
  • Docker: A ready-to-run container image with all dependencies
  • Kubernetes: Small deployment manifests that scale the AI worker pool
  • CI: GitHub Actions workflows for linting, tests, and build of artifacts
  • Observability: Basic logging, plus optional OpenTelemetry instrumentation for traces

Security & Privacy

  • The API uses token-based authentication for protected routes
  • Data is stored with versioning to enable audit trails
  • All AI prompts are designed to minimize the leakage of sensitive content
  • Access to external services (Gemini, LLaMA, CORE) is controlled by API keys and endpoints

Testing & Quality

  • End-to-end tests simulate article creation from metadata to export
  • Unit tests cover AI orchestration, formatting, and export logic
  • Mock services stand in for Gemini, LLaMA, and CORE during tests
  • Continuous integration runs tests on each pull request

Troubleshooting

  • Common issues: missing environment variables, incorrect API keys, or network restrictions
  • Check the logs for AIRun entries to see seeds, prompts, and durations
  • If CORE returns empty results, verify CORE_API_KEY and network access
  • If LaTeX export fails, confirm LaTeX toolchain (pdflatex/xelatex) availability
  • For Docker users, ensure proper volumes and environment variable passthrough

Roadmap

  • Expand multi-language support for metadata and drafts
  • Add more publishers with templates for journal formats
  • Improve citation disambiguation and DOI resolution
  • Enhance accessibility features and screen reader support
  • Introduce a plugin system to integrate other AI providers

Contributing

  • We welcome contributions. Start by forking the repository and creating a feature branch.
  • Follow the code style used in the project: type hints, clear variable names, and small functions.
  • Add or update tests for any new feature.
  • Run tests locally before opening a pull request.
  • Documentation updates are welcome for new endpoints, workflows, or templates.

Licensing

  • This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

  • Special thanks to the teams behind Google Gemini, Groq LLaMA, and the CORE API for enabling research and practical tooling in journal automation.
  • Gratitude to the open-source community for sharing best practices in FastAPI, AI workflows, and document preparation.

Releases

Notes on Usage and Best Practices

  • Start with a small metadata package to validate the flow. Gradually scale to longer manuscripts.
  • Use the enrichment notes to guide the AI in tone, structure, and audience.
  • Keep generation seeds consistent when you want repeatable outputs across environments.
  • Leverage CORE to fetch diverse sources while maintaining a credible citation strategy.
  • Regularly rotate API keys and monitor usage to avoid drift in AI behavior.

Glossary

  • AI Agent: A module that performs a dedicated task in the pipeline.
  • RAG: Retrieval-Augmented Generation, a method that augments generation with external sources.
  • LaTeX: A high-quality typesetting system commonly used for scientific documents.
  • CORE: An API providing access to a wide literature corpus for grounding content.
  • Gemini: Google’s AI model used for semantic understanding and task framing.
  • LLaMA: Meta’s family of language models released via Groq’s deployment channel.

Techniques and Best Practices

  • Keep prompts concise but expressive. The more precise you are, the better the result.
  • Define a clear outline before drafting long sections.
  • Predefine a citation strategy to ensure consistency across the manuscript.
  • Maintain a clean data model that tracks provenance and edits.

Visuals and Design

  • Hero image showcases the fusion of AI and scholarly writing.
  • Architecture diagram highlights the flow between ingestion, AI processing, RAG integration, and output.
  • The design favors readability, with a calm color palette and accessible typography.

Appendix: API Reference Highlights

  • POST /articles/generate
    • Input: title, authors, abstract, keywords, metadata_version, language, target_journal
    • Output: article_id, status, summary
  • GET /articles/{article_id}
    • Output: full article with sections, figures, references
  • POST /articles/{article_id}/export
    • Input: format (json|latex|markdown|pdf)
    • Output: export artifact or download link
  • POST /ai/feedback
    • Input: article_id, feedback_type, notes
    • Output: acknowledgement and next steps

Appendix: Environment Variables (Expanded)

  • GOOGLE_GEMINI_API_KEY: Gemini service key
  • GROQ_LLAMA_ENDPOINT: LLaMA server address
  • CORE_API_KEY: CORE service key
  • DATABASE_URL: local or cloud database connection
  • APP_SECRET_KEY: session and token signing
  • LOG_LEVEL: debug|info|warning|error
  • ENABLE_HEDGING: true|false, controls safety hedges in generation
  • OUTPUT_FORMATS: json, latex, markdown, pdf (comma-separated)

Appendix: Sample Commands (No Code Blocks)

  • Create a virtual environment and activate it
  • Install dependencies from requirements
  • Copy example environment file and fill in keys
  • Start the API with uvicorn
  • Use the API docs for endpoint exploration

Final Note

  • This repository embraces a practical, step-by-step approach to automating journal article production with a compound AI system. It blends semantic understanding, high-quality generation, and credible literature grounding to produce publication-ready manuscripts.

Releases (again)

About

🐙 FastAPI Journal Automation uses Gemini and Groq LLaMA to auto-create and enrich journal articles via CORE API, with Pydantic validation and JSON output.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •