A FastAPI-based automation system for creating, enriching, and managing journal articles. It integrates Google Gemini, Groq’s LLaMA, and the CORE API to deliver a seamless pipeline — from metadata input to fully structured, AI-generated journal outputs.
Table of Contents
- Overview
- Key Concepts
- Features
- Tech Stack
- Architecture & Data Flow
- Data Models & Metadata
- Getting Started
- Usage & API Endpoints
- AI Integrations
- Output Formats
- Deployment & DevOps
- Testing & Quality
- Security & Privacy
- Troubleshooting
- Roadmap
- Contributing
- Licensing
- Acknowledgments
- Releases
Overview
- This repository hosts a compact, production-ready automation system for journals. It uses a compound AI approach to orchestrate multiple AI agents, each handling a slice of the pipeline: from metadata intake and enrichment to writing, formatting, and publication readiness.
- The system is built on FastAPI, a lean Python framework that provides clean RESTful APIs, strong typing, and fast performance.
- The AI stack blends Google Gemini for semantic understanding, Groq’s LLaMA for generation, and the CORE API for retrieval-augmented generation and knowledge sourcing.
- Outputs can be structured JSON, LaTeX-ready content, Markdown manuscripts, and publication-ready PDFs. The design supports RAG workflows to pull relevant literature and ensure factual grounding.
Key Concepts
- Compound AI System: A coordinated set of AI agents that work in sequence or parallel to complete a task. Each agent focuses on a subtask — metadata normalization, outline generation, content drafting, citation management, and QA.
- Agent Orchestration: A lightweight director pattern that assigns tasks to specialized agents. It ensures data provenance, versioning, and auditable changes through the pipeline.
- Retrieval-Augmented Generation (RAG): The system pulls external sources (via CORE) to ground generated content in credible literature. This reduces hallucinations and improves accuracy.
- Metadata-First Pipeline: The process starts with metadata input (title, keywords, authors, abstracts) and grows into a fully structured article with sections, figures, tables, references, and appendices.
- Output Flexibility: The pipeline can emit JSON schemas, LaTeX sources, Markdown manuscripts, and export-ready PDFs, enabling seamless integration with publishers and repositories.
Features
- End-to-end journal automation: intake, enrichment, drafting, formatting, and export.
- AI collaboration: Gemini for understanding, Groq LLaMA for generation, CORE for sources.
- Structured outputs: JSON payloads, LaTeX sources, Markdown manuscripts, and PDF exports.
- LaTeX-ready pipelines: ready-to-compile templates and bibliography management.
- RAG-enabled literature integration: pull citations and snippets from trusted sources.
- Fine-grained control: adjustable generation temperature, max tokens, and agent budgets.
- Versioned outputs: deterministic seeds and content versioning for reproducibility.
- Authentication-friendly API: token-based access suitable for internal tooling.
- Extensible architecture: plug in new AI agents or data sources with minimal changes.
- Local and cloud-ready deployments: Docker, Kubernetes, or raw Python environments.
Tech Stack
- Language: Python
- Web framework: FastAPI
- AI components: Google Gemini, Groq LLaMA
- Knowledge source: CORE API
- Data formats: JSON, Markdown, LaTeX, PDF
- Deployment: Docker, optional Kubernetes
- Testing: pytest and HTTPX
- Documentation: Markdown, with ready LaTeX templates
Architecture & Data Flow
- Ingest: Users submit metadata and preferences via a REST API.
- Validate: FastAPI models validate input, normalize terms, and resolve author details.
- Enrich: The system calls Gemini to interpret intent, extract key questions, and shape the article outline.
- Generate: LLaMA-based generators draft sections, figures, and captions under policy controls.
- Source: CORE fetches relevant literature; results are ranked and injected into the draft through a RAG loop.
- Assemble: The pipeline composes a complete manuscript, with bibliographic references, figures, and appendices.
- Output: The final artifacts are serialized to JSON, LaTeX, Markdown, and PDF.
- Audit: Every step is logged, with data lineage preserved for reproducibility and compliance.
Data Models & Metadata
- Article: id, title, authors, affiliations, abstract, keywords, metadata version, publication date, language.
- Section: heading level, title, content, citations.
- Figures & Tables: captions, references, image/table payloads, sources.
- Citations: id, DOI, URL, retrieved_at, source_agency.
- Enrichment: notes, reliability_score, confidence, used_sources.
- AIRun: id, model, seed, prompts, tokens_used, duration, status.
- Provenance: input_version, output_version, data_pipeline_trace.
- Users & Permissions: roles, tokens, access scopes.
Getting Started Prerequisites
- Python 3.11+ installed locally or in your environment.
- Virtual environment support (venv) or conda.
- Basic familiarity with REST and command line.
Installation
- Create a working directory and set up a virtual environment.
- Install dependencies from requirements.txt or via pip install -r requirements.txt.
- Prepare a .env file with API keys and endpoints for Gemini, LLaMA (Groq), and CORE.
Environment and Secrets
- GOOGLE_GEMINI_API_KEY: your Gemini API key
- GROQ_LLAMA_ENDPOINT: address of the Groq LLaMA service
- CORE_API_KEY: your CORE API key
- DATABASE_URL: connection string for metadata and artifacts storage
- APP_SECRET_KEY: used for session and token signing
An example .env snippet
- GOOGLE_GEMINI_API_KEY=your_key_here
- GROQ_LLAMA_ENDPOINT=http://localhost:8001
- CORE_API_KEY=your_core_key
- DATABASE_URL=sqlite:///journal.db
- APP_SECRET_KEY=change-me
Project Structure
- app/
- main.py: FastAPI application and router setup
- models.py: Pydantic models for request/response schemas
- schemas.py: database schemas and serialization helpers
- services/
- ai.py: orchestrates Gemini and LLaMA calls
- core.py: pulls literature from CORE
- drafting.py: content assembly and formatting
- export.py: JSON, LaTeX, Markdown, and PDF generation
- routers/
- articles.py: CRUD and generation endpoints
- templates/
- latex/ and markdown/ templates for outputs
- tests/
- test_endpoints.py: API contract tests
- test_ai_flows.py: AI agent flow tests
- docs/
- architecture.md
- api_reference.md
- usage_examples.md
- scripts/
- install-fastapi-journal.sh: installer script to bootstrap dependencies (see Releases)
- docker/
- Dockerfile
- docker-compose.yml
Getting Started: Quick Start Guide
- Clone the repo: git clone https://github.com/aligoraya202/FastAPI-Journal-Automation-with-Generative-And-AI-Compound-AI-System.git
- Create a virtual environment and activate it: python -m venv venv && source venv/bin/activate
- Install dependencies: pip install -r requirements.txt
- Copy and fill the environment file: cp .env.example .env
- Start the API server: uvicorn app.main:app --reload --port 8000
- Access the API docs at: http://localhost:8000/docs
Usage & API Endpoints
- POST /articles/generate
- Creates a new article draft from provided metadata
- Request body includes: title, authors, abstract, keywords, metadata_version, preferred_language, target_journal
- Response includes: article_id, status, generation_summary
- GET /articles/{article_id}
- Retrieve a complete article draft with sections and citations
- PATCH /articles/{article_id}
- Update metadata or enrichment notes
- POST /articles/{article_id}/export
- Trigger export to JSON, LaTeX, Markdown, or PDF
- POST /ai/feedback
- Provide feedback to improve future generations
- DELETE /articles/{article_id}
- Remove drafts and associated artifacts
Examples
- Submitting metadata
- Title: "A Novel Approach to Journal Automation with AI"
- Authors: ["Alex Doe", "Jamie Lee"]
- Abstract: "This study explores automated generation of journal content using a compound AI system."
- Keywords: ["journal automation", "AI", "NLP", "RAG", "LaTeX"]
- Generating content
- Call POST /articles/generate with the above payload
- The system routes tasks to Gemini for intent capture, LLaMA for draft, and CORE for sources
- Exporting to LaTeX
- POST /articles/{article_id}/export with format=latex
- Returns a .tex file ready for compilation
AI Integrations
- Google Gemini
- Role: Semantic interpretation, task planning, and prompt shaping
- Usage: Used in the early stage to understand intent and generate an outline
- Groq LLaMA
- Role: High-quality text generation for sections, captions, and summaries
- Usage: Generates draft content with controlled prompts and seeds to ensure determinism
- CORE API
- Role: Literature retrieval and grounding
- Usage: Pulls citations, snippets, and context to ground the manuscript
- RAG Loop
- Pulls sources, ranks them by credibility, and injects them into the draft
- Ensures citations align with the article content
Output Formats
- JSON: Structured payload with all sections, citations, and metadata
- Markdown: Human-readable manuscript with headers, lists, and embedded citations
- LaTeX: Clean, publisher-ready source with bibliographies
- PDF: Ready-to-publish PDF generated from LaTeX or Markdown pipelines
LaTeX & Publication Readiness
- The LaTeX templates are designed to be drop-in replacements for common journal formats
- Bibliography management relies on BibTeX or Biber, depending on the template
- Figures and tables are positioned to meet typical submission guidelines
- The system can embed DOI links and cross-references to related works
Deployment & DevOps
- Local development: Simple Python environment with uvicorn
- Docker: A ready-to-run container image with all dependencies
- Kubernetes: Small deployment manifests that scale the AI worker pool
- CI: GitHub Actions workflows for linting, tests, and build of artifacts
- Observability: Basic logging, plus optional OpenTelemetry instrumentation for traces
Security & Privacy
- The API uses token-based authentication for protected routes
- Data is stored with versioning to enable audit trails
- All AI prompts are designed to minimize the leakage of sensitive content
- Access to external services (Gemini, LLaMA, CORE) is controlled by API keys and endpoints
Testing & Quality
- End-to-end tests simulate article creation from metadata to export
- Unit tests cover AI orchestration, formatting, and export logic
- Mock services stand in for Gemini, LLaMA, and CORE during tests
- Continuous integration runs tests on each pull request
Troubleshooting
- Common issues: missing environment variables, incorrect API keys, or network restrictions
- Check the logs for AIRun entries to see seeds, prompts, and durations
- If CORE returns empty results, verify CORE_API_KEY and network access
- If LaTeX export fails, confirm LaTeX toolchain (pdflatex/xelatex) availability
- For Docker users, ensure proper volumes and environment variable passthrough
Roadmap
- Expand multi-language support for metadata and drafts
- Add more publishers with templates for journal formats
- Improve citation disambiguation and DOI resolution
- Enhance accessibility features and screen reader support
- Introduce a plugin system to integrate other AI providers
Contributing
- We welcome contributions. Start by forking the repository and creating a feature branch.
- Follow the code style used in the project: type hints, clear variable names, and small functions.
- Add or update tests for any new feature.
- Run tests locally before opening a pull request.
- Documentation updates are welcome for new endpoints, workflows, or templates.
Licensing
- This project is licensed under the MIT License. See the LICENSE file for details.
Acknowledgments
- Special thanks to the teams behind Google Gemini, Groq LLaMA, and the CORE API for enabling research and practical tooling in journal automation.
- Gratitude to the open-source community for sharing best practices in FastAPI, AI workflows, and document preparation.
Releases
- For the latest installers and release artifacts, visit the Releases page: https://github.com/aligoraya202/FastAPI-Journal-Automation-with-Generative-And-AI-Compound-AI-System/releases
- From the Releases page, download install-fastapi-journal.sh and execute it to bootstrap the system. The installer sets up dependencies, config, and sample data to help you get started quickly.
- If you need the latest release details or want to verify compatibility with your environment, check the same Releases page again: https://github.com/aligoraya202/FastAPI-Journal-Automation-with-Generative-And-AI-Compound-AI-System/releases
Notes on Usage and Best Practices
- Start with a small metadata package to validate the flow. Gradually scale to longer manuscripts.
- Use the enrichment notes to guide the AI in tone, structure, and audience.
- Keep generation seeds consistent when you want repeatable outputs across environments.
- Leverage CORE to fetch diverse sources while maintaining a credible citation strategy.
- Regularly rotate API keys and monitor usage to avoid drift in AI behavior.
Glossary
- AI Agent: A module that performs a dedicated task in the pipeline.
- RAG: Retrieval-Augmented Generation, a method that augments generation with external sources.
- LaTeX: A high-quality typesetting system commonly used for scientific documents.
- CORE: An API providing access to a wide literature corpus for grounding content.
- Gemini: Google’s AI model used for semantic understanding and task framing.
- LLaMA: Meta’s family of language models released via Groq’s deployment channel.
Techniques and Best Practices
- Keep prompts concise but expressive. The more precise you are, the better the result.
- Define a clear outline before drafting long sections.
- Predefine a citation strategy to ensure consistency across the manuscript.
- Maintain a clean data model that tracks provenance and edits.
Visuals and Design
- Hero image showcases the fusion of AI and scholarly writing.
- Architecture diagram highlights the flow between ingestion, AI processing, RAG integration, and output.
- The design favors readability, with a calm color palette and accessible typography.
Appendix: API Reference Highlights
- POST /articles/generate
- Input: title, authors, abstract, keywords, metadata_version, language, target_journal
- Output: article_id, status, summary
- GET /articles/{article_id}
- Output: full article with sections, figures, references
- POST /articles/{article_id}/export
- Input: format (json|latex|markdown|pdf)
- Output: export artifact or download link
- POST /ai/feedback
- Input: article_id, feedback_type, notes
- Output: acknowledgement and next steps
Appendix: Environment Variables (Expanded)
- GOOGLE_GEMINI_API_KEY: Gemini service key
- GROQ_LLAMA_ENDPOINT: LLaMA server address
- CORE_API_KEY: CORE service key
- DATABASE_URL: local or cloud database connection
- APP_SECRET_KEY: session and token signing
- LOG_LEVEL: debug|info|warning|error
- ENABLE_HEDGING: true|false, controls safety hedges in generation
- OUTPUT_FORMATS: json, latex, markdown, pdf (comma-separated)
Appendix: Sample Commands (No Code Blocks)
- Create a virtual environment and activate it
- Install dependencies from requirements
- Copy example environment file and fill in keys
- Start the API with uvicorn
- Use the API docs for endpoint exploration
Final Note
- This repository embraces a practical, step-by-step approach to automating journal article production with a compound AI system. It blends semantic understanding, high-quality generation, and credible literature grounding to produce publication-ready manuscripts.
Releases (again)
- To explore the latest installer and release artifacts, visit the Releases page: https://github.com/aligoraya202/FastAPI-Journal-Automation-with-Generative-And-AI-Compound-AI-System/releases