GitHub - naravid19/isranews-scraper: A robust and parallel web scraper for isranews.org with multi-category support and data export.

Isranews Scraper

A robust, asynchronous web scraper for Isranews.org with a modern GUI and CLI support.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Usage
Roadmap
Contributing
License
Contact
Acknowledgments

About The Project

isranews-scraper is a high-performance web scraping tool designed specifically for isranews.org. It allows users to extract news articles from various categories efficiently using asynchronous operations.

Key features include:

Asynchronous Scraping: Built with asyncio and playwright for maximum speed and concurrency.
Dual Interface: Offers both a Command Line Interface (CLI) for automation and a Graphical User Interface (GUI) for ease of use.
Multi-Format Export: Save data in CSV, Excel, JSON, or TXT formats.
Smart Filtering: Filter news by date and automatically merge new data with existing files.
Robust Error Handling: Handles network issues and encoding errors gracefully.

(back to top)

Built With

This project is built using robust Python libraries to ensure reliability and performance.

(back to top)

Getting Started

To get a local copy up and running follow these simple steps.

Prerequisites

Python 3.8 or higher
pip

Installation

Clone the repo

git clone https://github.com/naravid19/isranews-scraper.git

Install Python packages
```
pip install -r requirements.txt
```
Install Playwright browsers
```
python -m playwright install
```

(back to top)

Usage

Graphical User Interface (GUI)

For a user-friendly experience, run the GUI application:

python isranews_scraper_gui.py

Select Categories: Choose one or more news categories from the list.
Set Range: Define the start and end pages to scrape.
Filter: Optionally set a date to filter news items.
Export: Choose your desired output format and filename.
Start: Click the "Start Scraping" button.

Command Line Interface (CLI)

For automation or server environments, use the CLI:

python isranews_scraper.py -c "ศูนย์ข่าวสืบสวน" -s 1 -e 5 -o investigative_news

Arguments:

-c, --categories: Category name or index (comma-separated). Use "all" for everything.
-s, --start: Start page number (default: 1).
-e, --end: End page number (0 for all).
-o, --output: Output filename (without extension).
-f, --format: Output format (csv, excel, json, txt).
-d, --date: Filter date (YYYY-MM-DD).
--max-threads: Maximum concurrent pages (default: 5).

(back to top)

Roadmap

Migrated to Asynchronous Architecture (asyncio + playwright)
Modern Dark-Themed GUI (PyQt6)
Multi-format Export Support
Automatic Data Merging
Add support for downloading article images/attachments
Implement scheduled scraping (Cron/Task Scheduler integration)
REST API for remote triggering

See the open issues for a full list of proposed features (and known issues).

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

Fork the Project
Create your Feature Branch (git checkout -b feature/AmazingFeature)
Commit your Changes (git commit -m 'Add some AmazingFeature')
Push to the Branch (git push origin feature/AmazingFeature)
Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE for more information.

(back to top)

Contact

Naravid - @naravid19

Project Link: https://github.com/naravid19/isranews-scraper

(back to top)

Acknowledgments

(back to top)

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
example		example
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
isranews_scraper.py		isranews_scraper.py
isranews_scraper_gui.py		isranews_scraper_gui.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Isranews Scraper

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

Graphical User Interface (GUI)

Command Line Interface (CLI)

Roadmap

Contributing

License

Contact

Acknowledgments

About

Uh oh!

Releases

Packages

Languages

License

naravid19/isranews-scraper

Folders and files

Latest commit

History

Repository files navigation

Isranews Scraper

About The Project

Built With

Getting Started

Prerequisites

Installation

Usage

Graphical User Interface (GUI)

Command Line Interface (CLI)

Roadmap

Contributing

License

Contact

Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages