A comprehensive Python toolkit for scraping, analyzing, and extracting insights from job postings on LinkedIn.
- Job ID Extraction: Find relevant job postings based on custom keywords and trigger words
- Job Details Retrieval: Extract comprehensive information from LinkedIn job postings
- Skill Analysis: Automatically identify required skills in job descriptions
- Experience Requirements: Extract years of experience requirements
- Relevant Section Extraction: Focus on the most important parts of job descriptions
- Text Processing: NLP-based analysis of job postings
- Blacklisting: Prevent duplicate job postings across multiple searches
-
Clone this repository:
git clone https://github.com/yourusername/linkedin-job-scraper.git cd linkedin-job-scraper -
Create a virtual environment and install the required dependencies:
python -m venv .env # On Windows .env\Scripts\activate # On Linux/Mac source .env/bin/activate pip install -r requirements.txt
from functions import *
# 1. Define your search parameters
trigger_words = ['data', 'machine learning', 'ai', 'ml ', 'statist', 'artificial intelligence', 'python']
keyword = 'Data science'
skills = ['python', 'sql', 'machine learning', 'pandas', 'numpy', 'tensorflow', 'pytorch']
# 2. Get job IDs matching your criteria
job_ids = get_job_ids(trigger_words, keyword, search_count=350, headers=None,
internship=False, blacklist=True)
# 3. Fetch detailed job information
data = fetch_job_details(job_ids, headers=None)
# 4. Extract relevant information from job descriptions
data_info = job_info_extractor(data, skills=skills)
# 5. Save the results to CSV with today's date in the filename
save_jobs(data_info, keyword, date=True)
# 6. Add the current job IDs to blacklist for future searches
blacklist_job_ids(keyword, rows=None, cleanup=True, date=True)The default location is set to the Netherlands (geoid=102890719). To change the location, update the geoid parameter.
You can provide custom headers for the HTTP requests to avoid rate limiting:
headers = {
'User-Agent': 'Your User Agent String',
'Accept': '*/*'
# Add other headers as needed
}
job_ids = get_job_ids(trigger_words, keyword, headers=headers)Instead of providing a predefined list of skills, you can automatically extract common skills from job descriptions:
data_info = job_info_extractor(data, skills='extract')Scrapes LinkedIn jobs search results to find job IDs matching your criteria.
Fetches detailed information for each job ID, including job title, company name, location, and full description.
Extracts relevant information from job descriptions, including skills and years of experience.
Saves the job data to a CSV file with optional date in the filename.
Loads previously saved job data from a CSV file.
Manages a blacklist of job IDs to avoid duplicates in future searches.
Uses NLP techniques to automatically identify common skills mentioned in job postings.
- LinkedIn may change their API or HTML structure, requiring updates to the scraping functions
- Excessive scraping may lead to IP blocks - use responsibly with delays between requests
- The skill extraction is based on pattern matching and may require customization for specific domains
This project is licensed under the GNU License - see the LICENSE file for details.
This tool is for educational purposes only. Web scraping may violate the terms of service of websites. Use responsibly and at your own risk.