This is a simple yet robust Python script for cleaning CSV files. It removes duplicate rows, replaces missing values with empty strings, and saves the cleaned data to a new file. It also includes basic error handling for missing files and corrupted input.
- Removes duplicate rows
- Replaces missing values (
NaN) with empty strings - Handles missing or empty input files gracefully
- Reports how many duplicate rows were removed
- Easy to use via command-line arguments
- Python 3.x
pandaslibrary
Install dependencies with:
pip install pandas
Usage
Run the script from the command line:
python clean_csv.py input.csv output.csvArguments input.csv: The path to the original CSV file you want to clean.
output.csv: The path where the cleaned CSV file should be saved.
Error Handling If the input file does not exist, the script will notify you.
If the input file is empty or corrupted, an appropriate error message will be shown.
If the script is run with incorrect arguments, it will display usage instructions.
License This project is open source and free to use. No license restrictions.