CNNs for Text Classification

Mentors 👨‍🏫:

Overview 🕵️:

This project focuses on leveraging Machine Learning, especially Natural Language Processing (NLP), to classify text. Students will implement and train a one-dimensional (1D) convolutional neural network (CNN) that takes a text input and outputs a class label for the text.

Datasets 📊:

Toxic Comments

Identify hate speech by classifying comments as toxic or not toxic.

Disclaimer: This dataset contains text that may be considered profane, vulgar, or offensive.

IMDB Movie Reviews

Classify the sentiment (negative or positive) of movie reviews.

Students are encouraged to try out other datasets of interest and to extend the task to multi-class text classification.

Bonus 🏆:

As a bonus task, students can write a script that uses the trained model to classify user input in real-time.

Requirements 📝:

Laptop/PC
Access to stable Internet
Google Collaboratory

Setup ⚙️:

Install all the necessary libraries. Type following command in your Jupyter Notebook:

!pip install -r requirements.txt

Install Glove

!wget http://nlp.stanford.edu/data/glove.6B.zip

Unzip Glove

!unzip -q glove.6B.zip

References

@InProceedings{maas-EtAl:2011:ACL-HLT2011,
  author    = {Maas, Andrew L.  and  Daly, Raymond E.  and  Pham, Peter T.  and  Huang, Dan  and  Ng, Andrew Y.  and  Potts, Christopher},
  title     = {Learning Word Vectors for Sentiment Analysis},
  booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
  month     = {June},
  year      = {2011},
  address   = {Portland, Oregon, USA},
  publisher = {Association for Computational Linguistics},
  pages     = {142--150},
  url       = {http://www.aclweb.org/anthology/P11-1015}
}

@article{buerle2019net2vis,
  title={Net2Vis -- A Visual Grammar for Automatically Generating Publication-Ready CNN Architecture Visualizations},
  author={Alex Bäuerle and Christian van Onzenoodt and Timo Ropinski},
  year={2019},
  eprint={1902.04394},
  archivePrefix={arXiv},
  primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
.gitignore		.gitignore
1D-Convolutions-Text.pdf		1D-Convolutions-Text.pdf
README.md		README.md
hate_speech_classification.ipynb		hate_speech_classification.ipynb
imdb_reviews_classification.ipynb		imdb_reviews_classification.ipynb
predict.py		predict.py
requirements.txt		requirements.txt
toxic_cnn.h5		toxic_cnn.h5
toxic_tokenizer.pkl		toxic_tokenizer.pkl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CNNs for Text Classification

Mentors 👨‍🏫:

Overview 🕵️:

Datasets 📊:

Toxic Comments

IMDB Movie Reviews

Bonus 🏆:

Requirements 📝:

Setup ⚙️:

References

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

ai4all-sfu/NLP_2020

Folders and files

Latest commit

History

Repository files navigation

CNNs for Text Classification

Mentors 👨‍🏫:

Overview 🕵️:

Datasets 📊:

Toxic Comments

IMDB Movie Reviews

Bonus 🏆:

Requirements 📝:

Setup ⚙️:

References

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages