This project focuses on leveraging Machine Learning, especially Natural Language Processing (NLP), to classify text. Students will implement and train a one-dimensional (1D) convolutional neural network (CNN) that takes a text input and outputs a class label for the text.
Identify hate speech by classifying comments as toxic or not toxic.
Disclaimer: This dataset contains text that may be considered profane, vulgar, or offensive.
Classify the sentiment (negative or positive) of movie reviews.
Students are encouraged to try out other datasets of interest and to extend the task to multi-class text classification.
As a bonus task, students can write a script that uses the trained model to classify user input in real-time.
- Laptop/PC
- Access to stable Internet
- Google Collaboratory
- Install all the necessary libraries. Type following command in your Jupyter Notebook:
!pip install -r requirements.txt
- Install Glove
!wget http://nlp.stanford.edu/data/glove.6B.zip
- Unzip Glove
!unzip -q glove.6B.zip
@InProceedings{maas-EtAl:2011:ACL-HLT2011,
author = {Maas, Andrew L. and Daly, Raymond E. and Pham, Peter T. and Huang, Dan and Ng, Andrew Y. and Potts, Christopher},
title = {Learning Word Vectors for Sentiment Analysis},
booktitle = {Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies},
month = {June},
year = {2011},
address = {Portland, Oregon, USA},
publisher = {Association for Computational Linguistics},
pages = {142--150},
url = {http://www.aclweb.org/anthology/P11-1015}
}
@article{buerle2019net2vis,
title={Net2Vis -- A Visual Grammar for Automatically Generating Publication-Ready CNN Architecture Visualizations},
author={Alex Bäuerle and Christian van Onzenoodt and Timo Ropinski},
year={2019},
eprint={1902.04394},
archivePrefix={arXiv},
primaryClass={cs.LG}
}