Toxic Comment Classification System using Deep Learning

Download Project Document/Synopsis

Over a decade, social networking and social media have been growing in leaps and bounds. Today, people can express themselves and their opinions and also discuss with others via these platforms. In such a scenario, it is quite obvious that debates may arise due to differences in opinion. But often these debates take a dirty side and may result in fights over social media during which offensive language termed as toxic comments may be used from one side. These toxic comments may be threatening, obscene, insulting or identity-based hatred. So, these pose the threat of abuse and harassment online. Detecting Toxic comments has been a great challenge for all scholars in the field of research and development.

To tackle the above-mentioned problem, we have developed a Toxic Comment Classification System using Deep Learning. The system is designed to detect and classify toxic comments or texts while chatting. It helps people refrain from using negative or profane language while interacting with others and promote healthy conversation among users.

The system comprises 1 module: User.
The user would require to register first to access the system. They can log in using their credentials. The user would require to select a particular user to chat. After the user will add the text, the system will check if the comment is toxic. If it is, the system will highlight the text using JavaScript. The system will check if there is any toxic word in the sentence by comparing it with the pre-defined list of words. It will automatically give synonyms for the word which are non-toxic. After all the conditions are checked, the system will post the chat.

The system involves HTML, CSS and JavaScript in the front end and MSSQL Database in the back end. The back-end language is Python and the framework used is Django. The dataset is used from Kaggle.

The model used for this system is LSTM. LSTM stands for Long-Short Term Memory. It is a type of recurrent neural network that is better than traditional recurrent neural networks in terms of memory. As with every other NN, LSTM can have multiple hidden layers and as it passes through every layer, the relevant information is kept and all the irrelevant information gets discarded in every single cell.

The libraries used are NLTK, Profanity and Wordnet. By using Natural Language Processing (NLP), text classifiers can automatically analyze text and then assign a set of pre-defined tags or categories based on its content. Profanity is a fast and robust python library to check for profanity or offensive language. WordNet is a part of Python’s Natural Language Toolkit. It is a large word database of English Nouns, Adjectives, Adverbs and Verbs.

Advantages

The system will detect and classify toxic comments or texts.
It will help promote healthy conversations.
It will encourage people to refrain from using profanities and negative language.
Reduce cyberbullying.
The system is efficient to use.

Download Project Document/Synopsis

Related Posts