---
title: Build a text recognition app
linkTitle: Text classification
keywords: nlp, natural language processing, sentiment analysis, python, nltk, scikit-learn, text classification
description: Learn how to build and run a text recognition application using Python, NLTK, scikit-learn, and Docker.
summary: |
This guide details how to containerize text classification models using
Docker.
tags: [ai]
languages: [python]
aliases:
- /guides/use-case/nlp/text-classification/
params:
time: 20 minutes
---
## Overview
In this guide, you'll learn how to create and run a text recognition
application. You'll build the application using Python with scikit-learn and the
Natural Language Toolkit (NLTK). Then you'll set up the environment and run the
application using Docker.
The application analyzes the sentiment of a user's input text using NLTK's
SentimentIntensityAnalyzer. It lets the user input text, which is then processed
to determine its sentiment, classifying it as either positive or negative. Also,
it displays the accuracy and a detailed classification report of its sentiment
analysis model based on a predefined dataset.
## Prerequisites
- You have installed the latest version of [Docker Desktop](/get-started/get-docker.md). Docker adds new features regularly and some parts of this guide may work only with the latest version of Docker Desktop.
- You have a [Git client](https://git-scm.com/downloads). The examples in this section use a command-line based Git client, but you can use any client.
## Get the sample application
1. Open a terminal, and clone the sample application's repository using the
following command.
```console
$ git clone https://github.com/harsh4870/Docker-NLP.git
```
2. Verify that you cloned the repository.
You should see the following files in your `Docker-NLP` directory.
```text
01_sentiment_analysis.py
02_name_entity_recognition.py
03_text_classification.py
04_text_summarization.py
05_language_translation.py
entrypoint.sh
requirements.txt
Dockerfile
README.md
```
## Explore the application code
The source code for the text classification application is in the `Docker-NLP/03_text_classification.py` file. Open `03_text_classification.py` in a text or code editor to explore its contents in the following steps.
1. Import the required libraries.
```python
import nltk
from nltk.sentiment import SentimentIntensityAnalyzer
from sklearn.metrics import accuracy_score, classification_report
from sklearn.model_selection import train_test_split
import ssl
```
- `nltk`: A popular Python library for natural language processing (NLP).
- `SentimentIntensityAnalyzer`: A component of `nltk` for sentiment analysis.
- `accuracy_score`, `classification_report`: Functions from scikit-learn for
evaluating the model.
- `train_test_split`: Function from scikit-learn to split datasets into
training and testing sets.
- `ssl`: Used for handling SSL certificate issues which might occur while
downloading data for `nltk`.
2. Handle SSL certificate verification.
```python
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
```
This block is a workaround for certain environments where downloading data
through NLTK might fail due to SSL certificate verification issues. It's
telling Python to ignore SSL certificate verification for HTTPS requests.
3. Download NLTK resources.
```python
nltk.download('vader_lexicon')
```
The `vader_lexicon` is a lexicon used by the `SentimentIntensityAnalyzer` for
sentiment analysis.
4. Define text for testing and corresponding labels.
```python
texts = [...]
labels = [0, 1, 2, 0, 1, 2]
```
This section defines a small dataset of texts and their corresponding labels (0 for positive, 1 for negative, and 2 for spam).