Home Explore Blog CI



docker

1st chunk of `content/guides/text-classification.md`
2aa8b033a4dc379e7a99583a374f765d93b6dc2013aa3c4a0000000100000fb9
---
title: Build a text recognition app
linkTitle: Text classification
keywords: nlp, natural language processing, sentiment analysis, python, nltk, scikit-learn, text classification
description: Learn how to build and run a text recognition application using Python, NLTK, scikit-learn, and Docker.
summary: |
  This guide details how to containerize text classification models using
  Docker.
tags: [ai]
languages: [python]
aliases:
  - /guides/use-case/nlp/text-classification/
params:
  time: 20 minutes
---

## Overview

In this guide, you'll learn how to create and run a text recognition
application. You'll build the application using Python with scikit-learn and the
Natural Language Toolkit (NLTK). Then you'll set up the environment and run the
application using Docker.

The application analyzes the sentiment of a user's input text using NLTK's
SentimentIntensityAnalyzer. It lets the user input text, which is then processed
to determine its sentiment, classifying it as either positive or negative. Also,
it displays the accuracy and a detailed classification report of its sentiment
analysis model based on a predefined dataset.

## Prerequisites

- You have installed the latest version of [Docker Desktop](/get-started/get-docker.md). Docker adds new features regularly and some parts of this guide may work only with the latest version of Docker Desktop.
- You have a [Git client](https://git-scm.com/downloads). The examples in this section use a command-line based Git client, but you can use any client.

## Get the sample application

1. Open a terminal, and clone the sample application's repository using the
   following command.

   ```console
   $ git clone https://github.com/harsh4870/Docker-NLP.git
   ```

2. Verify that you cloned the repository.

   You should see the following files in your `Docker-NLP` directory.

   ```text
   01_sentiment_analysis.py
   02_name_entity_recognition.py
   03_text_classification.py
   04_text_summarization.py
   05_language_translation.py
   entrypoint.sh
   requirements.txt
   Dockerfile
   README.md
   ```

## Explore the application code

The source code for the text classification application is in the `Docker-NLP/03_text_classification.py` file. Open `03_text_classification.py` in a text or code editor to explore its contents in the following steps.

1. Import the required libraries.

   ```python
   import nltk
   from nltk.sentiment import SentimentIntensityAnalyzer
   from sklearn.metrics import accuracy_score, classification_report
   from sklearn.model_selection import train_test_split
   import ssl
   ```

   - `nltk`: A popular Python library for natural language processing (NLP).
   - `SentimentIntensityAnalyzer`: A component of `nltk` for sentiment analysis.
   - `accuracy_score`, `classification_report`: Functions from scikit-learn for
     evaluating the model.
   - `train_test_split`: Function from scikit-learn to split datasets into
     training and testing sets.
   - `ssl`: Used for handling SSL certificate issues which might occur while
     downloading data for `nltk`.

2. Handle SSL certificate verification.

   ```python
   try:
       _create_unverified_https_context = ssl._create_unverified_context
   except AttributeError:
       pass
   else:
       ssl._create_default_https_context = _create_unverified_https_context
   ```

   This block is a workaround for certain environments where downloading data
   through NLTK might fail due to SSL certificate verification issues. It's
   telling Python to ignore SSL certificate verification for HTTPS requests.

3. Download NLTK resources.

   ```python
   nltk.download('vader_lexicon')
   ```

   The `vader_lexicon` is a lexicon used by the `SentimentIntensityAnalyzer` for
   sentiment analysis.

4. Define text for testing and corresponding labels.

   ```python
   texts = [...]
   labels = [0, 1, 2, 0, 1, 2]
   ```

   This section defines a small dataset of texts and their corresponding labels (0 for positive, 1 for negative, and 2 for spam).

Title: Building a Text Recognition Application with Docker
Summary
This guide walks you through creating and running a text recognition application using Python, scikit-learn, and NLTK, containerized with Docker. It details setting up the environment, obtaining the sample application, exploring the application code, and handling SSL certificate verification for downloading NLTK resources. The application uses NLTK's SentimentIntensityAnalyzer to classify the sentiment of user input as positive or negative.