Building a Text Recognition Application with Docker

--- title: Build a text recognition app linkTitle: Text classification keywords: nlp, natural language processing, sentiment analysis, python, nltk, scikit-learn, text classification description: Learn how to build and run a text recognition application using Python, NLTK, scikit-learn, and Docker. summary: | This guide details how to containerize text classification models using Docker. tags: [ai] languages: [python] aliases: - /guides/use-case/nlp/text-classification/ params: time: 20 minutes --- ## Overview In this guide, you'll learn how to create and run a text recognition application. You'll build the application using Python with scikit-learn and the Natural Language Toolkit (NLTK). Then you'll set up the environment and run the application using Docker. The application analyzes the sentiment of a user's input text using NLTK's SentimentIntensityAnalyzer. It lets the user input text, which is then processed to determine its sentiment, classifying it as either positive or negative. Also, it displays the accuracy and a detailed classification report of its sentiment analysis model based on a predefined dataset. ## Prerequisites - You have installed the latest version of [Docker Desktop](/get-started/get-docker.md). Docker adds new features regularly and some parts of this guide may work only with the latest version of Docker Desktop. - You have a [Git client](https://git-scm.com/downloads). The examples in this section use a command-line based Git client, but you can use any client. ## Get the sample application 1. Open a terminal, and clone the sample application's repository using the following command. ```console $ git clone https://github.com/harsh4870/Docker-NLP.git ``` 2. Verify that you cloned the repository. You should see the following files in your `Docker-NLP` directory. ```text 01_sentiment_analysis.py 02_name_entity_recognition.py 03_text_classification.py 04_text_summarization.py 05_language_translation.py entrypoint.sh requirements.txt Dockerfile README.md ``` ## Explore the application code The source code for the text classification application is in the `Docker-NLP/03_text_classification.py` file. Open `03_text_classification.py` in a text or code editor to explore its contents in the following steps. 1. Import the required libraries. ```python import nltk from nltk.sentiment import SentimentIntensityAnalyzer from sklearn.metrics import accuracy_score, classification_report from sklearn.model_selection import train_test_split import ssl ``` - `nltk`: A popular Python library for natural language processing (NLP). - `SentimentIntensityAnalyzer`: A component of `nltk` for sentiment analysis. - `accuracy_score`, `classification_report`: Functions from scikit-learn for evaluating the model. - `train_test_split`: Function from scikit-learn to split datasets into training and testing sets. - `ssl`: Used for handling SSL certificate issues which might occur while downloading data for `nltk`. 2. Handle SSL certificate verification. ```python try: _create_unverified_https_context = ssl._create_unverified_context except AttributeError: pass else: ssl._create_default_https_context = _create_unverified_https_context ``` This block is a workaround for certain environments where downloading data through NLTK might fail due to SSL certificate verification issues. It's telling Python to ignore SSL certificate verification for HTTPS requests. 3. Download NLTK resources. ```python nltk.download('vader_lexicon') ``` The `vader_lexicon` is a lexicon used by the `SentimentIntensityAnalyzer` for sentiment analysis. 4. Define text for testing and corresponding labels. ```python texts = [...] labels = [0, 1, 2, 0, 1, 2] ``` This section defines a small dataset of texts and their corresponding labels (0 for positive, 1 for negative, and 2 for spam).

This guide walks you through creating and running a text recognition application using Python, scikit-learn, and NLTK, containerized with Docker. It details setting up the environment, obtaining the sample application, exploring the application code, and handling SSL certificate verification for downloading NLTK resources. The application uses NLTK's SentimentIntensityAnalyzer to classify the sentiment of user input as positive or negative.