Home Explore Blog CI



docker

2nd chunk of `content/guides/text-classification.md`
b29da8857a4693320fdd2ef034fb0a36b48519452f3d57290000000100000fa1
   - `ssl`: Used for handling SSL certificate issues which might occur while
     downloading data for `nltk`.

2. Handle SSL certificate verification.

   ```python
   try:
       _create_unverified_https_context = ssl._create_unverified_context
   except AttributeError:
       pass
   else:
       ssl._create_default_https_context = _create_unverified_https_context
   ```

   This block is a workaround for certain environments where downloading data
   through NLTK might fail due to SSL certificate verification issues. It's
   telling Python to ignore SSL certificate verification for HTTPS requests.

3. Download NLTK resources.

   ```python
   nltk.download('vader_lexicon')
   ```

   The `vader_lexicon` is a lexicon used by the `SentimentIntensityAnalyzer` for
   sentiment analysis.

4. Define text for testing and corresponding labels.

   ```python
   texts = [...]
   labels = [0, 1, 2, 0, 1, 2]
   ```

   This section defines a small dataset of texts and their corresponding labels (0 for positive, 1 for negative, and 2 for spam).

5. Split the test data.

   ```python
   X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.2, random_state=42)
   ```

   This part splits the dataset into training and testing sets, with 20% of data
   as the test set. As this application uses a pre-trained model, it doesn't
   train the model.

6. Set up sentiment analysis.

   ```python
   sia = SentimentIntensityAnalyzer()
   ```

   This code initializes the `SentimentIntensityAnalyzer` to analyze the
   sentiment of text.

7. Generate predictions and classifications for the test data.

   ```python
   vader_predictions = [sia.polarity_scores(text)["compound"] for text in X_test]
   threshold = 0.2
   vader_classifications = [0 if score > threshold else 1 for score in vader_predictions]
   ```

   This part generates sentiment scores for each text in the test set and classifies them as positive or negative based on a threshold.

8. Evaluate the model.

   ```python
   accuracy = accuracy_score(y_test, vader_classifications)
   report_vader = classification_report(y_test, vader_classifications, zero_division='warn')
   ```

   This part calculates the accuracy and classification report for the predictions.

9. Specify the main execution block.

   ```python
   if __name__ == "__main__":
   ```

   This Python idiom ensures that the following code block runs only if this
   script is the main program. It provides flexibility, allowing the script to
   function both as a standalone program and as an imported module.

10. Create an infinite loop for continuous input.

    ```python
       while True:
        input_text = input("Enter the text for classification (type 'exit' to end): ")

          if input_text.lower() == 'exit':
             print("Exiting...")
             break
    ```

    This while loop runs indefinitely until it's explicitly broken. It lets the
    user continuously enter text for entity recognition until they decide to
    exit.

11. Analyze the text.

    ```python
            input_text_score = sia.polarity_scores(input_text)["compound"]
            input_text_classification = 0 if input_text_score > threshold else 1
    ```

12. Print the VADER Classification Report and the sentiment analysis.

    ```python
            print(f"Accuracy: {accuracy:.2f}")
            print("\nVADER Classification Report:")
            print(report_vader)

            print(f"\nTest Text (Positive): '{input_text}'")
            print(f"Predicted Sentiment: {'Positive' if input_text_classification == 0 else 'Negative'}")
    ```

13. Create `requirements.txt`. The sample application already contains the
    `requirements.txt` file to specify the necessary packages that the
    application imports. Open `requirements.txt` in a code or text editor to
    explore its contents.

    ```text
    # 01 sentiment_analysis
    nltk==3.6.5

    ...

    # 03 text_classification
    scikit-learn==1.3.2

    ...
    ```

Title: Exploring the Text Classification Application Code
Summary
This section details the code of a text classification application, explaining how it splits test data, sets up sentiment analysis using NLTK's SentimentIntensityAnalyzer, generates predictions, evaluates the model, and continuously analyzes user input for sentiment. It also covers the `requirements.txt` file, which specifies the necessary Python packages like nltk and scikit-learn.