- `ssl`: Used for handling SSL certificate issues which might occur while
downloading data for `nltk`.
2. Handle SSL certificate verification.
```python
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
```
This block is a workaround for certain environments where downloading data
through NLTK might fail due to SSL certificate verification issues. It's
telling Python to ignore SSL certificate verification for HTTPS requests.
3. Download NLTK resources.
```python
nltk.download('vader_lexicon')
```
The `vader_lexicon` is a lexicon used by the `SentimentIntensityAnalyzer` for
sentiment analysis.
4. Define text for testing and corresponding labels.
```python
texts = [...]
labels = [0, 1, 2, 0, 1, 2]
```
This section defines a small dataset of texts and their corresponding labels (0 for positive, 1 for negative, and 2 for spam).
5. Split the test data.
```python
X_train, X_test, y_train, y_test = train_test_split(texts, labels, test_size=0.2, random_state=42)
```
This part splits the dataset into training and testing sets, with 20% of data
as the test set. As this application uses a pre-trained model, it doesn't
train the model.
6. Set up sentiment analysis.
```python
sia = SentimentIntensityAnalyzer()
```
This code initializes the `SentimentIntensityAnalyzer` to analyze the
sentiment of text.
7. Generate predictions and classifications for the test data.
```python
vader_predictions = [sia.polarity_scores(text)["compound"] for text in X_test]
threshold = 0.2
vader_classifications = [0 if score > threshold else 1 for score in vader_predictions]
```
This part generates sentiment scores for each text in the test set and classifies them as positive or negative based on a threshold.
8. Evaluate the model.
```python
accuracy = accuracy_score(y_test, vader_classifications)
report_vader = classification_report(y_test, vader_classifications, zero_division='warn')
```
This part calculates the accuracy and classification report for the predictions.
9. Specify the main execution block.
```python
if __name__ == "__main__":
```
This Python idiom ensures that the following code block runs only if this
script is the main program. It provides flexibility, allowing the script to
function both as a standalone program and as an imported module.
10. Create an infinite loop for continuous input.
```python
while True:
input_text = input("Enter the text for classification (type 'exit' to end): ")
if input_text.lower() == 'exit':
print("Exiting...")
break
```
This while loop runs indefinitely until it's explicitly broken. It lets the
user continuously enter text for entity recognition until they decide to
exit.
11. Analyze the text.
```python
input_text_score = sia.polarity_scores(input_text)["compound"]
input_text_classification = 0 if input_text_score > threshold else 1
```
12. Print the VADER Classification Report and the sentiment analysis.
```python
print(f"Accuracy: {accuracy:.2f}")
print("\nVADER Classification Report:")
print(report_vader)
print(f"\nTest Text (Positive): '{input_text}'")
print(f"Predicted Sentiment: {'Positive' if input_text_classification == 0 else 'Negative'}")
```
13. Create `requirements.txt`. The sample application already contains the
`requirements.txt` file to specify the necessary packages that the
application imports. Open `requirements.txt` in a code or text editor to
explore its contents.
```text
# 01 sentiment_analysis
nltk==3.6.5
...
# 03 text_classification
scikit-learn==1.3.2
...
```