Twitter Sentiment Analysis in Python — Full Tutorial
Step-by-step Python tutorial for Twitter sentiment analysis. Compare TextBlob, VADER, and transformer models on real tweets. Code, costs, and pitfalls covered.

Most "Twitter sentiment analysis" tutorials online were written before the free Twitter API died, and they have not aged well. They tell you to install tweepy, point you at endpoints that now cost real money, and lean on nltk.sentiment examples that quietly mislabel half of modern Twitter copy because they were trained on movie reviews. The good news is that doing this properly in 2026 is not hard — you need one API key, three Python libraries, and about 50 lines of code per pass.
This tutorial walks through a complete Twitter sentiment-analysis pipeline in Python: pull recent tweets via the GetXAPI Twitter API (cheaper than the official X API, no developer-account approval), score them with three different methods (TextBlob, VADER, and a transformer model), compare accuracy on real tweets, and visualize the results. Every section has runnable code, and the full pipeline costs about $0.50 to score 10,000 tweets end-to-end.
TL;DR — the 3-step pipeline
Every Twitter sentiment-analysis workflow has the same three steps. The hard part is picking the right model for your accuracy/cost tradeoff.
- Fetch tweets matching a query (brand name, hashtag, keyword) via the Twitter API.
- Score each tweet as positive, negative, or neutral using a sentiment model.
- Aggregate and visualize the scores — counts, time series, score distribution.
The methods covered below trade off speed and accuracy:
| Method | Speed | Accuracy on Twitter | Cost | Best for |
|---|---|---|---|---|
| TextBlob | ⚡⚡⚡ Very fast | ⭐⭐ OK | Free | Quick prototyping, dashboards where speed matters |
| VADER | ⚡⚡⚡ Very fast | ⭐⭐⭐ Good | Free | Default for social-media text (handles emojis, slang) |
| RoBERTa (Hugging Face) | ⚡ Slower | ⭐⭐⭐⭐⭐ Best | Free model, compute cost | Production brand monitoring, research |
Skip to whichever section matches your need — the code samples are independent.
Step 1 — Fetch tweets via the Twitter API
The fastest path to a Twitter API key in 2026 is to skip the official X developer console (multi-day approval queue, OAuth 1.0a, four-credential setup) and use a pay-per-use third-party API. GetXAPI gives you a Bearer token in 30 seconds, $0.10 in free credits at signup (about 2,000 tweets), and a single REST endpoint for tweet search.
Install the only dependency you need for fetching:
pip install requests
Then this is the entire data-fetch script:
import requests
import json
from typing import List, Dict
GETXAPI_BASE = "https://api.getxapi.com"
GETXAPI_TOKEN = "YOUR_GETXAPI_TOKEN" # get one at getxapi.com
HEADERS = {"Authorization": f"Bearer {GETXAPI_TOKEN}"}
def search_tweets(query: str, max_tweets: int = 200) -> List[Dict]:
"""
Fetch tweets matching `query`. Handles pagination via the cursor.
Returns a list of tweet dicts with id, text, author, and created_at.
"""
tweets: List[Dict] = []
cursor = None
while len(tweets) < max_tweets:
params = {"query": query, "queryType": "Latest"}
if cursor:
params["cursor"] = cursor
r = requests.get(
f"{GETXAPI_BASE}/twitter/tweet/advanced_search",
headers=HEADERS,
params=params,
timeout=15,
)
r.raise_for_status()
data = r.json()
batch = data.get("tweets", [])
if not batch:
break
tweets.extend(batch)
cursor = data.get("next_cursor")
if not cursor:
break
return tweets[:max_tweets]
if __name__ == "__main__":
results = search_tweets("ChatGPT lang:en -is:retweet", max_tweets=500)
print(f"Fetched {len(results)} tweets")
print(json.dumps(results[0], indent=2))
A few things worth knowing:
- One call returns roughly 20 tweets at $0.001 per call, so 500 tweets costs $0.025. A full 10,000-tweet sample is around $0.50.
lang:en -is:retweetis a standard advanced-search filter combination that removes retweets and forces English. Add operators likemin_faves:50orfrom:elonmuskfor more targeted samples. See the Twitter Search API guide for the full operator list.- Pagination is cursor-based. Each response includes a
next_cursoryou pass back on the next request until it stops returning one. - No OAuth. The Bearer header is the whole auth flow.
If you would rather see how to handle retries, async fetching with httpx, and other production concerns, the Python Twitter API tutorial covers those patterns in depth.
Step 2a — Score with TextBlob (the easy one)
TextBlob is the friendliest sentiment library in Python. It returns a polarity score between -1 (negative) and +1 (positive). Accuracy on Twitter is mediocre because TextBlob's lexicon was built mostly from product reviews, but it is fine for quick prototypes and dashboards where the user does not need surgical precision.
pip install textblob
python -m textblob.download_corpora
from textblob import TextBlob
def textblob_sentiment(text: str) -> dict:
blob = TextBlob(text)
polarity = blob.sentiment.polarity # -1.0 to 1.0
subjectivity = blob.sentiment.subjectivity # 0.0 to 1.0
if polarity > 0.1:
label = "positive"
elif polarity < -0.1:
label = "negative"
else:
label = "neutral"
return {"polarity": polarity, "subjectivity": subjectivity, "label": label}
# Apply to your fetched tweets
for tweet in results[:5]:
s = textblob_sentiment(tweet["text"])
print(f"{s['label']:8s} ({s['polarity']:+.2f}) {tweet['text'][:90]}")
TextBlob's main weaknesses on Twitter:
- Ignores emojis entirely. A tweet of just 🔥🔥🔥 scores 0.
- Fails on negation in informal text. "not bad" scores negative.
- Slang and abbreviations are mostly unscored.
If those misses matter for your use case, jump to VADER.
Start building with GetXAPI
$0.05 per 1,000 tweets. $0.10 free credits. No credit card required.
Step 2b — Score with VADER (the smart one for social media)
VADER (Valence Aware Dictionary and sEntiment Reasoner) was built specifically for social-media text. It handles emojis, slang, intensifiers ("really good" scores higher than "good"), and negations correctly. It is still rule-based and still free, but it is dramatically better on Twitter than TextBlob.
pip install vaderSentiment
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
vader = SentimentIntensityAnalyzer()
def vader_sentiment(text: str) -> dict:
scores = vader.polarity_scores(text)
# scores = {"neg": 0.0, "neu": 0.5, "pos": 0.5, "compound": 0.7}
compound = scores["compound"] # -1 to 1, the standard summary score
if compound >= 0.05:
label = "positive"
elif compound <= -0.05:
label = "negative"
else:
label = "neutral"
return {**scores, "label": label}
for tweet in results[:5]:
s = vader_sentiment(tweet["text"])
print(f"{s['label']:8s} ({s['compound']:+.2f}) {tweet['text'][:90]}")
What makes VADER different in practice:
- Emojis count. 🔥 reads as positive, 💀 as negative, 😭 as either depending on context.
- Intensifiers work. "ABSOLUTELY AMAZING" scores higher than "amazing".
- Negation flips correctly. "not bad" scores positive.
- Speed. Around 30,000 tweets per second on a single CPU core — same order as TextBlob.
For most brand-monitoring or general sentiment dashboards, VADER is the right default. It will only let you down on heavy sarcasm, irony, and complex multi-clause sentences — which brings us to the transformer model.
Step 2c — Score with a transformer (the accurate one)
The best Twitter sentiment model on Hugging Face today is cardiffnlp/twitter-roberta-base-sentiment-latest — a RoBERTa model fine-tuned on roughly 124M tweets. It catches sarcasm and context that VADER misses, at the cost of being 100–1,000× slower (still tractable: a few hundred tweets per second on a modern CPU, thousands per second on GPU).
pip install transformers torch
from transformers import pipeline
# Loads about 500MB on first run; cached locally after.
classifier = pipeline(
"sentiment-analysis",
model="cardiffnlp/twitter-roberta-base-sentiment-latest",
)
def roberta_sentiment(texts: list[str]) -> list[dict]:
"""Batch-score for efficiency. Each result has 'label' and 'score'."""
results = classifier(texts, truncation=True, max_length=128)
# results: [{"label": "positive", "score": 0.93}, ...]
return results
# Batch the tweets for speed
texts = [t["text"] for t in results[:50]]
scores = roberta_sentiment(texts)
for tweet, s in zip(results[:5], scores[:5]):
print(f"{s['label']:8s} ({s['score']:.2f}) {tweet['text'][:90]}")
When the transformer earns its compute cost:
- Sarcasm. "Oh great, another Twitter outage" — VADER scores positive ("great"), RoBERTa scores negative.
- Irony. "Just love spending three hours on hold" — same story.
- Context-dependent words. "sick" can mean "ill" or "amazing" depending on the surrounding text.
- Subtle tonal shifts between polite and passive-aggressive.
If you are building a research project, a brand intelligence tool, or anything where misclassifying 10% of tweets matters, use RoBERTa. If you are building a real-time dashboard with thousands of tweets per minute, VADER is the practical choice.
Step 3 — Aggregate, visualize, decide
A list of {label, score, tweet} rows is not insight. The next layer is aggregation: counts by sentiment class, sentiment over time, and the most-positive / most-negative example tweets.
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
# Build a DataFrame combining tweets + VADER scores
rows = []
for tweet in results:
s = vader_sentiment(tweet["text"])
rows.append({
"id": tweet["id"],
"created_at": datetime.fromisoformat(
tweet["created_at"].replace("Z", "+00:00")
),
"text": tweet["text"],
"compound": s["compound"],
"label": s["label"],
})
df = pd.DataFrame(rows)
# Sentiment distribution
counts = df["label"].value_counts()
print(counts)
# Hourly time series
df["hour"] = df["created_at"].dt.floor("h")
hourly = df.groupby(["hour", "label"]).size().unstack(fill_value=0)
ax = hourly.plot(kind="area", stacked=True, alpha=0.7, figsize=(10, 5))
ax.set_title("Sentiment Over Time")
ax.set_xlabel("Hour")
ax.set_ylabel("Tweet count")
plt.tight_layout()
plt.savefig("sentiment_time_series.png", dpi=120)
# Top 5 most positive and most negative tweets
print("\nTop 5 positive:")
print(df.nlargest(5, "compound")[["compound", "text"]])
print("\nTop 5 negative:")
print(df.nsmallest(5, "compound")[["compound", "text"]])
This is the minimum analytical surface for a useful dashboard. From here you would typically layer on top:
- Filter by author / verified-only to weight influencer voices.
- Geo grouping if your fetched tweets included location metadata.
- Topic clustering with
BERTopicorsentence-transformersto break the data into themes before measuring sentiment per theme. - Anomaly alerts when negative-tweet velocity spikes by some threshold.
Real example — brand sentiment for an app launch
Here is the same pipeline put to work on a realistic question: "What was the public sentiment around the ChatGPT 5 launch in the first 48 hours?"
# 1. Fetch a focused sample
tweets = search_tweets(
"ChatGPT 5 lang:en -is:retweet",
max_tweets=2000,
)
# 2. Score with RoBERTa (worth the compute for a one-time analysis)
texts = [t["text"] for t in tweets]
scores = roberta_sentiment(texts)
# 3. Combine
df = pd.DataFrame([
{
"created_at": datetime.fromisoformat(
t["created_at"].replace("Z", "+00:00")
),
"text": t["text"],
"author": t.get("author", {}).get("userName"),
"favorite_count": t.get("favorite_count", 0),
"label": s["label"],
"confidence": s["score"],
}
for t, s in zip(tweets, scores)
])
# 4. Headline numbers
total = len(df)
pct_pos = (df["label"] == "positive").mean() * 100
pct_neg = (df["label"] == "negative").mean() * 100
print(f"{total} tweets · {pct_pos:.1f}% positive · {pct_neg:.1f}% negative")
# 5. Weight by engagement (likes act as amplification)
df["weighted"] = df["favorite_count"].clip(lower=1)
weighted_pos = (
df.loc[df["label"] == "positive", "weighted"].sum() / df["weighted"].sum() * 100
)
print(f"Engagement-weighted positivity: {weighted_pos:.1f}%")
The engagement-weighted score usually tells a different story than the raw count. A small number of very-popular negative tweets can dominate the conversation even when the average tweet is neutral.
The cheapest Twitter API. Try it free.
$0.05 per 1,000 tweets. $0.10 free credits. No credit card required.
Common pitfalls (read this before you ship)
A few mistakes show up over and over in Twitter sentiment-analysis projects:
- Sample bias from search syntax. A query for
"AcmeCorp"will under-index complaints (people often spell the brand wrong when they are angry). Use OR-broadened queries ("AcmeCorp" OR "Acme Corp" OR "@acme") and check the recall. - Survivorship bias from rate-limited fetches. If your fetch caps at 1,000 tweets and the topic has 10,000 mentions, you are sampling the most-recent slice. Either sample randomly across the time window or scale up your fetch budget.
- Bot tweets contaminate the signal. Filter out accounts with zero followers and zero following (often spam), or filter for
verified:trueif you want only signal from "real people". The Twitter API rate limits guide covers how to scale a fetch without tripping platform caps. - Multilingual tweets break English-only models. Set
lang:enin your search query, or use a multilingual model likexlm-roberta-base-sentimentinstead. - Sarcasm fools VADER consistently. If sarcasm is common in your domain (politics, gaming, tech), pay the transformer compute cost.
- Time-zone bugs in the time series. Tweets are timestamped in UTC. Convert to the user's local time before plotting hourly trends or "8am vs 6pm" patterns will be wrong.
How much does this cost end to end?
A complete analysis of 10,000 tweets, run once on a developer laptop, costs roughly:
| Line item | Cost |
|---|---|
| Fetch 10,000 tweets via GetXAPI (~500 calls × $0.001) | $0.50 |
| TextBlob / VADER scoring (CPU, no extra cost) | $0.00 |
| RoBERTa scoring on CPU (~10 minutes of compute) | ~$0.00 (your laptop) |
| RoBERTa scoring on a GPU instance (under 1 minute) | ~$0.01 |
| Total per 10K-tweet pass | ~$0.50 |
For comparison, doing the same fetch on the official X API would cost about $50–$100 at the same volume, because the X API charges $0.005–$0.01 per post read versus GetXAPI's $0.001 per call (~20 tweets per call). Our Twitter API pricing comparison page breaks down the per-tweet economics in detail.
Pulling the whole pipeline together
Here is the full end-to-end script combining everything above. Run it once with your GetXAPI token and you have a working sentiment-analysis tool.
import requests
import pandas as pd
from datetime import datetime
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
GETXAPI_TOKEN = "YOUR_GETXAPI_TOKEN"
HEADERS = {"Authorization": f"Bearer {GETXAPI_TOKEN}"}
vader = SentimentIntensityAnalyzer()
def search_tweets(query, max_tweets=500):
tweets, cursor = [], None
while len(tweets) < max_tweets:
params = {"query": query, "queryType": "Latest"}
if cursor:
params["cursor"] = cursor
r = requests.get(
"https://api.getxapi.com/twitter/tweet/advanced_search",
headers=HEADERS, params=params, timeout=15,
)
r.raise_for_status()
d = r.json()
batch = d.get("tweets", [])
if not batch:
break
tweets.extend(batch)
cursor = d.get("next_cursor")
if not cursor:
break
return tweets[:max_tweets]
def score(text):
s = vader.polarity_scores(text)
c = s["compound"]
label = "positive" if c >= 0.05 else "negative" if c <= -0.05 else "neutral"
return c, label
def analyze(query, max_tweets=1000):
tweets = search_tweets(query, max_tweets)
rows = []
for t in tweets:
c, label = score(t["text"])
rows.append({
"created_at": datetime.fromisoformat(t["created_at"].replace("Z", "+00:00")),
"text": t["text"],
"compound": c,
"label": label,
})
df = pd.DataFrame(rows)
summary = df["label"].value_counts(normalize=True).round(3)
return df, summary
if __name__ == "__main__":
df, summary = analyze("ChatGPT lang:en -is:retweet", max_tweets=1000)
print(summary)
print(f"\nMost positive tweet:\n{df.loc[df['compound'].idxmax(), 'text']}")
print(f"\nMost negative tweet:\n{df.loc[df['compound'].idxmin(), 'text']}")
Run it, swap the query for whatever brand or topic you want to monitor, and you have a complete Twitter sentiment dashboard in roughly 70 lines of code.
Where to go from here
You now have a working Twitter sentiment-analysis pipeline. The natural next steps depend on what you are building:
- Brand monitoring dashboards — drop the pipeline behind a cron job, ship results to a database, build a dashboard. The Twitter Monitoring use-cases page covers 14 patterns that build on this same sentiment foundation.
- Research projects — switch to the RoBERTa model, expand the time window, and start clustering by topic before measuring sentiment per cluster.
- Production at scale — read the Twitter scraping best practices for handling retries, deduplication, and rate-limit-aware fetches at million-tweet volume.
Or just start with the script: paste it into a file, drop your GetXAPI token at the top (get one with $0.10 in free credits at getxapi.com), and run it against any query you want to understand. Sentiment analysis is one of those tools that earns its place the moment you have it.
Frequently Asked Questions
`cardiffnlp/twitter-roberta-base-sentiment-latest` on Hugging Face. It is a RoBERTa model fine-tuned on ~124M tweets and consistently outperforms TextBlob and VADER on Twitter-specific benchmarks. The tradeoff is compute cost: it is 100–1,000× slower than VADER. For real-time dashboards stick with VADER, for offline accuracy use RoBERTa.
For sentiment analysis specifically, the official X API has the same fundamental data (recent tweets matching a search), but at 100x the cost — $0.005–$0.01 per post read versus $0.001 per call returning roughly 20 tweets via GetXAPI. The official API also requires multi-day developer-account approval and a four-credential OAuth setup. Unless you specifically need filtered streams or PowerTrack-tier firehose access, a third-party API is the more practical choice for sentiment work. See our [Twitter API alternatives](/twitter-api-alternatives) writeup for the full landscape.
Sarcasm detection is its own active research area, and even RoBERTa misses obvious sarcasm sometimes. Two practical options: (1) Use a sarcasm-specific model like `helinivan/english-sarcasm-detector` as a second-pass filter, then re-score positive tweets that the sarcasm detector flags. (2) Filter your input to high-confidence-only RoBERTa predictions (`score > 0.8`) and accept that sarcasm-heavy tweets will fall into the dropped middle.
A polling job that runs the analysis script every 15 minutes against your tracked queries, stores results in a database, and flags anomalies. The compute cost is negligible and the API cost is $0.50 per 10,000 tweets — usually under $5/month per brand you monitor. See our [Twitter Monitoring tools and use cases](/twitter-api-usecases) page for the broader monitoring landscape.
Only if you are scoring with RoBERTa at high volume (10,000+ tweets per pass). TextBlob and VADER are CPU-only and process tens of thousands of tweets per second. RoBERTa on CPU runs about 10–50 tweets per second depending on your machine — fine for batches of a few hundred, slow for batches of 100,000. A consumer GPU (RTX 3060 or better) pushes RoBERTa to 1,000+ tweets per second.
The sentiment libraries (TextBlob, VADER, transformers) are all free. The only cost is fetching the tweets. GetXAPI gives $0.10 in free credits at signup with no credit card, which covers about 2,000 tweets — enough to run a meaningful pilot. After that it is $0.05 per 1,000 tweets.
Replace the model with a multilingual sentiment model. `cardiffnlp/twitter-xlm-roberta-base-sentiment` is the same RoBERTa architecture trained on tweets across roughly 8 languages including Spanish, French, German, Portuguese, Italian, and Arabic. Drop it into the same `pipeline()` code and remove the `lang:en` filter from your search query.
Yes. GetXAPI's `tweet/advanced_search` endpoint covers historical tweets by date range — use the `since:` and `until:` operators in your query string, e.g. `"AcmeCorp" since:2025-01-01 until:2025-03-01`. The advanced-search operator reference has the full list.
Filter at the search-query level (`min_faves:1` removes most zero-engagement spam) or at the analysis level (drop accounts where `follower_count == 0 AND following_count > 1000`, a common bot signature). For higher confidence, set `verified:true` in the query to only score tweets from blue-check accounts — smaller sample but cleaner signal.
Yes for public tweets. The legality question usually comes up around storing tweets or republishing them — which is governed by Twitter / X's Terms of Service and your local data-protection law (GDPR, CCPA, etc.). The analysis itself — reading public tweets and computing a score — is standard practice and uncontroversial. Just do not republish the raw tweet text without attribution.
Check out similar blogs
More guides on the Twitter/X API, scraping, and pricing.







