How To Scrap YouTube Comments?

Priyanka Dave
Analytics Vidhya
Published in
2 min readMar 27, 2021

--

Web Scraping is the process of importing information from a website into local files saved on your computer. Later on you can use that information for analysis purpose.

Here, we will see how we can scrap YouTube comments and generate csv file from scraped data.

1. Prerequisite:

  • You will require Google API Key (You can refer to this link to generate API Key)
  • Put key inside .env file

2. Import required packages:

from googleapiclient.discovery import build
from urllib.parse import urlparse, parse_qs
import pandas as pd
import os
import time
from dotenv import load_dotenv
load_dotenv()

3. Functiona to get video id using URL:

def get_video_id(url):
u_pars = urlparse(url)
quer_v = parse_qs(u_pars.query).get('v')
if quer_v:
return quer_v[0]
pth = u_pars.path.split('/')
if pth:
return pth[-1]

4. Function to scrap comments:

def video_comments(video_id,api_key):resource = build('youtube', 'v3', developerKey=api_key)try:
request = resource.commentThreads().list(
part="snippet,replies",
videoId=video_id,
maxResults= 5,
order='time')

#execute the request
response =request.execute()
dfa=[]
while response:
for item in response['items']:
item_info = item["snippet"]
topLevelComment = item_info["topLevelComment"]
comment_info = topLevelComment["snippet"]
dfa.append({
'comment_by': comment_info["authorDisplayName"],
'comment_text': comment_info["textDisplay"],
'comment_date': comment_info["publishedAt"],
'likes_count': comment_info["likeCount"],
})
# Again repeat
if 'nextPageToken' in response:
response = resource.commentThreads().list(
part = 'snippet,replies',
videoId = video_id,
maxResults= 100,
pageToken=response['nextPageToken'] #get 100 comments
).execute()
else:
break

pd.DataFrame(dfa, columns=('comment_by', 'comment_text', 'comment_date', 'likes_count'))
df = pd.DataFrame(dfa)path = "Data/"+video_id+".csv"
full_path = os.path.abspath(path) # get the FULL path
df.to_csv(path)
return full_path
except:
return False

5. Function calls:

api_key = os.getenv('GOOGLE_API_KEY')
url = "https://youtubeurl"
video_id = get_video_id(url)video_comments(video_id,api_key)

6. Check output:

  • Inside data directory you will be able to see .csv file of scraped comments.

You will be able to download Source Code for Python backend and React front end from my GitHub account here.

You can check my another blog on how to use this data to explore NLP techniques here.

Thank you.

--

--