How To Scrap YouTube Comments?

Published in

Analytics Vidhya

2 min readMar 27, 2021

Web Scraping is the process of importing information from a website into local files saved on your computer. Later on you can use that information for analysis purpose.

Here, we will see how we can scrap YouTube comments and generate csv file from scraped data.

1. Prerequisite:

You will require Google API Key (You can refer to this link to generate API Key)
Put key inside .env file

2. Import required packages:

from googleapiclient.discovery import build
from urllib.parse import urlparse, parse_qs
import pandas as pd
import os
import time
from dotenv import load_dotenv
load_dotenv()

3. Functiona to get video id using URL:

def get_video_id(url):
    u_pars = urlparse(url)
    quer_v = parse_qs(u_pars.query).get('v')
    if quer_v:
        return quer_v[0]
    pth = u_pars.path.split('/')
    if pth:
        return pth[-1]

4. Function to scrap comments:

def video_comments(video_id,api_key):resource = build('youtube', 'v3', developerKey=api_key)try:
        request = resource.commentThreads().list(
                            part="snippet,replies",
                            videoId=video_id,
                            maxResults= 5,
                            order='time') 
                            
    #execute the request
        response =request.execute()
        dfa=[]while response:
            for item in response['items']:
                item_info = item["snippet"]
                topLevelComment = item_info["topLevelComment"]
                comment_info = topLevelComment["snippet"]dfa.append({
                'comment_by': comment_info["authorDisplayName"],
                'comment_text': comment_info["textDisplay"],
                'comment_date': comment_info["publishedAt"],
                'likes_count':  comment_info["likeCount"],
                })# Again repeat
            if 'nextPageToken' in response:
                response = resource.commentThreads().list(
                    part = 'snippet,replies',
                    videoId = video_id,
                    maxResults= 100,
                    pageToken=response['nextPageToken']  #get 100 comments
                ).execute()
            else:
                break
    
        pd.DataFrame(dfa, columns=('comment_by', 'comment_text', 'comment_date', 'likes_count'))df = pd.DataFrame(dfa)path = "Data/"+video_id+".csv"
        full_path = os.path.abspath(path)  # get the FULL pathdf.to_csv(path)
        return full_pathexcept:
        return False

5. Function calls:

api_key = os.getenv('GOOGLE_API_KEY')
url = "https://youtubeurl"video_id = get_video_id(url)video_comments(video_id,api_key)

6. Check output:

Inside data directory you will be able to see .csv file of scraped comments.

You will be able to download Source Code for Python backend and React front end from my GitHub account here.

You can check my another blog on how to use this data to explore NLP techniques here.

How To Scrap YouTube Comments?

Thank you.

Written by Priyanka Dave