LinkedIn is the world’s largest professional network with over 850 million members. With so many detailed profiles, LinkedIn is a goldmine for data scraping. But is scraping LinkedIn data allowed and ethical? What are the limits?
And how can you scrape LinkedIn profiles, jobs, companies, groups, and more? This comprehensive guide covers everything you need to know about how to do LinkedIn Data Scraping.
Table of Contents
What is LinkedIn Data Scraping?
LinkedIn data scraping refers to the automated extraction of data from LinkedIn profiles, companies, jobs, groups, and other sections of the site. It involves using scraping tools and scripts to extract large volumes of targeted information from LinkedIn.
The scraped LinkedIn data may include member profile information, company details, job postings, group discussions, and more. This extracted data can then be structured and analyzed to gain business intelligence.
What Can You Scrape From LinkedIn?
The possibilities are nearly endless when it comes to scraping data from LinkedIn. Here are some of the most common and useful types of data that can be scraped:
LinkedIn Profiles
Profile data including name, job title, education, skills, endorsements, connections, etc. Extremely useful for sales prospecting and recruiting.
LinkedIn Jobs
Details on job postings including description, responsibilities, qualifications, salary, etc. Great for competitive analysis and recruitment.
LinkedIn Groups
Group members, discussions, and posts. Valuable for understanding interests and engagement.
LinkedIn Company Pages
Company details, employees, jobs, updates, etc. Useful for sales prospecting and market research.
LinkedIn Search Results
Scraping search results for keywords can uncover target companies and contacts.
The list goes on. In short, LinkedIn is a goldmine of professional data if you know how to extract it. Now let’s look at how to scrap this data legally and ethically.
Why Scrape Data from LinkedIn?
Here are some of the key reasons businesses scrape data from LinkedIn:
- Lead generation – Scrape member profile data to build targeted lead lists for sales and marketing.
- Competitive analysis – Analyze data about your competitors’ employees, products, jobs etc.
- Recruitment – Scrape and compile candidate profiles from LinkedIn Jobs to identify potential hires.
- Market research – Gather data on your industry, target audience, trends etc. to inform business decisions.
- Sales intelligence – Identify key decision makers and generate sales leads from company and profile data.
- Job market analysis – Understand hiring demand and trends by scraping LinkedIn job listings.
- LinkedIn monitoring – Track brand mentions, competitor activity and other LinkedIn data.
Is Scraping LinkedIn Data Legal and Ethical?
Before scraping any website, it’s important to understand the legal and ethical implications. LinkedIn’s User Agreement prohibits scraping their site via automated means. However, they don’t seem to go after individuals scraping small amounts of data for research or personal use.
That said, scraping large amounts of LinkedIn data to sell or for commercial use is unethical and risky. You could have your account banned or face legal issues. When in doubt, reach out to LinkedIn to see if they will grant you access to their data for your specific use case.
Overall, tread carefully when scraping LinkedIn. Make sure you understand and follow their guidelines. Scraping a limited amount of public profile data for non-commercial purposes seems to be tolerated. But beyond that, you’re on shaky ground both legally and ethically.
Also read: How to Add Publications to LinkedIn: The Comprehensive Guide
LinkedIn Scraping Tools and Services
Rather than building your own LinkedIn scraper from scratch, you may want to consider using an existing tool or service. Here are some of the top options:
- Octoparse – Web scraping platform with specific templates for LinkedIn including profiles, companies, jobs, groups, and more. Free trial available.
- ParseHub – Visual web scraper where you can record scraping LinkedIn pages. Free plan available with 100 extractions per month.
- ScrapeHero – Scraper API and cloud platform that handles proxies, browsers, CAPTCHAs, and more. Pre-built scrapers available for LinkedIn.
- Import.io – General web scraping tool you can use to extract data from LinkedIn. Free trial then pricing starts at $299/month.
- ScrapeStorm – Proxy API and web scraping service starting at $30/month. Helpful for avoiding LinkedIn blocks.
- SerpApi – APIs for scraping Google, LinkedIn, Twitter, and more. LinkedIn scraper is $30/month.
The benefit of using a service is they handle the difficulties of scraping like proxies, CAPTCHAs, and managing blocks. Just be sure to pick one that fits your budget and scraping needs.
Also read: How Much is LinkedIn Learning? An In-Depth Guide
Scraping LinkedIn with Python and Selenium
For complete flexibility, you may want to scrape LinkedIn profiles and data using Python scripts with Selenium. Here are the key steps:
- Install dependencies – selenium, BeautifulSoup, pandas, time
- Create a LinkedIn account – Make a separate account just for scraping to avoid issues.
- Get cookies – Manually sign in and use Selenium to get your LinkedIn auth cookies.
- Create Selenium web driver – Launch headless Chrome or Firefox to load pages.
- Search LinkedIn – Enter search URLs with filters to target specific data.
- Extract data – Use BeautifulSoup to parse page source and extract needed data.
- Store in pandas – Save scraped LinkedIn data into a pandas data frame for analysis.
- Add proxies – Rotate proxies using a service like ScrapeStorm to avoid blocks.
- Slow it down – Ensure proper delays between page loads to mimic human behavior.
This methodology allows you to scrape various LinkedIn data points like profiles, job listings, company pages, groups, and more. Just be wary of their scraping limits.
Overcoming LinkedIn Bot Detection and Blocks
Since scraping violates LinkedIn’s ToS, they use various bot detection and blocking systems to stop scrapers including:
- ReCAPTCHA tests after viewing too many profiles or jobs
- IP blocks if they detect suspicious activity from your IP address
- Captcha and phone verification on login if you use an automation tool like Selenium
Here are some techniques to help avoid blocks:
- Use proxies – Rotate different residential proxy IP addresses and mimic human behavior.
- Bypass CAPTCHAs – Services like 2Captcha can solve CAPTCHAs automatically.
- Slow it down – Put proper delays between page views and vary wait times.
- Don’t login – Scrape public info that doesn’t require logging in like job postings.
- Use services – Tools like Octoparse have methods in place to prevent blocks.
The key is to scrape LinkedIn sparingly like a real human would. Rapid scraping and clicking will get your account blocked quickly.
Also read: How to Cancel LinkedIn Premium
Step-by-Step Guide to Scraping LinkedIn Profiles
Now that we’ve covered the basics, let’s walk through a detailed guide to scraping profile info from LinkedIn.
1. Install Python Libraries
We’ll use BeautifulSoup and Selenium, so install them:
pip install beautifulsoup4 selenium pandas
Also install browser drivers for Selenium.
2. Create a LinkedIn Account
Sign up for a LinkedIn account dedicated to scraping. Avoid using your personal account.
3. Search LinkedIn for Profiles
Go to LinkedIn and perform a search for people. For example, search for “Site Reliability Engineer” and set the location to “San Francisco Bay Area”.
4. Scroll Through Results
Scroll down slowly on the results page so all the profiles load. LinkedIn loads them dynamically via AJAX.
5. Copy URL
Copy the URL from the profiles section. It should look like:
https://www.linkedin.com/search/results/people/?keywords=Site%20Reliability%20Engineer&location=San%20Francisco%20Bay%20Area
6. Launch Selenium
Launch a Chrome browser via Selenium and input the URL:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://www.linkedin.com/search/results/people/?keywords=Site%20Reliability%20Engineer&location=San%20Francisco%20Bay%20Area")
7. Scroll with Selenium
Scroll down the page slowly to dynamically load all the profiles:
# Scroll down page to trigger AJAX load
scroll_pause_time = 1
# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")
while True:
# Scroll down to bottom
driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")
# Wait to load page
time.sleep(scroll_pause_time)
# Calculate new scroll height and compare with last scroll height
new_height = driver.execute_script("return document.body.scrollHeight")
if new_height == last_height:
break
last_height = new_height
This will scroll the page and trigger the loading of additional profiles.
8. Parse Page Source
Once you’ve scrolled to the bottom, you can parse the page source to extract info on each profile:
from bs4 import BeautifulSoup
soup = BeautifulSoup(driver.page_source, 'html.parser')
profiles = soup.find_all('div', class_='entity-result__item')
9. Extract Profile Data
Now extract key data points from each profile element like name, job title, and profile URL:
for profile in profiles:
name = profile.find('h3', class_='entity-result__title').text.strip()
job_title = profile.find('h4').text.strip()
url = "https://www.linkedin.com" + profile.find('a')['href']
print(name, job_title, url)
This will print out the name, job title, and profile URL for each person.
10. Visit Each Profile
To get additional info like experience and education, loop through and visit each profile:
for url in profile_urls:
driver.get(url)
# Extract info from page
...
Just add delays so you don’t overload LinkedIn with requests. And that’s the basic process for scraping LinkedIn profiles! You can collect links to thousands of profiles based on your search criteria.
Scrape LinkedIn Public Profiles
LinkedIn member profiles are a goldmine of information. Here is how to extract profile data from LinkedIn:
- Get target profile URL such as https://www.linkedin.com/in/alec-friedman
- Use Selenium to navigate to profile URL
- Extract elements using selectors:
- Name:
driver.find_element_by_class_name("text-heading-xlarge").text
- Job Title & Company:
driver.find_element_by_class_name("text-body-medium").text
- Location:
driver.find_element_by_class_name("text-body-small").text
- About text:
driver.find_element_by_class_name("break-words").text
- Skills:
driver.find_elements_by_class_name("pvs-entity__value-item")
- Education etc.
- Store extracted profile data into structured format like CSV
Repeat this process across any number of profiles by looping through a list of profile URLs.
Scraping LinkedIn Job Postings
In addition to profiles, you can scrape data from LinkedIn job postings. Here are the key steps:
- Search for jobs on LinkedIn based on keywords and filters.
- Scroll through the results page to load all job listings.
- Parse the page source to find all
li
tags with classresult-card
. - Extract data from each job element like title, company, date posted, location, and link.
- Visit each job link to get the full description and other details.
- Save job info to a CSV or database.
As long as you avoid spamming their servers, you can build up a nice dataset of job listings in your desired categories and locations.
Also read: How to Find Your LinkedIn URL in 2023
Scraping Company Pages
Target any company’s LinkedIn page to extract useful business data:
- Go to company URL e.g. https://www.linkedin.com/company/apple
- Extract key elements:
- Company name
driver.find_element_by_class_name("org-top-card-summary__title").text
- Industry:
driver.find_element_by_class_name("org-top-card-summary__industry").text
- Employees count:
driver.find_element_by_class_name("org-about-company-module__employees").text
- Description:
driver.find_element_by_class_name("org-about-us-organization-description__text").text
- Website URL:
driver.find_element_by_class_name("org-about-us-company-module__website").text
- Company statistics
- Save company data to CSV/database
Repeat for any number of companies by iterating through a company URL list.
Scrape LinkedIn Groups
LinkedIn groups contain valuable discussions. Scrape groups with this approach:
- Go to group URL e.g. https://www.linkedin.com/groups/82221
- Scroll to load more discussions and posts
- Extract post title, description, author etc. by class name
- Navigate to discussion pages and extract comments
- Save group messages and comments into CSV/Excel/DB
Customize element selectors for different types of groups.
Also read: What does the green dot mean on LinkedIn?
Scrape Google for LinkedIn Profiles
Find LinkedIn profiles on Google using these steps:
- Construct search query e.g.
site:linkedin.com AND "silicon valley" AND "python developer"
- Google search URL:
https://www.google.com/search?q=site%3Alinkedin.com+AND+"silicon+valley"+AND+"python+developer"
- Extract profile name and job title from Google search results
- Extract LinkedIn profile URL from the search result link
- Scrape full profile data using LinkedIn URL
Leverage boolean search operators for focused profile searches on Google indexed LinkedIn pages.
Current LinkedIn Scraping Limits
LinkedIn employs various limits and bot detection systems to block scrapers and spammers. Here are the current limits as of October 2023:
LinkedIn Data | Approx. Limit |
---|---|
Profile Views per Hour | 300 |
Profile Views per Day | 1000 |
Job Search Views per Hour | Around 300 |
Job Search Views per Day | Around 1000 |
Messages Sent | 300 per Day |
Connection Requests | 300 per Day |
Connection Requests | Varies, under 100/hour |
As you can see, it’s easy to hit limits when scraping things like profiles and job listings. So you need to employ delays, proxies, and randomness in your scripts. Always scrap LinkedIn gently to maintain access.
Is LinkedIn Scraping Worth the Risk?
Although possible, scraping LinkedIn does carry significant risks including:
- Account termination and bans – They can blacklist your IP, cookies, and account.
- Legal action – Technically violates their ToS so lawsuits are possible.
- Wasted time – Even if you avoid blocks, limits make scraping tedious.
- Poor data quality – Profiles and job posts often have sparse info.
- Ethical issues – Should you scrape private business contact data?
For most purposes, the risks and downsides outweigh the potential benefits. Only scrape LinkedIn if you have explicit permission and really need the type of data it provides. Proceed with caution!
Conclusion
Scraping LinkedIn data is doable but requires careful precautions to avoid blocks. Rotation proxies, proper delays, using services, and relying more on public data can help you scrape profiles, jobs, companies, and groups without excess friction.
However, the legal and ethical standing of scraping LinkedIn remains questionable at best. Tread very carefully and consider if you can get the data you need through other means before resorting to scraping their private platform.
Key Takeaways
- Check the legal and ethical standing before web scraping any site.
- LinkedIn actively blocks scrapers, so expect limits and use proxies and delays.
- Scrape only the minimum data needed, and focus on public info when possible.
- Use tools like Selenium, BeautifulSoup, and Pandas to extract LinkedIn data in Python scripts.
- Adding delays, using proxies, and spreading out requests can help avoid blocks.
- Scraping large amounts of private LinkedIn data for commercial use is very risky.
- Think carefully about whether LinkedIn scraping is absolutely necessary vs. finding alternate data sources.