0:00
/
0:00
Preview

How to Scrape Media Kits for SEO

Using Scrapebox to find Media Kits for Editorial Link-Building projects.

Changelog:

  • 01/18/2025 - Original Post Published.

  • 10/09/2025 - Updated post to premium, and added a special video below the paywall, “Link Insertion Research Walkthrough”, which shows more of my process for finding good links, with real traffic.

What's up, guys! Ivan, here, and in this video, I’ll show you how I set up a scrape for Media Kits, which I use for link building. Recently, I misplaced my Media Kit URL file and decided to redo the scrape. Let’s dive into the process, so you can use it for building backlinks at your leisure.


Tools and Setup

Scraping Server and Proxies

  • Scraping Server:
    I use a dedicated local scraping server, with Scrapebox installed.

  • Proxy Service:
    For scraping URLs from Yahoo, I currently use Lightning Proxies.
    (2025 update: I now use datacenter proxies, instead of IPv6, because they no longer work, when scraping Yahoo.)
    Pricing starts at $10/day.
    If you plan to scan or pull data from the resulting URLs, then you’ll need a cheap datacenter proxy solution, like WebShare.
    You can probably use Lightning Proxies datacenter products, as well, for a similar effect, but as of this post, I have not tested them, yet.

Why Yahoo Over Google or Bing?

  • Yahoo’s Advantages:
    Yahoo offers better topical URL results compared to Bing and Google.

  • Proxies for Yahoo:
    Unlimited bandwidth proxies perform well for short scraping sessions.


Step-by-Step Guide

Purchasing and Configuring Proxies

  1. Choose a Plan: I opted for Lightning Proxies’ $10/day plan.

  2. Payment Tip: Use Stripe, and click the alternate option for PayPal.
    PayPal often resolves card rejection issues.

  3. Whitelist Your IP:
    Add your IP to the proxy provider's safe list for secure access.

Setting Up ScrapeBox

  1. Adjust Settings:

    • Set connections to 7 threads (1/4th of bandwidth capacity).

    • Timeout settings should match your bandwidth.

  2. Import Permutations:

    • Use city-based permutations like the top 5,000 U.S. cities.

    • Combine permutations with “Media Kit” keywords.

Scraping Process

  1. Search Configuration:

    • Target Yahoo with a depth of 50 results per search term.

    • Automate the scraping process using the ScrapeBox Automator plugin.

  2. URL Filtering:

    • Remove duplicates and irrelevant entries.

    • Sort URLs to focus on viable domains.


Analyzing Results

URL Scrubbing and Exporting

  • Use ScrapeBox's tools to scrub:

    • URLs containing undesired keywords.

    • Hostnames that are IP addresses.

Page Scanner

  • Custom Footprint:
    Look for “Media Kit” in the page content, often located in footers.

  • Export filtered URLs for further analysis.


Post-Scraping Workflow

Checking Domain Authority

  1. Use Tools like Open Page Rank: Identify high-value domains.

  2. Analyze Metrics:

    • Keywords ranking.

    • Referring domains.

Example: Media Kit Review

  • Look for Media Kits that mention:

    • Sponsored content.

    • Editorial coverage details.

    • Pricing for link insertions.

Outreach

  1. Draft an Email Inquiry:

    • Subject: Editorial Content Request.

    • Key Questions:

      • Is it tagged as sponsored?

      • Is it a do-follow link?

      • Is it permanent?

  2. Pricing Benchmark:

    • Good links often range from $100 to $350, depending on metrics.

    • It’s also possible to find linking opportunities for $5 to $25, usually on small blogs, where people don’t understand how to value the assets.
      These are phenomenal linking opportunities for inner pages.


Final Thoughts

Scraping for Media Kits is a powerful way to identify link-building opportunities. Use proxies and automation tools to streamline the process. Don’t forget to validate domain authority and contact site owners directly for collaboration.

Resources


Members-Only Section
for Premium Subscribers

This section contains page specific public case studies, where I go into more depth on locating viable pages, for legitimate projects.
It also contains updates to the original post.

This post is for paid subscribers