0:00
/
0:00
Transcript

How to Setup a ScrapeBox Automator Script for B2B Lists

Learn how to create a self-contained Scrapebox Automator process for efficient web scraping. Configure the Automator, and streamline your workflow with this step-by-step guide.

This article will guide you through the process of setting up a ScrapeBox Automator process for B2B Email List scraping. This method allows you to create a self-contained, portable system for automating your B2B Email List efforts, using ScrapeBox. This article has been taxonomically generated, using the longform transcript from the video on this page, so it may contain errors.
Be sure to reference the video, above, for the most accurate steps. This video was recorded in 2021, but I STILL use this exact Automator script, today.

If you want to bypass all this trouble, and have me do some locational niche scraping for you, with my infrastructure, contact me, here:

Contact Ivan


Ivan David Lippens is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.


Why Automate List Scraping with ScrapeBox?

ScrapeBox is a powerful tool for harvesting data from the web, and its Automator plugin allows you to chain together a series of actions to create automated workflows. By automating your list scraping process, you can:

  • Save Time: Automate repetitive tasks like harvesting URLs, checking for live links, and guessing contact pages.

  • Increase Efficiency: Run multiple scrapes simultaneously on different servers.

  • Scale Your Efforts: Easily expand your B2B List pool, by adding more keywords and locations.

  • Maintain Organization: Keep all your files and processes organized within a single folder.

Prerequisites

Before you begin, make sure you have the following:

  • ScrapeBox: The core software for web scraping.

  • Automator Plugin: A premium ScrapeBox plugin for creating automated workflows.

  • Proxies: Essential for avoiding IP blocks during large scrapes.
    Your use of my recommended proxy links will support new content.
    My recommended services: IPv6 Residential and datacenter proxies

  • Keyword and Location Lists: Lists of keywords and locations relevant to your target audience.

  • A Batch File for Space Removal: (Optional but recommended) To improve the accuracy of contact page guesses.

Steps to Setting Up the ScrapeBox Automator Process

The process involves setting up a folder with the necessary files and then configuring the Automator to run the steps in sequence. Here's a breakdown:

1. Create a Dedicated Folder

  • Create a new folder on your desktop or in a preferred location.

  • Name it descriptively (e.g., "Ivan's ScrapeBox B2B Lead Automator").

  • This folder will house all the necessary files and subfolders.

2. Organize Subfolders and Files

Inside your main folder, create the following subfolders:

  • Resources:

    • Top 5000 US Cities.txt (or your preferred location list)

    • Niches.txt (your list of target industries or niches)

    • Phone Number Rejects.txt (patterns to filter out unwanted numbers)

    • Permutations Term Blacklist.txt (terms to exclude from URLs, e.g., .gov, .edu, wiki)

    • E-commerce Footprints.txt (if targeting e-commerce)

    • 1300 Local Business Keywords.txt (if targeting Local Businesses)

    • Canadian Cities.txt (if targeting Canadian Businesses)

    • Netherlands Cities.txt (if targeting Businesses in the Netherlands)

    • Australian Cities.txt (if targeting Australian Businesses)

  • Proxies:

    • URL Proxies.txt (residential proxies for URL harvesting)

    • Data Center Proxies.txt (data center proxies for other tasks)

    • Mobile Proxies.txt (if using mobile proxies)

  • Permutations:

    • Niche Terms.txt (a copy of your niches list)

    • Contact Keyword Export.txt (permutations of contact page guesses)

    • Keyword Export.txt (permutations of keywords and locations)

    • Contact Pages.txt (common contact page slugs, e.g., /contact, /about-us)

    • Updated-Contact Keyword Export.txt (output file after removing spaces)

    • RemoveSpaces.bat (your batch file for removing spaces)

  • Tier 1:

    • Tier 1 URL.txt (initial harvested URLs)

    • Tier 1 URL Alive.txt (results of the alive check)

    • Tier 1 URL Root.txt (trimmed to root domains)

    • Tier 1 URL Contact Pages Alive.txt (results of contact page guesses)

    • Tier 1 URL and Root and Contact Pages Alive.txt (final combined URL list)

3. Create Shortcuts

Create shortcuts on your desktop to the following files for easy access:

  • Permutations/Niche Terms.txt

  • Tier 1/Tier 1 URL and Root and Contact Pages Alive.txt

  • Permutations/Permutations Term Blacklist.txt

  • Resources/Temp.txt (for custom location lists)

4. Configure the Automator File

  • Open ScrapeBox and go to the Automator plugin.

  • Load the provided Automator file (or create a new one).

  • Use the %desktop% placeholder to dynamically reference your folder path.

Here's a breakdown of the Automator steps and how to configure them:

  1. Load Proxies:

    • Select "Load from file" and navigate to Proxies/URL Proxies.txt.

    • Replace the desktop path with %desktop%/YourFolderName/Proxies/URL Proxies.txt.

  2. Import Keyword List:

    • Select "Import from file" and navigate to Resources/Top 5000 US Cities.txt (or your chosen city list).

    • Replace the path with %desktop%/YourFolderName/Resources/Top 5000 US Cities.txt.

  3. Merge Keywords:

    • Select "Merge keywords" and navigate to Permutations/Niche Terms.txt.

    • Replace the path with %desktop%/YourFolderName/Permutations/Niche Terms.txt.

  4. Export Keyword List:

    • Select "Export to file" and navigate to Permutations/Keyword Export.txt.

    • Replace the path with %desktop%/YourFolderName/Permutations/Keyword Export.txt.

    • Change the number of results (e.g., 30) to control the depth of the scrape.

  5. Harvest URLs:

    • Select "Start harvester" and choose your preferred search engines (e.g., Yahoo).

    • Load the keyword list from Permutations/Keyword Export.txt.

    • Replace the path with %desktop%/YourFolderName/Permutations/Keyword Export.txt.

  6. Remove Duplicate URLs:

    • Select "Remove/Filter" -> "Remove duplicate URLs".

  7. Remove URLs with Extensions:

    • Select "Remove/Filter" -> "Remove URLs containing entries from list".

  8. Export URLs:

    • Select "Export URL list" -> "Export as text file" and save to Tier 1/Tier 1 URL.txt.

    • Replace the path with %desktop%/YourFolderName/Tier 1/Tier 1 URL.txt.

  9. Load Data Center Proxies:

    • Select "Load from file" and navigate to Proxies/Data Center Proxies.txt.

    • Replace the path with %desktop%/YourFolderName/Proxies/Data Center Proxies.txt.

  10. Check Alive (Tier 1 URLs):

    • Select "Check alive" and import the URLs from Tier 1/Tier 1 URL.txt.

    • Replace the path with %desktop%/YourFolderName/Tier 1/Tier 1 URL.txt.

  11. Export Alive URLs:

    • Select "Export URL list" -> "Export alive URLs" and save to Tier 1/Tier 1 URL Alive.txt.

    • Replace the path with %desktop%/YourFolderName/Tier 1/Tier 1 URL Alive.txt.

  12. Trim to Root:

    • Select "Trim to root".

  13. Remove Duplicate URLs (Root):

    • Select "Remove/Filter" -> "Remove duplicate URLs".

  14. Export Root URLs:

    • Select "Export URL list" -> "Export as text file" and save to Tier 1/Tier 1 URL Root.txt.

    • Replace the path with %desktop%/YourFolderName/Tier 1/Tier 1 URL Root.txt.

  15. Merge Contact Page Guesses:

    • Select "Merge keywords" and load from Permutations/Contact Pages.txt.

    • Replace the path with %desktop%/YourFolderName/Permutations/Contact Pages.txt.

    • Select "Merge with" and load from Tier 1/Tier 1 URL Root.txt.

    • Replace the path with %desktop%/YourFolderName/Tier 1/Tier 1 URL Root.txt.

  16. Export Contact Page Guesses:

    • Select "Export keyword list" -> "Export to file" and save to Permutations/Contact Keyword Export.txt.

    • Replace the path with %desktop%/YourFolderName/Permutations/Contact Keyword Export.txt.

  17. Execute External Program (Remove Spaces):

  18. Check Alive (Contact Pages):

    • Select "Check alive" and load from Permutations/Updated-Contact Keyword Export.txt.

    • Replace the path with %desktop%/YourFolderName/Permutations/Updated-Contact Keyword Export.txt.

  19. Export Alive Contact Pages:

    • Select "Export URL list" -> "Export alive URLs" and save to Tier 1/Tier 1 URL Contact Pages Alive.txt.

    • Replace the path with %desktop%/YourFolderName/Tier 1/Tier 1 URL Contact Pages Alive.txt.

  20. Combine URLs:

    • Select "Clear harvester grid".

    • Select "Import URL list" -> "Import and add URLs into harvester" and load from Tier 1/Tier 1 URL Alive.txt.

    • Replace the path with %desktop%/YourFolderName/Tier 1/Tier 1 URL Alive.txt.

    • Repeat "Import and add URLs into harvester" for Tier 1/Tier 1 URL Root.txt and Tier 1/Tier 1 URL Contact Pages Alive.txt, replacing the paths accordingly.

  21. Remove Duplicate URLs (Combined):

    • Select "Remove/Filter" -> "Remove duplicate URLs".

  22. Export Final URL List:

    • Select "Export URL list" -> "Export as text file" and save to Tier 1/Tier 1 URL and Root and Contact Pages Alive.txt.

    • Replace the path with %desktop%/YourFolderName/Tier 1/Tier 1 URL and Root and Contact Pages Alive.txt.

  23. Save the Automator File:

    • Save the configured Automator file within your main folder.

5. Running the Automator Process

  1. Add Keywords: Add your keywords to the Permutations/Niche Terms.txt file.

  2. Run the Automator: Open ScrapeBox, go to the Automator, load your saved file, and click "Run".

Tips and Considerations

  • Testing: Run small test scrapes with a few keywords and locations to ensure everything is working correctly.

  • Customization: Adjust the depth of the scrape, city lists, and other parameters based on your needs.

  • Error Handling: Be prepared to troubleshoot issues like stuck threads or incorrect file paths.

Potential Enhancements

  • Email Notifications: Add a step to the Automator to send email notifications upon completion or for specific events.

  • Keyword Filtering: Incorporate keyword filtering to refine your results.

  • Deeper Scrapes: Experiment with deeper scrapes and alternative methods like trimming to root for potentially higher yields.

  • Metadata Scraping: Consider scraping metadata before email scraping for better filtering options.

Conclusion

By following these steps, you can create a powerful and efficient ScrapeBox Automator process for B2B Email List collection. This method allows you to streamline your workflow, save time, and scale your lead generation efforts. Remember to customize the process to fit your specific needs and adhere to your relevant laws, regarding web scraping.

Again, if you want to bypass all this trouble, and have me do some scraping for you, with my infrastructure, contact me, here:

Contact Ivan


Ivan David Lippens is a reader-supported publication. To receive new posts and support my work, consider becoming a free or paid subscriber.

Get more from Ivan David Lippens in the Substack app
Available for iOS and Android