Note: You MUST follow the appropriate copyright laws that apply to you when using content, images and videos from the internet. You are responsible for your own content use.
The following topics are covered in this guide.
- Overview of the article downloader
- How to use the article downloader to scrape articles
- Advanced usage:
– How to filter content
– How to remove unwanted paragraphs in articles
– How to save only titles
– How to scrape custom data using Xpath
Overview of the article downloader
You can use the article downloader to scrape content off the web.
You can scrape from 2 sources:
- Google search results
- Custom sites (via Bing site results Or Google)
The article downloader can be found on the main application menu.
How to use the article downloader to scrape articles
1. Type in the keywords you want to scrape content for.
2. Check your custom sources and Google region settings.
– Each custom site will return ~50 urls per keyword. Each custom site can be set to scrape from Bing Or Google.
– You can also import your own list of urls via clipboard or a text file. Just right click on the grid for a menu.
6. Click on the [Export] button to save the results to your hard drive.
Advanced: How to filter content
You can filter the downloaded articles to only show articles that contain certain words, meet a word length etc.
1. Click on [Show Content Filter] to open up the filter window.
3. Once you are happy with the filtering just click on [Export].
The article downloader will only save articles that are visible on the content grid.
Advanced: How to remove unwanted paragraphs in articles
You can filter out unwanted paragraphs within articles. Click on [Enable Paragraph mode]
Each article is broken up, and each paragraph is placed on its on row.
You can filter out unwanted paragraphs now.
- Set word count filter to bigger > 15 words. This removes a lot of spammy headers etc.
- Set content filter to only include your keyword. Potentially only leaving behind relevant content.
- Set content filter to remove any paragraphs that include competitor names etc.
Below I set the word count filter. Notice how a lot of the headers and spammy lines are removed. (Compare to the above screenshot).
Click on [Finish] button to return back to the normal article view.
Advanced: How to save only titles
Right click anywhere on the content grid.
[Copy Titles] command will copy titles to the clipboard.
Advanced: How to scrape custom data using Xpath
The article downloader uses Xpath to find and extract content.
You can double click on the xpath cell to manually edit the xpath.
I have set the xpath for one article to only scrap “//p” tags
You can also set the xpath for selected rows by right clicking on your selection
Scrape HTML content
You can scrape HTML content too. Just enable HTML first.
Here is an example of how you can scrape meta keywords etc
To set the Xpath, you can double click on the cell to edit it manually. You can also right click to set all.
Try setting the Xpath to //meta[@name=”keywords”]
You can then select all grid results and copy them using the keyboard or via the available right click menu.