Article downloader guide

Note: You MUST follow the appropriate copyright laws that apply to you when using content, images and videos from the internet. You are responsible for your own content use.

Purpose

The following topics are covered in this guide.

  • Overview of the article downloader
  • How to use the article downloader to scrape articles
  • Advanced usage:
    – How to filter content
    – How to remove unwanted paragraphs in articles
    – How to save only titles
    – How to scrape custom data using Xpath

Overview of the article downloader

You can use the article downloader to scrape content off the web.

You can scrape from 2 sources:

  1. Google search results
  2. Custom sites (via Bing site results Or Google)

The article downloader can be found on the main application menu.

main menu -scrape content

How to use the article downloader to scrape articles

1. Type in the keywords you want to scrape content for.

2. Check your custom sources and Google region settings.

– Each custom site will return ~50 urls per keyword. Each custom site can be set to scrape from Bing Or Google.

Google or Bing
– Each google search will return ~100 urls per keyword. You can select a specific Google region to get language specific results.


– Set google to “[None]” to disable it

3. Click [Scrape Urls] to begin

4. List of scraped Urls appear in the url grid.

– You can also import your own list of urls via clipboard or a text file. Just right click on the grid for a menu.


5. Start scraping for content by click on [Download Article]. The results appear in the large content box below.

 

6. Click on the [Export] button to save the results to your hard drive.

 

Advanced: How to filter content

You can filter the downloaded articles to only show articles that contain certain words, meet a word length etc.

1. Click on [Show Content Filter] to open up the filter window.

 

2. The filter panel allows advanced filtering conditions to remove unwanted content

3. Once you are happy with the filtering just click on [Export].

The article downloader will only save articles that are visible on the content grid.

Advanced: How to remove unwanted paragraphs in articles

You can filter out unwanted paragraphs within articles. Click on [Enable Paragraph mode]

Each article is broken up, and each paragraph is placed on its on row.

You can filter out unwanted paragraphs now.

Some recommendations:

  • Set word count filter to bigger > 15 words. This removes a lot of spammy headers etc.
  • Set content filter to only include your keyword. Potentially only leaving behind relevant content.
  • Set content filter to remove any paragraphs that include competitor names etc.

Below I set the word count filter. Notice how a lot of the headers and spammy lines are removed. (Compare to the above screenshot).

Click on [Finish] button to return back to the normal article view.

Advanced: How to save only titles

Right click anywhere on the content grid.

[Copy Titles] command will copy titles to the clipboard.

Advanced: How to scrape custom data using Xpath

The article downloader uses Xpath to find and extract content.

You can double click on the xpath cell to manually edit the xpath.

I have set the xpath for one article to only scrap “//p” tags

You can also set the xpath for selected rows by right clicking on your selection

Scrape HTML content

You can scrape HTML content too. Just enable HTML first.

Here is an example of how you can scrape meta keywords etc

meta keywords scraping

 

To set the Xpath, you can double click on the cell to edit it manually. You can also right click to set all.

Try setting the Xpath to //meta[@name=”keywords”]

You can then select all grid results and copy them using the keyboard or via the available right click menu.

 

admin has written 31 articles

14 thoughts on “Article downloader guide

    1. admin says:

      Look up section on “Advanced: How to remove unwanted paragraphs in articles” in this page.

      Basically you want to use the “Show Content Filter” button.

      A simple filtering box will popup and you will have a whole bunch of filtering options.
      http://help.satinblue.com.au/wp-content/uploads/2014/10/article-downloader-filter-editor.png

      Since I’m not sure what you have tried, if you tell me where exactly you got stuck on (with a screenshot)
      I can help unstick you from there and provide more specific guidance.

  1. Fabian says:

    I am trying to download the articles from EzineArticles bu I receive the message: “Error en el servidor remoto: 403(prohibido) @ezinearticles”

  2. Olak says:

    Hi, Is there any way of limiting the amount of articles returned per keyword when using google search?
    And is there any way of knowing the order of search engine ranking for returned pages. Thanks

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

This site uses Akismet to reduce spam. Learn how your comment data is processed.