https://www.fivefilters.org/2021/how-to-turn-a-webpage-into-an-rss-feed-pt2/ Skip to content FiveFilters.org Menu * Apps and Services + Feed Control + Push to Kindle + Simple Print + Full-Text RSS + Feed Creator + Lab o Txtify.it o PDF Newspaper o Term Extraction o Explore Independent Media * Pricing * Blog * Login * Help * About * Media Activism [social] How to turn a webpage into an RSS feed using Feed Creator - Part 2 April 3, 2021April 2, 2021 In part 1 we showed you how to turn a webpage into an RSS feed using our Feed Creator application and its simple selector mode. In this post we'll show you how to use advanced mode and CSS selectors to include additional item information such as the publication date, featured image, and summary text. [css-selectors-web-page-elements-1-1024x512]CSS selectors used to target elements on a web page If you're new to Feed Creator, we recommend you start by reading part 1 and then continue here. What's a CSS selector? CSS is a standardised web technology primarily used for styling web page elements. As part of its specification, it includes selectors to target HTML elements to be styled. Feed Creator does not concern itself with the styling aspect of CSS, but does accept CSS selectors to help extract elements to be used in the feeds it produces. Generating a feed from a webpage using CSS selectors In this post we're going to show you how to create a feed using CSS selectors, step by step. We'll use Reuters Investigates as our source page, but the technique can be applied to any site. Short on time? If you'd rather have us create a feed for you, please submit a custom feed request. What you'll need 1. Some basic knowledge of HTML and CSS selectors 2. The webpage address (URL) of the source page you want to create a feed from 3. Our Feed Creator application (we offer a free, hosted service to get started, no signup required - if you find it useful, it's also available for self-hosting or as a premium, hosted service) 4. Your browser's developer tools to inspect the source page's HTML (we'll use Firefox's Developer tools in this guide, but Chrome will be very similar) Overview These are the basic steps we'll be following: 1. Find appropriate selectors for the main item blocks 2. Find appropriate selectors for individual item elements (e.g. title, date, image, summary) 3. Enter the selectors in Feed Creator to generate the feed Step 1: Make source page and Feed Creator easily accessible We'll be switching between the source page and Feed Creator in the steps below, so we recommend you open them in two tabs (or have the windows side-by-side). Tab 1: Reuters Investigates - reuters.com/investigates/ Tab 2: Feed Creator - createfeed.fivefilters.org Reuters Investigates and Feed Creator in two separate tabs. Step 2: [Source page] Identify the items that should be used in the feed In this example we're using the Reuters Investigates page, and the areas we've marked in red rectangles contain the items of interest. The items we want to turn into a feed from the Reuters Investigates site. Step 3: [Feed Creator] Enter the source page URL and choose Advanced Selectors Now switch to the Feed Creator tab and enter the Reuters Investigates URL in the field labeled 'Enter web page URL': https:// www.reuters.com/investigates/ Below it, choose 'Advanced Selectors' [feed-creator-advanced-selectors-1024x180] Step 4: [Source page] Create selector for the desired items To create a usable selector, we'll want to inspect the desired items and identify the main elements in the underlying HTML. So let's jump back to our source page. Move your cursor over one of the items and right-click and choose 'Inspect Element' in Firefox ('Inspect' in Chrome). Firefox context menu showing the 'Inspect Element (Q)' menu item. You'll now see the item's underlying HTML markup. What we're looking for is an HTML element for a single item. Later, we will use additional selectors to target title, summary, image and date elements within each selected item. Firefox's inspector showing the underlying HTML A common mistake is to identify an element that contains all the items and to create a selector for that. For example, the parent element of the highlighted
element in the image above is such an element, so targeting it with div.section-articles would be selecting a single element. That's not what Feed Creator expects as the item selector (unless there's only ever a single item on the page). We have a number of options here for choosing a suitable CSS selector: * article to select all
elements on the page * article.section-article-container to select all
elements with a class attribute containing "section-article-container" * div.section-article to select all
elements with a class attribute containing "section-article" Javascript-generated elements At the moment Feed Creator only works with HTML elements that are returned by the server in its initial response. Some sites rely on Javascript to construct elements and sometimes pull in the desired items via additional requests after the page has loaded in your browser. When you inspect elements using your browser's developer tools, as we're doing here, you're seeing the final result after Javascript execution. This might not be what Feed Creator sees when it processes the page. The easiest way to make sure you're not using attributes that Feed Creator cannot see is to disable Javascript in your browser temporarily, reload the source page, and then inspect elements using your browser's developer tools. Step 5: [Source page] Ensure selector targets all desired items We want the selector we choose to match all the elements we want, and nothing more. An easy way to test this is to enter the selectors, one by one, into the Search HTML field in developer tools (CTRL+F in Chrome to bring up the search field). Both Firefox and Chrome will show you how many elements are selected by the selector and will allow you to move through them by hitting Enter. Firefox's developer tools showing HTML elements matching CSS selector 'article.section-article-container'Using Firefox's Search HTML field to find all HTML elements matching the CSS selector 'article.section-article-container' The HTML search field in developer tools is not exclusively for CSS selectors, so when entering 'article', Firefox will also find instances of the text 'article' wherever it appears in the HTML. To avoid this, change the input to something that more resembles a CSS selector, such as by adding 'html' before the selector: 'html article'. This will find all
elements within the root element, essentially the same CSS selector as just 'article'. Another option is to open the console in developer tools with CTRL+Shift+K (CTRL+Shift+J in Chrome) and enter your CSS selector in a call to $$(), for example: $$('article'). You will then see a list of selected elements which you can hover over to highlight on the page or click into to view in the element inspector panel. All three selectors listed in the previous step match the content we want on the page, so we could go with any one of them. When deciding which selector to use, we like to consider the likelihood of a selector matching more than we want in the future, or a completely different set of items in the case of a site redesign. That's more likely to happen with article (for example, an element