A self-hosted API to fetch and mix entries from Atom and RSS feeds (returns Atom, RSS, or JSON)
FeedMixer
FeedMixer is a little web service (Python3/WSGI) which takes a list of feed URLs and combines them into a single (Atom, RSS, or JSON) feed. Useful for personal news aggregators, "planet"-like websites, etc.
FeedMixer exposes three endpoints:
When sent a GET request they return an Atom, an RSS 2.0, or a JSON feed, respectively. The query string of the GET request can contain these fields:
The provided feedmixer_wsgi.py application uses a session that caches HTTP responses so that repeatedly fetching the same sets of feeds can usually be responded to quickly by the FeedMixer service.
The FeedMixer object can be passed a custom requests.session object used to make HTTP requests, which allows flexible customization in how requests are made if you need that.
FeedMixer does not (yet?) do any resource restriction itself:
TO protect your installation either configure a front-end http proxy to take care of your required restrictions (Nginx is a good choice), or/and use suitable WSGI middleware.
$ git clone https://github.com/cristoper/feedmixer.git
$ cd feedmixer
$ pipenv sync
The project consists of three modules:
feedmixer.py
- contains the core logicfeedmixer_api.py
- contains the Falcon-based API. Call wsgi_app()
to
get a WSGI-compliant object to host.feedmixer_wsgi.py
- contains an actual WSGI application which can be used
as-is or as a starting point to create your own custom FeedMixer service.The feedmixer_wsgi module instantiates the feedmixer WSGI object (with sensible defaults and a rotating logfile) as both api and application (default names used by common WSGI servers). To start the service with gunicorn, for example, clone the repository and in the root directory run:
$ pipenv sync $ pipenv run pip3 install gunicorn $ pipenv run gunicorn feedmixer_wsgi
Note that the top-level install directory must be writable by the server running the app, because it creates the logfiles ('fm.log' and 'fm.log.1') there.
As an example, assuming an instance of the FeedMixer app is running on the localhost on port 8000, let's fetch the newest entry each from the following Atom and RSS feeds:
The constructed URL to GET is:
http://localhost:8000/atom?f=https://catswhisker.xyz/shaarli/?do=atom&f=https://hnrss.org/newest&n=1
Entering it into a browser will return an Atom feed with two entries. To GET it from a client programatically, remember to URL-encode the f fields:
$ curl 'localhost:8000/atom?f=https%3A%2F%2Fcatswhisker.xyz%2Fshaarli%2F%3Fdo%3Datom&f=https%3A%2F%2Fhnrss.org%2Fnewest&n=1'
HTTPie is a nice command-line http client that makes testing RESTful services more pleasant:
$ pip3 install httpie $ http localhost:8000/json f==http://hnrss.org/newest f==http://catswhisker.xyz/atom.xml n==1
You should see some JSONFeed output (since we are requesting from the /json endpoint):
HTTP/1.1 200 OK Connection: close Date: Thu, 23 Jan 2020 03:53:45 GMT Server: gunicorn/20.0.4 content-length: 1296 content-type: application/jsonDeploy{ "version": "https://jsonfeed.org/version/1", "title": "FeedMixer feed", "home_page_url": "http://localhost:8000/json?f=http%3A%2F%2Fhnrss.org%2Fnewest&f=https%3A%2F%2Fcatswhisker.xyz%2Fatom.xml&n=1", "description": "json feed created by FeedMixer.", "items": [ { "title": "Kyrsten Sinema, the Only Anti-Net Neutrality Dem, Linked to Comcast Super Pac", "content_html": "<p>Article URL: <a href=\"https://prospect.org/politics/kyrsten-sinema-anti-net-neutrality-super-pac-comcast-lobbyist/\">https://prospect.org/politics/kyrsten-sinema-anti-net-neutrality-super-pac-comcast-lobbyist/</a></p>\n<p>Comments URL: <a href=\"https://news.ycombinator.com/item?id=22124592\">https://news.ycombinator.com/item?id=22124592</a></p>\n<p>Points: 1</p>\n<p># Comments: 0</p>", "url": "https://prospect.org/politics/kyrsten-sinema-anti-net-neutrality-super-pac-comcast-lobbyist/", "id": "https://news.ycombinator.com/item?id=22124592", "author": { "name": "joeyespo" }, "date_published": "2020-01-23T03:32:19Z", "date_modified": "2020-01-23T03:32:19Z" }, { "title": "FO Roundup December 2019", "content_html": "I've started knitting again.", "url": "http://catswhisker.xyz/log/2019/12/3/fo_december/", "id": "tag:catswhisker.xyz,2019-12-04:/log/2019/12/3/fo_december/", "author": { "name": "A. Cynic", "url": "http://catswhisker.xyz/about/" }, "date_published": "2019-12-04T04:48:59Z", "date_modified": "2019-12-04T04:48:59Z" } ] }
Deploy FeedMixer using any WSGI-compliant server (uswgi, gunicorn, mod_wsgi, ...). For a production deployment, put an asynchronous http proxy (like Nginx) in front of FeedMixer to protect it from too many and slow connections (as well as to provide SSL termination, additional caching, authoriziation, etc., as required)
Refer to the documentation of the server of your choice.
For notes on deploying behind Apache, see apache.rst (from html docs: apache.html)
An alternative to using a virtualenv for both building and deploying is to run FeedMixer in a Docker container. The included Dockerfile will produce an image which runs FeedMixer using gunicorn.
Build the image from the feedmixer directory:
$ docker build . -t feedmixer
Run it in the foreground:
$ docker run -p 8000:8000 feedmixer
Now from another terminal you should be able to connect to FeedMixer on localhost port 8000 just as in the example above.
Using the provided feedmixer_wsgi.py application, information and errors are logged to the file fm.log in the directory the application is started from (auto rotated with a single old log called fm.1.log).
Any errors encountered in fetching and parsing remote feeds are reported in a custom HTTP header called X-fm-errors.
First install as per instructions above.
Other than this README, the documentation is in the docstrings. To build a pretty version (HTML) using Sphinx:
$ pipenv run pip install -r doc/requirements.txt
$ cd doc
$ pipenv run make html
$ x-www-browser _build/html/index.html
Tests are in the test directory and Python will find and run them with:
$ pipenv run python3 -m unittest
To check types using mypy:
$ MYPYPATH=stub/ mypy --ignore-missing-imports -p feedmixer
Not everything is stubbed out, but can be useful for catching bugs after changing feedparser.py
Feel free to open an issue on Github for help: https://github.com/cristoper/feedmixer/issues
If this package was useful to you, please consider supporting my work on this and other open-source projects by making a small (like a tip) one-time donation: donate via PayPal
If you're looking to contract a Python developer, I might be able to help. Contact me at chris.burkhardt@orangenoiseproduction.com
The project is licensed under the WTFPL license, without warranty of any kind.
Twice a month we will interview people behind open source businesses. We will talk about how they are building a business on top of open source projects.
We'll never share your email with anyone else.