
The Python programming language provides the ability to work with files using open(). Python Programming Bootcamp: Go from zero to hero
#Python feed reader how to#
But so far, we have not discussed how to read or write files.
#Python feed reader free#
Nonetheless, please do leave your comments and feel free to reach out to me here/through github.You have seen various types of data holders before: integers, strings, lists. While building this out, I had focused on getting a functional system out first and minimal effort spent in re-factoring and designing. SOLID principles, background scheduler, using docker compose files, etc are something to probably start with first.
#Python feed reader upgrade#
Obviously, a lot of upgrade and enhancements are possible. You have an RSS-parser ready, running every day and updating the DB. Now you simply need to execute the main script using the following command to run it once immediately, as well as schedule it for once every day at the same time: docker exec -d rss-python python main.py Et Voila!

If you ever want to stop the execution, the try…catch block is to exit cleanly. I’m using blocking scheduler here, thus the python thread is always alive. Executing and scheduling it: The script is completed by invoking the begin function and scheduling it for once everyday execution as follows: if _name_ = '_main_': feed_url = '' db_credentials = 'connection.json' print('Main Script Running.') begin(feed_url,db_credentials) scheduler = BlockingScheduler() scheduler.add_job(begin, 'interval', hours=24) try: scheduler.start() except Exception as e: print('Stopping Schedule!!') print('Main Script Exiting!!') execute_query() finally inserts the data into the databaseĨ. Query_param simply stores the values in the column order in a tuple. persist_record( conn, data, tb_name ): Execute the insert query based on the object type: def persist_record(conn,data,tb_name): query_param = tuple( list(map(lambda k : data,col_list))) execute_query(conn,query_strings,query_param) return persist_tastse_of_india_record( conn, data ): It tries to persist each component of a post separately (based on the entities defined) def persist_taste_of_india_record(conn,data): persist_record(conn,data,'posts') persist_record(conn,data,'itunes_data') for media in data: persist_record(conn,media,'media') mit() return TrueĬonn.commit() is necessary, else the changes in the database aren’t permanent and will be lost once the session expires.ħ. If yes, it first invokes parse_record() for each record, then goes ahead to persist the record.Ħ. We’ll define some global variables also (I know not a recommended practice, but given the time constraint, it was a trade off we all have to take) # Query to find the max record processed so far get_max_query = 'SELECT COALESCE(max(itunes_episode),0) FROM tasteofindia.posts ' # Query template to insert values in any table query_string = 'INSERT INTO tasteofindia.") # List comprehension on the result of map operation on records )] return records Imports: The following lines will import the required modules and objects from data_parser import get_soup, parse_record, store_tags from db_connect import get_connection, get_max_records,execute_query from import BlockingSchedulerĢ.

With approach shared above, let’s get you started with the implementation!įinally let’s build the script connecting all the pieces together… 1. Also, since no environment specifications were given, I decided to build all of this using two docker ( ) containers, one for python and one for postgres.

Library psycopg2 ( ) to support postgres ( ) (don’t think it needs an explanation, but didn’t want to use wrappers like SQLAlchemy ( )) 5. Library apscheduler ( ) to schedule task once a day (a simple library to start with, airflow ( )/ azkaban ( ) would’ve been an overkill) 4. No particular reason, except ease of usage and familiarity and obviously an insanely popular open source support. The relational database I chose was postgres ( ). Since it’s ultimately xml-based content, I decided to use the ever-reliable BeautifulSoup Library ( ) 2. The restriction of RSS parser libraries, basically expected me to write my own parser. All other decisions were pretty straight-forward given the time constraint of 1 day, leaving 3–4 hours for this work (assuming 8–9 hours given to the day job). #3 above basically meant I had to implement the RSS parser on my own. Obviously I decided to use python 3.7, with the end of life support coming for python 2.7.
