Skip to main content

Building a Simple Web App With Bottle, SQLAlchemy, and the Twitter API

Project Setup

First, Namespaces are one honking great idea so let’s do our work in a virtual environment. Using Anaconda I create it like so:
$ virtualenv -p <path-to-python-to-use> ~/virtualenvs/pytip
Create a production and a test database in Postgres:
$ psql
psql (9.6.5, server 9.6.2)
Type "help" for help.

# create database pytip;
CREATE DATABASE
# create database pytip_test;
CREATE DATABASE
We’ll need credentials to connect to the the database and the Twitter API (create a new appfirst). As per best practice configuration should be stored in the environment, not the code. Put the following env variables at the end of ~/virtualenvs/pytip/bin/activate, the script that handles activation / deactivation of your virtual environment, making sure to update the variables for your environment:
export DATABASE_URL='postgres://postgres:password@localhost:5432/pytip'
# twitter
export CONSUMER_KEY='xyz'
export CONSUMER_SECRET='xyz'
export ACCESS_TOKEN='xyz'
export ACCESS_SECRET='xyz'
# if deploying it set this to 'heroku'
export APP_LOCATION=local
In the deactivate function of the same script, I unset them so we keep things out of the shell scope when deactivating (leaving) the virtual environment:
unset DATABASE_URL
unset CONSUMER_KEY
unset CONSUMER_SECRET
unset ACCESS_TOKEN
unset ACCESS_SECRET
unset APP_LOCATION
Now is a good time to activate the virtual environment:
$ source ~/virtualenvs/pytip/bin/activate
Clone the repo and, with the virtual environment enabled, install the requirements:
$ git clone https://github.com/pybites/pytip && cd pytip
$ pip install -r requirements.txt
Next, we import the collection of tweets with:
$ python tasks/import_tweets.py
Then, verify that the tables were created and the tweets were added:
$ psql

\c pytip

pytip=# \dt
          List of relations
 Schema |   Name   | Type  |  Owner
--------+----------+-------+----------
 public | hashtags | table | postgres
 public | tips     | table | postgres
(2 rows)

pytip=# select count(*) from tips;
 count
-------
   222
(1 row)

pytip=# select count(*) from hashtags;
 count
-------
    27
(1 row)

pytip=# \q
Now let’s run the tests:
$ pytest
========================== test session starts ==========================
platform darwin -- Python 3.6.2, pytest-3.2.3, py-1.4.34, pluggy-0.4.0
rootdir: realpython/pytip, inifile:
collected 5 items

tests/test_tasks.py .
tests/test_tips.py ....

========================== 5 passed in 0.61 seconds ==========================
And lastly run the Bottle app with:
$ python app.py
Browse to http://localhost:8080 and voilà: you should see the tips sorted descending on popularity. Clicking on a hashtag link at the left, or using the search box, you can easily filter them. Here we see the pandas tips for example:
The design I made with MUI - a lightweight CSS framework that follows Google’s Material Design guidelines.

Implementation Details

The DB and SQLAlchemy

I used SQLAlchemy to interface with the DB to prevent having to write a lot of (redundant) SQL.
In tips/models.py, we define our models - Hashtag and Tip - that SQLAlchemy will map to DB tables:
from sqlalchemy import Column, Sequence, Integer, String, DateTime
from sqlalchemy.ext.declarative import declarative_base

Base = declarative_base()


class Hashtag(Base):
    __tablename__ = 'hashtags'
    id = Column(Integer, Sequence('id_seq'), primary_key=True)
    name = Column(String(20))
    count = Column(Integer)

    def __repr__(self):
        return "<Hashtag('%s', '%d')>" % (self.name, self.count)


class Tip(Base):
    __tablename__ = 'tips'
    id = Column(Integer, Sequence('id_seq'), primary_key=True)
    tweetid = Column(String(22))
    text = Column(String(300))
    created = Column(DateTime)
    likes = Column(Integer)
    retweets = Column(Integer)

    def __repr__(self):
        return "<Tip('%d', '%s')>" % (self.id, self.text)
In tips/db.py, we import these models, and now it’s easy to work with the DB, for example to interface with the Hashtag model:
def get_hashtags():
    return session.query(Hashtag).order_by(Hashtag.name.asc()).all()
And:
def add_hashtags(hashtags_cnt):
    for tag, count in hashtags_cnt.items():
        session.add(Hashtag(name=tag, count=count))
    session.commit()

Query the Twitter API

We need to retrieve the data from Twitter. For that, I created tasks/import_tweets.py. I packaged this under tasks because it should be run in a daily cronjob to look for new tips and update stats (number of likes and retweets) on existing tweets. For the sake of simplicity I have the tables recreated daily. If we start to rely on FK relations with other tables we should definitely choose update statements over delete+add.
We used this script in the Project Setup. Let’s see what it does in more detail.
First, we create an API session object which we pass to tweepy.Cursor. This feature of the API is really nice: it deals with pagination, iterating through the timeline. For the amount of tips - 222 at the time I write this - it’s really fast. The exclude_replies=True and include_rts=False arguments are convenient because we only want Daily Python Tip’s own tweets (not re-tweets).
Extracting hashtags from the tips requires very little code.
First, I defined a regex for a tag:
TAG = re.compile(r'#([a-z0-9]{3,})')
Then, I used findall to get all tags.
I passed them to collections.Counter which returns a dict like object with the tags as keys, and counts as values, ordered in descending order by values (most common). I excluded the too common python tag which would skew the results.
def get_hashtag_counter(tips):
    blob = ' '.join(t.text.lower() for t in tips)
    cnt = Counter(TAG.findall(blob))

    if EXCLUDE_PYTHON_HASHTAG:
        cnt.pop('python', None)

    return cnt
Finally, the import_* functions in tasks/import_tweets.py do the actual import of the tweets and hashtags, calling add_* DB methods of the tips directory/package.

Make a Simple web app with Bottle

With this pre-work done, making a web app is surprisingly easy (or not so surprising if you used Flask before).
First of all meet Bottle:
Bottle is a fast, simple and lightweight WSGI micro web-framework for Python. It is distributed as a single file module and has no dependencies other than the Python Standard Library.
Nice. The resulting web app comprises of < 30 LOC and can be found in app.py.
For this simple app, a single method with an optional tag argument is all it takes. Similar to Flask, the routing is handled with decorators. If called with a tag it filters the tips on tag, else it shows them all. The view decorator defines the template to use. Like Flask (and Django) we return a dict for use in the template.
@route('/')
@route('/<tag>')
@view('index')
def index(tag=None):
    tag = tag or request.query.get('tag') or None
    tags = get_hashtags()
    tips = get_tips(tag)

    return {'search_tag': tag or '',
            'tags': tags,
            'tips': tips}
As per documentation, to work with static files, you add this snippet at the top, after the imports:
@route('/static/<filename:path>')
def send_static(filename):
    return static_file(filename, root='static')
Finally, we want to make sure we only run in debug mode on localhost, hence the APP_LOCATION env variable we defined in Project Setup:
if os.environ.get('APP_LOCATION') == 'heroku':
    run(host="0.0.0.0", port=int(os.environ.get("PORT", 5000)))
else:
    run(host='localhost', port=8080, debug=True, reloader=True)

Bottle Templates

Bottle comes with a fast, powerful and easy to learn built-in template engine called SimpleTemplate.
In the views subdirectory I defined a header.tplindex.tpl, and footer.tpl. For the tag cloud, I used some simple inline CSS increasing tag size by count, see header.tpl:
% for tag in tags:
  <a style="font-size: {{ tag.count/10 + 1 }}em;" href="/{{ tag.name }}">#{{ tag.name }}</a>&nbsp;&nbsp;
% end
In index.tpl we loop over the tips:
% for tip in tips:
  <div class='tip'>
    <pre>{{ !tip.text }}</pre>
    <div class="mui--text-dark-secondary"><strong>{{ tip.likes }}</strong> Likes / <strong>{{ tip.retweets }}</strong> RTs / {{ tip.created }} / <a href="https://twitter.com/python_tip/status/{{ tip.tweetid }}" target="_blank">Share</a></div>
  </div>
% end
If you are familiar with Flask and Jinja2 this should look very familiar. Embedding Python is even easier, with less typing—(% ... vs {% ... %}).
All css, images (and JS if we’d use it) go into the static subfolder.
And that’s all there is to making a basic web app with Bottle. Once you have the data layer properly defined it’s pretty straightforward.

Add tests with pytest

Now let’s make this project a bit more robust by adding some tests. Testing the DB required a bit more digging into the pytest framework, but I ended up using the pytest.fixturedecorator to set up and tear down a database with some test tweets.
Instead of calling the Twitter API, I used some static data provided in tweets.json. And, rather than using the live DB, in tips/db.py, I check if pytest is the caller (sys.argv[0]). If so, I use the test DB. I probably will refactor this, because Bottle supports working with config files.
The hashtag part was easier to test (test_get_hashtag_counter) because I could just add some hashtags to a multiline string. No fixtures needed.

Code quality matters - Better Code Hub

Better Code Hub guides you in writing, well, better code. Before writing the tests the project scored a 7:
BCH screenshot #1
Not bad, but we can do better:
  1. I bumped it to a 9 by making the code more modular, taking the DB logic out of the app.py (web app), putting it in the tips folder/ package (refactorings 1 and 2)
  2. Then with the tests in place the project scored a 10:
BCH screenshot #2

Popular posts from this blog

How to read or extract text data from passport using python utility.

Hi ,  Lets get start with some utility which can be really helpful in extracting the text data from passport documents which can be images, pdf.  So instead of jumping to code directly lets understand the MRZ, & how it works basically. MRZ Parser :                 A machine-readable passport (MRP) is a machine-readable travel document (MRTD) with the data on the identity page encoded in optical character recognition format Most travel passports worldwide are MRPs.  It can have 2 lines or 3 lines of machine-readable data. This method allows to process MRZ written in accordance with ICAO Document 9303 (endorsed by the International Organization for Standardization and the International Electrotechnical Commission as ISO/IEC 7501-1)). Some applications will need to be able to scan such data of someway, so one of the easiest methods is to recognize it from an image file. I 'll show you how to retrieve the MRZ information from a picture of a passport using the PassportE

How to generate class diagrams pictures in a Django/Open-edX project from console

A class diagram in the Unified Modeling Language ( UML ) is a type of static structure diagram that describes the structure of a system by showing the system’s classes, their attributes, operations (or methods), and the relationships among objects. https://github.com/django-extensions/django-extensions Step 1:   Install django extensions Command:  pip install django-extensions Step 2:  Add to installed apps INSTALLED_APPS = ( ... 'django_extensions' , ... ) Step 3:  Install diagrams generators You have to choose between two diagram generators: Graphviz or Dotplus before using the command or you will get: python manage.py graph_models -a -o myapp_models.png Note:  I prefer to use   pydotplus   as it easier to install than Graphviz and its dependencies so we use   pip install pydotplus . Command:  pip install pydotplus Step 4:  Generate diagrams Now we have everything installed and ready to generate diagrams using the comm

How to Remove course from Open-edX

Go to vagrant  => 1. In the edx-platform directory:  - cd /edx/app/edxapp/edx-platform 2. Run the following Django management command:   - sudo -u www-data /edx/bin/python.edxapp /edx/bin/manage.edxapp lms dump_course_ids --settings aws    - sudo -u www-data /edx/bin/python.edxapp /edx/bin/manage.edxapp lms dump_course_ids --settings=devstack 3. Find the course ID which you'd like to delete in the resulting list of course IDs. 4. Copy the course ID into the following command and run it:  - sudo -u www-data /edx/bin/python.edxapp /edx/bin/manage.edxapp cms delete_course <COURSE_ID> --settings aws  -   sudo -u www-data /edx/bin/python.edxapp /edx/bin/manage.edxapp cms delete_course <COURSE_ID> --settings=devstack  - You'll be asked to verify the deletion . To verify the deletion, run the command from step 2 above and ensure that the course ID is not in the list. Help reference : https://openedx.atlassian.net/wiki/spa