Using OpenShift for my MongoDB homework
I've been quite a fan of RedHat's PaaS platform, OpenShift, since it came out last year but have been somewhat distracted by various MOOCs and OpenStack related activities this last year. Last week I was at EuroPython 2013 in Florence and so I took the opportunity to attend Steve Citron-Pousty's OpenShift workshop.
Eager to play with OpenShift some more I was wondering if I could use it - because I can - in some of the MOOC courses I've been doing. They may not be the best use cases of OpenShift, some being just command-line invocation of Python/MongoDB (10gen's MongoDB M101P course), others being examples of TDD/BDD Rails (EdX's excellent SaaS course) or the most suited being the crowd sourcing web site to be built for Coursera's Startup Engineering course.
We'll see what I have time for ... the summer is going to be hot ... too many MOOCs!
So as a first try I thought I'd see if I could do my week3 homework of 10gen's M101P "MongoDB for (Python) Developers" course.
So I'll step through the mechanics of creating the environment to do this homework, but I won't show my solution for the homework itself of course. The goal is to show that OpenShift can provide a usable, in this case Python/MongoDB environment - not to write an actual web application which is what OpenShift is really for.
I'll assume that you've already created your free OpenShift account here and you've already installed the rhc command-line tool.
Create the application
So let's create a new OpenShift application to allow us to use Python and MongoDB and cd to the newly created directory
rhc app create M101Pweek3 python-2.6 mongodb-2.2
Verify the ssh/scp information for this app so that we can copy files across to the OpenShift application we just created.
rhc app show
Although we can simply connect with ssh, using the "rhc app ssh" command we want to be able to scp files to our application and so we need to know what ssh url to use. Look at the line of the form
Verify that you can connect with ssh using the login shown by the "rhc app show" command.
For example if the above was the output, then
ssh -i ~/.ssh/id_rsa email@example.com
should allow to perform an ssh login.
Copy the course handouts to the application
Download the week3 course handouts, provided for the 10gen homework here. Download them to a local directory week3 for example. Then copy them to the application using the command
scp -r -i ~/.ssh/id_rsa week3/ firstname.lastname@example.org:app-root/data/
HW3.1 - Import the sample data
Now we can log into the application environment, a Linux CGroup, and do the homework.
Let's login now and import the data. For this we use a command-line a bit more complicated than usual as we need to specify the OpenShift environment variables to use to access our applications's private MongoDB instance.
rhc app ssh
Then import the data:
mongoimport --headerline --type json \
--host $OPENSHIFT_MONGODB_DB_HOST \
--port $OPENSHIFT_MONGODB_DB_PORT \
--username $OPENSHIFT_MONGODB_DB_USERNAME \
--password $OPENSHIFT_MONGODB_DB_PASSWORD \
-d $OPENSHIFT_APP_NAME -c students \
NOTE: It seems that in the OpenShift environment you must use the application name as the database name (rather than 'school'). Although data can be imported into school I could not access that database from a Python script. Using $OPENSHIFT_APP_NAME solved this.
Then check that you have 200 entries as expected - note that typing mongo will run a bash function defined in your environment rather than /usr/bin/mongo directly (when I tried accessing mongo from a script this was not working due to some missing magic):
MongoDB shell version: 2.2.3
connecting to: 127.7.153.2:27017/admin
Welcome to the MongoDB shell.
For interactive help, type "help".
For more comprehensive documentation, see
Questions? Try the support group
> use m101pweek3
switched to db m101pweek3
> show collections
I found a useful page about using MongoDB under OpenShift here.
HW3.1 - Writing the Python script
I first took the Python script I wrote last week for HW2.2 and updated this to work in the OpenShift environment.
I had some trouble at this step, partly because the Pymongo module provided uses an older api (2.3) as we're provided Python 2.6 by default. This only seemed to require using a Connection object rather than MongoConnection.
The other problem was that I could not access my school database. I could connect, I could authenticate (I took advice from this example script here) but attempts to use the database failed with authentication errors.
Importing into db 'm101pweek3' i.e. our application name (as per the OPENSHIFT_APP_NAME) environment variable solved this problem.
#!/usr/bin/env python from pymongo import * import sys import os if not (os.getenv("OPENSHIFT_MONGODB_DB_HOST") and os.getenv("OPENSHIFT_MONGODB_DB_PORT") and os.getenv("OPENSHIFT_MONGODB_DB_USERNAME") and os.getenv("OPENSHIFT_MONGODB_DB_PASSWORD")): print "Missing OPENSHIFT_MONGODB_DB_* env variables" sys.exit(1) host = os.getenv("OPENSHIFT_MONGODB_DB_HOST") port = int(os.getenv("OPENSHIFT_MONGODB_DB_PORT")) user = os.getenv("OPENSHIFT_MONGODB_DB_USERNAME") passwd = os.getenv("OPENSHIFT_MONGODB_DB_PASSWORD") #db = MongoClient().school conn = Connection(host, port) db = conn[os.getenv("OPENSHIFT_APP_NAME")] #db = conn['school'] db.authenticate(user,passwd) print "DB 'school' has collections: " + str(db.collection_names()) collection=db.students print print "find_one:" + str(collection.find_one()) print print "Number of documents in 'school.students' => count:" + \ str(collection.count())
Now you're on your own to do the actual homework !
Well this first attempt took me longer than I expected as there are a few details to be taken care of with the OpenShift environment. Nevertheless I was pleased that I was able to import the problem data set and write a basic Python script to access that data.
I will try again later, and I will probably choose Python 2.7 - it seems this is now available in OpenShift.