Sunday, July 07, 2013

Using OpenShift for my MongoDB homework


I've been quite a fan of RedHat's PaaS platform, OpenShift, since it came out last year but have been somewhat distracted by various MOOCs and OpenStack related activities this last year.  Last week I was at EuroPython 2013 in Florence and so I took the opportunity to attend Steve Citron-Pousty's OpenShift workshop.

Eager to play with OpenShift some more I was wondering if I could use it - because I can - in some of the MOOC courses I've been doing.  They may not be the best use cases of OpenShift, some being just command-line invocation of Python/MongoDB (10gen's MongoDB M101P course), others being examples of TDD/BDD Rails (EdX's excellent SaaS course) or the most suited being the crowd sourcing web site to be built for Coursera's Startup Engineering course.

We'll see what I have time for ... the summer is going to be hot ... too many MOOCs!

So as a first try I thought I'd see if I could do my week3 homework of 10gen's M101P "MongoDB for (Python) Developers" course.

So I'll step through the mechanics of creating the environment to do this homework, but I won't show my solution for the homework itself of course.  The goal is to show that OpenShift can provide a usable, in this case Python/MongoDB environment - not to write an actual web application which is what OpenShift is really for.

I'll assume that you've already created your free OpenShift account here and you've already installed the rhc command-line tool.

Create the application

So let's create a new OpenShift application to allow us to use Python and MongoDB and cd to the newly created directory

    rhc app create M101Pweek3 python-2.6 mongodb-2.2
  cd m101pweek3


Verify the ssh/scp information for this app so that we can copy files across to the OpenShift application we just created.

    rhc app show


Although we can simply connect with ssh, using the "rhc app ssh" command we want to be able to scp files to our application and so we need to know what ssh url to use.  Look at the line of the form

    SSH: 47545664@m101pweek3-yourdomain.rhcloud.com


Verify that you can connect with ssh using the login shown by the "rhc app show" command.
For example if the above was the output, then
  ssh -i ~/.ssh/id_rsa 47545664@m101pweek3-yourdomain.rhcloud.com

should allow to perform an ssh login.


Copy the course handouts to the application


Download the week3 course handouts, provided for the 10gen homework here.  Download them to a local directory week3 for example.  Then copy them to the application using the command
  scp -r -i ~/.ssh/id_rsa week3/ 47545664@m101pweek3-yourdomain.rhcloud.com:app-root/data/


HW3.1 - Import the sample data


Now we can log into the application environment, a Linux CGroup, and do the homework.
Let's login now and import the data.  For this we use a command-line a bit more complicated than usual as we need to specify the OpenShift environment variables to use to access our applications's private  MongoDB instance.

First login:
    rhc app ssh

Then import the data:
  mongoimport --headerline --type json \
    --host $OPENSHIFT_MONGODB_DB_HOST \
    --port $OPENSHIFT_MONGODB_DB_PORT \
    --username $OPENSHIFT_MONGODB_DB_USERNAME \
    --password $OPENSHIFT_MONGODB_DB_PASSWORD \
    -d $OPENSHIFT_APP_NAME -c students \
    < ./app-root/data/week3/students.e7ed0a289cbe.js


NOTE: It seems that in the OpenShift environment you must use the application name as the database name (rather than 'school').  Although data can be imported into school I could not access that database from a Python script.  Using $OPENSHIFT_APP_NAME solved this.

Then check that you have 200 entries as expected - note that typing mongo will run a bash function defined in your environment rather than /usr/bin/mongo directly (when I tried accessing mongo from a script this was not working due to some missing magic):


  $ mongo 
  MongoDB shell version: 2.2.3
  connecting to: 127.7.153.2:27017/admin
  Welcome to the MongoDB shell.
  For interactive help, type "help".
  For more comprehensive documentation, see
http://docs.mongodb.org/
  Questions? Try the support group
http://groups.google.com/group/mongodb-user
  > use m101pweek3
  switched to db m101pweek3
  > show collections
  students
  system.indexes
  > db.students.count()
  200

I found a useful page about using MongoDB under OpenShift here.


HW3.1 - Writing the Python script

I first took the Python script I wrote last week for HW2.2 and updated this to work in the OpenShift environment.

I had some trouble at this step, partly because the Pymongo module provided uses an older api (2.3) as we're provided Python 2.6 by default.  This only seemed to require using a Connection object rather than MongoConnection.

The other problem was that I could not access my school database.  I could connect, I could authenticate (I took advice from this example script here) but attempts to use the database failed with authentication errors.

Importing into db 'm101pweek3' i.e. our application name (as per the OPENSHIFT_APP_NAME) environment variable solved this problem.

#!/usr/bin/env python

from pymongo import *
import sys

import os

if not (os.getenv("OPENSHIFT_MONGODB_DB_HOST") and
        os.getenv("OPENSHIFT_MONGODB_DB_PORT") and
        os.getenv("OPENSHIFT_MONGODB_DB_USERNAME") and
        os.getenv("OPENSHIFT_MONGODB_DB_PASSWORD")):
    print "Missing OPENSHIFT_MONGODB_DB_* env variables"
    sys.exit(1)

host = os.getenv("OPENSHIFT_MONGODB_DB_HOST")
port = int(os.getenv("OPENSHIFT_MONGODB_DB_PORT"))
user = os.getenv("OPENSHIFT_MONGODB_DB_USERNAME")
passwd = os.getenv("OPENSHIFT_MONGODB_DB_PASSWORD")

#db = MongoClient().school
conn = Connection(host, port)

db = conn[os.getenv("OPENSHIFT_APP_NAME")]
#db = conn['school']
db.authenticate(user,passwd)

print "DB 'school' has collections: " + str(db.collection_names())

collection=db.students

print
print "find_one:" + str(collection.find_one())
print
print "Number of documents in 'school.students' => count:" + \
    str(collection.count())

Now you're on your own to do the actual homework !



Conclusion


Well this first attempt took me longer than I expected as there are a few details to be taken care of with the OpenShift environment.   Nevertheless I was pleased that I was able to import the problem data set and write a basic Python script to access that data.

I will try again later, and I will probably choose Python 2.7 - it seems this is now available in OpenShift.






  



 








2 comments:

Unknown said...

You can make the DB but I think then you need to authenticate and use the username and passwords with the DB with the new DB.

http://api.mongodb.org/python/current/api/pymongo/database.html

Call the authenticate() method on your db and that should do it.

In your setup.py in the root of your git repo you can find a way to specify a version for pymongo.

mjbright said...

Thanks Steve0 I'll have to have a play some more with that.

[Conference - CodeEurope.pl] Developing Micro-services on Kubernetes

In April I had the chance to present at CodeEurope.pl , first in Warsaw on Apr 24th, and then in Wroclaw ("wroslof" was my best at...