DjangoCon US 2018 – Anatomy of Open edX – a modern online learning platforms… by Nate Aune
Articles,  Blog

DjangoCon US 2018 – Anatomy of Open edX – a modern online learning platforms… by Nate Aune


(“California Sun Instrumental Mix”) – Why talk about Open edX at DjangoCon? Well there’s three main reasons. Number one: it’s a Django app. Number two: it’s a highly
scalable app that’s serving over 35 million users. And number three: it’s open
source which means it’s an ideal project to study and to learn from. So my name is Nate Aune. In a former life, I was a
software engineer, using Python almost exclusively for most
of my professional career. And nowadays, I mostly
wear a business hat. And I run Appsembler which
is a small 20-person startup whose mission is to close the
skills gap by making education more accessible and scalable. And no surprise Open edX plays
a key role in how we do that. Okay, so. Here’s what we’re gonna cover today. What we’re not gonna
talk about is Open edX from an end user perspective, so the student, instructor,
author perspective. There simply wasn’t enough
time to do both that and to talk about the technology. So my hope for you today is
that by the end of this talk, you will want to go
take an Open edX course, you’ll consider Open edX
for your organization, and if you’re really curious, you’ll want to go download the code and start learning for
yourself how it’s put together. Okay, so let’s get started. So, there’s been a lot of
buzz in the last few years about MOOCs, also known as
Massive Open Online Courses. And some of the services
you may have heard of, like Coursera, Udacity, Khan Academy. These are all examples of MOOC platforms which deliver online courses for free. How many people here have taken a MOOC? Wow, we got a lot of MOOC lovers
in the room, that’s great. So edX is in the same camp as
these other MOOC platforms, but unlike Coursera and Udacity, edX is a nonprofit foundation
that has open sourced the code that powers edX.org So we’ll talk more about the
open source project in a second but first, why would you
be interested in edX? Well, if you like to learn,
then edX is a treasure trove. As of yesterday, there were
2,400 high-quality courses from top universities and
companies that you can take completely free of charge. That’s pretty cool. How many folks here have
taken an edX course? Okay. A fair number of you. So of those 2,400, 600
of them, nearly a quarter of all the courses are
computer science courses. And of those 600 computer science
courses, almost 60 of them nearly 10% are courses about Python. And these courses are offered
by 53 top universities including MIT, Harvard, UC Berkeley. And it’s not just universities who are offering these courses but
there’s also 62 organizations from such companies as Amazon,
IBM, Red Hat, Microsoft, MongoDB, as well as associations
like the Linux Foundation, W3C IEEE, even Amnesty International. So who’s taking these courses? Well, it’s not just college students. But it’s lifelong learners. So these are people who are
going through a career change, they’re learning new skills, or maybe just taking a course for fun. And today, edX has attracted
more than 17 million students from every country in the world. So who’s gonna go home tonight
and enroll in an edX course? Alright (laughs). Okay, so now hopefully
you know what edX is, if you didn’t already know about it. So what is Open edX? So the Open edX platform is an
open source web-based system for creating, delivering,
and analyzing online courses at massive scale. So as I mentioned earlier,
the entire codebase is Django, with a small sprinkling of Ruby
thrown in for good measure. So, what are people doing with Open edX? Well there are literally
hundreds of Open edX sites offering training on
every conceivable subject. But I’m just gonna show a couple examples which I think are relevant
for this audience. So, as an open source
company, MongoDB has placed a high value on fostering and
giving back to the community. And early on, they realized
that a great way to do this is by making expert
MongoDB training easily and freely available to
anyone anywhere in the world. So they launched MongoDB
University back in 2012, and today they have 18 free
courses from introductory level to mastery level, and over
one million learners have registered on the site, a site
which is powered by Open edX. And taking a page out of the
MongoDB playbook, Redis Labs, also an open source company,
launched the Redis University earlier this year, and
they’ve already had thousands of learners enroll in these free courses. So you might be asking, well what’s in it for these companies? Why give away courses instead
of charging money for them? Well, if you make an open source product, you often don’t know
who’s using your software. So, by offering training for
free, it’s a way to find out who these people are and see
if they might be interested in other services that you offer. So one of the really unique
aspects of the Redis University site is the focus on learning by doing. And to make this super easy,
they’re using our virtual labs tool to embed labs right within
the course, so this means that each learner has
their own personal sandbox in which to do these exercises. So all the learner has to
do is click on this button that says click to launch
your Python workbench lab, and within seconds, they see
a URL for their own personal cloud-based development
environment, which looks like this. So there’s a file browser,
there’s a code editor, there’s a Linux Shell,
everything the student needs to do the exercises. They don’t need to download
or install any software onto their computer
because this environment is completely accessible using
nothing but a web browser. And the actual environment is
running in a Docker container on a server somewhere. So the ability to embed
these labs is one example of how extensible the
Open edX platform is. Another example is this
assessment in which the student input is evaluated against
the Python expression. So here we can see the
student is being asked to enter two numbers
that should sum to 10. A very trivial example, but
the way that you author this would be going into Studio which is this very powerful
course authoring tool. Here we see the course outline
showing all the sections of the course that can be
edited, and if you’re logged in as a course author you
can click on this section, it says Custom Python-Evaluated Input. And you’ll see something like
this which is a markup editor, and it looks like we’re editing
HTML, but lo and behold, right up there in that
red box, what’s that? Some Python code. So you can actually embed
Python scripts right within the assessment editor
that evaluate the student’s input against the Python function. And Open edX has a plugin architecture called Xblocks which
lets you create virtually any kind of assessment imaginable. Alright, so, looking at
any software product, you wanna see how it’s growing. Who’s adopting it, and at what pace. And it’s no different
with open source projects like Open edX. So what do you look for
in an open source project? The one thing I look at is
whether there’s a healthy community of participants. These don’t necessarily have
to be developers, in fact, it’s better if they’re end users. So looking at edX, the mailing list has almost 4,000 members and
around 50-120 posts every month which has diminished in the
last year because of the rising popularity of Slack, which
now has close to 3,000 members and 150-300 weekly active users. There might even be people still chatting in the IRC channel. So the membership of users on
the Open edX Slack community has grown consistently
over the last 2 1/2 years. Another thing you can look at
is how many contributors there are, and are those numbers going up. So according to GitHub, the number of individual
contributors is around 400 and the repo has been
forked over 2,000 times. And you can see that it’s
kinda flattening out, and one reason I suspect
that the number of commits per month has flattened
is that the codebase is getting more mature,
and the rate of change is not as rapid as it
was in the earlier days. And the innovation is also
happening in other repos as they’re breaking up
this monolith codebase, they’re splitting it out in
microservices so a lot of the innovative components of Open
edX are actually happening in different repos. Well, the core platform is
not seeing as many changes. So looking at this growth
over the last four years, you can see some pretty
incredible growth especially in the last year as the number
of sites and courses offered on Open edX have just shot
up, and in this graph, the sites are red and
the courses are blue. So to date, they’ve identified
over 1,500 Open edX sites and over 18,000 courses are offered. And of course, this
doesn’t count the sites that they don’t know about. You might be asking,
well it’s open source, how do they know how many
people are using the software? Well, edX very cleverly
included a logo in the footer of every Open edX site that’s
being pulled from an S3 bucket that they manage, and then
they have a script that crawls all of these sites to see
how many courses they have, and that’s how they were able
to generate these statistics. And they’ve been running this
script since 2014 and they just keep finding new ones
popping up all over the place. So, with over 17 million
learners on edX.org and 18 million on Open edX, there’s over 35 million
on the Open edX platform. And last year was a really exciting time. It was the first time that the number of Open edX sites and learners
eclipsed that of edX.org And that number is just
expected to keep growing as more organizations are
adopting the platform. I think I heard that the
national government in China has something like 17
million just in itself. So what are some of these organizations? Well, some of the biggest
companies are using Open edX. And some of the fastest
growing technology companies have adopted Open edX. These are just a few of the ones. And the 17 of the, sorry, nine of the 39 MOOC sites
on Class Central are built on Open edX, which is pretty
incredible considering that no other platform even comes close. We’ve also seen national
platforms are choosing Open edX. There are platforms from at
least 10 national governments. And because it’s open source, anyone can contribute translation files, to translate it into
their native language. So to date, there’s been
translation work done in 73 different languages
including over 97% completed strings for French, Spanish, and Chinese. So where are all these
contributions coming from? Well, one of the great
things about open source is the pace of innovation,
and Open edX has attracted code contributions from many
companies and individuals. And here you can see some of
the top contributors include Stanford, Google, Microsoft, MIT. My company Appsembler is also
contributing enhancements to the native mobile
apps and figures which is a lightweight reporting app
that sits on top of Open edX. So looking over the last three years, the number of contributing
organizations continues to grow, with over 50 who have signed
the contributors agreement. And similar to the Python PEP
process, whereby community members can propose improvements
to the Python language, Open edX has adopted something similar with the OEP, or Open
edX Proposal Process. So there are 23 OEPs
that have been adopted and many more that are under review. So what have been said
about the Python community I think can be said for Open edX, you come for the code, you
stay for the community. And I think this is one of those aspects of an open source project
that cannot be underestimated. We all know the Python and
Django community is wonderfully welcoming, diverse, and inclusive. This is a photo from the hack
day at the Open edX conference at Stanford a few years ago,
and there’s been five annual conferences so far, the
last one was in Montreal and the next one will be right
here in San Diego in March. So who here is curious to
take a look at Open edX for their company or organization? We got a few people, alright. So, what is Open edX made of? Most of us in the room are developers, how many developers do we have? Raise your hand. Great, how many of you are
Python/Django developers? I would expect most of you. Alright, so I mentioned
earlier that Open edX is a Python application using
the Django web framework, but it has a lot of code
from other languages as well. So before we look at the breakdown
of languages, does anyone wanna guess how many Djangos
will fit in the edX platform? So Django is roughly
228,000 lines of code. How many edXes do you think, sorry, how many Djangos would
fit in the edX platform? Anyone wanna guess? – Ten?
– Ten. – One.
– Six. – Six.
– Two. – Three.
– Three, who said three? Alright, you’re pretty close. It’s 1.87 So the edX platform has
around 427,000 lines of code. And that slide is actually
old because there are now over one million lines of code just in the edX platform repository. But if you look at the
number of code lines and you subtract the number
of comments and blank lines, then it’s roughly the same as that slide. So it’s predominately Python code, 65%, and the next highest language
is JavaScript at 26%. Alright, so you might be asking yourself, how does an application scale
to serve that many users? Well, if you’re like me, you
like to take things apart and figure out how they’re constructed. What are these parts and
how do they fit together? So where do you start with a
system as massive as Open edX? Well, in the early days,
this was around 2013, there wasn’t much documentation
on Open edX, it was still, you know, it was a baby of an
open source project, and this was the only known diagram
describing the architecture. And it was really intimidating, you know, you look at this thing,
you think, over-engineered? Yes, no, maybe? But one thing to remember
when you’re doing things on a small scale, you can
make certain design decisions that fall apart when you’re
operating at a large scale. So here’s a good example. Once edX started collecting all
this learner analytics data, they needed a way to analyze it to figure out how learners learn. And as you can imagine, if
you have 17 million learners, you’re generating
terabytes of data everyday. And this is a treasure trove
if you’re a researcher, right? So how do you process all this data and glean meaningful insights out of it? So edX built this what they
call the edX Analytics Pipeline which takes student data from MySQL, it takes course data from
MongoDB and then it takes events data from the tracking
logs, it aggregates them all and it pushes them up to
Hadoop for processing, and then finally the results are pushed
back into a MySQL database. And once all the results are
in MySQL, another Django app called edX Insights
visualizes all this data in lots of pretty graphs showing student enrollment,
engagement, performance. And here we’re looking at
the number of video views per unit in a course, and
this one is the attention span on an individual video. So if this is a four-minute
video, you can see that people’s attention kind
of wavers, not surprisingly, we all have pretty short
attention spans these days. So to give you a sense of how
massive this Open edX suite of components is, that Insights component that we’re just looking at, is
just one of many components, it’s that red box in the
upper right hand corner. So this diagram shows a slightly
less intimidating and more colorful interpretation of
the Open edX architecture. So it’s been divided into
different layers of a stack. So we have the learner and
educator facing components along the top. This is the LMS, the
Learning Management System. The Studio which is the
course authoring environment. Insights which was that analytics service. And then we have the backend services, so this is like the Forums, Student Notes, a backend cuing service. And then we have these async processing, so this is, you know, celery workers. And then on the bottom we
all have the storage layers, so MySQL, Mongo, S3,
Elasticsearch, Memcached. Another way of looking at the
architecture is to drill down a level deeper and separate
components into four areas. So we have the tools and
clients across the top, so these are like the mobile
apps, the API manager, UX Toolkits. And then this bluish-green area, this is the edX platform codebase. Over here, we have independently
deployed applications which you can think of
as like microservices. So these kinda sit alongside edX and expose an API that edX can talk to. And then you have the persistence systems like on the last slide. We don’t have time to go into the details of each one and how it’s used,
but you can see that it’s a pretty complex system
with a lot of moving parts. Yet another way of looking at
the components is by audience. So we have educators,
learners, and business. And I like to call this
one the Mickey Mouse taming the monolith.
(laughing) Alright, so how the heck
do you deploy this thing? Well, once you’ve dug
into the codebase a bit and understand these pieces,
then you might start thinking like an implementer, like
how would I go about putting this thing into production for
my company or organization? Well, this setup is probably
overkill for a small Open edX site, this is what you’d be
looking at if you were building a site like edX.org with
scalability and high availability. So, you’ll notice that the
app servers are running in two different availability zones. So if there’s a problem in
US-EAST-1B zone, the traffic will still be routed to US-EAST-1C,
and your site won’t go down. So obviously setting all this up manually would be a huge pain. So we leverage a few open source tools to make deployments
reliable and repeatable. So building the platform
takes place in two phases. The first phase is
infrastructure provisioning, and the second one is
service configuration. And the provisioning phase
stands up all the required resources and tags them
with the role identifiers, so that the configuration
tool can come in and do the provisioning and configuration
of all the services. So we use a tool from
HashiCorp called Terraform. How many people here use Terraform? Okay, we got a few people out here. And then for configuring the services, we use the recommended Ansible
which is also what edX uses, which is written in Python, by the way. And then we built this tool called Ax which is the Appsembler edX tool. It’s sort of a wrapper
around Terraform and Ansible to save keystrokes and ensure consistency. So basically, one of
our engineers got tired of typing everything, so he
made this command-line tool. So I’m gonna show a
simple example of using Ax with the helpers Terraform
and Ansible to provision server resources and deploy
the Open edX application. So, this is an example directory layout for one of our customers, JFrog. And they’re on the Pro tier,
and the hosting provider here called the Platform is Google Cloud. It could be AWS, DigitalOcean, even Azure. Terraform is cloud provider
agnostic so you can use the same most of the same code to
deploy to different providers. We’ve also specified the
Google Compute Engine project, JFrog Academy, that’s right here. And then we’re passing
in the Google credentials using a JSON file that’s
stored securely in a shared Keybase directory, and
then reusing Ansible Vault to store the secrets that
Ansible needs, also in Keybase. And lastly, we’re
specifying here to use the Ansible playbook called ficus-pro.yml And I’ll walk through this file in a sec. But first, what do we mean
when we say the Pro tier? Well, we offer our customers
three different tiers depending on how much traffic they’re expecting. So Basic is a single server, Pro is application and database servers, and then Enterprise is like a
multiple app servers with load balancing and master-slave
replication for the database. So for this example, we’re
just looking at the Pro. So this ficus-pro.yml file
might be a little bit hard for those of you in the back to read. But this is a generic Ansible playbook for all customers who are on the Pro plan. And writing the Ficus Release of Open edX. So I’m not gonna walk through
this entire file, but you can see there are multiple
plays within this playbook, each which can execute multiple roles. So there’s like Configure
Mongo play at the top that installs the MongoDB
server and some others services like Elasticsearch and Memcached. And then there’s a Configure
stateless edxapp instance play that configures a
stateless edX app server. So that’s a generic
playbook, but how do you override the defaults with
customer-specific values? Well, edX uses a file called
server-vars.yml to do that. So this is the
server-vars.yml file specific to the Open edX site
for our customer, JFrog. And you can see that
we’re overriding the site, so the name of the site, JFrog Academy. We’re also overriding the
backup structure-y definition and the theme and the URL. So all of that can be
configured in this YAML file that sort of extends
that generic playbook. And then lastly we have
the inventory file. And the inventory file
tells Ansible which servers to connect to and run these playbooks. So obviously, you would
create this inventory file after Terraform is finished,
provisioning a servers ’cause you don’t necessarily
know the IP addresses before you run Terraform. Alright, so that was kind of
an overview of that process. So Terraform has a three step process: initialize, plan, and apply. So this is the command that we used to run the first step, initialize, which initializes the
Terraform working directory. And we’re passing some
parameters like the customer, the environment, the tier, the plan. You notice that we’ve also
passed in this Terraform plug in this directory. So Terraform is extensible, so you can build your own plugins for it. And these parameters are sort of optional because we’ve defined
them in our ax.yml file, so I’m just showing them kind
of for educational purposes. Okay, so we then run the plan command. And this is sort of like a dry run. So it doesn’t actually
provision any servers for you, but it shows you what it’s gonna do, and that way you can audit
this plan and make sure it’s gonna do things correctly
before you execute it for real. And then we run the apply command to actually provision the resources. And I should mention here that Terraform is a very powerful tool, and it can obliterate
your production servers and all of your customer
data if you’re not careful. So before running Terraform, you should have a good
understanding of how it works, how it tracks the state
of your infrastructure, and how the plan and apply commands work. So lastly, we run the Ansible
using Ax as the wrapper to provision the application
onto the newly created servers that Terraform just spun up for us. And when this command is finished, we should have a fully
production Open edX site running specific to our customer, JFrog. Simple, right? Okay, so if you just
wanna get Open edX running on your laptop, and you don’t
want to mess with Terraform and Ansible, edX has provided
a local dev environment called DevStack that can be built
easily using Docker Compose. So I know, I know, just when
you think, I can’t possibly throw any more technology
at you, now there’s Docker. But Docker Compose takes a
lot of that complexity away because it automagically
creates a working development environment for you. So how does it work? Well there’s a provided makefile,
and you run these series of commands to spin up an entire
multi-node Open edX stack, with each service running
in a separate container. So it’s like a production environment, but without all the costs of
spinning up servers on AWS. So you run these commands,
go make a coffee, when you come back, you’ll have
the entire platform running on your laptop, which is pretty cool. And if you’re intimidated
by the command-line, Docker for Mac ships with
this gooey called Kitematic which you can use to inspect
all the running containers. So you can see in this red box over here, we have eight different
containers running a variety of services on different ports. So we have Elasticsearch,
Mongo, MySQL, Memcached, plus all the edX services
like LMS, Studio, Ecommerce. A lot of stuff. So who’s gonna go home tonight and try to get DevStack
running on their machine? Alright, got a few takers,
few brave souls (laughs). Alright, so I wanna wrap
up my talk by inviting you to come and explore this
deep sea that is Open edX. And like many open source projects, there’s a whole ecosystem,
not just the core platform. So what we’ve been talking about today is primarily this white circle
over here, which is the core. This is governed by edX, it’s a AGPL licensed piece of software. But there hundreds of plugins
and extensions available for Open edX that are being
developed by people like you. And there’s also a growing
community of companies that are adopting Open edX for their
own use, as well as vendors who are providing commercial
products based on Open edX. And it’s still a young project,
Open edX just celebrated its five-year birthday this year. So there are a lot of
opportunities to get involved. So, if you haven’t been
scared away by the complexity of Open edX, and to want to
explore its labyrinth of code, I recommend joining the Slack community, it’s the first URL up there. There’s also a mailing
list, and of course, you can get all the code
and playbooks on GitHub. And this talk would not have
been possible without the contributions of Feanil, Regis,
Nimisha, and John Mark, so thank you all for letting me
reuse your excellent materials. And if you’re interested
in trying out Open edX and you don’t want to invest the time in figuring all this stuff out, my company Appsembler makes
an all-in-one SaaS offering of Open edX called Tahoe. And Tahoe lets you build
a branded Open edX site in a matter of minutes. So come find me after the
talk if you want to learn more or you can just go to that URL. Oh, and if you’re into stickers, I have these cool laptop
stickers to give away. And with that, I want to thank
you all for your attention, being a great audience, and I
hope this peaked your interest in taking a closer look at Open edX. Thank you.
(clapping) – [Audience Member] Thank
you for that very nice talk, I’m a little curious about,
so Open edX came out of edX? – Correct.
– And how long has edX been around? – I think edX started in 2011, 2012, and then they open
sourced the codec in 2013. – [Audience Member] I saw
in the architecture diagram MongoDB and MySQL, why both? – Great question, yeah,
I forgot to mention that. So edX looked at the
different data store options and at the time they were using
Amazon, so they chose MySQL ’cause RDS was available,
Postgres wasn’t yet available on Amazon, so that was
the reason for MySQL. The reason for Mongo is
that they found that MongoDB was a better storage
layer for course data. So rather than using
a relational database, it’s more like an object store. – [Audience Member]
Hi, is Ax open sourced, and if not, when will it be open sourced?
– (laughs) I knew someone was gonna ask that question. We’ve been trying to open
source Ax for like the last year and it’s an internal-only
tool at this point. But we have every intention
of open sourcing it. – [Audience Member] Have
you come across any Open edX courses on building an Open edX course? – Ha-ha.
(laughter) – Another great question. I actually gave a tutorial at
the last Open edX conference in Montreal, and I delivered
the training on Open edX. So it was an Open edX
course to teach people how to develop Open edX (laughs). – [Audience Member] Are
there any options currently for being able to
install it on Kubernetes? – Great question. Yes, there is a team in France, actually. We call them FUN, the
French Université Numérique, and they have got Open
edX deployed on OpenShift which is Kubernetes-based, of course. It’s still a pretty nascent
project, I think they’re still kinda working out the kinks,
but it looks really promising. – [Audience Member] So I was
wondering, you’ve got the Ansible scripts and you’ve
got Docker containers. – Mm-hmm.
– And I’m guessing that there are things going
on in the Docker file that are also going on
in the Ansible scripts, how do you keep those in
sync, or do you try to or? – Yeah, you’ve touched on a
very big point of contention within the community. There’s a movement to make
everything Docker file-based, but edX has a huge investment
in Ansible, so currently that the DevStack that I
showed, kinda uses both. They have Docker files to kind
of bootstrap the environment, and then Ansible kinda
comes in and does its thing. There is an unofficial project called openedx-docker which uses pure Docker files
and it doesn’t use any Ansible. And I found that it’s actually
kinda breath of fresh air to use that because it’s
very easy to understand, whereas Ansible is super
powerful but it also, you can get kind of mired
in the complexity of it. – [Audience Member] I was
wondering if for this project and I guess projects of this
ilc, like, and maybe this is too complicated question
like a process of it starting closed source, having funding,
being a growing business that then gets to a point
that it can transition to releasing the whole
project as open source? – Let’s see if I understand
the question, you mean like how edX started out as closed and
then they later open source it is that what you mean? – [Audience Member] Yeah,
and also how much that was the plan and if that’s the
plan, how a business even, like, pitches that to people to fund them, we’re totally gonna make
what we’re building free and easy for other people to edit? How do you get people to
fund you doing that (laughs)? – Yeah, well I think it
helps that edX is a nonprofit organization, so they don’t
have a profit motive in the way that like a VC backed startup, you know, would have a hard time making that case. edX also has backing from MIT and Harvard, so they can kind of, you
know, get by for a while without having to make a lot of revenue. But I think it’s an interesting
question around how, what’s the trade-off, right? If you’re making an open source product, and your competitors can take that product and then compete with
you, is that a good thing? I guess you have to
differentiate in other ways. – First, let’s thank Nate again for the fascinating talk.
(clapping) (whoosh) (ringing) (whoosh)

Leave a Reply

Your email address will not be published. Required fields are marked *