allura - an open source mongodb based document oriented sourceforge
DESCRIPTION
MongoSF 2011 talk on Allura, the new platform for SourceForge that we released under an Apache licenseTRANSCRIPT
![Page 1: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/1.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 1
Allura – an Open Source MongoDB Based Document
Oriented SourceForge
Rick Copeland@rick446
![Page 2: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/2.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 2
I am not Mark Ramm (sorry)
![Page 3: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/3.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 3
Allura (SF.net “beta” devtools)
Rewrite developer tools with new architecture
Wiki, Tracker, Discussions, Git, Hg, SVN, with more to come
Single MongoDB replica set
Release early & often
![Page 4: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/4.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 4
Allura ScalingSourceForge.net currently handles ~4M pageviews per day
Allura will eventually handle 10% (with lots of writing)
“Consume” currently handles 3M+ pageviews/day on one shard (read-mostly)
Allura can handle ~48k pageviews / day / shard
Add shards & optimize queries as we migrate projects to sf.net
Most data is project-specific; sharding by project is straightforward
![Page 5: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/5.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 5
System Architecture
Web-facing App Server
Task Daemon
SMTPServer
FUSE Filesystem(repository hosting)
![Page 6: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/6.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 6
Ming – an “Object-Document
Mapper?” Your data has a schema Your database can define and enforce it
It can live in your application (as with MongoDB)
Nice to have the schema defined in one place in the code
Sometimes you need a “migration” Changing the structure/meaning of fields
Adding indexes, particularly unique indexes
Sometimes lazy, sometimes eager
“Unit of work:” Queuing up all your updates can be handy
Python dicts are nice; objects are nicer
![Page 7: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/7.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 7
Ming Concepts Inspired by SQLAlchemy
Group of collection objects with schemas defined
Group of classes to which you map your collections
Use collection-level operations for performance
Use class-level operations for abstraction
Convenience methods for loading/saving objects and ensuring indexes are created
Migrations
Unit of Work – great for web applications
MIM – “Mongo in Memory” nice for unit tests
![Page 8: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/8.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 8
Ming Examplefrom ming import schema, Fieldfrom ming.orm import (mapper, Mapper, RelationProperty,
ForeignIdProperty)
WikiDoc = collection(‘wiki_page', session, Field('_id', schema.ObjectId()), Field('title', str, index=True), Field('text', str))CommentDoc = collection(‘comment', session, Field('_id', schema.ObjectId()), Field('page_id', schema.ObjectId(), index=True), Field('text', str))
class WikiPage(object): passclass Comment(object): pass
ormsession.mapper(WikiPage, WikiDoc, properties=dict( comments=RelationProperty('WikiComment')))ormsession.mapper(Comment, CommentDoc, properties=dict( page_id=ForeignIdProperty('WikiPage'), page=RelationProperty('WikiPage')))
Mapper.compile_all()
![Page 9: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/9.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 9
Allura Artifacts
Artifacts include tickets, wiki pages, discussions, comments, merge requests, etc.
On artifact change, a session extension:
• Queues a Solr index operation (for full text search support)
• Scans the artifact text for references to other artifacts
• Updates statistics on objects created/modified/deleted
Artifact
VersionedArtifact Snapshot Message
![Page 10: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/10.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 10
Allura Threaded DiscussionsMessageDoc = collection( 'message', project_doc_session, Field('_id', str, if_missing=h.gen_message_id), Field('slug', str, if_missing=h.nonce), Field('full_slug', str), Field('parent_id', str),…)
_id – use an email Message-ID compatible key
slug – threaded path of random 4-digit hex numbers prefixed by parent (e.g. dead/beef/f00d dead/beef dead)
full_slug – slug interspersed with ISO-formatted message datetime
Easy queries for hierarchical data
Find all descendants of a message – slug prefix search “dead/.*”
Sort messages by thread, then by date – full_slug sort
![Page 11: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/11.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 11
MonQ: Async Queueing in MongoDB
states = ('ready', 'busy', 'error', 'complete')result_types = ('keep', 'forget')
MonQTaskDoc = collection( 'monq_task', main_doc_session, Field('_id', schema.ObjectId()), Field('state', schema.OneOf(*states)), Field('result_type', Schema.OneOf(*result_types)), Field('time_queue', datetime), Field('time_start', datetime), Field('time_stop', datetime), # dotted path to function Field('task_name', str), Field('process', str), # worker process name: “locks” the task Field('context', dict( project_id=schema.ObjectId(), app_config_id=schema.ObjectId(), user_id=schema.ObjectId())), Field('args', list), Field('kwargs', {None:None}), Field('result', None, if_missing=None))
![Page 12: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/12.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 12
Repository Cache Objects
On commit to a repo (Hg, SVN, or Git)
• Build commit graph in MongoDB for new commits
• Build auxiliary structures
• tree structure, including all trees in a commit & last commit to modify
• linear commit runs (useful for generating history)
• commit difference summary (must be computed in Hg and Git)
• Note references to other artifacts and commits
Repo browser uses cached structure to serve pages
Commit
Tree Trees CommitRun
LastCommitDiffInfo
![Page 13: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/13.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 13
Repository Cache Lessons Learned
Using MongoDB to represent graph structures (commit graph, commit trees) requires careful query planning. Pointer-chasing is no fun!
Sometimes Ming validation and ORM overhead can be prohibitively expensive – time to drop down a layer.
Benchmarking and profiling are your friends, as are queries like {‘_id’: {‘$in’:[…]}} for returning multiple objects
![Page 14: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/14.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 14
Authorization: ProjectRole Objects
ProjectRoleDoc = collection( 'project_role', main_doc_session, Field('_id', schema.ObjectId()), Field('user_id', schema.ObjectId(), index=True), Field('project_id', schema.ObjectId(), index=True), Field('name', str), Field('roles', [schema.ObjectId()]), Index('user_id', 'project_id', 'name', unique=True) )
class ProjectRole(object): passmain_orm_session.mapper(ProjectRole, ProjectRoleDoc, properties=dict( user_id=ForeignIdProperty('User'), project_id=ForeignIdProperty('Project'), user=RelationProperty('User'), project=RelationProperty('Project’)))
![Page 15: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/15.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 15
Authorization: ProjectRole Objects
Roles can be named roles (“Groups”) or user proxies. Roles inherit all permissions of the roles they can “act as”
User membership in a group is stored on the user proxy object (the list of roles for which the user has permission)
Authorization checks all roles transitively for a user. If any role has the appropriate permission being required, then access is granted.
Hierarchical role structures are supported, but not exposed in the UI.
![Page 16: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/16.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatConfidential Geeknet, page 16
Flyway Migrations
Ming supports “lazy migrations” from one schema version to another automatically
Sometimes you want to explicitly version your DB
Flyway allows you to define various versions of your schema with pre- and post-conditions for running an “up” migration and a “down” migration
With multiple tools with interdependencies and a platform under it all, we thought we needed it
We didn’t, but it’s there and it works….
![Page 17: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/17.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 17
What We Liked Performance, performance, performance – Easily handle
90% of SF.net traffic from 1 DB server, 4 web servers
Schemaless server allows fast schema evolution in development, making many migrations unnecessary
Replication is easy, making scalability and backups easy Keep a “backup slave” running
Kill backup slave, copy off database, bring back up the slave
Automatic re-sync with master
Query Language You mean I can have performance without map-reduce?
GridFS
![Page 18: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/18.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 18
Pitfalls Too-large documents
Store less per document Return only a few fields
Ignoring indexing Watch your server log; bad queries show up there
Too much denormalization Try to use an index if all you need is a backref
Ignoring your data’s schema Using many databases when one will do Using too many queries
![Page 19: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/19.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 19
Open Source
Minghttp://sf.net/projects/merciless/
MIT License
Allurahttp://sf.net/p/allura/
Apache License
![Page 20: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/20.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 20
Future Work
mongos New Allura Tools Migrating legacy SF.net projects to Allura Stats all in MongoDB rather than Hadoop? Better APIs to access your project data
![Page 21: Allura - an Open Source MongoDB Based Document Oriented SourceForge](https://reader035.vdocuments.net/reader035/viewer/2022062220/55515bb7b4c9059f768b4b28/html5/thumbnails/21.jpg)
SourceForge | Slashdot | ThinkGeek | Ohloh | freshmeatGeeknet, page 21
Rick Copeland@rick446