content rating behind the firewall

Content rating behind the firewall

April 2011

Presented for SIKM by David Thomas, Deloitte

Copyright © 2009 Deloitte Development LLC. All rights reserved.2

Dave Thomas is a Product Manager for the U.S. Intranet at Deloitte. He previously worked in client service at Deloitte Consulting in their SharePoint Practice and as a Project Manager for Global Consulting Knowledge Management (GCKM) on their Portal: the Knowledge Exchange (KX).

Background

• KX is available globally and attracts over 20,000 unique visitors per month• KX is built on a heavily customized version of SharePoint 2003• Users typically come a few times a month to retrieve documents, utilize communities

and complete other knowledge related tasks


•Content rating has been common on the internet for some time but there seem to be limited examples of successful rating systems behind the firewall. Internal usage of ratings at our organization is historically <5% on the two platforms that it was deployed on

•Stan Garfield posed the question “Has anyone had positive experiences with 1-5 star content rating mechanisms inside a firewall?”January 2010. Here are selection of responses (thank you to SIKM members):

– “I think that 5-star rating systems are ideal for apples to apples comparisons. Most knowledge objects (and of course people) cannot be compared in this manner”

– “I think there is an added complication in that inside the firewall it might also be important to know who is doing the rating. The CEO's rating might carry a little more weight than the janitor's”

– “With process documents, I'd like the idea of ratings because the purpose of the document is clear. For other documents, the case is muddier.

– “rating a book or toaster is very different from rating a specific piece of content. Products purchased on Amazon tend to have more common use cases”

– On a deployed CR system: ”At first, pretty much no one rated. We suspect this was for several of the reasons that you pose in your document but also because it was unclear what they were being asked to rate - the quality of the writing, whether or not they agreed with the author, whether or not they thought highly of the author, or whether they liked the quality of the document. In an effort to encourage participation, the sponsors clarified the intent of the ratings”



•In March 2010, a project was initiated to provide a content rating system that was integrated into the Portal. The perceived value was that a content rating system would allow stronger rated content to be easily identified and promoted accordingly. Later on, weaker content could be removed /archived earlier (separate project)

•Our Portal runs on SharePoint 2003 so this project entailed custom work (no 3rd party webparts available) . The rating system integrated with

– Published content pages giving ability to rate

– Search user interface allowing retrieval based on rating

•The first part of the project was research on various options for rating. Determined that most rating systems fell into one of three buckets:

• ‘favorites and flags’

• ‘this or that’

• ‘rating schemes’



Design: Single value rating scheme - usually positive

Type 1: ‘Favorites and Flags’


Design: Positive/Negative value: Yes/No, Like/Dislike, Up/Down

Type 2: ‘This or That’


Type 3: Rating Schemes

Design: Traditional 1-X rating scheme (1-5 and 1-10 are common)


•‘Inside the firewall’ is not ‘outside the firewall’ – user behavior might to be different

• Scale could be an issue (will there be enough people rating enough content to be meaningful)

• There is a lag between the time a knowledge asset is accessed and the time a rating can fairly be made. The user may also no longer be logged into the repository at the time the rating could be applied• When someone watches a short video they watch and can rate quickly in most cases as the

rating mechanism is often easy and convenient. They have consumed the media asset and are positioned to make a judgment on it. The value of a document is not known until after it has been downloaded and read and that can take time.

• We could experience cultural resistance when trying to implement content rating• Anonymity• Lack of desire to rate content as poor will likely be evident• People are not used to rating content inside the firewall

Some of the concerns identified pre-deployment


•“Eight of these graphs have what is known to reputation system aficionados as J-curves- where the far right point (5 Stars) has the very highest count, 4-Stars the next, and 1-Star a little more than the rest.”

•“a J-curve is considered less-than ideal for several reasons: The average aggregate scores all clump together between 4.5 to 4.7 and therefore they all display as 4- or 5-stars and are not-so-useful for visually sorting between options. Also, this sort of curve begs the question: Why use a 5-point scale at all? Wouldn't you get the same effect with a simpler thumbs-up/down scale, or maybe even just a super-simple favorite pattern?”

•“If a user sees an object that isn't rated, but they like, they may also rate and/or review, usually giving 5-stars - otherwise why bother - so that others may share in their discovery. People don't think that mediocre objects are worth the bother of seeking out and creating internet ratings”

•“There is one ratings curve not shown here, the U-curve, where 1 and 5 stars are disproportionately selected”

Typical ratings distributions

Outside the firewall: generally ‘J Curves’ exist . The authors of Building Web Reputation Systems did research on ratings of various Yahoo sites

• Product or service based sites with either a) tightly nit communities or b) Incentivization or c) huge user groups can generate U curves also (Amazon.com is often cited as an example)

http://www.amazon.com/Building-Reputation-Systems-Randy-Farmer/dp/059615979X

http://buildingreputation.com/doku.php?id=chapter_3&s%5B%5D=autos


•One of the groups evaluated (custom autos) generated a ‘W Curve’. This actually represented a preferred distribution for our deployment and we later speculated on whether we would achieve it.

•“The biggest difference is most likely that Autos Custom users were rating each other's content. The other sites had users evaluating static, unchanging or feed-based content in which they don't have a vested interest”

•“Looking more closely at how Autos Custom ratings worked and the content was being evaluated showed why 1-stars were given out so often: users were providing feedback to other users in order to get them to change their behavior. Specifically, you would get one star if you 1) Didn't upload a picture of your ride, or 2) uploaded a dealer stock photo of your ride”

•“The 5-star ratings were reserved for the best-of-the-best. Two through Four stars were actually used to evaluate quality and completeness of the car's profile. Unlike all the sites graphed here, the 5-star scale truly represented a broad sentiment and people worked to improve their scores.”

Typical ratings distributions cont..d

Deciding on a content rating design


Deciding on a content rating design

Question

Do users update and try and improve their content over time?

No, once they contribute that’s it

Would users rate other peoples content negatively? Would they expect a change be made to the underlying deliverable?

• Don’t expect many users to rate content negatively • If users did it could be because the content didn’t meet their needs for a given situation, not that it is poor content necessarily

Do users have a vested interest in rating something ? (what's in it for them?)

Not clear what the value proposition is to rate something. Rating will improve ability for the broader population, but there is no immediate incentive

Is there a tight nit group who will populate content ratings or sense of ‘I should rate this content’?

No, but we do have a reasonably large number of users, but they have limited time

Is our organization culturally disposed toward positive ratings only…?

In my experience yes

Based on the W distribution example, we asked some questions to determine whether a 1-5 rating scheme would work and we could get the desired W.

Ultimately, we decided to custom develop a 1-5 Rating Scheme (Type 3). There were other drivers identified on that drove this decision.


•The business drivers for implementing a 1-5 Rating scheme• Simple, familiar model to rate published content• Granularity – ability to get and average score and promote / remove content as needed• Alignment with SharePoint 2010 (future platform) reduced disruption for the user when we moved

•Resource constraints meant some deferral of some functionality for future releases. For the first release:• Single classification of knowledge asset - published content. (Other types would later follow).

• Rating occurred on the content record only

• No mechanism for comments (even though they often go hand in hand with ratings) • concern on moderation team requirements, some risk aversion, worried comments would either not be used (people not

comfortable) or perhaps inappropriate in some cases

• We didn’t give explicit guidance on what each of the ratings meant – just used the ‘1. Not recommended –5. Highly recommended’ nomenclature. Suggestion to provide something like below was not pursued:

• Marketing and promotion at the launch of rating meant that the rating activity was essentially incentivized for the user. This had an impact on the usage as you will see.

Deploying a rating system

Content rating data


Jun-

10

Jul-1

0

Aug-1

0

Sep-1

0

Oct-1

0

Nov-1

0

Dec-1

0

Jan-

11

Feb-1

10

200

400

600

800

1000

1200

1400

1600

1800

2000 18961746

423501 550

1491

1114

367281

Commentary

•Average of 836 unique pieces of content each month (about 2.5% of content available) was rated.

•Additional rating capabilities were deployed for qualifications in October/November – potentially raised awareness around rating in general.

•KX has seasonality effects in user visits

Rating events / unique pieces of content rated

Jun-

10

Jul-1

0

Aug-1

0

Sep-1

0

Oct-1

0

Nov-1

0

Dec-1

0

Jan-

11

Feb-1

10

500

1000

1500

2000

2500

30002662

2227

465 550 601

1687

1210

706

299


Commentary

• This graph normalizes the absolute number of ratings to the page views for item. The long-term average is around 1-1.5%.

Content Conversion

Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-110.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

3.50%3.20%

2.67%

0.56% 0.59% 0.65%

1.95%1.72%

0.77%

0.32%


Ratings per user / monthly new users

Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Jan-11 Feb-110.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

2.822.99

1.952.28

2.55

3.013.28

2.87

1.42

Commentary

•Average of around 2.75 ratings applied by each user.

•Average of around 270 new raters each month although heavily skewed by the first 2 months.

•There are repeat raters using the system. Current new rater run-rate is around 100 users a month

Jun-10 Jul-10 Aug-10Sep-10 Oct-10 Nov-10Dec-10 Jan-11 Feb-110

100

200

300

400

500

600

700

800

900

1000 944

591

143 116 134

386

198

109 87


Average score and rating distribution

FY11 Jun

FY11 Jul

FY11 Aug

FY11 Sep

FY11 Oct

FY11 Nov

FY11 Dec

FY11 Jan

FY11 Feb

0.0%

10.0%

20.0%

30.0%

40.0%

50.0%

60.0%

70.0%

80.0%

90.0%

100.0%

72.3% 74.7% 73.4%

61.6% 60.7%

72.9% 72.0%

88.8%

76.6%

% 4 or 5 star ratings

FY11 Jun

FY11 Jul

FY11 Aug

FY11 Sep

FY11 Oct

FY11 Nov

FY11 Dec

FY11 Jan

FY11 Feb

3.91

3.92

3.93

3.94

3.95

3.96

3.97

3.98

3.99

4.003.99 3.99

3.98

3.97

3.95

3.94 3.94

3.99 3.99

Average rating

Commentary

•There is some fluctuation in the 4 and 5 ratings but the long-term average is 73%.

•Average rating is extremely steady and has been from month1.


Cumulative table of resultsCategory 1 month 6 months All data (10

months)

Total Ratings 3102 9361 10796

Unique Content Items Rated

2268 5960 6498

Unique Raters 1020 2504 2843

Average Rating 3.99 3.99 3.99

% Ratings 4 or 5 70% 72.1% 73.5%

2 or more ratings applied

1.5% 3.7% 4.2%

5 or more ratings applied

0.3% 1% 1.3%

1 2 3 4 50%

5%

10%

15%

20%

25%

30%

35%

40%

Content Rating distribution (All Data)

What did we learn from the experience?


•You can build a custom simple content rating system and it will get some meaningful use: Over the last ten months, 10000+ rating events following a fairly typical J-Curve distribution with ~70% of ratings a 4 or a 5.

•There are no real benchmarks for ‘success’. Project set target was 1-3% of viewed content would be rated. Noted if 1-2% of our MUVs rate content that equates to 2500 total ratings a year. (YouTube Rating from 0.1 – 0.5% of viewers is common (sign in to rate has impact?))

•Value for knowledge assets can be situational – “one mans trash is another mans treasure”. Without comments system it is difficult to understand why something is rated a certain score.

•Feel that our users are pre-disposed to rate a lot of content 3-4. They get the concept of best in class. We have firm wide methodologies that are broadly used – they would equate that with best in class/ 5 star.

•Experienced excessive rating and of course, self rating.

•Still some level of fear that if they rate something a 1, then the document author will find out – to the extent that we put that in the FAQs for the system to address this.

•Incentivization had an impact. Require more data to see exactly how much.

What did we learn from the experience? Can content rating happen behind the firewall effectively?


What else could be done in the future?

•Identification and promotion of high quality content in a meaningful way. Some challenges given the sheer volume of content and the distribution of our business and the interests of individual users.

•A model for removal/archive of ‘low quality content’. Business rule definition is still outstanding around this. Basic idea: If there is compelling evidence (multiple ratings) look to possibly retire the content earlier.

•Additional marketing and promotions, sponsorship. Incentivization will drive a temporary increase in rating activity, but is not sustainable.

•Demo rating as part of onboarding materials for new hires – set it as an expectation to rate assets that are used. This should be part of a broader initiative about the value of a knowledge sharing culture.

•Authored or contributed content appears on your profile. Add rating of that content as well.

•Deployment of a DVD by mail rating type model for rating (‘Blockbuster ‘). Essentially these are communications asking the user to take a proactive step of rating a consumed asset. Note: This was discussed but de-scoped as the only options at the time were heavily manual.


Closing thoughts and Q&A

• Still feel there value in content rating we experienced both technical and organizational challenges implementing it behind the firewall but learned from the experience and at least the capability is there for users to use.

• Rating is built into common platforms that we use for document management, collaboration and knowledge sharing. In theory if rating is pervasive, and if users see it all the time, they *may* use it more.

•User behavior was generally consistent• Generally more ratings were applied by junior – mid level users (time, familiarity with the Portal)• Common for users to rate items a ‘4’ or a ‘5’• We saw long term usage is around 1% of page views and 2% of total content in the content

store rated in any one month– Strong impact when the process is incentivized (not that surprising)– Concern about the time it would take to get meaningful ratings on a lot of the content on the

Portal• Outliers are likely ‘power raters’ although might not be self-directed!


‘“Seems like when it comes to ratings it's pretty much all or nothing.Great videos prompt action; anything less prompts indifferenceThus, the ratings system is primarily being used as a seal of approval,not as an editorial indicator of what the community thinks out a video. Rating a video joins favoriting and sharing as a way to tell the world that this is something you love”. (22/9/09 YouTube blog)

You-tube’s position on rating drove a change for them…

Questions? Feel free to contact [email protected]

http://youtube-global.blogspot.com/2009/09/five-stars-dominate-ratings.html

mailto:[email protected]

content rating behind the firewall

Technology

deloitte consulting

knowledge exchange kx

knowledge related tasks

project manager

product manager

sharepoint practice

david thomas

backgrounddave thomas