descubriendo información comercial desde la web social.€¦ · descubriendo información...
TRANSCRIPT
Descubriendo información comercial desde la web social.
Sebastián A. RíosDepartamento de Ing. Industrial
[email protected]://wi.dii.uchile.cl
Thursday, January 6, 2011
Outline
• Motivación
• Social Web Sites
• Problemas y Soluciones
• Aplicaciones y Ejemplos
Thursday, January 6, 2011
Motivación
Thursday, January 6, 2011
SOY DIGITAL 2010 Reporte sobre experiencia de consumo digital en Chile. 30
A partir de la asignación de valor que dan las personas a elementos de la interfaz que ayudan a tomar decisiones de compra, vemos que los “comentarios de los usuarios” conservan el primer lugar, asimismo el tercer lugar se lo llevan “las descripciones de un blogger o medio independiente”, lo que nos da luces claras de cómo la integración de información que no sale de la tienda, es un factor relevante para sentar las condiciones que promuevan una experiencia positiva en los consumidores.
Los factores que inciden en una decisión de compra
77,5%71,7%
59,9%
51,7%
43.4%
41.1%
24.9%
24.1%
IMÁGENES
VOTOS DE LOS USUARIOS
RANKING PRODUCTOS MÁS DEMANDADOS
DESCRIPCIÓN PROPIA DEL SITIO
COMENTARIOS DE LOS USUARIOS
VIDEOS
DESCRIPCIÓN DE UN BLOGGER O MEDIO INDEPENDIENTE
COMPARACIÓN CON OTROSPRODUCTOS SIMILARES
Gráfico 07: 77,5% de las personas valoran los comentarios de otros usuarios como referentes de aprobación para tomar una decisión de compra
Más aún, cuando el 58,1% de las personas declara comentar en las redes sociales y un 47,9% recomienda sitios, videos o aplicaciones a sus amigos, lo que nos habla de una creciente tendencia hacia la participación y construcción de relaciones de influencia, enalteciendo el rol activo que están tomando las personas en el consumo.
Tus consumidores ya se anticiparon (y en tiempo real)Los usuarios son más ágiles y proactivos que lo que las marcas consideran, tienen una relación con los productos segundo a segundo que dura mucho más tiempo que los tradicionales 30 segundos de un anuncio. Así que probablemente la gran mayoría de los productos tecnológicos y artículos electrónicos disponibles en tiendas online tradicionales, ya hayan sido revisados, descritos, calificados y comentados por alguna persona o medio independiente, lo que implica que actualmente las interfaces para apoyo a la decisión de compra son altamente infecientes, en tanto las personas tienen que
CAUTIVACIÓN Y ESTÍMULO DE DECISIÓN
Fuente [2]
Thursday, January 6, 2011
Usuarios chilenos investigan en Internet antes de comprar [1].
“Con 77,5% de las menciones, los comentarios de otros usuarios sobre el producto y la comparación de éstos en otros sitios son los factores determinantes para concretar la compra, incluso para aquellas que se realizan en el comercio tradicional.”
Thursday, January 6, 2011
Social Web Sites
Thursday, January 6, 2011
• Sitio Web donde las personas se pueden compartir ideas, comentarios, experiencias, sentimientos, proyectos, etc.
¿Qué es un Social Web Site?
Thursday, January 6, 2011
Basicamente...
Thursday, January 6, 2011
!"#$%&'()*$%'Basicamente...
Thursday, January 6, 2011
PlataformasSociales
Blogging
PhotoShearing Micro
Blogging
RSS
Widgets
SocialNetworking
ChatRooms
MessageBoards
Podcasts
VideoSharing
¿Donde estan estas conversaciones?
Thursday, January 6, 2011
!"#$%&'()*+,-$,&'./&*/
01121112111'3
4$&"&/')5'")6#&6#'/78*&,'9&*':)6#7
;0211121112111
<)"8+'=>/$6&//'-$#7'8"#$%&'?8"&=))@'98A&/
B20112111
Fuente [3]
Thursday, January 6, 2011
!"#$%&&'#()*+,#-')*+.%/0#"1
2#(/)'(&+3"#$'*%&+#$+4#"/5(%+677+%8%-5/'9%&+)(:+*%):'(;+%(/"%3"%(%5"&
<9%");%+'(:'9':5)*+&)*)"=+#(+>'(1%:?(+'&+@A7BC777+
Fuente [3]
Thursday, January 6, 2011
!"#"$%&'"(#
)*$+,(-(./012")$34"5),,"()
67$-/'$8141$0)"2$9:$7/4$1;14-$)45"(.1$0/*512$/,
-/'$8/'.2$1)4,$9:<=:>?@>
A+B$CDEBFuente [3]
Thursday, January 6, 2011
!"#$%&"'"()*%"'%
+,-.///.///)0
*1(2&"%)1"')345&6
,7.///.///)0
*%"'%)84994:$5#)2);'25(
<+=
>'25()'"?433"5(2&$45%)1"')345&6
@.A//.///)0
Fuente [3]
Thursday, January 6, 2011
Problemas y Soluciones
Thursday, January 6, 2011
...y...¿cómo sacamos provecho a la web
social?
Thursday, January 6, 2011
¿Cómo obtener las preferencias de las personas desde la
Web Social?
Thursday, January 6, 2011
Encuestas y Estadísticas
• Encuestas usando social web sites
• Posteriormente obtener estadísticas y otra información valiosa a partir de ellas.
Thursday, January 6, 2011
... veamos las estrellitas!
Thursday, January 6, 2011
Reputación Web
Este producto falla Poco
Buena Imagen
Los productos son buenos
La Atención al Cliente es Excelente
Good Karma!
Thursday, January 6, 2011
... y las preferencias?
... que las estrellitas tienden siempre a ser positivas. Pormedio en YouTube (4,8 out of 5 in the
old interface)[4]
...estudios demuestran que es difícil obtener datos de buena calidad por
encuestas online
Thursday, January 6, 2011
Problemas!
¿Cómo obtener las preferencias de las personas desde la web social?
¿Cómo usar esta información para mejorar nuestros productos y servicios?
Thursday, January 6, 2011
¿Cuál es la mejor manera de hacer esto?
Lo mejor es poder extraer la información que necesitamos directamente desde los
comentarios!
Fácil de decir, pero difícil de hacer!
Thursday, January 6, 2011
¿Cómo se puede hacer?
Aplicación de Técnicas de Web Intelligence y de Análisis de
Redes Sociales [9]
Web Text Mining
Web Structure Mining
Web Usage Mining
Social Network Analysis
+
Desafío: Fácil de decir, pero difícil de hacer
Thursday, January 6, 2011
Ventajas1. Se extrae lo que los usuarios realmente
opinan
2. Es posible obtener la encuesta más grande que se quiera.
3. Es posible hacer estudios evolutivos, prácticamente con el mismo esfuerzo de desarrollo
Entre otras...
Thursday, January 6, 2011
Lo malo...
Privacidad
Ataques
Thursday, January 6, 2011
Preferencias
Blogging
PhotoShearing Micro
Blogging
RSS
Widgets
SocialNetworking
ChatRooms
MessageBoards
Podcasts
VideoSharing
1. Se puede obtener información de muchas de
estas plataformas2. Se puden cruzar datos
inter-plataforma
Thursday, January 6, 2011
Aplicaciones y Ejemplos
Thursday, January 6, 2011
Análisis Sobre Reclamos
Thursday, January 6, 2011
Objetivo
Poder encontrar conocimiento valioso respecto de los productos y servicios de las
empresas de retail usando los reclamos de las personas en sitios web sociales
Thursday, January 6, 2011
Objetivos Especificos1. Encontrar familias o tipos de reclamos
2. Encontrar tendencias en los reclamos
3. ¿Cómo evolucionaron los reclamos de las personas? (los problemas persisten, con que frequencia)
4. Identificar redes sociales afectadas por ciertos reclamos específicos
5. Conectar y buscar relaciones implicitas en redes inconexas
6. Intentar medir el impacto de los reclamos mediante el analisis de estas redes sociales
Thursday, January 6, 2011
...qué queda fuera?
Medidas Correctivas
Thursday, January 6, 2011
Valor Fundamental!
Preservaranonimato de las
personas y empresas
Thursday, January 6, 2011
¿Como se reclama hoy?
Mail directo a las empresas
Mail a Organismos Fizcalizadores
Publicación en Portal de las empresasPublicación en Sitios de Reclamos
...independiente si los reclamos son resueltos o no!
Thursday, January 6, 2011
Datos Utilizados(porcion) Reclamos de
cuatro retail entre el 2006 y 2010
Obtenidos desde un Foro de Reclamos
Retail A
Retail B
Retail C
Retail D
N. de Comentarios: 310 709 242 1125
Thursday, January 6, 2011
1. Encontrar familias o tipos de reclamos
Dos Alternativas:
A. Usar Text Mining
B. Usar Modelos de Topicos/Conceptos
Enfoque tradicional, usando vecotores de palabras y luego clusterizando e indexando por palabras clave
Avanzado, utiliza distribuciones de probabilidad sobre el texto para generar temas/conceptos de manera automática
Thursday, January 6, 2011
Por que A no sirve
Retail A
Retail B
Retail C
Retail D
N. de Comentarios: 310 709 242 1125
Tamaño Vectores Pal: 6271 12199 6931 15934
Thursday, January 6, 2011
¿Qué significa ésto?HomeThe pages of a web site are represented as
a vector of words with some frequency. Later, this vectors are used by a clustering or classification algorithm in order to find some interesting patterns.
School
Engineering
University
Student
Course
Lecture
Department
Alcohol
0.0002604
0.0002234
0.0001225
0.0000023
0.0234545
0.0002667
0.1093356
0.0000580
A Web Page
Thousands of components(experimentaly 5.000
components per web page) Analyst
???
TFIDF
Thursday, January 6, 2011
Modelos de Topicos
text mining. This way, we can discover threatening topics,
and perform specific analysis on each one.
3.1 Basic NotationLet us introduce some concepts. In the following, let V
a vector of words that defines the vocabulary to be used.
We will refer to a word w, as a basic unit of discrete data,
indexed by {1, ..., |V|}. A post message is a sequence of Swords defined by w = (w1, ..., wS
), where wsrepresents the
sthword in the message. Finally, a corpus is defined by a
collection of P post messages denoted by C = (w1, ...,w|P|).
A vectorial representation of the posts corpus is given by
TF-IDF= (mij), i ∈ {1, . . . , |V|} and j ∈ {1, . . . , |P|} , where
mij is the weight associated to whether a given word is more
important than another one in a document. The mij weights
considered in this research are defined as an improvement of
the tf-idf term [26] (Term Frequency times inverse documentfrequency), defined by
mij = fij(1 + sw(i))× log
„|C|ni
«(1)
where fij is the frequency of the ith word in the jthdoc-
ument, sw(i) is a factor of relevance associated of word iin a set of words, and ni is the number of documents con-
taining word i. In this case, sw(i) =wi
post|P| , where wi
post
is the frequency of word i over all documents, and |P| is
the total amount of posts. The tf-idf term is a weighted
representation of the importance of a given word in a doc-
ument that belongs to a collection of documents. The termfrequency (TF) indicates the weight of each word in a doc-
ument, while the inverse document frequency (IDF) states
whether the word is frequent or uncommon in the document,
setting a lower or higher weight respectively.
3.2 Topic ModelingA topic model can be considered as a probabilistic model
that relates documents and words through variables which
represent the main topics infered from the text itself. In
this context, a document can be considered as a mixture of
topics, represented by probability distributions which can
generate the words in a document given these topics. The
infering process of the latent variables, or topics, is the key
component of this model, whose main objective is to learn
from text data the distribution of the underlying topics in a
given corpus of text documents.
A main topic model is the Latent Dirichlet Allocation
(LDA) [3, 4, 9]. LDA is a Bayesian model where latent
topics of documents are infered from estimated probabil-
ity distributions over the training dataset. The key idea of
LDA, is that every topic is modeled as a probability distri-
bution over the set of words represented by the vocabulary
(w ∈ V), and every document as a probability distribution
over a set of topics (T ). These distributions are sampled
from multinomial Dirichlet distributions.
As described by [4], the latent Dirichlet allocation model
can be represented as a probabilistic generative process de-
scribed by the following sequence of events,
1. Choose a number of multinomial S (S ∼ Poisson(ξ))which represents the amount of words in a given mes-
sage.
2. Choose θ ∼ Dir(α).
3. For each ws ∈ w.
(a) Choose a topic zs ∼ Multinomial(θ).
(b) Choose a word wsfrom p(ws|zs, β), a multinomial
probability conditioned on the topic zs.
where the final set of topics T is built by the top k topics
zs of n words, for which k and n must be defined a-priori in
the experimental setup.
For LDA, given the smoothing parameters β and α, and
a joint distribution of a topic mixture θ, the idea is to de-
termine the probability distribution to generate from a set
of topics T , a message composed by a set of S words w(w = (w1, ..., wS
)),
p(θ, z,w|α, β) = p(θ|α)
SY
s=1
p(zs|θ)p(ws|zs, β) (2)
where p(zs|θ) can be represented by the random variable
θi, such that topic zs is presented in document i (zis = 1).
A final expression can be deduced by integrating equation 2
over the random variable θ and summing over topics z ∈mathcalT . Given this, the marginal distribution of a mes-
sage can be defined as follows:
p(w|α, β) =
Zp(θ|α)
SY
s=1
X
zs∈T
p(zs|θ)p(ws|zs, β)
!dθ (3)
The final goal of LDA is to estimate previously described
distributions to build a generative model for a given cor-
pus of messages. There are several methods developed for
making inference over these probability distributions such as
variational expectation-maximization [4], a variational dis-
crete approximation of equation 3 empirically used by [28],
and by a Gibbs sampling Markov chain Monte Carlo model
[8] which have been efficiently implemented and applied by
[19].
3.3 Network ConfigurationTo build the social network, the members’ interaction
must be taken into consideration. In general, members’ ac-
tivity is followed according to its participation on the forum.
Likewise, participation appears when a member post in the
community. Because the activity of the VCoI is described
according members’ participation, the network will be con-
figured according to the following: Nodes will be the VCoI
members, and arcs will represent interaction between them.
How to link the members and how to measure their interac-
tions to complete the network is our main concern.
In this work, will be describe two VCoIs’ network repre-
sentation according the following replying schema of mem-
bers:
1. Creator-oriented Network: When a member create
a thread, every reply will be related to him/her.
2. Last Reply-oriented Network: Every reply of a
thread will be a response of the last post.
In figure 5 the latter two approaches of network conver-
sion of the forum is presented. In figure 5, arcs represents
members’ replies and nodes represent the users who made
the posts. In our first approach, the weight of arcs will be a
text mining. This way, we can discover threatening topics,
and perform specific analysis on each one.
3.1 Basic NotationLet us introduce some concepts. In the following, let V
a vector of words that defines the vocabulary to be used.
We will refer to a word w, as a basic unit of discrete data,
indexed by {1, ..., |V|}. A post message is a sequence of Swords defined by w = (w1, ..., wS
), where wsrepresents the
sthword in the message. Finally, a corpus is defined by a
collection of P post messages denoted by C = (w1, ...,w|P|).
A vectorial representation of the posts corpus is given by
TF-IDF= (mij), i ∈ {1, . . . , |V|} and j ∈ {1, . . . , |P|} , where
mij is the weight associated to whether a given word is more
important than another one in a document. The mij weights
considered in this research are defined as an improvement of
the tf-idf term [26] (Term Frequency times inverse documentfrequency), defined by
mij = fij(1 + sw(i))× log
„|C|ni
«(1)
where fij is the frequency of the ith word in the jthdoc-
ument, sw(i) is a factor of relevance associated of word iin a set of words, and ni is the number of documents con-
taining word i. In this case, sw(i) =wi
post|P| , where wi
post
is the frequency of word i over all documents, and |P| is
the total amount of posts. The tf-idf term is a weighted
representation of the importance of a given word in a doc-
ument that belongs to a collection of documents. The termfrequency (TF) indicates the weight of each word in a doc-
ument, while the inverse document frequency (IDF) states
whether the word is frequent or uncommon in the document,
setting a lower or higher weight respectively.
3.2 Topic ModelingA topic model can be considered as a probabilistic model
that relates documents and words through variables which
represent the main topics infered from the text itself. In
this context, a document can be considered as a mixture of
topics, represented by probability distributions which can
generate the words in a document given these topics. The
infering process of the latent variables, or topics, is the key
component of this model, whose main objective is to learn
from text data the distribution of the underlying topics in a
given corpus of text documents.
A main topic model is the Latent Dirichlet Allocation
(LDA) [3, 4, 9]. LDA is a Bayesian model where latent
topics of documents are infered from estimated probabil-
ity distributions over the training dataset. The key idea of
LDA, is that every topic is modeled as a probability distri-
bution over the set of words represented by the vocabulary
(w ∈ V), and every document as a probability distribution
over a set of topics (T ). These distributions are sampled
from multinomial Dirichlet distributions.
As described by [4], the latent Dirichlet allocation model
can be represented as a probabilistic generative process de-
scribed by the following sequence of events,
1. Choose a number of multinomial S (S ∼ Poisson(ξ))which represents the amount of words in a given mes-
sage.
2. Choose θ ∼ Dir(α).
3. For each ws ∈ w.
(a) Choose a topic zs ∼ Multinomial(θ).
(b) Choose a word wsfrom p(ws|zs, β), a multinomial
probability conditioned on the topic zs.
where the final set of topics T is built by the top k topics
zs of n words, for which k and n must be defined a-priori in
the experimental setup.
For LDA, given the smoothing parameters β and α, and
a joint distribution of a topic mixture θ, the idea is to de-
termine the probability distribution to generate from a set
of topics T , a message composed by a set of S words w(w = (w1, ..., wS
)),
p(θ, z,w|α, β) = p(θ|α)
SY
s=1
p(zs|θ)p(ws|zs, β) (2)
where p(zs|θ) can be represented by the random variable
θi, such that topic zs is presented in document i (zis = 1).
A final expression can be deduced by integrating equation 2
over the random variable θ and summing over topics z ∈mathcalT . Given this, the marginal distribution of a mes-
sage can be defined as follows:
p(w|α, β) =
Zp(θ|α)
SY
s=1
X
zs∈T
p(zs|θ)p(ws|zs, β)
!dθ (3)
The final goal of LDA is to estimate previously described
distributions to build a generative model for a given cor-
pus of messages. There are several methods developed for
making inference over these probability distributions such as
variational expectation-maximization [4], a variational dis-
crete approximation of equation 3 empirically used by [28],
and by a Gibbs sampling Markov chain Monte Carlo model
[8] which have been efficiently implemented and applied by
[19].
3.3 Network ConfigurationTo build the social network, the members’ interaction
must be taken into consideration. In general, members’ ac-
tivity is followed according to its participation on the forum.
Likewise, participation appears when a member post in the
community. Because the activity of the VCoI is described
according members’ participation, the network will be con-
figured according to the following: Nodes will be the VCoI
members, and arcs will represent interaction between them.
How to link the members and how to measure their interac-
tions to complete the network is our main concern.
In this work, will be describe two VCoIs’ network repre-
sentation according the following replying schema of mem-
bers:
1. Creator-oriented Network: When a member create
a thread, every reply will be related to him/her.
2. Last Reply-oriented Network: Every reply of a
thread will be a response of the last post.
In figure 5 the latter two approaches of network conver-
sion of the forum is presented. In figure 5, arcs represents
members’ replies and nodes represent the users who made
the posts. In our first approach, the weight of arcs will be a
La finalidad de estos modelos es estimar alguna distribución de probabilidad
LDA Method
De este modo se encuentran conjuntos de terminos que con alta probabilidad ocurren simultaneamente.
Estos son llamados Topicos/Conceptos
Thursday, January 6, 2011
Resultados LDA sobre los Reclamos
Topic 1th: Intereses Excesivos.Cobros
Inadecuados.Ingreso a DICOM
cuenta 0.031379 deuda 0.022933 XXXX 0.020842 pagar 0.020600 mes 0.018268 pago 0.016739
meses 0.012235dicom 0.006282 fecha 0.006202
cobranza 0.006121...40 terminos más...
Topic 2th: Problemas en la Garantia y Garantía
Extendida
servicio 0.029304 tecnico 0.026330 garantia 0.026220
producto 0.024568 XXXX 0.023247 compre 0.018071 tienda 0.017741 cambio 0.015318 equipo 0.012675
problema 0.010363 falla 0.007830
...39 terminos más ...
Thursday, January 6, 2011
2. Encontrar tendencias en los reclamos
• Se obtubieron 20 topicos para los datos de los Retail A, B, C y D en conjunto
Retail A
Retail B
Retail C
Retail D
N. de Comentarios: 310 709 242 1125
Tamaño Vectores Pal: 6271 12199 6931 15934
Topicos: 17 17 11 16
Thursday, January 6, 2011
Intersección?
• RECLAMOS POR ALTOS INTERESES PRODUCTO DEL NO PAGO DE LA TARJETA/INGRESO A DICOM.
• RECLAMOS POR ERRORES EN LA PUBLICIDAD PUBLICADA. NO SE HACE EL DESCUENTO O NO EXISTE EL DESCUENTO O PROMOCIóN.
• RECLAMO POR MAL TRATO POR PARTE DE LOS GUARDIAS DE SEGURIDAD
Thursday, January 6, 2011
3. ¿Cómo evolucionaron los reclamos de las personas? [6]Fuzzy ClusteringBasado en la Regla de Nakanishi para la Composición difusa [Nakanishi’93]
Sea R(U , V ) & S(V , W ) dos relaciones difusas que comparten un conjunto común V
Sea μR (x , y ) & μS (y , z ) funciones de pertenencia respectivas
Entonces:
μR◦S = ⋁{μR (x , y ) ∧ μS (y , z )}
Donde ⋁ es la suma limitada, Sum = min(1, x + y )
y ∧ es el producto algebraico, prod = x y
Thursday, January 6, 2011
Ejemplo 1
Ejemplo: para retail D desde el año 2006 a Dic 2010
TOPIC 1TH: RECLAMO POR MALA CALIDAD DE IMAGEN EN TVS LCDTOPIC 2TH: RECLAMO POR DAñO EN CAMAS/COLCHONESTOPIC 3TH: NO HAY RESPUESTA EN EL SERVICIO TELEFóNICO
Esto también es buen indicador del impacto de un reclamo en el tiempo
Thursday, January 6, 2011
Ejemplo 2
TOPIC 3TH: NO HAY RESPUESTA EN EL SERVICIO TELEFóNICOTOPIC 4TH: MALA ATENCIóN AL CLIENTE EN LA ATENCIóN TELEFóNICA.TOPIC 6TH: RECLAMO POR EMBARGO DE BIENESTOPIC 7TH: RECLAMO FRAUDES CON TARJETAS DE RETAIL/PERDIDA DE TARJETAS.
Thursday, January 6, 2011
4. Identificar redes sociales afectadas por ciertos reclamos específicos
Redes Topicas - Por Reclamos
TOPIC 3TH: NO HAY RESPUESTA EN EL SERVICIO TELEFóNICO
TOPIC 4TH: MALA ATENCIóN AL CLIENTE EN LA ATENCIóN TELEFóNICA. TOPIC 7TH: RECLAMO
FRAUDES CON TARJETAS DE RETAIL/PERDIDA DE TARJETAS.
Thursday, January 6, 2011
4. Identificar redes sociales afectadas por ciertos reclamos específicos
Redes Topicas - Por Reclamos
TOPIC 3TH: NO HAY RESPUESTA EN EL SERVICIO TELEFóNICO
TOPIC 4TH: MALA ATENCIóN AL CLIENTE EN LA ATENCIóN TELEFóNICA. TOPIC 7TH: RECLAMO
FRAUDES CON TARJETAS DE RETAIL/PERDIDA DE TARJETAS.
Thursday, January 6, 2011
4. Identificar redes sociales afectadas por ciertos reclamos específicos
Redes Topicas - Por Reclamos
TOPIC 3TH: NO HAY RESPUESTA EN EL SERVICIO TELEFóNICO
TOPIC 4TH: MALA ATENCIóN AL CLIENTE EN LA ATENCIóN TELEFóNICA. TOPIC 7TH: RECLAMO
FRAUDES CON TARJETAS DE RETAIL/PERDIDA DE TARJETAS.
Thursday, January 6, 2011
4. Identificar redes sociales afectadas por ciertos reclamos específicos
Redes Topicas - Por Reclamos
TOPIC 3TH: NO HAY RESPUESTA EN EL SERVICIO TELEFóNICO
TOPIC 4TH: MALA ATENCIóN AL CLIENTE EN LA ATENCIóN TELEFóNICA. TOPIC 7TH: RECLAMO
FRAUDES CON TARJETAS DE RETAIL/PERDIDA DE TARJETAS.
Thursday, January 6, 2011
... y las interacciones sociales
Reclamo Inicial
Opiniones/Comentarios
will be the VCoP members, and arcs will represent interaction between them.
How to link the members and how to measure their interactions to complete the
network is our main concern.
Figure 1: A typical forum structure, in circles are the users who posted.
In this work, will be describe two VCoPs’ network representation according
the following replying schema of members:
1. Creator-oriented Network: When a member create a thread, every
reply will be related to him/her. This network representation is the less
dense network (density is measured in terms of the number of arcs that
the network have).
2. Last Reply-oriented Network: Every reply of a thread will be a re-
sponse of the last post. This network representation has a middle density.
3. All Previous-oriented Network: Every reply of a thread will be a
response to all posts which are already in a specific thread. This network
representation is the most dense network.
In figure 2 the latter three approaches of network representations of the
forum are presented. In figure 2, arcs represents members’ replies and nodes
represent the members who made the posts. In a traditional approach, the
weight of arcs will be a simple counter of how many times a given member
replies to other one.
In order to consider the reply of members according to the community pur-
pose (for any of these configurations), and to filter noisy posts, both concept
based and topic based message reduction is performed.
11
Figure 2: Three different network models to represent a given forum interaction.
3.6. Concept-based & Topic-based Network Filtering
Previous work (Rıos et al., 2009) brings a method to evaluate community
goals accomplishment. In this work we will use this approach to classify the
members’ posts according VCoPs’ goals. These goals are defined as a set of
terms, which are composed by a set of keywords or statements in natural lan-
guage.
The idea is to compare with euclidean distance two members’ posts. If the
distance is over a certain threshold θ, an interaction will be considered between
them. We support the idea that this will help us to avoid irrelevant interactions.
For example, in a VCoP with k goals (or topics), let Pj a post of user j that
is a reply to post of user i (Pi). The distance between them will be calculated
with Equation 7.
dm(Pi, Pj) =
�k gikgjk��
k g2ik
�k g
2jk
(7)
Where gik is the score of topic k in post of user i. It is clear that the distanceexists only if Pj is a reply to Pi. After that, the weight of arc ai,j is calculated
according to equation 8.
ai,j =�
i,jdm(Pi,Pj)≥θ
d(Pi, Pj) (8)
We used this weight in all three configurations previously described (Creator-
oriented, Last Reply-oriented & All Previous-oriented). Afterwards, we applied
HITS (Kleinberg, 1999) to find the key members on the different network con-
figurations.
3.7. Network Construction
Algorithm 3.1 presents the pseudo-code on how a the graph Gc = (N ,A) is
build by using the Creator-oriented network, and algorithm 3.3 presents how it
12
Figure 2: Three different network models to represent a given forum interaction.
3.6. Concept-based & Topic-based Network Filtering
Previous work (Rıos et al., 2009) brings a method to evaluate community
goals accomplishment. In this work we will use this approach to classify the
members’ posts according VCoPs’ goals. These goals are defined as a set of
terms, which are composed by a set of keywords or statements in natural lan-
guage.
The idea is to compare with euclidean distance two members’ posts. If the
distance is over a certain threshold θ, an interaction will be considered between
them. We support the idea that this will help us to avoid irrelevant interactions.
For example, in a VCoP with k goals (or topics), let Pj a post of user j that
is a reply to post of user i (Pi). The distance between them will be calculated
with Equation 7.
dm(Pi, Pj) =
�k gikgjk��
k g2ik
�k g
2jk
(7)
Where gik is the score of topic k in post of user i. It is clear that the distanceexists only if Pj is a reply to Pi. After that, the weight of arc ai,j is calculated
according to equation 8.
ai,j =�
i,jdm(Pi,Pj)≥θ
d(Pi, Pj) (8)
We used this weight in all three configurations previously described (Creator-
oriented, Last Reply-oriented & All Previous-oriented). Afterwards, we applied
HITS (Kleinberg, 1999) to find the key members on the different network con-
figurations.
3.7. Network Construction
Algorithm 3.1 presents the pseudo-code on how a the graph Gc = (N ,A) is
build by using the Creator-oriented network, and algorithm 3.3 presents how it
12
Figure 2: Three different network models to represent a given forum interaction.
3.6. Concept-based & Topic-based Network Filtering
Previous work (Rıos et al., 2009) brings a method to evaluate community
goals accomplishment. In this work we will use this approach to classify the
members’ posts according VCoPs’ goals. These goals are defined as a set of
terms, which are composed by a set of keywords or statements in natural lan-
guage.
The idea is to compare with euclidean distance two members’ posts. If the
distance is over a certain threshold θ, an interaction will be considered between
them. We support the idea that this will help us to avoid irrelevant interactions.
For example, in a VCoP with k goals (or topics), let Pj a post of user j that
is a reply to post of user i (Pi). The distance between them will be calculated
with Equation 7.
dm(Pi, Pj) =
�k gikgjk��
k g2ik
�k g
2jk
(7)
Where gik is the score of topic k in post of user i. It is clear that the distanceexists only if Pj is a reply to Pi. After that, the weight of arc ai,j is calculated
according to equation 8.
ai,j =�
i,jdm(Pi,Pj)≥θ
d(Pi, Pj) (8)
We used this weight in all three configurations previously described (Creator-
oriented, Last Reply-oriented & All Previous-oriented). Afterwards, we applied
HITS (Kleinberg, 1999) to find the key members on the different network con-
figurations.
3.7. Network Construction
Algorithm 3.1 presents the pseudo-code on how a the graph Gc = (N ,A) is
build by using the Creator-oriented network, and algorithm 3.3 presents how it
12
Thursday, January 6, 2011
Ejemplo 1Retail
D
TOPIC 3TH: NO HAY RESPUESTA EN EL SERVICIO TELEFóNICO
Se puede visualizar facilmente el imapcto de los reclamos
Thursday, January 6, 2011
5. Conectar y buscar relaciones implicitas en redes inconexas
Retail D
TOPIC 3TH: NO HAY RESPUESTA EN EL SERVICIO TELEFóNICO
Existe una Relación Implicita
Thursday, January 6, 2011
Fusión de Redes InconexasRetail
D
TOPIC 3TH: NO HAY RESPUESTA EN EL SERVICIO TELEFóNICO
Nueva Red: Formada por la Fusion de dos grupos
Thursday, January 6, 2011
6. Medir el impacto de los comentarios mediante el analisis
de estas redes sociales
• Una vez que se obtiene la red es posible:
• Calcular los nodos que más impactan la red global usando varios algoritmos como: HITS o PageRank
• Obtener la densidad de la red
• In-degree y out-degree (grados de separación)
• Entre otros
Thursday, January 6, 2011
Conclusión
Thursday, January 6, 2011
Conclusión
• Desde Sitios Web Sociales Es posible encontrar información que pude ser usada para mejorar los procesos internos de provisión de bienes, servicios, etc.
Thursday, January 6, 2011
Referencias1) “Usuarios chilenos investigan en internet antes de comprar”, Reportaje en Cooperativa.cl,
10th jan. 2010
2) “Reporte soy Digital 2010”, AyerViernes, www.facebook.com/ayerviernes
3) “Social Media”, Veronica Peng
4) “Building Web Reputation Systems”, Google Tech Talk, Randy Farmer, 1st Jul 2010
5) “Chapter 9: Social Web Mining” on Advanced Techniques on Web Intelligence, Sebastián A. Ríos & Felipe Aguilera, Chapter of Book, Springer Verlag, 2010, 284 pp.
6) “Virtual Communities of Practice’s Purpose Evolution Analysis Using a Concept-Based Mining Approach”, Sebastián A. Ríos, Felipe Aguilera and Luis A. Guerrero, Knowledge-Based Intelligent Information and Engineering Systems - Part II; Lecture Notes in Computer Science, 2009 vol. 5712 pp. 480-489
Thursday, January 6, 2011