Essec\Faculty\Model\Contribution {#2233 ▼
#_index: "academ_contributions"
#_id: "12431"
#_source: array:26 [
"id" => "12431"
"slug" => "12431-completely-unsupervised-opinion-mining-in-online-professional-groups-discussions"
"yearMonth" => "2021-07"
"year" => "2021"
"title" => "Completely Unsupervised Opinion Mining in online (professional groups’) discussions"
"description" => "MANSOURI, J., CAVARRETTA, F., SWAILEH, W. et KOTZINOS, D. (2021). Completely Unsupervised Opinion Mining in online (professional groups’) discussions. Dans: Network 2021 (Sunbelt & Netsci joint conference). Indiana University.
MANSOURI, J., CAVARRETTA, F., SWAILEH, W. et KOTZINOS, D. (2021). Completely Unsupervised Opinion Mi
"
"authors" => array:4 [
0 => array:3 [
"name" => "CAVARRETTA Fabrice"
"bid" => "B00119673"
"slug" => "cavarretta-fabrice"
]
1 => array:1 [
"name" => "MANSOURI Jafar"
]
2 => array:1 [
"name" => "SWAILEH Wassim"
]
3 => array:1 [
"name" => "KOTZINOS Dimitris"
]
]
"ouvrage" => "Network 2021 (Sunbelt & Netsci joint conference)"
"keywords" => []
"updatedAt" => "2022-06-03 14:59:48"
"publicationUrl" => null
"publicationInfo" => array:3 [
"pages" => ""
"volume" => ""
"number" => ""
]
"type" => array:2 [
"fr" => "Communications dans une conférence"
"en" => "Presentations at an Academic or Professional conference"
]
"support_type" => array:2 [
"fr" => null
"en" => null
]
"countries" => array:2 [
"fr" => null
"en" => null
]
"abstract" => array:2 [
"fr" => ""
"en" => """
The explosion of online discussions in different types of social media provides us with a large corpus of continuous text\n
The explosion of online discussions in different types of social media provides us with a large corp
exchanges over a variety of different topics. Trying to automatically extract and mine those opinions brings up two distinct\n
exchanges over a variety of different topics. Trying to automatically extract and mine those opinion
but highly related problems: (i) the need of identifying relevant posts (i.e., classify posts as relevant or not to the subject of\n
but highly related problems: (i) the need of identifying relevant posts (i.e., classify posts as rel
interest) and (ii) extract opinions from those posts and subsequently reclassify them in different classes in order to assess\n
interest) and (ii) extract opinions from those posts and subsequently reclassify them in different c
e.g. the importance of the different subjects/opinions. Many works rely on supervised classification methods [1], which\n
e.g. the importance of the different subjects/opinions. Many works rely on supervised classification
means that an already labeled dataset has been provided to the method and used to train a Machine Learning classifier.\n
means that an already labeled dataset has been provided to the method and used to train a Machine Le
These methods suffer from inherent bias, i.e., the quality classification can be biased by the labeling. For various types of\n
These methods suffer from inherent bias, i.e., the quality classification can be biased by the label
studies, this prohibits the use of supervised methods. In this paper, we propose the completely unsupervised extraction of\n
studies, this prohibits the use of supervised methods. In this paper, we propose the completely unsu
opinions of a specific professional group based on the posts on the social media platform Twitter. We want to focus on\n
opinions of a specific professional group based on the posts on the social media platform Twitter. W
opinions related to the professional activity of the group, which was the group of entrepreneurs but the methodology\n
opinions related to the professional activity of the group, which was the group of entrepreneurs but
described can be applied to any professional group. We used the Tweepy API [4] to collect tweets in the English language\n
described can be applied to any professional group. We used the Tweepy API [4] to collect tweets in
and we defined the groups of interest based on the self-descriptions of the users on their profiles (self-labeled as\n
and we defined the groups of interest based on the self-descriptions of the users on their profiles
“entrepreneurs”). We collected about 47M tweets from about 24K users/entrepreneurs and around 53M tweets from 38K\n
“entrepreneurs”). We collected about 47M tweets from about 24K users/entrepreneurs and around 53M tw
users/general public (with the requirement not to have the above keywords on their profile), dating from September 2020\n
users/general public (with the requirement not to have the above keywords on their profile), dating
and back. The public set plays the role of a control group, representing the topics of the general discussions.\n
and back. The public set plays the role of a control group, representing the topics of the general d
The proposed method eliminates the need of a pre-labeled training set for classifying relevant and not tweets and allows us\n
The proposed method eliminates the need of a pre-labeled training set for classifying relevant and n
to work in an unsupervised manner and avoid bias. We rely on the fact that usually specific words or combinations of words\n
to work in an unsupervised manner and avoid bias. We rely on the fact that usually specific words or
can be used to discriminate between two sets of texts when they appear frequently in one set of texts and not frequently in\n
can be used to discriminate between two sets of texts when they appear frequently in one set of text
the other set of texts. So, for each set of tweets for the entrepreneurs (ENT) and public (PUB), we find words and\n
the other set of texts. So, for each set of tweets for the entrepreneurs (ENT) and public (PUB), we
combinations of two-words in tweets and their frequencies. Here, frequency means how many users in the ENT set and\n
combinations of two-words in tweets and their frequencies. Here, frequency means how many users in t
respectively in the PUB set, have used one word or any combination of two words in their tweets. For each user, each word\n
respectively in the PUB set, have used one word or any combination of two words in their tweets. For
or combination is just counted once. Additionally, for each set, we calculate weights:
"""
]
"authors_fields" => array:2 [
"fr" => "Management"
"en" => "Management"
]
"indexedAt" => "2025-04-12T11:21:40.000Z"
"docTitle" => "Completely Unsupervised Opinion Mining in online (professional groups’) discussions"
"docSurtitle" => "Communications dans une conférence"
"authorNames" => "<a href="/cv/cavarretta-fabrice">CAVARRETTA Fabrice</a>, MANSOURI Jafar, SWAILEH Wassim, KOTZINOS Dimitris
<a href="/cv/cavarretta-fabrice">CAVARRETTA Fabrice</a>, MANSOURI Jafar, SWAILEH Wassim, KOTZINOS Di
"
"docDescription" => "<span class="document-property-authors">CAVARRETTA Fabrice, MANSOURI Jafar, SWAILEH Wassim, KOTZINOS Dimitris</span><br><span class="document-property-authors_fields">Management</span> | <span class="document-property-year">2021</span>
<span class="document-property-authors">CAVARRETTA Fabrice, MANSOURI Jafar, SWAILEH Wassim, KOTZINOS
"
"keywordList" => ""
"docPreview" => "<b>Completely Unsupervised Opinion Mining in online (professional groups’) discussions</b><br><span>2021-07 | Communications dans une conférence </span>
<b>Completely Unsupervised Opinion Mining in online (professional groups’) discussions</b><br><span>
"
"docType" => "research"
"publicationLink" => "<a href="#" target="_blank">Completely Unsupervised Opinion Mining in online (professional groups’) discussions</a>
<a href="#" target="_blank">Completely Unsupervised Opinion Mining in online (professional groups’)
"
]
+lang: "fr"
+"_type": "_doc"
+"_score": 8.631021
+"parent": null
}