Text Analytics
Introduction
The main idea is to count words, find correlated words or
- Change all words to either lower-case or upper-case
- Remove punctuation
Beware that sometimes these features are important, so the decision should be made tailored to the specific problem
Words like argue
and argued
in different grammatical forms can be represented by a common stem argu
.
There are many ways to do this, such as using a database of words or rule-based algorithm
Our course focuses on using Syuzhet
for sentiment analysis. Sentiment means the emotion of a person or an organization. It is often used in social media, marketing and advertising to understand how people feel about something.
Textual analysis in R-lang
First install required (recommended for course) packages:
Clean up the text:
Word frequency analysis
Calculate word frequency:
Visualization:
Sentiment analysis:
The following are the 4 main methods of the Syuzhet
package: