DATA ANALYSİS

Visualizing Google Analytics Data with R for SEO Purposes

UTKU DEMİRHAN

07 Ağu, 2017

Understanding SEO success and SEO related information through data are getting important day by day. Although it is possible to have some exploratory graphs using Google Analytics (GA from now on), data analyst trying to use other sources to visualize GA data to have better insights on SEO success. Another reason for that is, GA visualizations are just a sneak peek on the behalf of reporting. Although you can have some advanced graphs in GA, you cannot export those reports. In this case, R functions like ggplot and plotly come in handy.

R and GA data for Beginners

Before I start using ggplot and plotly packages, I want to suggest this content for you to start using R and access GA data via API. If you dont have an Analytics property don’t worry, you can use this link to access to the demo account provided by Google. You dont have to have a website to gather data. Even though you do have a website you may not want to use its’ data to start analyzing, because if you dont have enough traffic, you may not have sufficent info from it. This can cause you to cannot test the ideas that you have. With these two links you can start to make your own analysis.

A Simple Visual with GGplot

Of course, the best way to understand the improvements on search rankings is looking into the organic search traffic. I directly used the organicSearches metric from the Dimension & Metric Explorer. This metric gives a little bit different counts than segmenting the Sessions with Organic Searches inside the GA. But for the sake of visualization this metric is just fine for now. In this Tutorial, we will use a Content Based website. First of all, we can create a simple graph with gglpot in R. Here is the code for it and the graph:

g <- ggplot(ga_data, aes(date, organicSearches)) + geom_point()
g

To make the graph nicer;
- We can adjust the y axis by dividing it to “100.000”.
- Put a title on top of and center it.
- Add some blurriness for the crowded places
- Change the color of the dots

Here is our plot again for Organic Searches for our Content Website:

g <- ggplot(ga_data, aes(date, organicSearches / 100000))

+ geom_point(color = "midnightblue", alpha = 0.5)

+ ggtitle("Organic Search Increase from 2014 to 2017")

+labs(x = "Date")+labs(y = "Sessions Multiple with 100.000")

+ theme(plot.title = element_text(size = 15, face = "bold", hjust = 0.5,family="Arial"),axis.title.x = element_text(family = "Arial", color="forestgreen", vjust=-0.35),axis.title.y = element_text(family = "Arial",color="forestgreen" , vjust=0.35))

g

Next thing we are going to do is checking yearly SEO success. Here we want to see if our Content-Based website performed better through years. We are going to use Plotly on this.

SEO Performace Year by Year
First of all, let me show you the first five rows of our data set.

> head(ga_data)
        date 	organicSearches
1 2013-06-18            9523
2 2013-06-19            9368
3 2013-06-20            8615
4 2013-06-21            8803
5 2013-06-22            7876
6 2013-06-23            6662

here you can see we have dates and we have organic search counts. Now we need to group our dates and sum the organic search counts by year. We need lubridate function by Hadley Wickham.

> library(lubridate)

With lubridate function we can have the year part of our dates with year(ga_data$date). Then with this line of code;

> sum(ga_data[year(ga_data$date) == 2014,]$organicSearches)

we can count the organic searches from 2014 and so on.

years <- c("2014", "2015", "2016","2017")
counts <- c(sum(ga_data[year(ga_data$date) == 2014,]$organicSearches), sum(ga_data[year(ga_data$date) == 2015,]$organicSearches), sum(ga_data[year(ga_data$date) == 2016,]$organicSearches), sum(ga_data[year(ga_data$date) == 2017,]$organicSearches))

growth <- data.frame(years,counts) 

> growth
  years    counts
1  2014  22510804
2  2015  83490723
3  2016 106399551
4  2017  75052130

Here we created a data frame from years and counts. It looks like this:

library(plotly)p <- plot_ly(growth, x = ~years, y = ~counts, type = 'bar', name = 'Sessions')
p


Unlike ggplot, plotly doesnt reqire any kind of editing on top of the basic graphs. It automatically set the y axis in millions. And it is interactive.
Here we can see this content web site growed %400 in 2015. Growth countinued in 2016 with 20M. And as 2017 August it looks like the growth will countinue with 125M. 20M more than previous year. (Or more?)

Monthly Organic Traffic

We might wonder about monthly traffic. The question here if we will see seasonality in the data through years? From here maybe we can comment about the missing part of the 2017 which is future. We can also see the development of the web site on monthly basis.
To have the monthly data easily you need to have a little bit coding skills. You can take the monthly totals for every year again using the lubridate function:

> month(ga_data$date)

gives the month of the year.
After visualizing the monthly traffic data:

So I don't see any seasonality from monthly data. This is most probably because of this web site growth very quickly in 2014 and 2015. In 2016 and 2017 there are some increases and decreases but it is hard to tell.

For missing part of the 2017 maybe we can say that there will be some increase in August. It looks like it will go in the 10M line for the upcoming months.

Finally it is obvious that this web site grew every month and the growth is still going on in the 2017.
It is also possible to see a sneak peak of the montly data in GA. You can compare two years in monthly basis with Organic Searches segmented on. But it is not possible to have a histogram comparing years and months in the same graph.

This is why tools like R Studio and packages like ggplot and plotly are very useful on GA data visualization.