Connecting to PostgreSQL in R (Google Cloud)
In some cases, you want to make sure you are updating your data tables in R from one online database source. Or you want to overwrite or add new data in your table from R to your database. In those cases, you want to make sure you connected to your database first so that you can communicate.
Here, we are trying to connect to our database by PostgreSQL from R.
First thing first! You need to include libraries that you might want to use for connection:
library(odbc)
library(RPostgreSQL)
library(getPass)
library(DBI)
Then you can start connection:
pgdrv <- dbDriver(drvName = "PostgreSQL")
db <-DBI::dbConnect(pgdrv,dbname="database.name",host="host.name",
port=5432, user = 'postgres',password = getPass("Enter the password:"))
If you want to skip the part of asking for password and just enter directly to the database, you can use this one:
pgdrv <- dbDriver(drvName = "PostgreSQL")
db <-DBI::dbConnect(pgdrv,dbname="database.name",host="host.name",
port=5432,user = 'postgres',password = 'your.password')
Now your connection is set! Start communicating.
You can get the tables from your data base / just observe the tables:
DBI::dbGetQuery(db,"SELECT * FROM text_table")
You can overwrite the current table in database:
DBI::dbWriteTable(db, "text_table", new_table,
row.names = pkgconfig::get_config("RSQLite::row.names.table", FALSE),
overwrite = TRUE, append=FALSE)
You can add/append data table in database:
DBI::dbWriteTable(db, "text_table", new_df,
row.names = pkgconfig::get_config("RSQLite::row.names.table", FALSE),
overwrite = FALSE, append=TRUE)
And finally, most importantly, disconnecting from the database when you finish working with:
DBI::dbDisconnect(db)
As you can see it is more about the flags that you use in function for different purposes. So, it is better to read about the library and the variables in function so that you get to know how to use them for your purposes.
I have 2 sources for this case RSQLite and DBI.
Creating PDF file out of multiple plots in R
Sometimes you are generating multiple graphs for analyzing and you want to keep them in one single file.
For this you need to use the following function for collect all the plots and merge them in one single PDF file.
Multiple plot function # ggplot objects can be passed in …, or to plotlist (as a list of ggplot objects) - cols: Number of columns in layout - layout: A matrix specifying the layout. If present, 'cols' is ignored. # If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE), then plot 1 will go in the upper left, 2 will go in the upper right, and 3 will go all the way across the bottom. #
multiplot <- function(…, plotlist=NULL, file, cols=1, layout=NULL) { library(grid) # Make a list from the … arguments and plotlist plots <- c(list(…), plotlist) numPlots = length(plots) # If layout is NULL, then use 'cols' to determine layout if (is.null(layout)) { # Make the panel # ncol: Number of columns of plots # nrow: Number of rows needed, calculated from # of cols layout <- matrix(seq(1, cols * ceiling(numPlots/cols)), ncol = cols, nrow = ceiling(numPlots/cols)) } if (numPlots==1) { print(plots[[1]]) } else { # Set up the page grid.newpage() pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout)))) # Make each plot, in the correct location for (i in 1:numPlots) { # Get the i,j matrix positions of the regions that contain this subplot matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE)) print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row, layout.pos.col = matchidx$col)) } } }
Now, you are probably using a loop for generating each plot. By using the function above you can make it very easy to gather the plots and put them in one single PDF file. Look at the following example:
pdf("total_plots.pdf")
for(data in mydata){
models <- lm(data$column1 ~ data$column2 , data = data)
plot <- scatter.smooth(data$column2 , data$column1, col=data$column1)
multiplot(plot)
}
dev.off()
There you can get your final result in the “total_plots.pdf” file.
Nest in R
Ever wonder how you can gather and attach multiple columns to one metric. Well, nest makes that happen for you. With Nest, you can create a data frame containing all the nested variables.
Below, you can see an example of using nest.
#creating a nest
nested_data <- mydata %>%
arrange(column1, column2, desc(date_column)) %>%
group_by(column1, column2) %>%
nest()
A quick intro to different methods of ML for prediction in R
- Supervised Learning: Here we have a target that we want to predict it from a given set of predictors.
- Unsupervised Learning: Here we are using this for the clustering population in different groups.
- Reinforcement Learning: Here we want to train the machine to make specific decisions.
There are different ML algorithms as well! Such as Linear Regression, Logistic Regression, Decision Tree, SVM, Naive Bayes, kNN, K-Means, Random Forest and etc..
Linear Regression:
Linear regression is used to predict the value of an outcome variable Y based on one or more input predictor variables X. The aim of linear regression is to model a continuous variable Y as a mathematical function of one or more X variable(s), so that we can use this regression model to predict the Y when only the X is known.
The mathematical equation is as follows:
Y = β1 + β2X + ϵ
-where, β1 is the intercept and β2 is the slope. Collectively, they are called regression coefficients. ϵ is the error term, the part of Y the regression model is unable to explain.

Graphical Analysis
Before doing anything, it is better to analyze and understand the variables. So, we will use some plots to figure them out. Below, you can find some of the good plots that would help you in that case:
- Scatter plot: Visualize the linear relationship between the predictor and response
- Box plot: To spot any outlier observations in the variable. Having outliers in your predictor can drastically affect the predictions as they can easily affect the direction/slope of the line of best fit.
- Density plot: To see the distribution of the predictor variable. Ideally, a close to a normal distribution (a bell-shaped curve), without being skewed to the left or right is preferred. Let us see how to make each one of them.
There are a lot of other ways to visualize your data and make a plot and understand the relationship between them. One way that gives you potentially a nice plot, is the “ggplot”. I will show you some examples of using it later.