You want to start learning about Data Science but the information you find online makes you feel like:
This tutorial will guide you to install and set up R and Rstudio in different platforms. Remember to always install R first and then Rstudio to avoid issues. Follow the steps below and in case of questions don’t hesitate on using the #R channel in Slack.
NOTE: R is the programming languange and Rstudio is an integrated development enviroment to interact with it.
First download R from its official repository and select Linux/Mac/Windows according to your system. For the case of Windows computers click on the base version. Check that the version you are downloading is higher than 3.5. Don’t move to the next step until R is sucessfully installed.
Rstudio can be found in the download section of the official website.
Both R and Rstudio are available through “Softwareshoppen”. To find R click on R under the search bar and scroll down, it should be placed quite on top:
Rstudio is easily accessible by using the search bar.
A lot of software is already available at Computerome and can be loaded as modules. R and Rstudio are already installed in the system but some dependencies are required. Both can be loaded by running the following commands on a terminal:
module load tools usage_script/1.0
module load intel/compiler/64
module load intel/redist/2019_update2
module load R/3.5.0
module load rstudio/1.1.463
These commands can be added to the profile configuration by editing the ~/.bash_profile
file in the home directory and adding the previous lines to the end of the file. If the file does not exist it can be created. For more information on how to do edit this file check this answer. After editing the file, restarting the conection with Computerome should load the new profile. Rstudio should be then available by typing rstudio
in the terminal.
If you are familiar with virtual enviroments you might prefer having your personalised enviroment. This can be done by loading the Anaconda module and installing the enviroment in a specific location (change user
by your username):
module load tools
module load anaconda3/4.4.0
conda create --prefix /home/people/user/prog_env anaconda r=3.5.1 r-base rstudio
After installation process the enviroment can be loaded by using:
module load anaconda3/4.4.0; source activate /home/people/user/prog_env
To make it easier to load the same command can be added as an alias to the ~/.bash_profile
. This would allow to just type lprogram (or any prefered name) to load the enviroment:
alias lprogram='module load anaconda3/4.4.0; source activate /home/people/user/prog_env'
Open Rstudio and admire your new portal to the future:
Check in the left panel that the R version you installed is correctly loaded. To familiarise yourself with Rstudio check the following video:
Don’t get bored and start programming with a nice hacker vibe…
Rstudio is highly customizable. Go to Tools > Global Options...
and click in the Appearance
tab. Here you can select better aesthetics by choosing one of the many themes. It is also recommended to go to the Code
tab, then Display
and mark the [ ] Highlight R functions calls
option.
R is a open-source project with a worldwide community of developers that contribute with very useful libraries. Now it is the time to install and load the one of the main libraries that we will use through the workshops. We will start by installing the tidyverse
library by copying the following line in the Console
panel after the >
:
install.packages("tidyverse")
Hit Enter and the download and installation process should start. When finished, load the library by executing:
library("tidyverse")
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
## -- Attaching packages ------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.1 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.1
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## -- Conflicts ---------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Some other libraries that might be useful to install are:
install.packages("openxlsx", "psych")
The main purpose of the following workshops will be to build the conceptual framework that would help you to learn and analyse data with the help of “cheat sheets” that sumarise most of the important information and functions. For the case of Rstudio you can check the following:
For more detailed explanations of R and Rstudio or simply another take of the steps described below please check the following guides.