R programming for grad school

Akila Wijerathna-Yapa
6 min readAug 18, 2020

R is a free and powerful programming language for statistical computing and data visualization. For nonprogrammers, when you start your postgraduate degree and you have to learn data analysis and statistics, I am sure for the first time you must have felt totally paralyzed and overwhelmed. Yes, that is O.K. Your discipline may be in biology, life sciences, humanities, or commerce. So when you have to learn something that you are not familiar with, experiences with you may feel like that.

Let’s see how we can get into learning R programming. These are my favorite life learn steps.

I. Download R and RStudio

R is a programming language used for statistical computing while RStudio uses the R language to develop statistical programs. RStudio is a free, open-source IDE (integrated development environment) for R. R may be used without RStudio, but RStudio may not be used without R.

To make life easier, I recommend you to install the R program first, and then RStudio.

  1. R can be downloaded and installed on Windows, MAC OSX and Linux platforms from the Comprehensive R Archive Network (CRAN) webpage (http://cran.r-project.org/).
  2. After installing R software, install also the RStudio software available at: http://www.rstudio.com/products/RStudio/.
R program logo and R Studio logo

RStudio is a four-pane work-space for 1) creating a file containing R script, 2) typing R commands, 3) viewing command histories, 4) viewing plots, and more.

  1. Top-left panel: Source/Code editor allowing you to create and open a file containing R script. The R script is where you keep a record of your work. R script can be created as follows: File –> New –> R Script.
  2. Bottom-left panel: R console for typing R commands
  3. Top-right panel:
  • Workspace tab: shows the list of R objects you created during your R session
  • History tab: shows the history of all previous commands

4. Bottom-right panel:

  • Files tab: show files in your working directory
  • Plots tab: show the history of plots you created. From this tab, you can export a plot to a PDF or image files
  • Packages tab: show external R packages available on your system. If checked, the package is loaded in R.
R Studio four panels

-1- Console

  • Keeps a record of commands issued
  • Error messages appear here
  • Can type code here directly for quick feedback

-2- New

  • Create a new R script

-3- Editor

  • Where to write code for re-use
  • Best place to write code

-4- Run Button

  • Allows you to select and run chunks of code from the Editor
  • To use: Highlight Code in the editor, select “Run” (or Ctrl + Enter)

-5- Environment

  • See all variables or data sets you have saved
  • Click on a variable to get more information about it

-6- Plots

  • History of any plots created
  • Type plot(1:10) to see a basic plot

-7- Help

  • More information about any function you would like to use

-8- Packages

  • List of additional sets of functions to bring into workspace
  • Where to select, install, or update packages

II. Learn how to import data to R (now you may feel yeah!)

Set your working directory first

  • R base functions for importing data: read.table(), read.delim(), read.csv(), read.csv2()
  • in R studio, Functions for reading txt|csv files: read_delim(), read_tsv(), read_csv(), read_csv2()

This is a great guide from Dr. Alboukadel Kassambara, he generously shares his knowledge and experiences in his website

http://www.sthda.com/english/wiki/importing-data-into-r

III. Learn data types in R

First, we introduce the common variable types and data types that you’ll be working within R. Commonly, errors involve using the wrong variable or data type.

R has many different data structures. We can put data classes together in many different ways:• vector• matrix• array• factor• data frame • list

Manipulation data objects
Data objects have different dimensions
Data objects can be homogenous or heterogenous

IV. Learn basic R first! Just a few tutorials!

Now there are free great online courses, Youtube videos you can take to get familiar with this. Also, if possible I would recommend you attend a workshop (e.g. DataBazaar, Data carpentry). Here are a few resources.

V. Now jump to famous five

The tidyverse is an opinionated collection of R packages designed for data science. All packages share an underlying design philosophy, grammar, and data structures.

Install the complete tidyverse with:

install.packages("tidyverse")
R Tidyverse is a set of packages

Learn the tidyverse: See how the tidyverse makes data science faster, easier and more fun with “R for Data Science”. Read it online, buy the book or try another resource from the community.

Free ebook to learn Tidyverse https://r4ds.had.co.nz

VI. Exploratory data analysis

The rest of the R programming is Googling and find useful resources. Also, there are cheat sheets in R Studio for quick references.

RStudio also includes a number of links to helpful cheat sheets for a few important topics.

Join #TidyTuesday; A weekly podcast and community activity brought by the R4DS Online Learning Community. Their goal is to help R learners learn in real-world contexts.

Finally, when you have tried all the possible ways to solve your problem and hard to find an answer, the next step is asking help from the online community.

I found the Biostars bioinformatics community is really supportive, helpful, and friendly. They are not arrogant like peeps from other online forums. Just make a reproducible example (using dput() function), explain your experimental background, your intention and ask them.

Also this data to viz website is a great source to explore what graphs you want and how to plot them.

Typical R workflow

Often science is plagued by reproducibility issues. So it is important to address some of those problems during the data management and analysis stage of the research life cycle. By using R and dynamic document generation with RMarkdown and RStudio, we can generate reports in the form of PDF, Word, or HTML documents. This will allow end-user to see not only your final figures but also the pipeline of your data analysis (R Scripts, your notes #). Below is a good resource for learning RMarkdown.

Good Luck with your data analysis with R Programming.

And we must be grateful for the R and RStudio community for making this awesome tool free of charge! This is what really matters to grad students :)

--

--