Import data into R
R can import a number of different file types for data wrangling or analyses. The following is a go-to guide for importing .csv, Excel, SPSS, .txt, and .dat files. Occassionally, additional packages may be require to import a specific type of file.
Csv files
read.csv()
read.csv()
is a base R function that can import .csv files so there is no need to load a package. The code snippet below will import “file_name.csv” and assign it to an object called data.
# Read csv files
data <- read_csv("file_name.csv")
read_csv()
read_csv()
is from the readr package. The distinction between read.csv()
and read_csv()
is that the latter will read a .csv file as a tibble. Tibbles are data frames that have been modified to work well with other tidyverse functions. In fact, readr is part of the tidyverse package and the following code will work when either the tidyverse or readr package is loaded. In my experience, the only drawback to using read_csv()
is that in can take much longer to read in large .csv files. So the size of the file you are working with and whether or not you need it as a tibble may help you decide which function to use. One thing about the read_csv()
is that it may produce messages if the column types aren’t specified before hand. This can get annoying if you have lots of columns in your file. To avoid the message add the col_types = cols()
argument.
# Read csv files
install.packages("readr")
library(readr)
# Read .csv file
data <- read_csv("file_name.csv")
# Read .csv files without Parsed column specification output
data <- read_csv("file_name.csv", col_types = cols())
fread()
The fread()
function from the data.table package is a great way to import large .csv files into R. In my experience this function reads in data much faster than read_csv() or read.csv(). The only drawback to its use is that the data are imported as a data.table object.
data <- fread("file_name.csv",)
Excel files
read_excel()
To read Excel files, use the readxl package and the read_excel()
function. This approach will also let you specify what sheet to load, otherwise it defaults to the first sheet.
# Read excel files
install.packages("readxl")
library("readxl")
# Read .xlsx file
read_excel("file_name.xlsx")
read.xlsx()
Another route is to use openxlsx package. I haven’t used this package much for importing data, but have used it to write multiple data frames to sheets in an excel file. If you run into any java related errors when importing Excel files, this would be the one I’d try.
install.packages("openxlsx")
library(openxlsx) #for outputting tables, avoid java errors
# Read .xlsx file
read.xlsx("file_name.xlsx", sheet = 1)
SAS & SPSS
read_sas()
To load SAS, SPSS, or Stata files use the haven package. The haven package is also part of the tidyverse so the following function works when loading either the tidyverse or haven package.
# Read commercial statistical analysis software files
install.packages("haven")
library(haven)
# Read
data <- read_sas("file_name.sas7bdat")
read_sav()
# Read .sav file
data <- read_sav("file_name.sav")
Txt files
read.delim()
To load .txt files, the base R function read.delim()
does the job. Depending on the file you are working with, you may need to specify whether there is a header and specific separator. The header option will need to be set to TRUE in cases where the first row contains column names. As for the separator, its often a tab ("\t").
# Read .txt files
data <- read.delim("file_name.txt", header = TRUE, sep = "\t")
Dat files
read.table()
Every now and then I run into .dat files. To load these in R, I tend to use the read.table()
function from the base R utils package.
# Read .txt files
data <- read.table("file_name.dat", header = TRUE)