
Introduction to bulkreadr
Ezekiel Ogundepo and Ernest Fokoué
Source:vignettes/bulkreadr.Rmd
      bulkreadr.RmdAbout the package
bulkreadr is an R package designed to simplify and
streamline the process of reading and processing large volumes of data.
With a collection of functions tailored for bulk data operations, the
package allows users to efficiently read multiple sheets from Microsoft
Excel/Google Sheets workbooks and multiple CSV files from a directory.
It returns the data as organized data frames, making it convenient for
further analysis and manipulation. Whether dealing with extensive data
sets or batch processing tasks, “bulkreadr” empowers users to
effortlessly handle data in bulk, saving time and effort in data
preparation workflows.
Installation
You can install bulkreadr package from CRAN with:
install.packages("bulkreadr")or the development version from GitHub with
if(!require("devtools")){
 install.packages("devtools")
}
devtools::install_github("gbganalyst/bulkreadr")How to load the package
Now that you have installed bulkreadr package, you can
simply load it by using:
Functions in bulkreadr package
This section provides a concise overview of the different functions
available in the bulkreadr package for importing bulk data
in R.
Note
For the majority of functions within this package, we will utilize data stored in the system file by the
bulkreadr, which can be accessed using thesystem.file()function. If you wish to utilize your own data stored in your local directory, please ensure that you have set the appropriate file path prior to using any functions provided by the bulkreadr package.
read_excel_workbook()
read_excel_workbook() reads all the data from the sheets
of an Excel workbook and return an appended dataframe.
# path to the xls/xlsx file.
path <- system.file("extdata", "Diamonds.xlsx", package = "bulkreadr")
# read the sheets
read_excel_workbook(path = path)
#> # A tibble: 260 × 9
#>   carat color clarity depth table price     x     y     z
#>   <dbl> <chr> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2    I     SI1      65.9    60 13764  7.8   7.73  5.12
#> 2  0.7  H     SI1      65.2    58  2048  5.49  5.55  3.6 
#> 3  1.51 E     SI1      58.4    70 11102  7.55  7.39  4.36
#> 4  0.7  D     SI2      65.5    57  1806  5.56  5.43  3.6 
#> 5  0.35 F     VVS1     54.6    59  1011  4.85  4.79  2.63
#> # ℹ 255 more rowsread_excel_files_from_dir()
read_excel_files_from_dir() reads all Excel workbooks in
the "~/data" directory and returns an appended
dataframe.
# path to the directory containing the xls/xlsx files.
directory <- system.file("xlsxfolder",  package = "bulkreadr")
# import the workbooks
read_excel_files_from_dir(dir_path = directory)
#> # A tibble: 260 × 10
#>   cut   carat color clarity depth table price     x     y     z
#>   <chr> <dbl> <chr> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fair   2    I     SI1      65.9    60 13764  7.8   7.73  5.12
#> 2 Fair   0.7  H     SI1      65.2    58  2048  5.49  5.55  3.6 
#> 3 Fair   1.51 E     SI1      58.4    70 11102  7.55  7.39  4.36
#> 4 Fair   0.7  D     SI2      65.5    57  1806  5.56  5.43  3.6 
#> 5 Fair   0.35 F     VVS1     54.6    59  1011  4.85  4.79  2.63
#> # ℹ 255 more rowsread_csv_files_from_dir()
read_csv_files_from_dir() reads all csv files from the
"~/data" directory and returns an appended dataframe. The
resulting dataframe will be in the same order as the CSV files in the
directory.
# path to the directory containing the CSV files.
directory <- system.file("csvfolder",  package = "bulkreadr")
# import the csv files
read_csv_files_from_dir(dir_path = directory)
#> # A tibble: 260 × 10
#>   cut   carat color clarity depth table price     x     y     z
#>   <chr> <dbl> <chr> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Fair   2    I     SI1      65.9    60 13764  7.8   7.73  5.12
#> 2 Fair   0.7  H     SI1      65.2    58  2048  5.49  5.55  3.6 
#> 3 Fair   1.51 E     SI1      58.4    70 11102  7.55  7.39  4.36
#> 4 Fair   0.7  D     SI2      65.5    57  1806  5.56  5.43  3.6 
#> 5 Fair   0.35 F     VVS1     54.6    59  1011  4.85  4.79  2.63
#> # ℹ 255 more rowsread_gsheets()
The read_gsheets() function imports data from multiple
sheets in a Google Sheets spreadsheet and appends the resulting
dataframes from each sheet together to create a single dataframe. This
function is a powerful tool for data analysis, as it allows you to
easily combine data from multiple sheets into a single dataset.
# Google Sheet ID or the link to the sheet
sheet_id <- "1izO0mHu3L9AMySQUXGDn9GPs1n-VwGFSEoAKGhqVQh0"
# read all the sheets
read_gsheets(ss = sheet_id)
#> # A tibble: 260 × 9
#>   carat color clarity depth table price     x     y     z
#>   <dbl> <chr> <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1  2    I     SI1      65.9    60 13764  7.8   7.73  5.12
#> 2  0.7  H     SI1      65.2    58  2048  5.49  5.55  3.6 
#> 3  1.51 E     SI1      58.4    70 11102  7.55  7.39  4.36
#> 4  0.7  D     SI2      65.5    57  1806  5.56  5.43  3.6 
#> 5  0.35 F     VVS1     54.6    59  1011  4.85  4.79  2.63
#> # ℹ 255 more rows