Skip to contents

reads parquet,delta files from local or cloud

Usage

readparquetR(
  pathtoread,
  where = "",
  partition = NULL,
  collist = "",
  sample = F,
  samplesizecount = 3,
  add_part_names = F,
  format = "parquet",
  filelocation = "local",
  containerconnection = NULL,
  bucket = NULL
)

Arguments

pathtoread

reading path, local or azure cloud

where

it will read datatable and filter with this condition. i.e. you can write where="column='A'"

partition

if you want to read a partition files with a pattern. i.e. partition=c('2017','2018')

collist

specific columns to read

sample

sample=T just to see sample rows. dont read whole table

samplesizecount

default=3 rows. you can change it

add_part_names

when it is partitioned, you need to make this T to add partition names as column

filelocation

"local" or "azure" or "s3"

containerconnection

if filelocation="azure" then we need connection name Azure Helper Document

bucket

if filelocation="s3" we need to put bucket name

Examples

temp <- tempfile()
arrow::write_parquet(mtcars, paste(temp,".parquet"))
head(readparquetR(pathtoread=paste(temp,".parquet")),10)
#>      mpg cyl  disp  hp drat    wt  qsec vs am gear carb
#>  1: 21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
#>  2: 21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
#>  3: 22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
#>  4: 21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
#>  5: 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
#>  6: 18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
#>  7: 14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
#>  8: 24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
#>  9: 22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
#> 10: 19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
  # read sample rows
  readparquetR(pathtoread=paste(temp,".parquet"), sample=T)
#>    mpg cyl disp  hp drat   wt  qsec vs am gear carb
#> 1:  21   6  160 110  3.9 2.62 16.46  0  1    4    4
  # column select and apply where condition
  readparquetR(pathtoread=paste(temp,".parquet"),
              collist = c("mpg","cyl","vs"),
              format="parquet",
              where="cyl==4 & vs!='0'")
#>      mpg cyl vs
#>  1: 22.8   4  1
#>  2: 24.4   4  1
#>  3: 22.8   4  1
#>  4: 32.4   4  1
#>  5: 30.4   4  1
#>  6: 33.9   4  1
#>  7: 21.5   4  1
#>  8: 27.3   4  1
#>  9: 30.4   4  1
#> 10: 21.4   4  1