reading parquet or delta files from local directory or aws s3 or azure blob
Source:R/readparquetR.R
readparquetR.Rd
reads parquet,delta files from local or cloud
Usage
readparquetR(
pathtoread,
where = "",
partition = NULL,
collist = "",
sample = F,
samplesizecount = 3,
add_part_names = F,
format = "parquet",
filelocation = "local",
containerconnection = NULL,
bucket = NULL
)
Arguments
- pathtoread
reading path, local or azure cloud
- where
it will read datatable and filter with this condition. i.e. you can write where="column='A'"
- partition
if you want to read a partition files with a pattern. i.e. partition=c('2017','2018')
- collist
specific columns to read
- sample
sample=T just to see sample rows. dont read whole table
- samplesizecount
default=3 rows. you can change it
- add_part_names
when it is partitioned, you need to make this T to add partition names as column
- filelocation
"local" or "azure" or "s3"
- containerconnection
if filelocation="azure" then we need connection name Azure Helper Document
- bucket
if filelocation="s3" we need to put bucket name
Examples
temp <- tempfile()
arrow::write_parquet(mtcars, paste(temp,".parquet"))
head(readparquetR(pathtoread=paste(temp,".parquet")),10)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1: 21.0 6 160.0 110 3.90 2.620 16.46 0 1 4 4
#> 2: 21.0 6 160.0 110 3.90 2.875 17.02 0 1 4 4
#> 3: 22.8 4 108.0 93 3.85 2.320 18.61 1 1 4 1
#> 4: 21.4 6 258.0 110 3.08 3.215 19.44 1 0 3 1
#> 5: 18.7 8 360.0 175 3.15 3.440 17.02 0 0 3 2
#> 6: 18.1 6 225.0 105 2.76 3.460 20.22 1 0 3 1
#> 7: 14.3 8 360.0 245 3.21 3.570 15.84 0 0 3 4
#> 8: 24.4 4 146.7 62 3.69 3.190 20.00 1 0 4 2
#> 9: 22.8 4 140.8 95 3.92 3.150 22.90 1 0 4 2
#> 10: 19.2 6 167.6 123 3.92 3.440 18.30 1 0 4 4
# read sample rows
readparquetR(pathtoread=paste(temp,".parquet"), sample=T)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 1: 21 6 160 110 3.9 2.62 16.46 0 1 4 4
# column select and apply where condition
readparquetR(pathtoread=paste(temp,".parquet"),
collist = c("mpg","cyl","vs"),
format="parquet",
where="cyl==4 & vs!='0'")
#> mpg cyl vs
#> 1: 22.8 4 1
#> 2: 24.4 4 1
#> 3: 22.8 4 1
#> 4: 32.4 4 1
#> 5: 30.4 4 1
#> 6: 33.9 4 1
#> 7: 21.5 4 1
#> 8: 27.3 4 1
#> 9: 30.4 4 1
#> 10: 21.4 4 1