vignettes/read-FBM-from-file.Rmd
read-FBM-from-file.Rmd
In this vignette, you learn how to read a Filebacked Big Matrix from a text file. Package {bigreadr} is required.
## Warning: package 'bigreadr' was built under R version 4.2.3
## LONG CSV
df <- datasets::mtcars
csv <- fwrite2(df[rep(seq_len(nrow(df)), 500000), ],
tempfile(fileext = ".csv"),
row.names = TRUE)
format(file.size(csv), big.mark = ",")
## [1] "948,944,463"
nlines(csv)
## [1] 1.6e+07
(first_rows <- fread2(csv, nrows = 5))
## V1 mpg cyl disp hp drat wt qsec vs am gear carb
## 1 Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
## 2 Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
## 3 Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
## 4 Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
## 5 Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
sapply(first_rows, typeof)
## V1 mpg cyl disp hp drat
## "character" "double" "integer" "integer" "integer" "double"
## wt qsec vs am gear carb
## "double" "double" "integer" "integer" "integer" "integer"
ncol(first_rows)
## [1] 12
What you can see with these first lines:
Read all numeric columns in an FBM
(test <- big_read(csv, select = 2:12))
## A Filebacked Big Matrix of type 'double' with 16000000 rows and 11 columns.
rbind(csv, test$backingfile)
## [,1]
## csv "C:\\Users\\au639593\\AppData\\Local\\Temp\\RtmpeuJKQN\\file30485001141c.csv"
## "C:\\Users\\au639593\\AppData\\Local\\Temp\\RtmpeuJKQN\\file30485001141c.bk"
attr(test, "fbm_names")
## [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear"
## [11] "carb"
Read other non-numeric data afterwards
## V1
## 1 Mazda RX4
## 2 Mazda RX4 Wag
## 3 Datsun 710
## 4 Hornet 4 Drive
## 5 Hornet Sportabout
## 6 Valiant
## Get the filter data
filter <- fread2(csv, select = "cyl")[[1]] == 4
## Read only rows corresponding to 'filter'
(test2 <- big_read(csv, select = 2:12, filter = filter,
backingfile = tempfile()))
## A Filebacked Big Matrix of type 'double' with 5500000 rows and 11 columns.
test2$is_saved
## [1] TRUE
(rds <- test2$rds)
## [1] "C:\\Users\\au639593\\AppData\\Local\\Temp\\RtmpeuJKQN\\file3048541b5413.rds"
You need to read from the text file only once. To
get the FBM object in another R session, just use
big_attach()
:
(test3 <- big_attach(rds))
## A Filebacked Big Matrix of type 'double' with 5500000 rows and 11 columns.