vignettes/csv2sqlite.Rmd
csv2sqlite.Rmd
You can easily use this package {bigreadr} to convert a CSV to an SQLite database without loading the whole CSV in memory.
You can use the following function:
csv2sqlite <- function(csv,
every_nlines,
table_name,
dbname = sub("\\.csv$", ".sqlite", csv),
...) {
# Prepare reading
con <- RSQLite::dbConnect(RSQLite::SQLite(), dbname)
init <- TRUE
fill_sqlite <- function(df) {
if (init) {
RSQLite::dbCreateTable(con, table_name, df)
init <<- FALSE
}
RSQLite::dbAppendTable(con, table_name, df)
NULL
}
# Read and fill by parts
bigreadr::big_fread1(csv, every_nlines,
.transform = fill_sqlite,
.combine = unlist,
... = ...)
# Returns
con
}
Function bigreadr::big_fread1()
first splits the CSV in
smaller CSV files, then it reads these CSV files as data frames and
transform them, and finally combine the results.
Here, the transformation is just appending the data frame to the
SQLite database (and creating this DB the first time). Moreover, you
don’t want to return anything (NULL
).
For example, with this function, I was able to convert a CSV file of 9 GB in 40 minutes using less than 2 GB of memory.
con <- csv2sqlite(csv, every_nlines = 1e6, table_name = "sirene",
encoding = "Latin-1")