Sign in

How to save the CSV file step by step according to partition in dolphin DB?

hritz220 edited in Thu, 26 Jan 2023

In a distributed partition table of dolphin dB, more than 100 gigabytes of data are saved. Now if you want to export these data to a CSV file, I know that there is a way to save text:

db = database("dfs://db1")
t = select * from db.loadTable("tb1")

However, due to limited memory, it is impossible to read and export the full amount. I want to know if there is a way to export one by one according to the partition, control the use of memory within the available range, and the final result can be spliced into a single CSV file?

1 Replies
commented on Thu, 26 Jan 2023

The savetext function of dolphin DB database supports appending data as long as the parameter append = true is set. Therefore, the partition data can be loaded into the memory one by one, and then appended and saved to the CSV file. The following example is a monthly partitioned database, which is executed continuously with pipeline

v = 2015.01M..2016.12M
def queryData(m){
    return select * from loadTable("dfs://db1", "tb1") where TradingTime between datetime(date(m)) : datetime(date(m+1))
def saveData(tb){
    tb.saveText("/hdd/hdd0/data/gtatest.csv",',', true)
pipeline(each(partial{queryData}, v),saveData)