Tuesday, October 27, 2009

CSV files again


The alpha release of Matrex 2.0 solves the problem I reported about importing large CSV files.

Already in the previous versions Matrex used a virtual table to show the imported CSV files; this means the table (grid) loads from the memory only the rows that it needs to display.
The next step was simple: in version 2.0 the file is not even loaded in memory; only the rows that are displayed are actually loaded from the file.
In this way the memory used to import the CSV file decreased dramatically compared to the previous versions of Matrex.
To avoid performance losses with this new version, Matrex keeps a cache of 2000 rows from the file (the 2000 rows around the last row loaded from the file); in this way scrolling the table up and down is still fluid.
The following picture shows the 3 levels of the CSV file import: file, memory cache, table.




If the table is scrolled a lot up and down it can still be that many rows are loaded in memory, and released immediately after; to avoid memory losses because of this, Matrex calls directly the garbage collector every 50000 rows loaded.
In this way it was possible to import data from the 22 MBytes CSV file for which the memory problem was reported, running Matrex without any special memory option.

This fix will be part of version 2.0, but I also back-ported it to version 1.3.8, which will be published in a few days.

No comments: