Friday, July 30, 2010

Rowscope: view large files

I had a problem in the office: I had a server application that was generating a large log file every day (between 100 Megabytes and 1 Gigabytes) and, if something went wrong with the application, viewing this log file was very difficult.
I tried several file viewers, but none of them worked well for me. Some of them did not show the entire row, some of them did not show all rows, some of them was slow when searching.
So, I went back to use notepad, which worked generally very well, but used too many resources and when the file was over the 300/400 Megabytes it was using the whole CPU and memory of the server.

So I have decided to write a file viewer and to write it in Scala, also because I wanted to try this language. The new file viewer, called Rowscope, had to be lightweight and fast.
For this purpose, I used a trick. Instead of viewing the file and then search strings in it, the user must first enter the search string, then when he opens the file what he sees are the rows resulting from the search.
Then he can look what's around one or more of the found rows.

In this way, if the search string is good enough, the viewer shows only a few rows and therefore uses a limited amount of memory. Clearly it reads all rows, but if they don't match the search string they are discarded immediately and then garbage collected.

To avoid that the GUI blocks Rowscope uses multiple threads in a pipe (Scala actors) to do the real work, and the rows are passed to the GUI thread only when they need to be displayed.
With the correct settings for each actor (how many rows to read from the file before they get sent to the other actors, how many rows to send from one actor to the other in one shot), the application uses the correct amount of CPU and memory and remains reactive to the user input.

I was a little scared when I started to use it at work. Yes, theoretically it should work, but will I use it just because I wrote it or because it really works?
I can say that now I use it without even thinking I am doing it, like all the other tools I use in my daily job. The first search then view approach works.
If I don't know what to search I start it with an empty search string, just to have an idea of what the file contains. Rowscope shows only the first 1000 lines of the file. From this I find a good search string and I start to work.

I'm preparing a new version of Rowscope. If you miss some feature or have some good idea, please let me know.