Tuesday, October 27, 2009

CSV files again


The alpha release of Matrex 2.0 solves the problem I reported about importing large CSV files.

Already in the previous versions Matrex used a virtual table to show the imported CSV files; this means the table (grid) loads from the memory only the rows that it needs to display.
The next step was simple: in version 2.0 the file is not even loaded in memory; only the rows that are displayed are actually loaded from the file.
In this way the memory used to import the CSV file decreased dramatically compared to the previous versions of Matrex.
To avoid performance losses with this new version, Matrex keeps a cache of 2000 rows from the file (the 2000 rows around the last row loaded from the file); in this way scrolling the table up and down is still fluid.
The following picture shows the 3 levels of the CSV file import: file, memory cache, table.




If the table is scrolled a lot up and down it can still be that many rows are loaded in memory, and released immediately after; to avoid memory losses because of this, Matrex calls directly the garbage collector every 50000 rows loaded.
In this way it was possible to import data from the 22 MBytes CSV file for which the memory problem was reported, running Matrex without any special memory option.

This fix will be part of version 2.0, but I also back-ported it to version 1.3.8, which will be published in a few days.

Wednesday, October 14, 2009

Matrex 2.0 alpha

I published an alpha (unstable) version of the new Matrex 2.0, which adds the possibility to use Matrex as a client/server system.
You can download it from here.
To test it as a client/server system:
  • Install this version of Matrex. It is a generic version, so remember that when you start it the first time it will download the graphical library SWT. Only the second time you start it it will really start.
  • Install the Matrex Server. The setup file is in the Matrex directory, called matrex_server_2_0.jar.
  • Execute rmiregistry. It is the RMI registry server (Matrex Desktop and Server use RMI to communicate). It is part of the Java Runtime Environment (JRE).
  • In the Matrex Server directory, execute matrex_server.bat (Windows) or matrex_server.sh (Linux, MacOSX...) to start the server. Check that there are no errors.
  • In the Matrex directory start Matrex.
  • Follow this to let Matrex open a server project.
You can login as guest (password guest).
If you want to login as a different user, you need to change the config/accounts.xml file in the Matrex Server directory, adding an account element with the userid and password, always setting the encrypted attribute to false.

To become final release, Matrex 2.0 needs the following changes:
  • Fix potential issues when a single project in one server is opened concurrently by several users.
  • Some operations, like adding functions or functions expression (expression parser) cause the addition or update of several items in the project. Therefore these operations must be done atomically, possibly using some kind of transaction.
  • Check that the resources allocated to a client in the server are cleaned up correctly when the client disconnects.
If you find problems with this alpha version, please add a comment to this article.

Monday, October 05, 2009

Large CSV files

A bug submitted the last month was showing that Matrex needs a lot of memory when importing large CSV files.
The CSV file mentioned in the bug has a size of 22 MBytes.
The file contains around 200000 lines. Each line has 14 fields and is around 100 characters long.

To import this file the memory used by the Matrex process increases of 300 MBytes.
The CSV file is large but it does not justify so much RAM to handle it.

I checked the code that imports CSV files in Matrex; nothing is wrong.
Matrex uses the Java CSV library to read CSV files, which works fine.

The file is loaeded in memory row after row.
I checked how much memory is used for each loaded row. This is not easy in Java, since nothing similar to a sizeof function exists in the standard libraries. But I have found the Javabi library, which is able to measure the amount of memory used in total by a java object.
Each row with its fields is handled as an array of strings, which uses around 800 bytes, 8 times the original row's size.
This is because:
  • Java strings use Unicode, which means that they use 2 bytes for each character
  • Strings use additional memory for their fields and their fields alignments
800 bytes * 200000 rows = ~160 MBytes.

As far as I understood the rest of the memory used to import the file is allocated to the intermediate strings that the CSV reader uses to read the file, and that remain allocated until the garbage collector frees them.

There are some solutions that could be applied to reduce the memory use:
  1. Avoid loading all rows of the CSV file in memory: in other words, make the import editor extract the displayed lines directly from the file, and let it extract the lines only when they are actually displayed.
    I'm not sure about the effectiveness of this solution, because in general the user wants to import to a matrix an entire column of the file. Therefore the file, sooner or later, has to be read entirely.
    Another problem with this solution is that the Java CSV library, as far as I understood, does not allow to count the number of rows without reading them and does not allow to jump between rows without reading all the intermediate rows.

  2. Read less fields: immediately at the start of the import process give the possibility to the user to discard some fields, so that they are not loaded in the import dialog.
    This can work, but I am not sure that it can dramatically reduce the amount of memory used to load the file.

  3. Optimize the reading process so that it uses less memory: this means to look for an alternative to the Java CSV library that uses less memory to read the files (for example using CharSequence objects that use less memory). There are alternative libraries, for example opencsv and the Ostermiller utilities. They need to be tested and see if they are better than the Java CSV library to reduce the used memory.
I will try to apply these solutions and explain, in one of the next articles, what has been done.

Thursday, September 17, 2009

Matrex 2.0 is not just a specification anymore

After more than two months of work I was able in these days to let Matrex open a project in a Matrex server.
This is how it works:

In Matrex click on the menu File->Connect. The following dialog appears:


In this case the server is on my PC, the same one in which I run Matrex, so I write localhost as server address.
Clicking on Names, the Matrex Server combo box is populated with the list of available servers in the PC with the given address.
In my case there is only one server, default.
I press OK. The login dialog appears:


Guest is the default user, with password guest. It is the user that is available by default in the server, if it has not been configured.
I write user id and password and press OK.
The remote machine tab for the localhost server appears, beside the local machine:


In the machine menu I click on Open Project. The Open Remote Project dialog appears:


Differently from the local projects there is only the possibility to choose in a list of projects. In facts on the server side the projects are all under the same directory, projects.
I select the example project popcolorado and the project opens:



By now I checked that it is possible to open matrix and functions viewers and editors.
In the next days I will check all the project's functionalities.
As usual, the sources for the last version of Matrex are under the Matrex subversion repository.

As soon as I have a version that is tested enough, I'll publish it as a pre-alpha.

Monday, August 24, 2009

Matrex 1.3.7

I released version 1.3.7, which fixes the bug reported as a comment to the Matrex 1.3.6 blog entry, and that I entered in the Matrex bug tracker:

Matrex seems to forget the Project settings (threading, etc.) after a restart;

This was caused by Matrex not being able to overwrite the project file.
The files are the following (in alphabetic order):

matrex_1_3_7_generic.jar for any platform
matrex_1_3_7_linux_gtk.jar for Linux
matrex_1_3_7_macosx_32.jar for MacOSX with 32 bits Java (Java 5)
matrex_1_3_7_macosx_64.jar for MacOSX with 64 bits Java (Java 6)
matrex_1_3_7_win32.jar for Windows

The bug has also been fixed in the code of the next release, 2.0.

Saturday, August 08, 2009

Client/Server: technical view

As told in the previous article, I'm changing Matrex from a pure standalone desktop application to an application that allows to work standalone or in a client/server architecture.
To support the client/server architecture I use the RMI protocol.

This means that the calculation engine, the one that calculates the functions and therefore generates the content of matrices, presentations, charts, will be both in the desktop application and in the server.
For this reason, the GUI has to use in the same way the objects involved in the calculation (projects, matrices, functions...) , whether they are on the client side or on the server side.
To do this, the original calculation objects (projects, matrices functions,...) are wrapped in two different new categories of objects: Local (client) and Server:



Both the wrappers, share the same remote interface (which extends the RMI's Remote interface).

The reasons I use wrappers instead of the original objectts is because all the methods of a RMI business object must throw the RemoteException exception.
RemoteException is needed to understand when the server is down or there are problems of connection, so I would never do without it.
On the other side, it becomes annoying to catch it every time some code calls a method of a business object, so I want to do it only when it is strictly needed.
So I use the wrappers only in the GUI, where it is needed. Instead the calculation engine uses the original objects.

Now, why Local and Server wrappers? Why not use only Server wrappers, both on the server and on the client side?
There are several reasons:
  • Server machines and projects have slight different interfaces when they are on the client and on the server side, mainly because projects on the server side can only be saved in a specific directory, projects on the client side can be saved in any directory of the disk.
  • The server wrappers extends the UnicastRemoteObject, local wrappers don't. I don't really understand completely how the Java compiler and RMI compiler handle these objects, so I cannot be sure that they don't have some effects on the application's performance. If these performance effects are needed with the server business objects, I don't want them on the local objects.
And why I did not use the original classes instead of the Local wrappers? Because I needed a special wrapper for the Matrix class when it is used in the GUI, and only when used in the GUI, called SafeMatrix, which makes the Matrix methods thread safe.
But this means that all the other calculation classes need to have parameters of type SafeMatrix when called by the GUI, and instead use parameters of type Matrix when called by the calculation engine. And this means that I need special wrappers that use SafeMatrex parameters, the Local wrappers.


So, now I'm working on it. I will take some time, because in the GUI all the references to the original objects must be changed to the new remote interfaces.
Which means:
  • remote exceptions to handle.
  • utility functions to convert the original classes to the wrappers.
  • some code duplication.
  • many wrappers to write, expecially for the charts, for which there is one class for each chart type.
Also, I expect to reduce the number of methods in the calculation classes to reduce the number of remote calls.

When I have something that more or less works I'll publish it as an alpha version.

Wednesday, July 22, 2009

Working on client/server

I started to work on the client/server version of Matrex, the 2.0.

This version, as explained in the specification, will give the possibility to use Matrex in two ways:
  • standalone, as today
  • connected to one or more Matrex servers.
When Matrex opens a project in a server, all calculations for that project are done in the server: Matrex acts only as a graphical interface.
One would open a project in a server:
  • to use the CPU of the PC running the server instead of the one of his own PC.
  • to share the project with other people. In fact two or more Matrex clients can work on the same project in the same server in the same time, without problems.
Matrex has been written from the start to become one day a client/server system, so the GUI will not change so much: not much more than a new menu to connect to a server (I will publish some pictures as soon as I have a stable version).

The protocol used is RMI, but I will keep the possibility to use different protocols in the future. It could be nice to have a version (based on REST?) that can work on the internet through the firewalls.

Tuesday, June 30, 2009

Matrex 1.3.6

A few days ago the Eclipse Foundation published version 3.5 of SWT.
I was waiting for this because it means that Mac users can finally use Matrex with bot Java 5 and Java 6, and with better graphics, because the new version is based on Cocoa.
So, as soon as this new version has become available, I have published the new version of Matrex, 1.3.6.
Together with the new SWT, 1.3.6 comes the following changes:
  • There was still some incompatibilities with Java 5, so it was not possible to run Matrex 1.3.5 with Java 5. This problem was hitting mainly the Mac users, that could not use Java 6 because SWT was not working with it. Now I made sure this problem will not show up again.
  • From Matrex 1.3.5 maximizing the main window means maximizing only vertically, leaving the horizontal size of the window the same. In Windows it could happen that maximizing the main window it disappeared from the screen. In Linux from time to time the main window was becoming insensible to the mouse after the maximization. These problems got fixed.
  • Matrex is able now to read script templates from the script classpath defined in the Files Locations dialog. In this way also the script languages plugins (groovy, jruby...) can be installed outside the Matrex directory.
  • Minor bug fixing.
Now the versions are the follwing:

The choice between 32 and 64 bits depends by the Java runtime interpreter used.

Monday, May 25, 2009

About MacOSX

If you are not able to install Matrex 1.3.5 on MacOSX, please check the article about this topic.
You can download a more recent beta version of the SWT library for Cocoa here.
As soon as Eclipse 3.5 Galileo will be released, I will update the setup files to use the new SWT library, which will probably solve the problem.

Saturday, May 23, 2009

Matrex 1.3.5 has been released

Matrex 1.3.5 has finally been released.

It should have been called 1.4 for the new features it contains, but when the coding started it was called 1.3.5 and so it remained.
As the notes that come in the setup file report, these are the new features:

GUI:
  • Project diagram.
  • Plugins dialog has been evolved to a more general locations dialog:


    In this way it is possible to add plugins to Matrex without installing anything in its directory.

  • If a toolbar becomes too small to contain all buttons, it displays a menu containing the missing buttons:



  • Progress bar that shows loading of the single items of a project:


  • The matrix editor has a toolbar, to make the GUI uniform in the whole application:


Internal:
  • Configuration files containing the location of the other Matrex files (see locations dialogs). With this Matrex becomes easy to package for specific operating systems.
  • Able to read/write tab separated files (together with the CSV files).

Test:
  • More unit testing (> 280 tests). The idea is to have a set of tests to fire to check the project before each release.
  • GUI unit testing of an editor (Function Editor) with SWTBot.
  • Used Findbugs annotations.

Several bugs have been fixed.

You can download it here.

Thursday, May 07, 2009

Matrex 1.3.5 almost ready.

Last tests for the version 1.3.5 of Matrex, which will be probably released in the last half of May.
The changes are many, but surely the most important are:
together with many bug fixes.

Monday, March 30, 2009

Project diagram

Matrex version 1.3.5 will have the possibility to show the project in form of a diagram.
Here is the diagram for the example project projection, that is included in the Matrex setup:


It shows all items of the projects (but timers) and their connections.
Each item is labeled with its type (matrix, function...), each connection is displayed as an arrow.

Compared to the item trees and to the information views, the project diagram has the advantage that It gives a global overview of the project. In one single window it is possible to see all items of the project.

The meaning of the arrows is the following:
  • An arrow from a matrix to a function means that the matrix is input of the function
  • An arrow from a function to a matrix means that the matrix is output (result) of the function
  • An arrow from a matrix to a presentation or chart means that the matrix is used in the presentation or chart.
Since a project can contain many items and therefore many connections among them items, the project diagram can become complex.
To make it easier to understand it, it is possible to:
  • Click on an item: the item becomes green and all the connected items and their arrows become blue (in) and red (out).
  • Click on one of the items type on the tool bar: all the items of that type become green.
In the tool bar there is also a button to print the diagram.

Monday, March 09, 2009

Matrex easier to package

Until now once you installed Matrex in a directory, Matrex used the same directory to:
  • write its log file
  • write the changes to its configuration files
  • write additional templates
  • write scripts used as base of templates
This is natural for a java application that is installed with a java installer and does not pretend to integrate with the platform in which it is installed.

But what if we want to create installers for specific platforms (Windows setup, MacOSX application bundles, Linux .deb and .rpm files)?
In this case:
  • it is the operating system that defines in which directory or directories Matrex is installed.
  • Matrex cannot write the files it produces (templates, configurations, log...) in the directories in which it is installed, but must use a writable directory, which can be under Documents and Settings in Windows or under the home directory in Linux.
Version 1.3.5 will make this possible with two changes:
  • the new configuration file main.properties, which contains the paths of all other configuration files (which can contain environment variables).
  • the new concept of configuration files that contain the locations of other files.
The configuration files that contain the locations of other files are:
  • templates.cld, which contains the paths of the top directories of directory trees containing functions templates.
  • plugins.cld, which contains the directories and jar files containing additional java classes used as plugins and their dependencies.
  • scrips.cld, which contains the top directories of directory trees containing scripts used by templates.
With the addition of main.properties and of the .cld files it is possible to have the configuration files that change after the installation and additional templates, java classes and scripts in writable directories.
In this way anyone can change these files, add them or delete them without touching the original installation.

It will be then easy to install Matrex:
  • In Windows in a directory under the Program Files directory, with the additional files and the writable configuration files in a directory under Documents and Settings.
  • In Linux in the standard directories (/usr/bin, usr/share...), with the additional files and the writable configuration files in a directory (.matrex ?) under the home directory.
  • In MacOSX in a directory under the Applications directory, with the additional files and the writable configuration files in a directory under Users
I tested this new feature installing version 1.3.5 under a read-only directory and configuring it so that it keeps all the files that need to be changed under a writable directory.
It works fine.
As soon as possible I will release a beta of this version.

Thursday, February 26, 2009

ToolBarWithMenu component

The SWT ToolBar component has a minor problem: when there is no space enough to show all the buttons contained in the ToolBar, it just shows the first ones.
It is true that the SWT.WRAP property partially fixes the problem wrapping the buttons in two or more lines, but this does not work in some platforms (e.g. Linux) in which the property is ignored.
This can be a problem, because the user of the application can be completely unaware of the buttons that are not displayed.
I have seen other graphical libraries solve this problem adding an additional button at the end of the toolbar. This button when clicked shows a menu with the missing buttons.
So I adopted the same logic in SWT: I made a class called ToolBarWithMenu
, which adds a button at the start of the toolbar. The button has a menu showing the buttons that are not displayed in the toolbar because there is no space. Here is an example in Matrex:


Clicking on one of the menu items has the same effect as clicking on the related toolbar button.
It works as expected also for a button with an attached menu, if the button's selection listener that shows the menu implements the IHasMenu interface.

Matrex on MacOSX

Matrex uses the SWT library for its GUI. SWT uses the native GUI of the platform. The current version of Matrex, 1.3, uses SWT 3.4.
On MacOSX SWT 3.4 is based on Carbon, which is only 32 bits.
So Matrex 1.3 on MacOSX does not work using Java 1.6 (Java 6), which is only 64 bits and therefore not compatible with Carbon.
If you want to use Matrex 1.3 in MacOSX please either:

  • run it with Java 1.5 (Java 5) instead of Java 1.6
  • try the beta version of SWT 3.5 for Cocoa 64 bits. Download the zip file and unzip it in the /swt directory, where is the directory where you installed Matrex.

The following is the description of the SWT 3.4 problem written by the SWT developers:

SWT cannot be used with OS X JRE version 1.6 (Mac OSX only)

OS X JRE version 1.6 assumes that pointers have a size of 64 bits, but SWT's Carbon port only uses 32-bit pointers, so SWT and Eclipse cannot be used with OS X JRE version 1.6. The workaround is to use an earlier supported version of the OS X JRE.

Sunday, February 22, 2009

Updated Matrex adapter to Scilab

The Matrex adapter to Scilab has been updated for Matrex 1.3.5.
The adapter has been tested with Scilab 5.1.

The adapter contains a special function template that calls Scilab functions to calculate Matrex functions.
Input matrices and parameters of the Matrex function are passed to the related Scilab functions, which is calculated. The result of the Scilab function is passed back to Matrex as output matrices of the Matrex function.

More information can be found in the documentation included in the Scilab adapter setup.

New Matrex adapter to Groovy

I released in these days the new Matrex adapter to the Groovy language.
This follows the two adapters to Jython (integrated in Matrex) and to JRuby.
This adapter is good:
  • for fast prototyping and test of function templates
  • to write more compact code than in Java
  • to use features, for example closures, that are not available in Java
Like the other language adapters it used in this way:
  1. create a Groovy script with the code for the function template
  2. create the function template definition
  3. use the function template to create functions in Matrex projects
More information can be found in the documentation included in the Groovy adapter setup.

Link

Sunday, January 25, 2009

Subversion

For who is interested in the Matrex code, I'm moving Matrex and all its satellite projects to Subversion (SVN).
With CVS I was always adding to the repository a lot of files by mistake and was never able to really remove them.

For more information just go to the Matrex SVN page in Sourceforge.net.

As soon as possible I will move to Subversion also Internal SQL Library and SWT Web Installer.