I hate this. I hate it when I see a callout for a benchmark and am not able to not give it a try. And I hate it even more when I have tons of work to do. But yeah, I love the thrill :)
So here it goes. This morning Markus called out for the
GIS Clipping benchmark contest revisited. Well, I somehow felt invited as part of the "other FOSS GIS", so I had to run for it. :)
On thing I noticed at once, is that the reported processing times were quite high. I had no proof, but in the last years I worked (also on editing) withmany 1.5 giga shapefiles in uDig. And to be honest I aslo found it was the only software (FOSS and non) that was able to handle the thing properly without making one get mad.
I wanted to have a look how the
geotools/jgrasstools/uDig Spatial Toolbox would perform. I am not a real benchmark guy, but well, I can run some modules to have an idea about more or less what it takes. So I grabbed the testdata and gave them a run.
I have also to say that I ceated a bit because we didn't have a clipping tool inside the spatial toolbox up to now, since in uDig this would be done with the Axios tools. So I quickly wrote one, the
VectorClipper.
The machine I tested this on is a 2 years old windows 7 64bit machine with 8 gigs of RAM and java 1.6.
It has 4 cores.
The first run was really simple, reading in the features, create an index to quickly get the features that might be involved in the clipping and clip one by one.
Result using one core: ~201 seconds
Interesting. GRASS takes about 5 minutes, but that is ok (or better, great), since it also does topology. QGIS seems to take about 5 minute, which looks a bit like bad performance to me, considered that our worst case scenario takes a bit over 3 minutes.
Anyways, let's focus on making things better.
The VectorClipper can run in multithreading mode. So I enabled it for my 4 cores.
Result using 4 cores: ~152 seconds
Hey, hey. Nice!
But We can do better than that.
Since the creation of the spatial index was taking quite some time, I decided to drop that one, and use instead JTS prepared geometries, which would make the
intersects query faster.
So without an index but with fast intersect query I got:
Result using 1 core: ~130 seconds
Result using 4 core: ~110 seconds
So I though: what if I was wrong about the index thing?
So I added the index back...
Result using 4 core: ~102 seconds
Then I noticed that in the multithreading part I had some lists that got copied instead of just passed as indexes. Once that was fixed I got a nice:
Result using 4 core: ~82 seconds
Which is where I was very satified and stopped my tests.
So the above could be called the geotools/JTS/jgrasstools benchmark, since I was running the module from my development environment.
Running this inside uDig has some more overhead, so it made sense to test that, since that is what the user would get.
Here the result I got:
as you can see from the console output,
the processing took about 117 seconds, which is anyways a good performance, considered the available benchmarks.
I can't take much credit on this, all the work is done by JTS and geotools, so the credit goes there.
I loaded the used
jgrasstools libs on the site, so they could be used in uDig. If you need help in getting started with uDig's spatial toolbox, have a look
at this video.
-------------------------------
UPDATE
Just for the interest/fun of doing it, I ran this on amazon AWS on an cc2.8xlarge instance with 32 cores. It
took about 25 seconds, 2 of which for reading the data, 7 of which for writing the result, and most f the rest of the time for creating the index.