v.overlay: add option to remove small areas#7370
Conversation
|
The new test contains two tests:
The second test is relatively slow, together they take 10.7s on my laptop which I find a bit long for the whole GRASS testsuite. Any opinion on the test duration? |
|
Limit to 2-3 polygons maybe? We wish to have unit tests, not really integration tests |
OK, running time is now down to 1.5s, I think this is ok. My statements about the new tests still hold true. |
|
There are now unittest tests and pytest tests. Both do the same tests. Delete the unittest tests? |
Yes if possible. Or, to solve the failures, rename one of the files. But if they are both the same test, only keep pytest |
|
Does anybody have any idea why the pytest fails at the very first GRASS command? It succeeds locally, otherwise I would not have added this pytest. Because the failure is not reproducible, I have no idea how to fix it. |
|
Yes, there’s something weird made as a side effect of calling something of the c-based library or tools, that make it fail on the first call (per worker). Then, next calls are working as normal. Python side doesn’t seem affected. I didn’t manage to find out what magic is done yet in more than 2 years. This behavior is even more apparent when randomizing the test order. And on windows it’s worse, (from my experience), as more tests can fail. |
|
If you find out why, it would unblock many things |
| # create test data | ||
| @pytest.fixture(scope="class", autouse=True) |
There was a problem hiding this comment.
Is autouse a pattern we want to use, or we prefer being explicit of the side effects?
| class TestVOverlay: | ||
| """Test v.overlay output against expected output""" | ||
|
|
||
| # create test data | ||
| @pytest.fixture(scope="class", autouse=True) | ||
| def create_testdata(self): | ||
| # set up | ||
| gs.run_command( | ||
| "v.extract", | ||
| input="boundary_county", | ||
| output="boundary_county_extract1", | ||
| where="NAME in ('CURRITUCK')", | ||
| ) | ||
| gs.run_command( | ||
| "v.extract", | ||
| input="boundary_county", | ||
| output="boundary_county_extract2", | ||
| where="NAME in ('CAMDEN')", | ||
| ) | ||
| # modify extract 1 | ||
| gs.run_command( | ||
| "v.buffer", | ||
| input="boundary_county_extract1", | ||
| output="boundary_county_extract1_buffer_out", | ||
| type="area", | ||
| distance=2, | ||
| ) | ||
| gs.run_command( | ||
| "v.buffer", | ||
| input="boundary_county_extract1_buffer_out", | ||
| output="boundary_county_extract1_buffer_in", | ||
| type="area", | ||
| distance=-2, | ||
| ) | ||
|
|
||
| # run the tests | ||
| yield | ||
|
|
||
| # clean up test data regardless of test success/failure | ||
| gs.run_command( | ||
| "g.remove", type="vector", flags="f", pattern="boundary_county_extract*" | ||
| ) |
There was a problem hiding this comment.
Where is the grass temporary session/project created?
There is no default data available by default (that could be affected destructively by a low-quality test), unlike gunittest, that these tests already assume a project with certain maps are available and loaded (which end up being integration tests because of that).
There was a problem hiding this comment.
Thanks for the explanation! Indeed I assumed that a default GRASS session with the NC data is already active. So how can I make use of the NC data in a pytest test for a simple fast test?
Alternatively, I would extract the test data from the NC data and add them to the GRASS source code, but that seems wrong.
|
In the pytest error trace, for Linux, it makes additional mention of no active grass session, GISRC env var isn't set Here: So, not creating the temp project for the data setup fixture would be a good hypothesis, see the PR review comment. |
From the updated manual:
When overlaying two vectors with areas, very small areas can occur in the
output. This can happen when e.g. one vector is a slightly modified
version of the other vector (buffered or simplified). These very small
areas can be removed by setting minsize to some value larger 0.
The value is interpreted as square meters. In order to remove only noise
from slightly mismatching boundaries, the value of minsize should be
small, e.g. in the range 0.0001 to 1.
This is useful not only to remove noise, but also to reduce the size of the output vector in cases where a lot of very small areas are created by the overlay operation.
The group of PRs #7333, #7338, #7366, #7370 belong together.