add regression tests by BurnzZ · Pull Request #52 · scrapinghub/shublang

BurnzZ · 2020-08-17T11:59:28Z

This test need not be merged into master but instead provides the necessary stimuli to see some pain points in shublang's usage.

In particular we can see that:

The current sanitize functionality returns empty strings in its iterable. This presents the need to update it to prune out the empty strings, otherwise it would evaluate our test example as ['', '', '', 'price: $123,823.00', '']
We need to do a double first, since the 1st one transforms [('123823.00',)] into ('123823.00',) and the 2nd one transforms (123823.00,) into 123823.00.
The float functionality needs to be in between the double first since it only works on iterables.

As we can see, we need to jump on a lot of hoops just to properly extract this type of data.

Ideally, we should have a way to extract the data in a very concise manner like: re_search("(\d+\.\d{2}) | first_match

renancunha · 2020-08-18T13:06:10Z

I agree with the need for simplicity/concision at the necessary pipes to extract the data in the provided example above. Given the current grammar, from the "logical" point of view makes sense to use the first twice, but analyzing it from the user perspective, it could be weirdy, or at least, verbose.

Another thing that I found exploring this example is that the first will fail if the re_search returns None. Also, if we try to apply the float to an empty value we will get an exception too because it expects an iterable. But I think that in these cases we can avoid breaking things evaluating the expressions inside a try/catch (if an exception is thrown we return None), in the same way that the universal parser extractor does.

add regression tests

19dde85

BurnzZ requested review from akshayphilar and renancunha August 17, 2020 11:59

renancunha mentioned this pull request Aug 18, 2020

add product conversion tests #53

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add regression tests#52

add regression tests#52
BurnzZ wants to merge 1 commit intomasterfrom
add-complex-tests

BurnzZ commented Aug 17, 2020

Uh oh!

renancunha commented Aug 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

BurnzZ commented Aug 17, 2020

Uh oh!

renancunha commented Aug 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants