You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+144-1Lines changed: 144 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -582,7 +582,150 @@ pattern: "^prefix"
582
582
pattern: "suffix$"
583
583
```
584
584
585
-
### 8. LLM Judge Evaluator
585
+
### 8. Raises Evaluator (Exception Testing)
586
+
587
+
Tests that a function raises a specific exception. Similar to pytest's `pytest.raises`, this evaluator verifies both the exception type and optionally the exception message pattern. This is a **case-level only** evaluator.
588
+
589
+
```yaml
590
+
function_name:
591
+
dataset:
592
+
- case:
593
+
input: invalid_value
594
+
raises: ExceptionType # Required: exception type name
595
+
match: "pattern" # Optional: regex pattern for exception message
596
+
```
597
+
598
+
**Important Notes:**
599
+
600
+
- `raises`is **case-level only** - cannot be used as a global evaluator
601
+
- `match`can only be used together with `raises`
602
+
- When `raises` is specified, the test expects an exception and will fail if the function returns normally
603
+
- Global evaluators (type checks, assertions, etc.) are automatically skipped for exception cases
604
+
605
+
**Examples:**
606
+
607
+
```yaml
608
+
# Basic exception testing
609
+
calculate_discount:
610
+
evals:
611
+
IsFloat:
612
+
type: float
613
+
dataset:
614
+
- case:
615
+
id: "valid_calculation"
616
+
inputs: [100.0, 20.0]
617
+
expected: 80.0
618
+
619
+
- case:
620
+
id: "negative_price"
621
+
inputs: [-100.0, 20.0]
622
+
raises: ValueError
623
+
match: "must be positive" # Checks exception message
624
+
625
+
- case:
626
+
id: "invalid_discount"
627
+
inputs: [100.0, 150.0]
628
+
raises: ValueError # Just checks type, not message
629
+
630
+
# Division by zero
631
+
divide:
632
+
evals:
633
+
IsNumber:
634
+
type: "int | float"
635
+
dataset:
636
+
- case:
637
+
inputs: [10, 2]
638
+
expected: 5.0
639
+
640
+
- case:
641
+
inputs: [10, 0]
642
+
raises: ZeroDivisionError
643
+
644
+
# Type validation
645
+
parse_age:
646
+
dataset:
647
+
- case:
648
+
input: "25"
649
+
expected: 25
650
+
651
+
- case:
652
+
input: "invalid"
653
+
raises: ValueError
654
+
match: "invalid literal"
655
+
656
+
- case:
657
+
input: -5
658
+
raises: ValueError
659
+
match: "age must be positive"
660
+
661
+
# Key errors
662
+
get_config_value:
663
+
dataset:
664
+
- case:
665
+
input: "api_key"
666
+
expected: "secret_key_123"
667
+
668
+
- case:
669
+
input: "nonexistent_key"
670
+
raises: KeyError
671
+
match: "nonexistent_key"
672
+
673
+
# Multiple exception types
674
+
process_data:
675
+
dataset:
676
+
- case:
677
+
input: { "valid": "data" }
678
+
expected: "processed"
679
+
680
+
- case:
681
+
input: null
682
+
raises: TypeError
683
+
match: "NoneType"
684
+
685
+
- case:
686
+
input: []
687
+
raises: ValueError
688
+
match: "empty"
689
+
690
+
- case:
691
+
input: { "invalid": "format" }
692
+
raises: KeyError
693
+
694
+
# Index errors
695
+
get_element:
696
+
dataset:
697
+
- case:
698
+
inputs: [[1, 2, 3], 1]
699
+
expected: 2
700
+
701
+
- case:
702
+
inputs: [[1, 2, 3], 10]
703
+
raises: IndexError
704
+
match: "out of range"
705
+
```
706
+
707
+
**How it works:**
708
+
709
+
1. When `raises` is present in a case, the framework wraps the function execution in a try-catch
710
+
2. If an exception is raised:
711
+
- Checks if exception type matches `raises`
712
+
- If `match` is provided, validates exception message against the regex pattern
713
+
- Global evaluators are skipped (they would fail on exception dict)
714
+
3. If no exception is raised when `raises` is specified, the test fails
715
+
4. If exception type doesn't match, the test fails and shows actual vs expected
716
+
717
+
**Common Exception Types:**
718
+
719
+
- `ValueError`: Invalid value/argument
720
+
- `TypeError`: Wrong type
721
+
- `KeyError`: Missing dictionary key
722
+
- `IndexError`: List/array index out of range
723
+
- `ZeroDivisionError`: Division by zero
724
+
- `AttributeError`: Missing attribute
725
+
- `FileNotFoundError`: File doesn't exist
726
+
- `RuntimeError`: Generic runtime error
727
+
728
+
### 9. LLM Judge Evaluator
586
729
587
730
Uses a Language Model to evaluate outputs based on a custom rubric. Ideal for semantic evaluation, quality assessment, and cases where rule-based checking is insufficient.
0 commit comments