Skip to content

PairFeatureExtractor.fit_transform() throws exception #5

@delip

Description

@delip

I am working through the example in the "highered dataset" notebook, and I'm particularly interested in token-level features. But when I run this part of the code:

real = [
    lambda i, j, s1, s2: 1.0,
    lambda i, j, s1, s2: 1.0 if s1[i] == s2[j] else 0.0,
    lambda i, j, s1, s2: 1.0 if s1[i] == s2[j] and len(s1[i]) >= 6 else 0.0,
    lambda i, j, s1, s2: 1.0 if s1[i].isdigit() and s2[j].isdigit() and s1[i] == s2[j] else 0.0,
    lambda i, j, s1, s2: 1.0 if s1[i].isalpha() and s2[j].isalpha() and s1[i] == s2[j] else 0.0,
    lambda i, j, s1, s2: 1.0 if not s1[i].isalpha() and not s2[j].isalpha() else 0.0
]
# Other ideas are:
#  to look up whether words are dictionary words,
#  longest common subsequence,
#  standard edit distance
feature_extractor = PairFeatureExtractor(real=real)
X_extracted = feature_extractor.fit_transform(tokX)

I get the following exception:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-10-04ddbdf4798d> in <module>()
      1 feature_extractor = PairFeatureExtractor(real=real)
----> 2 X_extracted = feature_extractor.fit_transform(tokX)

/home/delip/anaconda2/envs/tensorflow/lib/python2.7/site-packages/pyhacrf/feature_extraction.pyc in fit_transform(self, raw_X, y)
    108             Feature matrix list, for use with estimators or further transformers.
    109         """
--> 110         return self.transform(raw_X)
    111 
    112     def transform(self, raw_X, y=None):

/home/delip/anaconda2/envs/tensorflow/lib/python2.7/site-packages/pyhacrf/feature_extraction.pyc in transform(self, raw_X, y)
    124             Feature matrix list, for use with estimators or further transformers.
    125         """
--> 126         return [self._extract_features(sequence1, sequence2) for sequence1, sequence2 in raw_X]
    127 
    128     def _extract_features(self, sequence1, sequence2):

/home/delip/anaconda2/envs/tensorflow/lib/python2.7/site-packages/pyhacrf/feature_extraction.pyc in _extract_features(self, sequence1, sequence2)
    138 
    139         for k, feature_function in enumerate(self._binary_features):
--> 140             feature_array[..., k] = feature_function(array1, array2)
    141 
    142         if self._sparse_features:

TypeError: <lambda>() takes exactly 4 arguments (2 given)

Any suggestions on what I can do or I should be doing? I'm executing the python notebook code as is.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions