I am working through the example in the "highered dataset" notebook, and I'm particularly interested in token-level features. But when I run this part of the code:
real = [
lambda i, j, s1, s2: 1.0,
lambda i, j, s1, s2: 1.0 if s1[i] == s2[j] else 0.0,
lambda i, j, s1, s2: 1.0 if s1[i] == s2[j] and len(s1[i]) >= 6 else 0.0,
lambda i, j, s1, s2: 1.0 if s1[i].isdigit() and s2[j].isdigit() and s1[i] == s2[j] else 0.0,
lambda i, j, s1, s2: 1.0 if s1[i].isalpha() and s2[j].isalpha() and s1[i] == s2[j] else 0.0,
lambda i, j, s1, s2: 1.0 if not s1[i].isalpha() and not s2[j].isalpha() else 0.0
]
# Other ideas are:
# to look up whether words are dictionary words,
# longest common subsequence,
# standard edit distance
feature_extractor = PairFeatureExtractor(real=real)
X_extracted = feature_extractor.fit_transform(tokX)
I get the following exception:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-10-04ddbdf4798d> in <module>()
1 feature_extractor = PairFeatureExtractor(real=real)
----> 2 X_extracted = feature_extractor.fit_transform(tokX)
/home/delip/anaconda2/envs/tensorflow/lib/python2.7/site-packages/pyhacrf/feature_extraction.pyc in fit_transform(self, raw_X, y)
108 Feature matrix list, for use with estimators or further transformers.
109 """
--> 110 return self.transform(raw_X)
111
112 def transform(self, raw_X, y=None):
/home/delip/anaconda2/envs/tensorflow/lib/python2.7/site-packages/pyhacrf/feature_extraction.pyc in transform(self, raw_X, y)
124 Feature matrix list, for use with estimators or further transformers.
125 """
--> 126 return [self._extract_features(sequence1, sequence2) for sequence1, sequence2 in raw_X]
127
128 def _extract_features(self, sequence1, sequence2):
/home/delip/anaconda2/envs/tensorflow/lib/python2.7/site-packages/pyhacrf/feature_extraction.pyc in _extract_features(self, sequence1, sequence2)
138
139 for k, feature_function in enumerate(self._binary_features):
--> 140 feature_array[..., k] = feature_function(array1, array2)
141
142 if self._sparse_features:
TypeError: <lambda>() takes exactly 4 arguments (2 given)
Any suggestions on what I can do or I should be doing? I'm executing the python notebook code as is.
I am working through the example in the "highered dataset" notebook, and I'm particularly interested in token-level features. But when I run this part of the code:
I get the following exception:
Any suggestions on what I can do or I should be doing? I'm executing the python notebook code as is.