Small refactor of is_valid_function_name#440
Small refactor of is_valid_function_name#440j-t-1 wants to merge 4 commits intoerocarrera:masterfrom
Conversation
Slightly increases readability.
Slightly increases readability.
|
string.punctuation (string of ASCII characters which are considered punctuation characters in the C locale) adds ";" and "=". If these should not be there I can change to something like |
| allowed_extra = b"._?@$()<>" | ||
| allowed_extra = b"$().<>?@_" | ||
| if relax_allowed_characters: | ||
| allowed_extra = b"!\"#$%&'()*+,-./:<>?[\\]^_`{|}~@" |
There was a problem hiding this comment.
I'd be curious to know where these lists of characters originally came from. I'm not seeing many good references for the allowed set of characters in mangled function names, other than looking at the name mangling functions implemented in compilers and figuring out what set of characters can appear in them based on their code.
There was a problem hiding this comment.
I think the first set are mangled / decorated characters. The dollar sign, question mark, @ sign and underscore are used by the Microsoft linker.
The less-than sign and greater-than sign were empirically added in issue #61.
I do not know where the parentheses and period are from, are they decoration from other compilers?
IDA has support for demangling names; @skochinsky would you be able to help us?
Doesb"._?@$()<>" correspond to mangled characters in PE files?
We use this in functions parse_export_directory and parse_imports.
There was a problem hiding this comment.
@j-t-1
From ida.cfg:
// the following characters are allowed in mangled names.
// they will be substituted with underscore during output if names
// are output in a mangled form.
MangleChars = "$:?([.)]" // watcom
"@$%?" // microsoft
"@$%&"; // borland
There was a problem hiding this comment.
P.S. the PE file format does not impose any limitation on the characters used, and GetProcAddress should in theory work with any null-terminated byte string. So, I would suggest to not perform any explicit validation, just maybe invalid character replacement on output.
There was a problem hiding this comment.
@skochinsky thank you.
I have updated this PR to include more characters, referencing ida.cfg (hope okay).
Changing the code to not perform explicit validation is a useful suggestion.
Lines 2316 to 2318 in 4b3b1e2
This function is used by parse_export_directory (uses the extra characters) and parse_imports (does not use the extra characters):
Lines 5515 to 5517 in 4b3b1e2
Lines 6064 to 6065 in 4b3b1e2
pefile is using this for imports that are difficult to parse:
Lines 6091 to 6103 in 4b3b1e2
So pefile could be modified to handle this kind of case differently.
I will keep this PR as is, but something to consider @erocarrera and @nightlark.
Also add docstring.
Slightly increases readability.