Skip to content

Small refactor of is_valid_function_name#440

Open
j-t-1 wants to merge 4 commits intoerocarrera:masterfrom
j-t-1:is_valid_function_name
Open

Small refactor of is_valid_function_name#440
j-t-1 wants to merge 4 commits intoerocarrera:masterfrom
j-t-1:is_valid_function_name

Conversation

@j-t-1
Copy link
Copy Markdown
Collaborator

@j-t-1 j-t-1 commented Nov 24, 2024

Slightly increases readability.

Slightly increases readability.
Slightly increases readability.
@j-t-1
Copy link
Copy Markdown
Collaborator Author

j-t-1 commented Dec 16, 2024

string.punctuation (string of ASCII characters which are considered punctuation characters in the C locale) adds ";" and "=". If these should not be there I can change to something like (string.punctuation - ";=").encode(), and add a reference.

Comment thread pefile.py
Comment on lines -2319 to -2321
allowed_extra = b"._?@$()<>"
allowed_extra = b"$().<>?@_"
if relax_allowed_characters:
allowed_extra = b"!\"#$%&'()*+,-./:<>?[\\]^_`{|}~@"
Copy link
Copy Markdown
Collaborator

@nightlark nightlark Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious to know where these lists of characters originally came from. I'm not seeing many good references for the allowed set of characters in mangled function names, other than looking at the name mangling functions implemented in compilers and figuring out what set of characters can appear in them based on their code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first set are mangled / decorated characters. The dollar sign, question mark, @ sign and underscore are used by the Microsoft linker.

The less-than sign and greater-than sign were empirically added in issue #61.

I do not know where the parentheses and period are from, are they decoration from other compilers?

IDA has support for demangling names; @skochinsky would you be able to help us?
Doesb"._?@$()<>" correspond to mangled characters in PE files?

We use this in functions parse_export_directory and parse_imports.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j-t-1
From ida.cfg:

// the following characters are allowed in mangled names.
// they will be substituted with underscore during output if names
// are output in a mangled form.

MangleChars = "$:?([.)]"        // watcom
              "@$%?"            // microsoft
              "@$%&";           // borland

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. the PE file format does not impose any limitation on the characters used, and GetProcAddress should in theory work with any null-terminated byte string. So, I would suggest to not perform any explicit validation, just maybe invalid character replacement on output.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skochinsky thank you.

I have updated this PR to include more characters, referencing ida.cfg (hope okay).

Changing the code to not perform explicit validation is a useful suggestion.

pefile/pefile.py

Lines 2316 to 2318 in 4b3b1e2

def is_valid_function_name(
s: Union[str, bytes, bytearray], relax_allowed_characters: bool = False
) -> bool:

This function is used by parse_export_directory (uses the extra characters) and parse_imports (does not use the extra characters):

pefile/pefile.py

Lines 5515 to 5517 in 4b3b1e2

if not is_valid_function_name(symbol_name, relax_allowed_characters=True):
export_parsing_loop_completed_normally = False
break

pefile/pefile.py

Lines 6064 to 6065 in 4b3b1e2

if not is_valid_function_name(imp_name):
imp_name = b"*invalid*"

pefile is using this for imports that are difficult to parse:

pefile/pefile.py

Lines 6091 to 6103 in 4b3b1e2

# The file with hashes:
#
# MD5: bfe97192e8107d52dd7b4010d12b2924
# SHA256: 3d22f8b001423cb460811ab4f4789f277b35838d45c62ec0454c877e7c82c7f5
#
# has an invalid table built in a way that it's parseable but contains
# invalid entries that lead pefile to take extremely long amounts of time to
# parse. It also leads to extreme memory consumption.
# To prevent similar cases, if invalid entries are found in the middle of a
# table the parsing will be aborted
#
if imp_ord is None and imp_name is None:
raise PEFormatError("Invalid entries, aborting parsing.")

So pefile could be modified to handle this kind of case differently.

I will keep this PR as is, but something to consider @erocarrera and @nightlark.

Also add docstring.
@nightlark nightlark mentioned this pull request Jan 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants