Skip to content
Open
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions pefile.py
Original file line number Diff line number Diff line change
Expand Up @@ -2306,7 +2306,7 @@ def is_valid_dos_filename(s):
# charset we will assume the name is invalid.
# The dot "." character comes from: https://github.com/erocarrera/pefile/pull/346
# All other symbols can be inserted by adding a name with that symbol to a .def file,
# and passing it to link.exe (See export_test.py)
# and passing it to link.exe, see export_test.py.
allowed_function_name = (
string.ascii_lowercase + string.ascii_uppercase + string.digits
).encode()
Expand All @@ -2316,13 +2316,13 @@ def is_valid_dos_filename(s):
def is_valid_function_name(
s: Union[str, bytes, bytearray], relax_allowed_characters: bool = False
) -> bool:
allowed_extra = b"._?@$()<>"
allowed_extra = b"$().<>?@_"
if relax_allowed_characters:
allowed_extra = b"!\"#$%&'()*+,-./:<>?[\\]^_`{|}~@"
Comment on lines -2319 to -2321
Copy link
Copy Markdown
Collaborator

@nightlark nightlark Nov 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be curious to know where these lists of characters originally came from. I'm not seeing many good references for the allowed set of characters in mangled function names, other than looking at the name mangling functions implemented in compilers and figuring out what set of characters can appear in them based on their code.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the first set are mangled / decorated characters. The dollar sign, question mark, @ sign and underscore are used by the Microsoft linker.

The less-than sign and greater-than sign were empirically added in issue #61.

I do not know where the parentheses and period are from, are they decoration from other compilers?

IDA has support for demangling names; @skochinsky would you be able to help us?
Doesb"._?@$()<>" correspond to mangled characters in PE files?

We use this in functions parse_export_directory and parse_imports.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@j-t-1
From ida.cfg:

// the following characters are allowed in mangled names.
// they will be substituted with underscore during output if names
// are output in a mangled form.

MangleChars = "$:?([.)]"        // watcom
              "@$%?"            // microsoft
              "@$%&";           // borland

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P.S. the PE file format does not impose any limitation on the characters used, and GetProcAddress should in theory work with any null-terminated byte string. So, I would suggest to not perform any explicit validation, just maybe invalid character replacement on output.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skochinsky thank you.

I have updated this PR to include more characters, referencing ida.cfg (hope okay).

Changing the code to not perform explicit validation is a useful suggestion.

pefile/pefile.py

Lines 2316 to 2318 in 4b3b1e2

def is_valid_function_name(
s: Union[str, bytes, bytearray], relax_allowed_characters: bool = False
) -> bool:

This function is used by parse_export_directory (uses the extra characters) and parse_imports (does not use the extra characters):

pefile/pefile.py

Lines 5515 to 5517 in 4b3b1e2

if not is_valid_function_name(symbol_name, relax_allowed_characters=True):
export_parsing_loop_completed_normally = False
break

pefile/pefile.py

Lines 6064 to 6065 in 4b3b1e2

if not is_valid_function_name(imp_name):
imp_name = b"*invalid*"

pefile is using this for imports that are difficult to parse:

pefile/pefile.py

Lines 6091 to 6103 in 4b3b1e2

# The file with hashes:
#
# MD5: bfe97192e8107d52dd7b4010d12b2924
# SHA256: 3d22f8b001423cb460811ab4f4789f277b35838d45c62ec0454c877e7c82c7f5
#
# has an invalid table built in a way that it's parseable but contains
# invalid entries that lead pefile to take extremely long amounts of time to
# parse. It also leads to extreme memory consumption.
# To prevent similar cases, if invalid entries are found in the middle of a
# table the parsing will be aborted
#
if imp_ord is None and imp_name is None:
raise PEFormatError("Invalid entries, aborting parsing.")

So pefile could be modified to handle this kind of case differently.

I will keep this PR as is, but something to consider @erocarrera and @nightlark.

allowed_extra = string.punctuation.encode()
return (
s is not None
and isinstance(s, (str, bytes, bytearray))
and all((c in allowed_function_name or c in allowed_extra) for c in set(s))
and all(c in (allowed_function_name + allowed_extra) for c in set(s))
)


Expand Down