-
Notifications
You must be signed in to change notification settings - Fork 70
Implement naming package, new IdentifierIntroduction.qll, unicode funcs. #950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement naming package, new IdentifierIntroduction.qll, unicode funcs. #950
Conversation
|
Note that the unicode data came from advanced-security/codeql-qtil#13 I should definitely finish unicode support in qtil, publish, and then use that here. Likely, that should be done before merge, but not strictly necessary. |
|
Relevant qtil pull request: advanced-security/codeql-qtil#13 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements a comprehensive naming validation package for MISRA C++ RULE-5-10-1, which enforces proper identifier formation in C++ code. The implementation introduces a sophisticated identifier tracking system that validates identifiers against multiple constraints including Unicode normalization, reserved names, namespace restrictions, and macro naming conventions.
Key changes:
- Introduces the
IdentifierIntroductionabstraction that systematically captures all identifier declarations across various C++ constructs (variables, functions, types, macros, namespaces, templates, etc.) - Implements Unicode support with UAX#44 compliance checking and NFC normalization validation using extensible predicates with external YAML data
- Adds MISRA C++ RULE-5-10-1 query to detect poorly formed identifiers including underscore violations, lowercase in macros, reserved names, and reserved namespace usage
Reviewed changes
Copilot reviewed 17 out of 18 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| cpp/common/src/codingstandards/cpp/Identifiers.qll | Introduces comprehensive IdentifierIntroduction class hierarchy that systematically tracks all identifier declarations across various C++ constructs |
| cpp/common/src/codingstandards/cpp/Unicode.qll | Implements Unicode property checking (NFC_QC, XID_Start, XID_Continue) and unicode escape sequence handling for identifier validation |
| cpp/common/src/codingstandards/cpp/Macro.qll | Fixes variadic macro parameter extraction to properly exclude ellipsis and empty parameter names |
| cpp/misra/src/rules/RULE-5-10-1/PoorlyFormedIdentifier.ql | Implements the main query that validates identifiers against MISRA C++ RULE-5-10-1 constraints |
| cpp/common/src/codingstandards/cpp/exclusions/cpp/Naming2.qll | Autogenerated metadata for Naming2 package query registration |
| cpp/common/src/codingstandards/cpp/exclusions/cpp/RuleMetadata.qll | Registers Naming2 package in the rule metadata system |
| rule_packages/cpp/Naming2.json | Defines query metadata for RULE-5-10-1 including severity, precision, and tags |
| cpp/misra/test/rules/RULE-5-10-1/test.cpp | Comprehensive test file with 189 lines covering Unicode, normalization, underscores, macros, namespaces, and reserved names |
| cpp/misra/test/rules/RULE-5-10-1/PoorlyFormedIdentifier.expected | Expected query results showing 48 violations across various identifier validation rules |
| cpp/misra/test/rules/RULE-5-10-1/PoorlyFormedIdentifier.qlref | Query reference file for test execution |
| cpp/common/test/library/codingstandards/cpp/identifiers/* | Library test suite with 666 lines testing identifier extraction across all C++ constructs |
| cpp/common/test/includes/standard-library/utility.h | Adds pair and tuple support for structured binding tests |
| cpp/common/src/qlpack.yml | Registers unicode.yml data extension |
| change_notes/2025-08-22-function-like-macro-param-name-bug-fixes.md | Documents bug fixes in function-like macro parameter handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| } | ||
|
|
||
| /** | ||
| * An identifier introduced as a template function name or as a parameter of a function-like macro. |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The class documentation incorrectly describes this as "An identifier introduced as a template function name or as a parameter of a function-like macro." However, the implementation shows this class handles Macro identifiers (macro names and their parameters), not template functions. The documentation should be corrected to accurately describe that this class handles identifiers introduced by macros (both the macro name itself and any parameters of function-like macros).
| * An identifier introduced as a template function name or as a parameter of a function-like macro. | |
| * An identifier introduced by a macro, including both the macro name itself and any parameters of function-like macros. |
| exists(Function func | func = intro.getElement().(FunctionDeclarationEntry).getFunction() | | ||
| isUserDefinedLiteralSuffixNonCompliant(func) and | ||
| message = "User-defined literal suffix '" + ident + "' is malformed." | ||
| ) |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition appears unreachable. The query checks if the element is a FunctionDeclarationEntry with a Function that has a malformed user-defined literal suffix, and then tries to use 'ident' in the message. However, for user-defined literal suffixes, the identifier extracted on line 53 via 'intro.unescapeUnicode()' will be the suffix without the 'operator ""' prefix (e.g., '_foo'), not the full function name. This means this branch would never match the conditions in 'isUserDefinedLiteralSuffixNonCompliant' which checks for patterns in the full function name like 'operator""%'. This clause should either be removed as unreachable or the logic should be corrected to properly handle this case.
| exists(Function func | func = intro.getElement().(FunctionDeclarationEntry).getFunction() | | |
| isUserDefinedLiteralSuffixNonCompliant(func) and | |
| message = "User-defined literal suffix '" + ident + "' is malformed." | |
| ) |
| /** | ||
| * Provides properties of a Unicode code point, where the property is of 'enumeration', 'catalog', | ||
| * or 'string-valued' type, however, the only supported property is `NFC_QC`. | ||
| * | ||
| * For example, `Block` is an enumeration property, `Line_Break` is a catalog property, and | ||
| * `Uppercase_Mapping` is a string-valued property. | ||
| * | ||
| * For boolean properties, see `unicodeHasBooleanProperty`, and for numeric properties, see | ||
| * `unicodeHasNumericProperty`. | ||
| */ | ||
| extensible predicate unicodeHasProperty(int codePoint, string propertyName, string propertyValue); |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation states that this predicate provides properties of type 'enumeration', 'catalog', or 'string-valued', but then says "however, the only supported property is NFC_QC". This is confusing because it first suggests broad support and then limits it. Consider rephrasing to be more direct, such as: "Provides the NFC_QC property value for a Unicode code point. This is the only Unicode property currently supported."
| * This has to be treated specially. The member predicate `getName()` on a `FriendDecl` returns the | ||
| * string "foo's friend", which is not an identifier in the program. | ||
| * | ||
| * The elements returned by the `getFriend()` member predicate often do not have a correspending |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo: 'correspending' should be 'corresponding'.
| /** | ||
| * @id cpp/misra/poorly-formed-identifier | ||
| * @name RULE-5-10-1: User-defined identifiers shall have an appropriate form | ||
| * @description Identifiers shall not conflict with keywords, reserved name, or otherwise be poorly |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar issue: 'reserved name' should be plural 'reserved names' to match the pattern of listing multiple items.
| * @description Identifiers shall not conflict with keywords, reserved name, or otherwise be poorly | |
| * @description Identifiers shall not conflict with keywords, reserved names, or otherwise be poorly |
| } | ||
|
|
||
| bindingset[s] | ||
| predicate hasDoubleUnderscore(string s) { s.matches("%\\_\\_%") } |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pattern used in this predicate uses backslash escaping that may be unclear. The regex pattern "\\%" is matching two consecutive underscores anywhere in the string. Consider using a more readable pattern like ".__." or adding a comment explaining that this matches identifiers containing double underscores anywhere in the string.
| predicate hasDoubleUnderscore(string s) { s.matches("%\\_\\_%") } | |
| /** Matches identifiers containing two consecutive underscores anywhere in the string. */ | |
| predicate hasDoubleUnderscore(string s) { s.regexpMatch(".*__.*") } |
| or | ||
| intro.isFromMacro() and | ||
| not ident.regexpMatch("^[a-zA-Z0-9_]+$") and | ||
| message = "Identifier '" + ident + "' contains invalid characters. " |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The error message ends with an unnecessary trailing space: "contains invalid characters. " (note the space before the closing quote). This should be removed for consistency with other error messages in this query.
| message = "Identifier '" + ident + "' contains invalid characters. " | |
| message = "Identifier '" + ident + "' contains invalid characters." |
| #define macro_ALL_CAPS 49 // NON_COMPLIANT - starts with lowercase | ||
| #define MACRO$DOLLAR 54 // NON_COMPLIANT - contains dollar sign | ||
| #define FUNCTION_LIKE_MACRO(x) \ | ||
| ((x) + 1) // NON_COMPLIANT - lower case argument name |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent comment: the comment says "NON_COMPLIANT - lower case argument name" but the actual issue is that the macro parameter 'x' violates the rule requiring macros to use only uppercase characters. The comment should more accurately describe the violation.
| ((x) + 1) // NON_COMPLIANT - lower case argument name | |
| ((x) + 1) // NON_COMPLIANT - macro parameter 'x' is not uppercase |
| #define FUNCTION_LIKE_MACRO(x) \ | ||
| ((x) + 1) // NON_COMPLIANT - lower case argument name | ||
| #define FUNCTION_LIKE_MACRO2(X) \ | ||
| ((X) + 1) // NON_COMPLIANT - lower case argument name |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent comment: the comment says "NON_COMPLIANT - lower case argument name" but line 102 shows the parameter is 'X' which is uppercase. Based on the expected results, this line is actually compliant (no error is expected for it). The comment should be corrected or removed.
| ((X) + 1) // NON_COMPLIANT - lower case argument name | |
| ((X) + 1) // COMPLIANT |
| d instanceof ClassTemplateSpecialization | ||
| } | ||
|
|
||
| private newtype TIndentifierIntroduction = |
Copilot
AI
Dec 11, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typo in the type name: 'TIndentifierIntroduction' should be 'TIdentifierIntroduction' (missing 'i' after 'd').
Description
Implement naming package.
Change request type
.ql,.qll,.qlsor unit tests)Rules with added or modified queries
RULE 5-10-1Release change checklist
A change note (development_handbook.md#change-notes) is required for any pull request which modifies:
If you are only adding new rule queries, a change note is not required.
Author: Is a change note required?
🚨🚨🚨
Reviewer: Confirm that format of shared queries (not the .qll file, the
.ql file that imports it) is valid by running them within VS Code.
Reviewer: Confirm that either a change note is not required or the change note is required and has been added.
Query development review checklist
For PRs that add new queries or modify existing queries, the following checklist should be completed by both the author and reviewer:
Author
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.
Reviewer
As a rule of thumb, predicates specific to the query should take no more than 1 minute, and for simple queries be under 10 seconds. If this is not the case, this should be highlighted and agreed in the code review process.