[MDSAL-309] V1 codegen generates enum constants that are not valid Java identifiers Created: 08/Feb/18 Updated: 05/Apr/18 Resolved: 23/Mar/18 |
|
| Status: | Resolved |
| Project: | mdsal |
| Component/s: | Binding codegen |
| Affects Version/s: | Nitrogen SR1 |
| Fix Version/s: | Carbon SR4, Oxygen SR1, Fluorine, Nitrogen SR3 |
| Type: | Bug | Priority: | Medium |
| Reporter: | Marek Gradzki | Assignee: | Robert Varga |
| Resolution: | Done | Votes: | 0 |
| Labels: | None | ||
| Remaining Estimate: | Not Specified | ||
| Time Spent: | Not Specified | ||
| Original Estimate: | Not Specified | ||
| Attachments: |
|
||||||||||||||||||||||||||||
| Issue Links: |
|
||||||||||||||||||||||||||||
| Description |
|
If enum value is not valid Java identifier, e.g.: type enumeration { } V1 code generator produces invalid enum constant identifier:
public class Ipv4MulticastSourceAddress
hence compilation fails. |
| Comments |
| Comment by Marek Gradzki [ 08/Feb/18 ] |
|
The issue is related to ietf-routing-types@2017-02-27 so simplest fix would be to transform characters that are not Java IdentifierChars to UNICODE names (Character.getName) + underscore separators. Solution is backward compatible (in YANG->Java direction, not sure if other direction is relevant), but such mapping might produce collisions (e.g. "*" and "ASTERIX").
Solution 2: Replace all underscores in the enum name to pars of underscores (required to prevent collisions as in S1), then replace non-java IdentifierChars to UNICODE names + underscore separators. Such mapping is not backwards compatible, but it's injective (probably 1-1).
|
| Comment by Robert Varga [ 08/Mar/18 ] |
|
RFC7950 specifies enum statement argument as: The "enum" statement, which is a substatement to the "type" statement, MUST be present if the type is "enumeration". It is repeatedly used to specify each assigned name of an enumeration type. It takes as an argument a string that is the assigned name. The string MUST NOT be zero-length and MUST NOT have any leading or trailing whitespace characters (any Unicode character with the "White_Space" property). The use of Unicode control codes SHOULD be avoided. Which means we need to handle any old Unicode string. Generated name has to be a valid Java identifier, as per https://docs.oracle.com/javase/specs/jls/se9/html/jls-3.html#jls-3.8, which specifies it as: A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true. A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true. The "Java letters" include uppercase and lowercase ASCII Latin letters A-Z (\u0041-\u005a), and a-z (\u0061-\u007a), and, for historical reasons, the ASCII dollar sign ($, or \u0024) and underscore (_, or \u005f). The dollar sign should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems. The underscore may be used in identifiers formed of two or more characters, but it cannot be used as a one-character identifier due to being a keyword. The "Java digits" include the ASCII digits 0-9 (\u0030-\u0039). Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages. YANG/Java mapping in this case needs to be at least an injection, so codegen works in a non-conflicting manner. We can work out the reverse assignment based on value (and sanity-check with the same function). I have (inexactly) examined the currently-public corpus of YANG models from github, which shows 16432 unique strings being used, some of which are fun (single-letter, ?, *). The corpus is attached to this issue. |
| Comment by Robert Varga [ 08/Mar/18 ] |
|
Based on this data and the expectation we will be allowed to break the spec in Fluorine, I would propose actually having the mapping function discontinuous, with separate mapping rules based on the contents of the entire string. Extracted corpus shows that '$' is not used anywhere, which makes it a good escape character from JLS perspective (use in generated source code, can be start of an identifier) and observed corpus (it does not show a need to escape it). Hence the rules would be the following:
This would break the users thoroughly, I am afraid As a mitigation, we can perhaps consider enums within the enumeration to decide how to map members and reflect that in the enumeration class name... Perhaps use current mapping schema when the result is known to work and use a trailing '$' when we know the enums are a mess and we are using the above schema... Any thoughts on this? |
| Comment by Marek Gradzki [ 09/Mar/18 ] |
|
Using current mapping schema when result is Java identifier, otherwise the one you propose looks backward compatible to me.
Such mapping is not invertible unless we know which schema was used. AFAIR Enumerations class is not generated if enumeration is defined using typedef, so perhaps we should consider using getter instead of class name. That would not break current users.
But not sure why we can't use yang names of enums instead. They are already present in the generated code.
|
| Comment by Robert Varga [ 09/Mar/18 ] |
|
As it turns out we do not need to mark the enumeration name at all. The only place that needs to reconcile Binding->Yang names (and vice-versa) is the binding codec, which has both the class and EnumTypeDefinition already available – hence pairing the two is a rather trivial matter. It would be possible to reconstruct this even without the help of schema, as the class itself captures and exposes YANG names, except it is not captured in a common contract, so we'd have to resort to reflection. I have filed |
| Comment by Robert Varga [ 09/Mar/18 ] |
|
Fluorine: https://git.opendaylight.org/gerrit/69317
|
| Comment by Robert Varga [ 10/Mar/18 ] |
|
Since the fix is backwards-compatible, it will be applied to all currently-supported streams. |
| Comment by Robert Varga [ 10/Mar/18 ] |
|
Oxygen: https://git.opendaylight.org/gerrit/69334 Nitrogen: https://git.opendaylight.org/gerrit/69336 Carbon: https://git.opendaylight.org/gerrit/69338
|