|
Today Plastic allows binding on the input side and expansion on the output side for collections (JSon lists)
embedded within a given payload. Today this feature has limitations.
${abc[*]}
First, it does not handle nested arrays withtout the schema designer breaking up the translation into
multiple levels using multiple intput/output schemas. Second, it sort-of allows re-use of the asterisk
for sibling arrays (and just hopes that the array lengths are the same). Conceptually there is only a
single asterisk and it represents only a single length.
Side note: the sibling array issue either results in a short output or an error, depending on
which array Plastic encounters first. If array A is of size 10 and B is of 20, if Plastic encounters
A first in expanding an output array that uses both A and B, then there will be 10 members of that
output array. If it encounters B first, then when it gets to the 11th expansion, it will fail to
find an 11th value for A and will error out.
To handle an arbitrary number of collection sizes, the asterisk needs to represent an arbitrary number
of values through two possible means. Plastic can generalize the current scheme by having an asterisk
usage per level of nesting. You might see this say in a schema using this syntax
${abc[*][*]}
This is simple to understand and can handle arbitrary nesting by adding more "[*]" to match the nesting
level.
Side note: there is an interesting case of how to flatten. If the input payload has an array
within an array so [*][*] is used and the output has only a single array, then Plastic needs
to map two dimensions into one by flattening.
Using only the nesting level and multiple [*] does not fix the sibling array issue, because two sibling
arrays sharing a parent array still might have different lengths.
A new expansion feature can do better however.
An iterator comes into existence when there are one or more [*] as part of an arrayed variable when
the payload is matched against an input schema. Iterators are not create on the output side, they are
just used to expand arrays. Iterators will have a name that is a derivative of the variable name to
which they were bound.
Most importantly, iterators will be merged together so that mixing outputs together like the sibling
array alluded to above are handled intuitively.
Lets look at examples.
1 - Introducing Iterators
-------------------------
Consider this in the INPUT schema...
${abc[*]}
Consider this on the OUTPUT schema...
${abc[*]}
The new feature would create an iterator with the name 'abc' and the length L (from the payload) when
the payload is bound to the input schema. On the output side, the abc variable will automatically be
using the abc iterator from the input side.
So right now our working definition of an iterator is a name and a range 0..(L-1). Later on when we
cover nested arrays, we'll expand the definition of an iterator to be more complex.
2 - Independent Sibling Arrays
------------------------------
Payload...
{
"abc": [ 'aa', 'bb', ... 'zz' ],
"colors": [ "red", "green", "blue" ]
}
Input schema...
{
"abc": [ "${abc[*]}" ],
"colors": [ "${colors[*]}" ]
}
Internal bindings...
abc[0] = aa
abc[1] = bb
...
abc[25] = zz
_[abc] = (26)
colors[0] = red
colors[1] = green
colors[2] = blue
_[colors] = (3)
Output schema...
{
"foo": [ "${abc[*]}" ],
"bar": [ "${colors[*]}" ]
}
Output...
{
"foo": [ 'aa', 'bb', ... 'zz' ],
"bar": [ "red", "green", "blue" ]
}
We introduce the internal naming for the iterator (_[name]). We introduced the (n,m,...) as a way to
represent the ranges of the indices. They are a list of ints with one per dimension. So (26) above
is a list of one number.
Output Steps
- Encounter owning collection and push on parent stack with empty iterator
- Encounter array variable
- Verify iterator exists and dimensionality matches else ERROR
- Impose iterator on iterator stack (effectively replaces empty iterator)
- Start iterate, clone, recurse
- Pop parent stack including iterator
4 - Mixing Input Arrays into a Single Output Array
--------------------------------------------------
Payload...
{
"aaa": [ 'aa', 'bb', 'cc' ],
"colors": [ "red", "green", "blue", "purple" ]
}
Input schema...
{
"aaa": [ "${aaa[*]}" ],
"colors" [ "${colors[*]}" ]
}
Internal bindings...
aaa[0] = aa
aaa[1] = bb
aaa[2] = cc
_[aaa] = (3)
colors[0] = red
colors[1] = green
colors[2] = blue
colors[3] = purple
_[colors] = (4)
Output schema...
{
"foobar": [ "${aaa[*]=xx}/${colors[*]=gray}" ]
}
Output...
{
"foobar": [ "aa/red", "bb/green", "cc/blue", "xx/purple" ]
}
The "foobar" array in the output schema is constructed from two arrayed variables "aaa" and
"colors". If the length for aaa is not the same as the length for colors, then this would
result in an error, unless there are defaults present.
Output Steps
- Encounter owning collection and push on parent stack with empty iterator
- Encounter array variables
- Verify iterators exists and dimensionality matches else ERROR
- Impose iterator aaa on iterator stack (effectively replacing empty iterator)
- Impose iterator colors
- existing iterator, so need to merge
- for each dimension, take the larger range
- if dimension is missing, then it wins
- merged iterator now has two names aaa and colors
- no worry yet about imposing up the parent stack because dimension = 1
- Start iterate, clone, recurse
- Pop iterator stack when popping parent stack
5 - Multivariable Example
-------------------------
Payload...
{
"deviceName": "CXP-2501",
"interfaceName": "TCP/1/0/24",
"addresses": [
{
"address": "10.10.100.100",
"prefix-length": "24"
}
,
{
"address": "10.10.100.221",
"prefix-length": "24"
}
,
{
"address": "10.10.100.168",
"prefix-length": "24"
}
]
}
Input schema...
{
"deviceName": "${dName}",
"interfaceName": "${iName}",
"addresses": [
{
"address": "${addr[*]}",
"prefix-length": "${pref-len[*]}"
}
]
}
addr[0] = 10.10.100.100
addr[1] = 10.10.100.221
addr[2] = 10.10.100.168
_[addr] = (3)
pref[0] = 24
pref[1] = 24
pref[2] = 24
_[pref-len] = (3)
Output schema...
{
"dev-name": "${dName}",
"interface-name": "${iName}",
"objects": [
{
"subnet": "${addr[*]}/${pref-len[*]}"
}
]
}
There are two iterators named 'addr' and 'pref-len' that are associated with the parent array.
It turns out that they have the same range.
Output Steps
- very similar to the previous example
6 - Multi-dimensional Example (two deep)
----------------------------------------
Payload...
{
"aaa": [
{
"id": "1",
"bbb": [ 'a', 'b', 'c' ]
}
,
{
"id": "2",
"bbb": [ 'd', 'e' ]
}
,
{
"id": "3",
"bbb": [ 'f' ]
}
,
{
"id": "4",
"bbb": [ ]
}
]
}
Input schema...
{
"aaa": [ <------------ parent/owning array
{
"id": "${id[*]},
"bbb": [ <-------------- child/nested array
"${bbb[^][*]}"
],
}
]
}
Rule: if "id" or "bbb" is missing, then its an error (that might be resolved through early classifier normalization)
id[0] = 1
id[1] = 2
id[2] = 3
id[3] = 4
_[id] = (4)
bbb[0][0] = a
bbb[0][1] = b
bbb[0][2] = c
bbb[1][0] = d
bbb[1][1] = e
bbb[2][0] = f
_[bbb] = (4,3)
Note: this example shows that you cannot even count on rectangular matrices (ie, bbb is not 4x3=12)!
And because of this, we may have shortened ranges (they must be dealt with). Worse, they actually
can be real holes and can be at any level of dimension. So even though there are only 6 entries
for bbb, we have to iterate over all 12 possibilities to be sure to cover everything.
Note that had bbb been given default values, it would have "filled in the holes" so that it is a
squared off 4x3
6A - Flattening
---------------
Output schema...
{
"foo": [
"${id[*]}"
],
"bar": [
"${bbb[*][*]}"
]
}
Output...
{
"foo": [ 1, 2, 3, 4 ],
"bar": [ 'a', 'b', 'c', 'd', 'e', 'f' ]
}
Flattening occurs when the number asterisks is the same as the number of dimensions on the data.
Rule: silently ignore indices in bbb that don't exist (to allow for holes in data)
Output Steps
- Encounter owning collection and push on parent stack with empty iterator
- Encounter array variable called id
- Verify iterator exists and dimensionality matches else ERROR
- Impose iterator id on iterator stack (effectively replacing empty iterator)
- Start iterate, clone, recurse
- Pop iterator stack when popping parent stack
- Encounter bar collection and push on parent stack with empty iterator
- Encounter array variable called bbb
- Verify iterator exists and dimensionality matches else ERROR
- Impose iterator id on iterator stack
- empty iterator is there
- parent stack is 1 deep (argument to make iterator stack and parent stack be the same)
- dimensionality is 2
- all of dimensionality is handled at this depth
- Start iterate, clone, recurse
- Pop iterator stack when popping parent stack
Similar output schema but top level array and using new ^ ...
[
{
"foo": [
"${id[*]}"
],
"bar": [
"${bbb[^][*]}"
]
}
]
Output...
[
{
"foo": [ 1, 2, 3, 4 ],
"bar": [ 'a', 'b', 'c' ]
}
,
{
"foo": [ 1, 2, 3, 4 ],
"bar": [ 'd', 'e' ]
}
,
{
"foo": [ 1, 2, 3, 4 ],
"bar": [ 'f' ]
}
,
{
"foo": [ 1, 2, 3, 4 ],
"bar": [ ]
}
,
]
New syntax is [^] which means that this dimension is going to come from a parent array. A
valid syntax is now extended to be zero or more [^] followed by one or more [*]. No alternating
of ^ and *.
For inputs, the only acceptable syntax is [^]...[^][*]. For no other reason, this is just to be
consistent with the output usage. Syntax of [*]...[*] for two or more dimensions would be an
error.
So if there are multiple dimensions of [*] in the output schema, then some kind of flattening
is happening. It normally would be [^]...[^][*] for conventional usage.
Algorithmic implication is that you don't know how many iterations there are without
recursing completely into your children. If their dimensionality is large enough, it
could reach back and affect your size.
The fixed relationship between parent collection and expansion is just how it is.
In today's Plastic, the logic looks for the immediate parent as the expansion point. If
you didn't like this simplification, say you wanted expansion to occur at a parent's parent,
then you needed to code.
After this implementation, the logic will look up the parent stack to handle each dimension of the
iteration. If there are enough parents, then flattening occurs.
If you don't like these simplifications, then you need to code.
6B - Mixing Input Arrays into a Flattened Output Array
------------------------------------------------------
What if I want to leverage the original arrays?
Desired output...
[
'1/a',
'1/b',
'1/c',
'2/d',
'2/e',
'3/f'
]
Output schema...
[
"${id[*]}"/"${bbb[*][*]}"
]
This is a flattening because of [*][*]. That dimension wins out so the flattening is 2 dimensional.
The iterators for id and bbb will be merged into a single iterator. The first [*] will have a range of
the larger of id and bbb. Then second [*] will have the range of the second dimension from bbb. This
means that the leftmost [*] in both iterators really need to have the same range to avoid issues.
Note that i=(4) and bbb=(4,3) and there are no entries for bbb[3][*] so the only choice is to error
out or to skip that expansion. If there are default values associated with id and/or bbb, they can
be used to fill in holes. If not, then it should be an error. This is a mixed hole because it takes
values from two variables to create a single output value. Had the output value just been from a single
variable, then this would have been a simple hole and skipped.
Rule: holes in simple output values will be skipped in interations, unless there is a default value,
but holes in mixed output values result in an error.
Had the output schema been...
[
"${id[*]}"/"${bbb[^][*]}"
]
then it would have been an error because there is no parent array to supply the missing dimension.
Rule: using ^ and not having a parent array to supply the missing dimension is an error.
6C - Naive Isomorphic Conversion
--------------------------------
Output schema...
{
"AAA": [
{
"ID": "${id[*]},
"BBB": [
"${bbb[*][*]}"
]
}
]
}
Output...
{
"aaa": [
{
"id": "1",
"bbb": [ 'a', 'b', 'c', 'd', 'e', 'f' ]
}
,
{
"id": "2",
"bbb": [ 'a', 'b', 'c', 'd', 'e', 'f' ]
}
,
{
"id": "3",
"bbb": [ 'a', 'b', 'c', 'd', 'e', 'f' ]
}
{
"id": "4",
"bbb": [ 'a', 'b', 'c', 'd', 'e', 'f' ]
}
]
}
6D - Isomorphic Conversion (as a sanity check)
-----------------------------------------------------
Output schema...
{
"AAA": [
{
"ID": "${id[*]},
"BBB": [
"${bbb[^][*]}"
],
}
]
}
Output...
{
"aaa": [
{
"id": "1",
"bbb": [ 'a', 'b', 'c' ]
}
,
{
"id": "2",
"bbb": [ 'd', 'e' ]
}
,
{
"id": "3",
"bbb": [ 'f' ]
}
{
"id": "4",
"bbb": [ ]
}
]
}
Design Notes
------------
MoArray logic
- will need to create iterator
- dump an array for debug including iterator
- copy iterator from one array to another
- Need way to create missing iterator for legacy morpher logic
Reserve for internal use variable names with
Illegal variable names
- containing: . [ ] * ~ ^ ? {} & | ( ) @
Iterator is special attribute "_[name]" and is an array of int[]
Error if dimensionality of variable is not equal to the dimensionality of the bound values
Iterator (excluding empty iterator) has
- one or more names
- one or more dimensions
- each dimension has a range 0 ... L-1
- merging where larger ranges and bigger dimensions win
- incrementable with a current value
- incrementing only applies to non-parent dimensions
- a concept of "is done", which needs to be checked FIRST to allow for zero length arrays
- "is done" only applies to non-parent dimension
Algorithm
---------
Recurse through schema
When you hit an array then push onto parent stack with empty iterator
Recurse through nearby children and collect iterators specifications (don't drill into arrays)
For each iterator specification
if input schema then
validate iterator syntax
create iterator
if output schema then
validate iterator syntax
validate the iterator is bound
valid dimensions of bound values matches specification
if iterator has parent references then
link all parent references up the iterator inheritance chain
impose/merge iterator on TOS
loop if iteration is not done then
increment iterator
clone
walk cloned output
if hit array, recurse and push
when resolving arrayed variable, need to look up stack of pushed iterators for resolution
QUESTIONS
All ^^^^^ should be semantically valid?
JW TMO example
|