Page 1 of 1

DeserializeTable and EM character

Posted: Tue Aug 03, 2021 12:08 pm
by Flinx
The DeserializeTable function fails if a string contains the character EM (0x19).

Code: Select all

TestTable={}
TestTable[0]= ByteChr(25)
TestTable$=SerializeTable(TestTable)
TestTable2= DeserializeTable(TestTable$)
Using the Inbuilt adapter it does not fail.
Because the strings for my program comes from outside I have the question, must I check for this and possible other special characters or is it a bug?
I convert the strings into UTF8 and check them with ValidateStr() and I thought this should be enough.

Ralf

Re: DeserializeTable and EM character

Posted: Wed Aug 04, 2021 11:00 am
by Flinx
Meanwhile I have found that the character was the result of a bad Unicode conversion, but I think that shouldn't matter for DeserializeTable() if a string can contain arbitrary data.

Re: DeserializeTable and EM character

Posted: Thu Aug 12, 2021 5:59 pm
by jPV
I'm also having problems with some non alphanumeric characters when deserializing JSON from web APIs. Many of them have "@" character for hyperlinks, ":" for CURIEs, and "-" characters otherwise in table keys.

Here are examples:

Code: Select all

; works
json$ = "{\"test\": \"alphabets\"}"
t = DeserializeTable(json$)
DebugPrint(t["test"])

; fails
json$ = "{\"te:st\": \"colon\"}"
t = DeserializeTable(json$)
DebugPrint(t["te:st"])

; fails
json$ = "{\"@test\": \"at\"}"
t = DeserializeTable(json$)
DebugPrint(t["@test"])

; fails
json$ = "{\"te-st\": \"hyphen\"}"
t = DeserializeTable(json$)
DebugPrint(t["te-st"])
Basically these should work as table keys in Hollywood, because this is fine:

Code: Select all

; all work
t = {}
t["test"] = "alphabets"
t["te:st"] = "colon"
t["@test"] = "at"
t["te-st"] = "hyphen"
DebugPrint(t["test"], t["te:st"], t["@test"], t["te-st"]) 
Any work-arounds or solutions? I'm getting spaghetti code if I try to replace those characters with placeholders, because it messes the actual data strings too...

Re: DeserializeTable and EM character

Posted: Fri Aug 13, 2021 9:59 pm
by airsoftsoftwair
Flinx's problem definitely sounds like a Hollywood bug. jPV's issue, however, could be considered a feature. Even though it's possible to use those special characters as table indices it's not really supported by the serialization interface because the idea is to only (de)serialize items that can be addressed using the "." syntax. It's not possible to use special characters like @ or - with that syntax. Not sure though if it the serializer should be more tolerant here, though.

Re: DeserializeTable and EM character

Posted: Sat Aug 14, 2021 8:52 am
by jPV
I found that Allanon's JSON library does handle these charactes, so I'm using it for now (have a deadline for a project soon where I really need this), but I would think that it would be really nice if this could be supported internally too now that we have a built-in function for deserializing anyway. Maybe an option to let it be more tolerant?

Re: DeserializeTable and EM character

Posted: Wed Aug 18, 2021 10:07 pm
by airsoftsoftwair
jPV wrote: Sat Aug 14, 2021 8:52 am I found that Allanon's JSON library does handle these charactes, so I'm using it for now (have a deadline for a project soon where I really need this), but I would think that it would be really nice if this could be supported internally too now that we have a built-in function for deserializing anyway. Maybe an option to let it be more tolerant?
The general problem is that I'm not sure if it's possible to make the deserializer handle *any* JSON because Hollywood also uses some extensions to signal the content of the individual JSON items, e.g. bytecode containing a function or binary data. So there can always be conflicts.

Re: DeserializeTable and EM character

Posted: Sat Nov 06, 2021 7:11 pm
by airsoftsoftwair
All issues should be fixed now.

Code: Select all

- Fix: Removed some restrictions that Hollywood imposed on JSON key names against the specification; all
  characters are valid now in key names except the space character because Hollywood sometimes uses that
  to specify the type of the binary data
- Fix: JSON deserializer didn't recognize the \u escape sequence
- Fix: Some control characters weren't serialized/deserialized correctly when Hollywood was in UTF-8 mode