Opened 16 years ago

Closed 15 years ago

#587 closed Defect (fixed)

Unescaped ampersands in client_state.xml

Reported by: Nicolas Owned by: davea
Priority: Minor Milestone: Undetermined
Component: Client - Daemon Version:
Keywords: xml Cc:

Description

PrimeGrid uses scripts to generate input files on the fly. This makes the input files have question marks and ampersands on the URL. The scheduler reply correctly escapes these ampersands as &. But when the client saves the file information on client_state.xml, they get unescaped:

<file_info>
    <name>psp_sr2sieve_2837737_cmd</name>
    <nbytes>59.000000</nbytes>
    <max_nbytes>0.000000</max_nbytes>
    <status>1</status>
    <url>http://www.primegrid.com/download/psp_sr2sieve_workunit.php?from=3029916000&to=3029916500</url>
</file_info>

This makes client_state non-wellformed XML. (note even Trac XML syntax highlighting is showing the ampersand in red).

Change History (12)

comment:1 Changed 16 years ago by davea

Resolution: wontfix
Status: newclosed

Does this cause any problems? If not I'm not going to fix it, but will check in the changes if someone else wants to.

comment:2 Changed 16 years ago by Nicolas

Resolution: wontfix
Status: closedreopened

Yes, with anyone using a real parser for client_state.xml (addons). It's pretty simple: what is getting saved is not XML (not valid XML at least).

I made a mockup for a debt changing GUI, using HTML, and a PHP script to read my current debts (no better sample data than real data). Immediately stopped working when I attached to PrimeGrid?, because &to is not a valid XML entity.

I detached PrimeGrid?, and it still didn't work, because there were accented characters in a result stderr (Windows gave a localized error message), in ISO-8859-1, but client_state.xml doesn't have a charset declaration, and XML specification says the default is UTF-8. That accented character made it invalid UTF-8.

That is not XML. That is a format that happens to be based in XML and look quite the same, but has no escaping, needs one tag per line, and has no idea how to handle Unicode. Will my client stop working if I edit client_state and remove all newlines? If yes, BOINC isn't really using XML, because newlines shouldn't matter.

comment:3 in reply to:  2 Changed 15 years ago by Mike O

Replying to Nicolas:

Yes, with anyone using a real parser for client_state.xml (addons). It's pretty simple: what is getting saved is not XML (not valid XML at least).

I made a mockup for a debt changing GUI, using HTML, and a PHP script to read my current debts (no better sample data than real data). Immediately stopped working when I attached to PrimeGrid?, because &to is not a valid XML entity.

This is causing problems with Boinc.NET as well as the XMLReader sees these ampersands as 'entity tags'. This causes an invalidation and an exception to be thrown. I have a temp work around. I replace the '&'(s) after the RPC BEFORE loading it into the XMLReader with '@'(s). After the LINQ parse, I replace the '@'(s) in the url with '&'(s). This works ok so far. As long as the ampersands remain ONLY in urls, this will always work in Boinc.NET --Mike

comment:4 Changed 15 years ago by Nicolas

Mike, do you mean this problem appears in GUI RPC replies as well?

comment:5 in reply to:  4 Changed 15 years ago by Mike O

Replying to Nicolas:

Mike, do you mean this problem appears in GUI RPC replies as well?

When I do an RPC <get_state>, some of the projects are using PHP variable passing in their links. 'page.php?val1=1&val2=2&val3=3' Thats the sorta thing im talking about. The XMLReader in .NET cant deal with this and thinks they are Entity tags as i mentioned above. I have so far NOT found any ampersands anywhere else. Im waiting for a tester to send me thier 'dump' from the RPC call so I can check to see if there are other PHP or HTML tags that are causing problems. Im thinking some projects are corupting the XML in other ways. I'll post what I find. --Mike

comment:6 Changed 15 years ago by davea

Projects specify GUI URLs in XML files on their server; I think the best thing is to demand that these be entity-escaped from the beginning. (it would be messy to repair them in the client, and even then the scheduler RPC reply would still be invalid).

What projects are returning unescaped GUI URLs?

comment:7 Changed 15 years ago by Mike O

I agree.. There are other ways to send variables to PHP scripts As for the projects.. the only one I know of 100% is Quake Catcher Network http://qcn.stanford.edu/qcnalpha/ The one link that is used to show the map has PHP variable passing in it. Im still trying to locate other projects.

comment:8 Changed 15 years ago by davea

I contacted QCN; should be fixed today. Let me know if there are other instances of this.

comment:9 Changed 15 years ago by Mike O

YOYO@home in the <description> element... <gui_url>

<name>news: 02 May 2009</name>

<description>-- Stats: OGR & Muon -- BOINC project yoyo@home: Main page News</description>

<url>http://www.rechenkraft.net/yoyo/all_news.php#115</url>

</gui_url> Im still waiting for that Email to show up.. He in Austria so it may be late today. BTW.. thanks for the RPC I requested

comment:10 in reply to:  6 ; Changed 15 years ago by Nicolas

Replying to davea:

Projects specify GUI URLs in XML files on their server; I think the best thing is to demand that these be entity-escaped from the beginning. (it would be messy to repair them in the client, and even then the scheduler RPC reply would still be invalid).

Project admin enters XML into a file. Server reads the file and sends it to the client. The client sends it to the GUI. The GUI parses it and shows it. And none of the steps ever complains about invalid XML? (in fact most don't even parse what they're passing along)

comment:11 in reply to:  10 Changed 15 years ago by Mike O

Replying to Nicolas:

Replying to davea:

Projects specify GUI URLs in XML files on their server; I think the best thing is to demand that these be entity-escaped from the beginning. (it would be messy to repair them in the client, and even then the scheduler RPC reply would still be invalid).

Project admin enters XML into a file. Server reads the file and sends it to the client. The client sends it to the GUI. The GUI parses it and shows it. And none of the steps ever complains about invalid XML? (in fact most don't even parse what they're passing along)

All I know is, I use LINQ to DataTable? querying... In order to do this, I must load it into a datatable using a XMLReader. This is where it chokes on the ampersands. I have tried many ways to parse the XML and this by far is the fastest there is.

Its not a big deal as I have found a simple work around. As long as the Ampersands dont show up any where that will ruin the schema of the XML, its all ok. Filtering out the '&'s from the entire XML before loading it in the the XMLReader has solved a major problem.. Not its just a matter of making sure they are converted back where they are needed. Unless this becomes more of an issue, I wouldn't worry to much about it.. I would however, make it clear to project admins the trouble this can cause. Maybe a blurp in the Docs somewhere?

Thanks Dave & Nicolas

comment:12 Changed 15 years ago by Nicolas

Resolution: fixed
Status: reopenedclosed

Original problem fixed in [18915].

Note: See TracTickets for help on using tickets.