Skip to content
This repository has been archived by the owner on Apr 6, 2021. It is now read-only.

http_server uploaded file handling #11

Open
DartBot opened this issue Jun 5, 2015 · 9 comments
Open

http_server uploaded file handling #11

DartBot opened this issue Jun 5, 2015 · 9 comments
Assignees

Comments

@DartBot
Copy link

DartBot commented Jun 5, 2015

Originally opened as dart-lang/sdk#14303

This issue was originally filed by [email protected]


Current http_body implementation decodes / parses HttpBodyFileUpload.content based upon the Content-Type header of the part.

However, file server applications need raw uploaded file data. Such applications save received files into their file system with no modification. I think we need a switch to disable decoding / parsing for all content types and just return type List<int> object as HttpBodyFileUpload.content.

Another solution would be to return simply List<int> for all uploaded files. I am not sure how much the impact of this on other kinds of applications. However, when a .json file was uploaded from HTML form, IE sends it as “Content-Type: text/plain” and Chrome, Firefox and Safari send it as “Content-Type: application/octet-stream” (I don’t know why). In any case, .json files uploaded through HTML form will never be parsed.

Regarding to the filename with multibyte characters, although Windows uses UTF-8, I think it might be safe to keep LATIN1 decoding. I am not familiar with other file systems. We can retrieve it using UTF8.decode(LATIN1.encode(part.filename)) with Windows, assuming that the LATIN1.decode(bytes) simply generates a String that has the same byte character to corresponding byte of bytes.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/2909286?v=3" align="left" width="48" height="48"hspace="10"> Comment by madsager


cc @Skabet.
Added Area-IO, Triaged labels.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/22043?v=3" align="left" width="48" height="48"hspace="10"> Comment by skabet


Set owner to @Skabet.
Removed Area-IO label.
Added Area-Pkg, Library-HttpServer, Accepted labels.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/22043?v=3" align="left" width="48" height="48"hspace="10"> Comment by skabet


Hi Terry,

You are right, in most cases with file uploads, it's the raw binary one wants to access. What if we do the following:

  1. Always provide the raw List<int> data.
  2. Add a method to the FileUpload class: 'parsedData()', that will try and parse/decode the data depending on the mime type. We can even throw in a optional 'mineType' argument for it, so one can override the default mime type, e.g. parse as 'text/utf-8' instead of 'application/json'.

Regarding the filename, I think we should do a test and see what the different browsers upload. if we can hit a 90% success rate with some default encoding, that could be the way to go.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/22043?v=3" align="left" width="48" height="48"hspace="10"> Comment by skabet


I just tried with both chrome and Windows, and I get the following:

With &lt;meta charset="UTF-8" />:
 - Chrome: as utf8
 - IE: as utf8

Without &lt;meta charset="UTF-8" />:
 - Chrome: multi-bytes replaced with ?
 - IE: as utf8

I think it's fine to use utf8-decoding for filenames.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

This comment was originally written by [email protected]


I confirmed it on my Windows Vista using following HTML text:

001 <!DOCTYPE html>
002 <html>
003 <head>
004 <title>file_upload_test</title>
005 <meta http-equiv="content-type" content="text/html; charset=UTF-8">
006 </head>
007 <body>
008 <form action="http://localhost:8080/DumpHttpMultipart"
009 enctype="multipart/form-data"
010 accept-charset="UTF-8"
011 method="POST"> <br>
012 What is your name? <input type="text" name="submitter"> <br>
013 What files are you sending? <input type="file" name="content"> <br>
014 <input type="submit" value="Send File">
015 </form>
016 </body>
017 </html>

If line 005 or 010 exists, Chrome, Firefox and Safari send filenames with multi-byte characters as UTF-8. Otherwise, such filenames are transmitted as Shit_JIS characters (one of most popular Japanese character encodings). Regardless of existence of line 005 or 010, IE sends them as UTF-8.

I agree to use UTF-8 decoding (current implementation uses ISO-8859-1 decoding) for filenames. It’s common to add line 005 for such applications.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/22043?v=3" align="left" width="48" height="48"hspace="10"> Comment by skabet


Hi

What do you think about the following API?

/**
 * A HTTP content body produced by [HttpBodyHandler] for either [HttpRequest]
 * or [HttpClientResponse].
 /
abstract class HttpBody {
  /**
   
The actual data of the request.
   */
  List<int> get data;

  /**
   * Convert the data using mimeType.
   *
   * If mimeType is left unspecified, the Content-Type header will be used.
   */
  dynamic asMimeType({String mimeType});

  /**
   * Parse the [data] as text.
   *
   * If the headers contains a charset hint, that charset will be used.
   */
  String asText();

  /**
   * Parse the [data] as JSON.
   */
  dynamic asJSON();

  /**
   * Parse the data as either multipart/form-data or
   * application/x-www-form-urlencoded.
   *
   * The Content-Type header will be used to identify the parsing.
   */
  Map asFormPost();
}

/**
 * The [HttpBody] of a [HttpClientResponse] will be of type
 * [HttpClientResponseBody].
 */
abstract class HttpClientResponseBody implements HttpBody, HttpClientResponse {
}

/**
 * The [HttpBody] of a [HttpRequest] will be of type [HttpRequestBody].
 */
abstract class HttpRequestBody implements HttpBody, HttpRequest {
}

/**
 * A [HttpBodyFileUpload] object wraps a file upload, presenting a way for
 * extracting filename, contentType and the data of the uploaded file.
 /
abstract class HttpBodyFileUpload {
  /**
   
The filename of the uploaded file.
   */
  String get filename;

  /**
   * The [ContentType] of the uploaded file.
   */
  ContentType get contentType;

  /**
   * The content of the file.
   */
  List<int> get content;
}


cc @sethladd.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/5479?v=3" align="left" width="48" height="48"hspace="10"> Comment by sethladd


Thanks! I like how HttpRequestBody implements HttpRequest now. Also, I like how I can control how I get the body (json, text, etc) because sometimes a content-type is not set on the request.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

This comment was originally written by [email protected]


I think this will give us more flexible POST body data handling.

@DartBot
Copy link
Author

DartBot commented Jun 5, 2015

<img src="https://avatars.githubusercontent.com/u/3276024?v=3" align="left" width="48" height="48"hspace="10"> Comment by anders-sandholm


Removed Library-HttpServer label.
Added Pkg-HttpServer label.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Development

No branches or pull requests

2 participants