From 6e3eef591ab5154de22401edfc65dd69b3da3640 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nicol=C3=B2=20Ribaudo?= Date: Wed, 12 Jun 2024 12:02:50 +0300 Subject: [PATCH 1/5] This patch explicitly defines how to extract such comments from JavaScript, CSS and WebAssembly sources. It defines multiple ways to do so: either by actually parsing the code, or by just going through all the lines of the program looking for what "looks like" a comment. This is so that different implementations can choose what's best for them, depending on whether they are already parsing the code or not. To ensure consist behavior accross implementations that choose different strategies, the specification enforces additional requirements on tools that append a `sourceMappingURL` comment to the generated code: the comment must be placed in such a way that all extraction methods yield the same result. This is not an unresonable burden, since if the progeram is syntactically valid, simply adding the comment at the end of the file only potentially followed by other tool-injected comments is enough. This requirement is lifted if the input code given to the tool is already "maliciously crafted", since we would otherwise require tool to go rewrite that code (for example, splitting strings that contain something that looks like a comment). I have left the CSS extraction method as TODO because first I want to check how do you feel about the JS one. It has the following properties: - It iterates line by line. Implementations can thus optimize it by going through each line _in reverse order_, and then scanning through its characters from the beginning to the end (which is what a regexp would do). - It expects multi-line comments to actually be in a single line. - It returns the last `sourceMappingURL` comment (or well, comment-like) found in the source. - It only considers comments after the last piece of code (i.e. it discards any comment found so far every time it sees some non-comment non-whitespace characters). - It has no requirements about what is _before_ a comment. Adding the comment at the end of the file without first ensuring that there is a newline before it is valid. --- source-map.bs | 244 ++++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 206 insertions(+), 38 deletions(-) diff --git a/source-map.bs b/source-map.bs index 975a67c..b26a2ca 100644 --- a/source-map.bs +++ b/source-map.bs @@ -24,12 +24,29 @@ spec:html; type:element; text:title text:link -spec:bikeshed-1; type:dfn; for:railroad; text:optional - -spec:fetch; type:dfn; for:/; text:request -spec:fetch; type:dfn; for:/; text:response +spec:fetch; type:dfn; for:/; + text:request + text:response spec:url; type:dfn; for:/; text:url + +spec:infra; type:dfn; + text:list + for:list; text:for each + +
+urlPrefix:https://tc39.es/ecma262/#; type:dfn; spec:ecmascript
+    url:sec-lexical-and-regexp-grammars; text:tokens
+    url:table-line-terminator-code-points; text:line terminator code points
+    url:sec-white-space; text: white space code points
+    url:prod-SingleLineComment; text:single-line comment
+    url:prod-MultiLineComment; text:multi-line comment
+    url:prod-MultiLineComment; text:multi-line comment
+    url:sec-regexpbuiltinexec; text:RegExpBuiltinExec
+
+urlPrefix:https://webassembly.github.io/spec/core/; type:dfn; spec:wasm
+    url:binary/modules.html#binary-customsec; text:custom section
+    url:appendix/embedding.html#embed-module-decode; text:module_decode
 
@@ -59,17 +76,18 @@ spec:url; type:dfn; for:/; text:url
     "status": "archive",
     "title": "Give your eval a name with //@ sourceURL"
   },
+  "ECMA-262": {
+    "href": "https://tc39.es/ecma262/",
+    "id": "esma262",
+    "publisher": "ECMA",
+    "status": "Standards Track",
+    "title": "ECMAScript® Language Specification"
+  },
   "V2Format": {
     "href": "https://docs.google.com/document/d/1xi12LrcqjqIHTtZzrzZKmQ3lbTv9mKrN076UB-j3UZQ/edit?hl=en_US",
     "publisher": "Google",
     "title": "Source Map Revision 2 Proposal"
   },
-  "WasmCustomSection": {
-    "href": "https://www.w3.org/TR/wasm-core-2/#custom-section",
-    "publisher": "W3C",
-    "status": "Living Standard",
-    "title": "WebAssembly custom section"
-  },
   "WasmNamesBinaryFormat": {
     "href": "https://www.w3.org/TR/wasm-core-2/#names%E2%91%A2",
     "publisher": "W3C",
@@ -339,38 +357,12 @@ to have some conventions for the expected use-case of web server-hosted JavaScri
 There are two suggested ways to link source maps to the output.  The first requires server
 support in order to add an HTTP header and the second requires an annotation in the source.
 
-The HTTP header should supply the source map URL reference as:
- 
-```
-sourcemap: 
-```
-
-Note: Previous revisions of this document recommended a header name of `x-sourcemap`.  This
-is now deprecated; `sourcemap` is now expected.
-
-The generated code should include a line at the end of the source, with the following form:
-
-```
-//# sourceMappingURL=
-```
-
-Note: The prefix for this annotation was initially `//@` however this conflicts with Internet
-Explorer's Conditional Compilation and was changed to `//#`. Source map generators must only emit `//#` 
-while source map consumers must accept both `//@` and `//#`.
-
-Note: `//@` is needed for compatibility with some existing legacy source maps.
-
-
-This recommendation works well for JavaScript, but it is expected that other source files will
-have different conventions.  For instance, for CSS `/*# sourceMappingURL= */` is proposed.
-On the WebAssembly side, such a URL is encoded using [[WasmNamesBinaryFormat]], and it's placed as the content of the custom section ([[WasmCustomSection]]) named `sourceMappingURL`.
-
-`` is a URL as defined in [[URL]]; in particular,
+Source maps are linked through URLs as defined in [[URL]]; in particular,
 characters outside the set permitted to appear in URIs must be percent-encoded
 and it may be a data URI.  Using a data URI along with [=sourcesContent=] allows
 for a completely self-contained source map.
 
-The HTTP `SourceMap` header has precedence over a source annotation, and if both are present,
+The HTTP `sourcemap` header has precedence over a source annotation, and if both are present,
 the header URL should be used to resolve the source map file.
 
 Regardless of the method used to retrieve the [=Source Mapping URL=] the same
@@ -394,6 +386,182 @@ When the [=Source Mapping URL=] is not absolute, then it is relative to the gene
 - If the generated code is being evaluated as a string with the `eval()` function or
     via `new Function()`, then the [=source origin=] will be the page's origin.
 
+### Linking through HTTP headers
+
+If a file is served through HTTP(S) with a `sourcemap` header, the value of the header is
+the URL of the linked source map.
+
+```
+sourcemap: 
+```
+
+Note: Previous revisions of this document recommended a header name of `x-sourcemap`.  This
+is now deprecated; `sourcemap` is now expected.
+
+### Linking through inline annotations
+
+The generated code should include a comment, or the equivalent construct depending on its
+language or format, named `sourceMappingURL` and that contains the URL of the source map. This
+specification defines how the comment should look like for JavaScript, CSS, and WebAssembly.
+Other languages should follow a similar convention.
+
+For a given language there can be multiple ways of detecting the `sourceMappingURL` comment,
+to allow for different implementations to choose what is less complex for them. The generated
+code unambiguously links to a source map if the result of all the extraction methods
+is the same.
+
+If a tool consumes one or more source files that [=unambiguously links to a source map=] and it
+produces an output file that links to a source map, it must do so [=unambiguously links to a
+source map|unambiguously=].
+
+
+The following JavaScript code links to a source map, but it does not do so [=unambiguously links +to a source map|unambiguously=]: + +```js +let a = ` +//# sourceMappingURL=foo.js.map +//`; +``` + +Extracing a Source Map URL from it [=extract a Source Map URL from JavaScript through +parsing|through parsing=] gives null, while [=extract a Source Map URL from JavaScript +without parsing|without parsing=] gives `foo.js.map`. + +
+ +#### Extraction methods for JavaScript sources + +To extract a Source Map URL from JavaScript through parsing a [=string=] |source|, +run the following steps: + +1. Let |tokens| be the [=list=] of [=tokens=] + obtained by parsing |source| according to [[ECMA-262]]. +1. [=For each=] |token| in |tokens|, in reverse order: + 1. If |token| is not a [=single-line comment=] or a [=multi-line comment=], return null. + 1. Let |comment| be the content of |token|. + 1. If [=match a Source Map URL in a comment|matching a Source Map URL in=] + |comment| returns a [=string=], return it. + +To extract a Source Map URL from JavaScript without parsing a [=string=] |source|, +run the following steps: + +1. Let |lines| be the result of [=strictly split|strictly splitting=] |source| on [=line + terminator code points|ECMAScript line terminator code points=]. +1. Let |lastURL| be null. +1. [=For each=] |line| in |lines|: + 1. Let |position| be a [=position variable=] for |line|, initially pointing at the start of |line|. + 1. [=While=] |position| doesn't point past the end of |line|: + 1. [=Collect a sequence of code points=] that are [=white space code points|ECMAScript + white space code points=] from |line| given |position|. + + NOTE: The collected code points are not used, but |position| is still updated. + 1. If |position| points past the end of |line|, [=break=]. + 1. Let |first| be the [=code point=] of |line| at |position|. + 1. Increment |position| by 1. + 1. If |first| is U+002F (/) and |position| does not point past the end of |line|, then: + 1. Let |second| be the [=code point=] of |line| at |position|. + 1. Increment |position| by 1. + 1. If |second| is U+002F (/), then: + 1. Let |comment| be the [=code point substring=] from |position| to the end of |line|. + 1. If [=match a Source Map URL in a comment|matching a Source Map URL in=] + |comment| returns a [=string=], set |lastURL| to it. + 1. [=Break=]. + 1. Else if |second| is U+002A (*), then: + 1. Let |comment| be the empty [=string=]. + 1. While |position| + 1 doesn't point past the end of |line|: + 1. Let |c1| be the [=code point=] of |line| at |position|. + 1. Increment |position| by 1. + 1. Let |c2| be the [=code point=] of |line| at |position|. + 1. If |c1| is U+002A (*) and |c2| is U+002F (/), then: + 1. If [=match a Source Map URL in a comment|matching a Source Map URL in=] + |comment| returns a [=string=], set |lastURL| to it. + 1. Increment |position| by 1. + 1. Append |c1| to |comment|. + 1. Else, set |lastURL| to null. + 1. Else, set |lastURL| to null. + + Note: We reset |lastURL| to null whenever we find a non-comment code character. +1. Return |lastURL|. + +NOTE: The algorithm above has been designed so that the source lines can be iterated in reverse order, +returning early after scanning through a line that contains a `sourceMappingURL` comment. + +
+Note: The algorithm above is equivalent to the following JavaScript implementation: + +```js +const JS_NEWLINE = /^/m; + +// This RegExp will always match one of the following: +// - single-line comments +// - "single-line" multi-line comments +// - unclosed multi-line comments +// - just trailing whitespaces +// - a code character +// The loop below differentiates between all these cases. +const JS_COMMENT = + /\s*(?:\/\/(?.*)|\/\*(?.*?)\*\/|\/\*.*|$|(?[^\/]+))/uym; + +const PATTERN = /^[@#]\s*sourceMappingURL=(\S*?)\s*$/; + +let lastURL = null; +for (const line of source.split(JS_NEWLINE)) { + JS_COMMENT.lastIndex = 0; + while (JS_COMMENT.lastIndex < line.length) { + let commentMatch = JS_COMMENT.exec(line).groups; + let comment = commentMatch.single ?? commentMatch.multi; + if (comment != null) { + let match = PATTERN.exec(comment); + if (match !== null) lastURL = match[1]; + } else if (commentMatch.code != null) { + lastURL = null; + } else { + // We found either trailing whitespaces or an unclosed comment. + // Assert: JS_COMMENT.lastIndex === line.length + } + } +} +return lastURL; +``` + +
+ +To match a Source Map URL in a comment |comment| (a [=string=]), run the following steps: + +1. Let |pattern| be the regular expression `/^[@#]\s*sourceMappingURL=(\S*?)\s*$/`. +1. Let |match| be ! [=RegExpBuiltInExec=](|pattern|, |comment|). +1. If |match| is not null, return |match|[1]. +1. Return null. + + +Note: The prefix for this annotation was initially `//@` however this conflicts with Internet +Explorer's Conditional Compilation and was changed to `//#`. + +Source map generators must only emit `//#` while source map consumers must accept both `//@` and `//#`. + +#### Extraction methods for CSS sources + +Extracting source mapping URLs from CSS is similar to JavaScript, with the exception that CSS only +supports `/* ... */`-style comments. + +#### Extraction methods for WebAssembly binaries + +To extract a Source Map URL from a WebAssembly source given +a [=byte sequence=] |bytes|, run the following steps: + +1. Let |module| be [=module_decode=](|bytes|). +1. If |module| is error, return null. +1. [=For each=] [=custom section=] |customSection| of |module|, + 1. Let |name| be the `name` of |customSection|, [=UTF-8 decode without BOM or fail|decoded as UTF-8=]. + 1. If |name| is "sourceMappingURL", then: + 1. Let |value| be the `bytes` of |customSection|, [=UTF-8 decode without BOM or fail|decoded as UTF-8=]. + 1. If |value| is failure, return null. + 1. Return |value|. + +Since WebAssembly is not a textual format and it does not support comments, it supports a single unambiguous extraction method. +The URL is encoded using [[WasmNamesBinaryFormat]], and it's placed as the content of the [=custom section=]. + Linking eval'd code to named generated code ------------------------------------------- From dadf256bdcd079f72fe2b0ef078a80ce7d53ea21 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nicol=C3=B2=20Ribaudo?= Date: Tue, 25 Jun 2024 16:43:23 +0200 Subject: [PATCH 2/5] Add fallback null --- source-map.bs | 1 + 1 file changed, 1 insertion(+) diff --git a/source-map.bs b/source-map.bs index b26a2ca..789f03f 100644 --- a/source-map.bs +++ b/source-map.bs @@ -442,6 +442,7 @@ run the following steps: 1. Let |comment| be the content of |token|. 1. If [=match a Source Map URL in a comment|matching a Source Map URL in=] |comment| returns a [=string=], return it. +1. Return null. To extract a Source Map URL from JavaScript without parsing a [=string=] |source|, run the following steps: From faaf960ce9b0b18843129c649ea5aa1aa6e117de Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nicol=C3=B2=20Ribaudo?= Date: Tue, 25 Jun 2024 16:44:46 +0200 Subject: [PATCH 3/5] Also return null for wasm --- source-map.bs | 1 + 1 file changed, 1 insertion(+) diff --git a/source-map.bs b/source-map.bs index 789f03f..493a243 100644 --- a/source-map.bs +++ b/source-map.bs @@ -559,6 +559,7 @@ a [=byte sequence=] |bytes|, run the following steps: 1. Let |value| be the `bytes` of |customSection|, [=UTF-8 decode without BOM or fail|decoded as UTF-8=]. 1. If |value| is failure, return null. 1. Return |value|. +1. Return null. Since WebAssembly is not a textual format and it does not support comments, it supports a single unambiguous extraction method. The URL is encoded using [[WasmNamesBinaryFormat]], and it's placed as the content of the [=custom section=]. From f14a7f89064f0c1e5f9afb81a7d08da4de46b4d4 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nicol=C3=B2=20Ribaudo?= Date: Tue, 25 Jun 2024 16:50:25 +0200 Subject: [PATCH 4/5] Disallow multiple custom sections for sourceMappingURL --- source-map.bs | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/source-map.bs b/source-map.bs index 493a243..6de0c95 100644 --- a/source-map.bs +++ b/source-map.bs @@ -562,7 +562,8 @@ a [=byte sequence=] |bytes|, run the following steps: 1. Return null. Since WebAssembly is not a textual format and it does not support comments, it supports a single unambiguous extraction method. -The URL is encoded using [[WasmNamesBinaryFormat]], and it's placed as the content of the [=custom section=]. +The URL is encoded using [[WasmNamesBinaryFormat]], and it's placed as the content of the [=custom section=]. It is invalid for +tools that generate WebAssembly code to generate two [=custom section=] with the "sourceMappingURL" name. Linking eval'd code to named generated code ------------------------------------------- From bffb47d5eccfd6b6934746e879e2162ab1d7ac9d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Nicol=C3=B2=20Ribaudo?= Date: Tue, 25 Jun 2024 16:52:34 +0200 Subject: [PATCH 5/5] Improve wording --- source-map.bs | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source-map.bs b/source-map.bs index 6de0c95..93bc6ac 100644 --- a/source-map.bs +++ b/source-map.bs @@ -563,7 +563,7 @@ a [=byte sequence=] |bytes|, run the following steps: Since WebAssembly is not a textual format and it does not support comments, it supports a single unambiguous extraction method. The URL is encoded using [[WasmNamesBinaryFormat]], and it's placed as the content of the [=custom section=]. It is invalid for -tools that generate WebAssembly code to generate two [=custom section=] with the "sourceMappingURL" name. +tools that generate WebAssembly code to generate two or more [=custom section|custom sections=] with the "sourceMappingURL" name. Linking eval'd code to named generated code -------------------------------------------