Containment via Service Worker

saleh · April 27, 2019, 2:28pm

As we make progress towards <ses-frame> and <jessie-frame> some questions and long-term considerations start cropping out that are worthy of discussion and feedback.

I want to try to use threaded comments (ie reply to specific comments) to avoid creating too many posts.

Guiding Principles

a. Avoid needless rewrite

Pre-parsing a resource consumed by the target is at least inefficient if not problematic.
Yet, when application-specific parsing/rewrites are sound, they need to be done right. (ie <jessie-frame />).
Rewrites, specifically for inter-resource linking, are (in my opinion) the only potentially sound form of rewrite for statically served content — doing it server-side (ie for dumb clients) is a practice that has far outgrow any historical necessity, soundness, validity, and cost — proper facilities for this client-side is the main canvas to work towards making progress in this problem space.

b. Avoid needless intercepts

Declarative CSP should be the only device needed to properly secure content.
Yet, Containment via Server/Worker can be mandated by application, for anything but being a man-in-the-middle tasked with guesstimation (at best) work of screening CSP-aversive fetches (which we only start by doing anyways, until we don’t need to).
Interceptions for requests that are meant to be possible by application (ie CORS) should both be safely intercepted and guarded against rouge interceptions — historically that was server/network/OS — the irony of our approach is that as it succeeds in intercepting requests for needless containment effect, it uncovers the dark side of Service Workers that may require more attention down the road.

c. Avoid needless obstruction

Work towards an Unobstructed Containment model, where intended restrictions are affected without unintended ones (obstructions).
Limiting the complexity of the containment mechanism is the best way to minimize unintended obstructions especially in cases where the mechanism could become stale, ineffective or unstable.
Clarity of intent for imposed restrictions makes it possible to limit the complexity of the containment mechanism where common intents can lead to either battle-tested (de-facto) implementations or Web specifications where warranted — at the very least, it makes it more likely that new API’s will be designed with more awareness of and a more justifiable burden of ensuring their proper alignment with opt-in restrictions of clear and conceivable intents.

See

saleh · April 22, 2019, 11:46am

Complex Requests

So far Phase 1 was limited to handling simple GET requests. Moving beyond this requires a little more clarity on what an abstract concept of a container really means when we are sandboxing-by-remapping requests against it.

Some considerations that call for this abstraction:

When a GET/non- request refers to a sub-domain.
When a non-GET request is made (and if that has a body).
When a request deals with cors, credentials, headers, integrity… etc.

All those scenarios are unclear without a well-rounded abstraction of what a container represents.

Edit: For context, you can see where those details come into play in this slightly refactored containers/service-worker.js

saleh · April 23, 2019, 9:34am

Targeting

A service worker cannot operate in absolute vacuum. At some point it will need some context from which it can determine how to target a given request.

Service Worker specs for clientId work after the first navigate request.
Service Worker specs for resultingClientId or reservedClientId are not yet clear.
Service Worker specs for state-related fixtures exclude localStorage but have indexDB
Service Worker specs lack a specific JavaScript contextId (can propose this to reflect the target thread — which would be independent from the browser context)

~~Should probably recommend renaming …Id to …ID which is the standard for casing in all other acronyms in other web specs (ie …URL)~~ scratch that — W3C TAG - Client-side API Design Principles.

saleh · April 22, 2019, 12:12pm

Re-basing

There are at least two strategies by which an HTML document can be “rebased” against a “container” (ie the rewritten URL).

Pro forma, by adding a <base> tag which will effectively force all relative resolutions except for modules to resolve accordingly. This approach will likely result in divergence from intent, at least in some edge-cases — there are no clear indications on how workers will be affected here.
Nestling, where by the container would reflect the actual “origin” of the original location and all absolute references to the original origin are rewritten with the container. This requires a lot of “accurate” and “efficient” parsing of at least HTML, CSS, and ES (maybe others). This approach will be effective even for workers.

Candidate recommendation here would allow declarative “rebasing” of requests as part of the Service Worker API to avoid any need for rewrites.

saleh · April 23, 2019, 1:09pm

Inlining

This is probably going to best be explained so:

<iframe src="…" worker-src="…" worker-type="module"></iframe>

<!-- better yet -->

<iframe src="…">
    <script type=worklet src="…"></script>
    <!--  
        ServiceWorklet.addModule(this.src, {
            frame: this.matches('iframe > :scope') || null && this.parentElement,
        }); 
    -->
</iframe>

<iframe src="…">
    <!--  
        ServiceWorklet.addModule(`vm:${vmModuleIndex}` …); 
    -->
    <script type="worklet">
       import {SandboxWorklet} from '//unpkg.com/ses-frame';

       export default class ServiceWorklet extends SandboxWorklet {
            /** @param {{frame: iframe}} [options]  */
            constructor(options) {
                super(... arguments);
            }
       } 
    </script>
</iframe>

markm · April 26, 2019, 11:55pm

Hi Saleh, I just wanted to add a note here explaining what the high level goals are for this containment work, and then a high level summary of the approach.

We want to be able to run third party web content in the browser, in a confined environment, but able to interact with the user. We want to present the third party code an emulation of the browser environment that is good enough for much legacy code. Web content is expressed in four “languages”: JavaScript, HTML, CSS, and the browser API. Google Caja and Salesforce Locker solve this problem. For JavaScript we all use SES. From these experiences, we know how to tame the browsers APIs other than the DOM API with confidence. Due to my own ignorance, I ignore CSS completely in this note.

The browser DOM API however has proved to difficult to do by object intermediation, because the semantics of the DOM is just too bizarre, and the standards are changing too fast to keep up with. Instead, we want to use mechanisms already in the browser (CSP, service workers, web components, the builtin HTML parser) to make it safe to give untrusted third party JavaScript code direct access to DOM objects within its iframe. We need techniques that will likely stay safe as browser standards change.

The same-origin iframe mechanism already confines that web content graphically: it can only render to the user within the rectangle granted to the iframe, and it can only receive user interface events directed to that rectangle.

This leaves us with the following hard problems:

There are many ways to provoke the DOM to issue network requests, violating confinement.
HTML is a nightmare to parse accurately. We would like to avoid adding a user level parser that would rapidly become stale anyway.
There are many ways to provoke the DOM to evaluate JavaScript code directly, bypassing the confinement mechanisms provided by SES.

Saleh’s approach here, at a high level, is to use

Service Workers to intercept all such network accesses, so that they can be remapped and attenuated according to the policy of the confining code.
Use the templates of web components to translate HTML into DOM nodes that are born inert, in particular, without evaluating any contained JavaScript code.
Emulate the appearance that these inert DOM nodes are genuine active DOM nodes. This includes scraping them for all evaluable JavaScript source text and arranging to evaluate them under SES confinement instead.

Saleh, does the above description sound right to you?

saleh · April 27, 2019, 2:26pm

Absolutely, but while the high-level goals are mutual and clear, the high level approach here continues to evolve organically. But even as I considered and started to experiment with the idea of <template> (or simpler yet just DOMParser), I remain intentionally undecided on the how to best affect containment over the DOM with lessons learned from prior and far more comprehensive work. So, I’m think that the guiding principles behind my long game can help bring more clarity not yet visible in our early efforts.

See

saleh · July 9, 2019, 8:22pm

FYI: VSCode had since charted full-steam towards iframe based webviews for the web (ie sans Electron’s WebView or browser extensions) https://github.com/microsoft/vscode/pull/75546

danfinlay · April 25, 2020, 9:07pm

Hey Saleh,

Was just revisiting this issue and approach. Am I right in understanding that the ServiceWorker is able to intercept/remap all network requests, even ones generated from css or js nested in object properties, and the challenge was in the remapping these requests and enforcing a fresh policy upon?

If that were the case, I would think even disabling network access for an iFrame would be a significant achievement, and network accesses could be provided via capabilities passed in from the parent frame.

danfinlay · April 28, 2020, 1:08am

I did some experimenting along these lines here. I seem to be doing something that Chrome doesn’t like, because it throws an error when the child iframe tries to register a service-worker, as service workers can only be registered from documents loaded over approved protocols (localhost, https, and apparently not an iframe defined with its srcdoc property. Any leads on strategies of registering a service-worker on a dynamically created iframe would be appreciated. Using the iframe.content.write() method also threw errors in Chrome, as it’s a deprecated method.

saleh · April 28, 2020, 8:53pm

So I’ve been trying to catch up on the specifics.

I did refactor my old code which you can see live here:

And the code lives here now:

https://github.com/SMotaal/experimental/tree/master/sandbox

I will try at this, be honestly its been completely off my radar for a year and I am not sure if I will get far enough down the rabbit hole to give you something concrete in a reasonable time.

Hopefully this helps a little!

danfinlay · April 28, 2020, 9:09pm

Absolutely, having your latest work will be very helpful, thanks!

saleh · April 28, 2020, 9:54pm

I realized I changed the repo to private for some reason not on my mind now, so sent you an invite as a collaborator