"Tofu" (Trusted on First Use) Tool

kate_sills · January 3, 2019, 10:35pm

Bradley Farias @bmeck has built an awesome tool called tofu that outputs certain information about JavaScript files, including:

the require analysis of the source text.
the access analysis of free variables of the source text.
import analysis of the source text.

Bradley said:

I’ve started work on dumping source text authority usage and aggregating authority usage in https://github.com/bmeck/tofu .

For now I’ve been making separated tools to allow upgrades of authority aggregation/trust persistence to not be tied to analysis.

It seems like this is leading to the following set of tools:

Authority dumper

A tool that crawls JS source text and dumps a summary of relevant usage of authority. It is not intended to be an enforcement or permissions tool.

Note that various forms of Function are non-trivial to make deniable due to Function.prototype.constructor properties. It is likely this tool will have to maintain a caveat that those properties are censored if we are to check for the ability to obtain import().

This does not currently seem to need to be cross module in nature of authority tracking.

Authority permissions manager

This consumes the output of the dumper and generates a full manifest of authorities being used. This can/should be summarized for end users so they can quickly understand what authorities are in use.

Due to time it takes to crawl an entire production source tree (generally 10,000+ files, 5+ minutes on 8 core mid range machine); this has some interesting problems with cache behavior if multiple digest algorithms are used in the manifest. For now, limiting to a single algorithm in the manifest fixes the problem and keeps re-runs at around 1.5s. We could optimize more, but this seems sufficient for now.

How to share and compose manifests should be investigated. This would allow for much faster initial auditing. Doing this by using the integrity matching scheme of treating all shared integrities as having the same content seems sufficient, but I would be interested in how people think on this topic. This idea of integrity matching is already used to speed up re-runs under the term “integrity-folding”.

We need to setup a test fixture suite for these. I made some fixture work by hard coding stuff for the dumper, but having a larger test suite of canary apps would be ideal.

My work is being done with the assumption of being in the Node Core root realm and will merely be listing the caveats as it stands. This however, should be applicable to SES based environments without the caveats being in place.

In this vein, the caveats about import() and require() being obtained through evaluators seems something that might need to be put into Node core policy work and the ability to disable them on a per source text basis. This is unfortunate, but seems unavoidable for requirements of auditing purposes.

Attribution needs to be worked out for this tool since it relates to work from calls on Realms.

bmeck · January 4, 2019, 7:30pm

This tool is meant to be ahead of code evaluation, but the big conceptual bits of this are starting to become clearer. I’m still thinking on it, but it looks like there are 3 distinct datum that this tool should produce and consume.

Authority Usage Reports

Listing all potential resource authority usages. This is the full set of all free variables and dependencies a resource can obtain directly. This can be reused across audits.

Audit Whitelist Files

These are composable white lists of authority. This allows for authority usage prompts to be skipped if it is white listed. When running TOFU, if a permission is not white listed the audit should fail. UX should allow prompting during audits to give permissions as they are encountered for the first time. When permissions are removed or remain unchanged across audits there is no need for user intervention.

Environment Description

There is a lot of noise in feature detection using free variables. When generating authority usage, dead code elimination can be performed by specifying an environment description of free variables and/or contextual variables. This is a lossy operation since it is removing code and is tied to authority usage so it cannot be separated to run against an existing authority usage report.

Additionally, some forms of loading code such as require need to be specified as contextual but able to obtain dependencies. Currently import and require are both functions so simple treating as specifier requests should be sufficient.

Finally, some aliases exist such as window, root, and self to the global. These provide access to arbitrary free variables and need to be tracked for appropriate authority expansion from a single free variable, to all free variables.

If people can think of ways to reduce noise that require more data points, we should probably add them.

bmeck · January 6, 2019, 4:21am

Apparently this kind of tooling is being looked at in other places as well.

kate_sills · January 7, 2019, 7:12pm

Oh interesting that they’re using a SES-base runtime at work. Do you know where they work by any chance?