WASM+SQLite for Web Products

This is an exploration of using WASM to help with web product engineering. This document focuses on some of the initial questions which tell us if we have any hope of getting this idea off the ground at all. A lot more investigation is needed, some of the followups are mentioned below.

Issue #1 : What kind of penetration does WASM have anyway?

Actually the answer here is very good.

By vendor/version:

By usage:

Source: https://caniuse.com/wasm

The data shows that the only browsers with any significant share are IE (which is basically limited to corporate scenarios for backwards compatibility at this point), Opera Mini which is out of scope and the UC Browser which is also out of scope. Some additional details on other browsers with no support:

These are very old indeed. It’s fair to say that any browser that has updated in the last 3 years supports WASM.

Issue #2 : Does our (FB) code work in this world at all?

Here we limit the exploration to just CQL code rather than the full corpus.

CQL itself doesn’t have to be ported, only the runtime. The CQL runtime is designed to be replaceable and there is a very simple and highly portable version that is included in the OSS drop.

For this test, I installed the emscripten tool chain following the download instructions (https://emscripten.org/docs/getting_started/downloads.html). The main issues encountered were about prerequisites, there were many:

  • brew install cmake

I ended up not needing pyenv

I followed the instructions for building hello world and soon had that running. The easiest way to run the WASM is inside a local node e.g.

% cat hello.c#include <stdio.h>int main(int argc, char **argv)
printf(“Hello, world\n”);
% emcc hello.c
% node a.out
Hello, world

Inspired by this early success I decided to go for it. CG/SQL includes two simple-ish programs

  • The runtime tests (this verifies all the runtime features plus arithmetic, string processing and so forth)

Since we keep the amalgam for sqlite installed in the repo, I thought it would be easier to use that rather than an existing drop of sqlite that is pre-compiled. That removes any dynamic linking complexities which we’ll discuss below. It also means any fixes required to make SQLite work out of the box are not available but I thought it was worth a shot so I pressed on.

This actually went remarkably well. The demo program is simpler so I did that first. These are the unabridged results:

# normal CQL command to generate the c file% cql --in demo.sql --cg demo.h demo.c --generate_copy# this is usually just “cc” but otherwise the same
# using the standard amalgam and the standard cqlrt.c unchanged
% emcc -I .../sqlite-3280000 .../sqlite-3280000/sqlite3-all.c demo.c cqlrt.c demo_client.c -o demo% node demoCQL data access demo: creating and reading from a table
result_set: row 0) 1 0 1 0 12 a name
result_set: row 1) 2 0 1 0 14 some name
result_set: row 2) 3 0 1 0 15 yet another name
result_set: row 3) 4 0 1 0 19 some name
result_set: row 4) 5 0 1 0 21 what name
result_set_copy: row 0) 2 0 1 0 14 some name
result_set_copy: row 1) 3 0 1 0 15 yet another name
result_set_copy: row 2) 4 0 1 0 19 some name
result_set_updated: row 0) 1 0 1 0 12 a name
result_set_updated: row 1) 2 0 1 0 14 some name
result_set_updated: row 2) 3 0 1 0 15 yet another name
result_set_updated: row 3) 4 0 1 0 19 some name
result_set_updated: row 4) 5 0 1 0 21 what name
.......+####.... +
.......+####.... +

That was very encouraging. No runtime issues, the node server produced the code via wasm on the first go with the standard SQLite amalgam unmodified.

The runtime test was similarly successful. Though I initially had one test failure it was because I forgot to enable mocking in the compilation step. That result also unabridged:

# preprocessing required because of macros
% cc -E -I test -x c -w run_test.sql >run_test_pre.sql
# normal CQL compilation
% cql — in run_test_pre.sql — cg run_test.h run_test.c — generate_copy — global_proc cql_startup
# same command line as “cc” note -DCQL_RUN_TEST enables mocks
% emcc -DCQL_RUN_TEST -I . -I .../sqlite-3280000 .../sqlite-3280000/sqlite3-all.c run_test.c cqlrt.c run_test_client.c test/cqltest.c result_set_extension.c -o test
% node testRunning C client test
Running cql_row_same client test
Running blob rowset test
Running ref comparison test
Running sparse blob rowset test
Running C client test with huge number of rows
Running cql finalize on error test
Running C one row result set test
Running column fetchers test
Running column encoded fetchers test
Running column encoded cursor fetchers test
Running column encoded out union fetchers test
Running column encoded multi out union fetchers test
Running column encoded turn off fetchers test
Running column some encoded field fetchers test
Running error case rowset test
Running autodrop rowset test (pass 1)
Running autodrop rowset test (pass 2)
Running autodrop rowset test (pass 3)
Running cql_bytebuf_open test
Running cql_bytebuf_alloc_within_bytebuf_exp_growth_cap test
Running cql_bytebuf_alloc_over_bytebuf_exp_growth_cap test
125 tests executed. 125 passed, 0 failed. 0 expectations failed of 2122.
program exited (with status: 0), but EXIT_RUNTIME is not set, so halting execution but not exiting the runtime or preventing further async execution (build with EXIT_RUNTIME=1, if you want a true shutdown)

Likewise in the browser (hosted by apache)

Not bad for right out the door. So at this point we have some hope:

  • We can build a stock sqlite

Issue #3 : How big is it?

The CQL tests themselves are quite small so this binary is a good proxy for the minimum size of SQLite with no special effort. It’s my understanding that you can get this smaller if you set the right compile options and other things, and it seems our work here is timely. There are a lot of independent efforts on the problems we’ve been thinking about such as this one : https://github.com/kbumsik/sqlite-wasm. Bumsik Kim has apparently been able to get the compressed size down to around 200k.

But it’s still interesting to see where we start so here’s the data I collected.

% emcc -O2 ...
-rw-r--r-- 1 rmariani staff 93711 May 4 11:04 x.html
-rw-r--r-- 1 rmariani staff 87587 May 4 11:04 x.js
-rwxr-xr-x 1 rmariani staff 1076577 May 4 11:04 x.wasm
% emcc -O3 -Oz ...
-rw-r--r-- 1 rmariani staff 93711 May 4 11:05 x.html
-rw-r--r-- 1 rmariani staff 80277 May 4 11:05 x.js
-rwxr-xr-x 1 rmariani staff 694965 May 4 11:05 x.wasm
% emcc -O3 -Os ...
-rw-r--r-- 1 rmariani staff 93711 May 4 11:06 x.html
-rw-r--r-- 1 rmariani staff 80387 May 4 11:06 x.js
-rwxr-xr-x 1 rmariani staff 702200 May 4 11:06 x.wasm
% emcc -O3
-rw-r--r-- 1 rmariani staff 93711 May 4 11:07 x.html
-rw-r--r-- 1 rmariani staff 80387 May 4 11:07 x.js
-rwxr-xr-x 1 rmariani staff 1141683 May 4 11:07 x.wasm
# back to -Oz for a gzip test...% emcc -O3 -Oz ...
-rw-r--r-- 1 rmariani staff 93711 May 4 12:21 x.html
-rw-r--r-- 1 rmariani staff 80277 May 4 12:21 x.js
-rwxr-xr-x 1 rmariani staff 694965 May 4 12:21 x.wasm
% gzip x.*
% ls -l x.*
-rw-r--r-- 1 rmariani staff 36959 May 4 12:21 x.html.gz
-rw-r--r-- 1 rmariani staff 20826 May 4 12:21 x.js.gz
-rwxr-xr-x 1 rmariani staff 292024 May 4 12:21 x.wasm.gz

So this is pretty hopeful, with no effort at all the WASM is about 300k compressed and compression is little better than 2x. This isn’t a non-starter by any means.

Issue #4: Can we break these into pieces so we don’t have a giant download?

I spent a good bit of time looking into options for dynamic loading of wasm. It turns out that there are basically three different ways you can do this:

There are three options available for dynamic linking when using Emscripten:

  1. You can instruct Emscripten that there are WebAssembly modules to link to by specifying them in the dynamicLibraries array of Emscripten’s generated JavaScript file. It will automatically download and link modules that are specified in this array.


The first option looks the simplest but is actually the least useful overall. This is like the “static references to dylibs” kind of approach that you usually get on Windows or iOS. It’s a known set of libraries that link to each other in a known way. This is great for sharing code in an operating system but doesn’t give you delayed load which is what we need.

The second and third options are actually very similar, it’s just a question of whether or not the linkage goes via JS or not. In both cases your loaded module will get a set of imported pointers. I think a hybrid approach will work best for us if we are talking about loading CQL queries dynamically.

The CQL code can bind to a lightweight version of cqlrt that simply delegates to proxy functions. The proxy functions are provided in bulk but an initialization call when the module is loaded. Those pointers are in turn provided by an exporter function in the main .wasm file. So by linking one function in JS import to export we can export as many as we like. This is important from a maintenance perspective and it means we can easily add and remove runtime helpers without having to rebuild everything.

Issue #5: Notifications

I’ll only touch on this briefly but given the above it seems clear that we could add “native” notification hooks for writes just like we do in normal code and then use those to create table-changed notifications which could drive “query needs refreshing” in the usual sort of way. This could be invaluable to the UI.

Issue #6 : Can we use any of the existing persistence solutions?

Q: Can we use the existing emscriptem virtual file system for SQLite?

A: No…

A quick look into this showed that while we could open and close databases with content preserved within a session, there was no persistence between sessions. A little bit more digging easily yields the answer. The built-in file system is called MEMFS. It’s a fake file system built in memory.

Source: https://emscripten.org/docs/api_reference/Filesystem-API.html

Q: What about the alternative file system IDBFS?

A: The nature of Indexed DB File System (IDBFS) is basically this: It’s like using MEMFS but you can periodically invoke the FS.syncfs() function to sync the data to IndexedDB storage. So in principle this is persisted storage. However, in practice you can’t use it.

You might be tempted to write some helpers like this to set up a db “/data”

function myAppStartup(callback) {
FS.mount(IDBFS, {}, '/data');
FS.syncfs(true, function (err) {
// handle callback
function myAppShutdown(callback) {
FS.syncfs(function (err) {
// handle callback

And then add maybe something on a timer or something on incoming messages to keep it fresh but things go wrong:

  • The file system is actually reading from MEMFS with periodic sync, so that means ALL the data has to be in memory

This just isn’t going to scale.

Source: IBID

Q: What about the worker file system or the device hooks?

A: The reference again makes it pretty clear that these aren’t viable. The backing store can be different but the access to the file isn’t really db-aware in any kind of way so you’ll end up reading far too much and syncing far too much. Mounting a custom device seems a little promising, you could maybe do block i/o. But it seems much less on point than the VFS solution, see below.

Source: IBID

Q: What about using SQLite’s Virtual File System (VFS) as the interface?

A: This seems like the most promising approach. There is an open source “MetaVFS” (See https://github.com/sql-js/sql.js/issues/447#issuecomment-821591827) which seems to be able to plug in a variety of file systems at the SQLite level. These are considerably better because the backing store is then organized in the blocks that SQLite likes to read in a natural way.

A variety of options have been built including one that reads out of the DOM directly which is an interesting one because it’s rather “from scratch” compared to the others.

A test case of working with 1300 rows (not conducted by me, see link above) has these results over 100 runs:

META VFS Published Results


  • Create database


  • “unix” is the default in-memory filesystem that SQL.js already uses.

Tested with two builds, one regular build that only permits synchronous VFS, and one with Asyncify that allows either synchronous or asynchronous VFS. Journal settings, DELETE (default) and (MEMORY). Note: We use “WAL” mode. See appendix and https://www.sqlite.org/pragma.html for more details on these modes.

These results indicate a significant slowdown when using any asynchronous file system and even more with standard journaling. A clear direction to explore is to extend this experiment with:

  • Comparison to normal non-wasm native mode control case

Importantly, MEMORY mode is not acceptable to us as it does not provide safe checkpointing across power failures so it only gives us a reference number for performance. WAL can’t be better than MEMORY.

Also of note: none of the tested systems was actually persistent. These are all backed by memory, they only exercise different amounts of glue code.

The base number for asynchronous writes in DELETE mode is troubling at nearly 2.9s.

Source: Discussions following https://github.com/sql-js/sql.js/issues/447#issuecomment-821591827

Issue #7: What about multiple browser windows? Multiple browsers?

The question here is about how the threading model might work if there are many windows doing messaging which is very normal. The overall strategy though is to have a single web worker that services all of the information to all of the threads. So the remaining question was: does this result in multiple threads accessing the same indexed db if there are multiple browser instances?

The answer appears to be resounding no at least for Chrome and therefore probably everyone. Even when multiple chrome windows are opened they are in the same browser universe and so they can share a single web worker instance. The separation of universes seems entirely superficial.

Note, as usual, payloads between the worker and UI threads will have to be suitably marshaled to java objects but any design would require that we be able to convert (e.g.) the result of CQL stored procedures to java objects.

In short, multiple windows does not seem to pose any especially onerous challengers.

Issue #8: What about the other ACID requirements we might need?

We’ve talked about persistence a little bit but really only up to the point of, “can we save this data at all?” and “Can we read back the data without reading all of it?”

So let’s talk about where we would sit on the ACID properties based on what we already know. There are some considerations but some things look fine.

Atomicity: Transactions are often composed of multiple statements. Atomicity guarantees that each transaction is treated as a single “unit”, which either succeeds completely, or fails completely.

I expect no issues on this front until we get to Durability which I will discuss below in that section. All the usual SQLite safeguards are available to us. Several journaling modes are available that will allow for rollback subject to Durability concerns.

Consistency: Ensures that a transaction can only bring the database from one valid state to another, Data written to the database must be valid according to all defined rules.

Again, all the usual provisions are available to us. Unique Keys, Foreign Keys, default values, nullability. SQLite will be free to make indices as it sees fit to enforce constraints.

Isolation: Ensures that concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially.

I don’t expect any problems here as these guarantees do not depend on the backing store. The usual good semantics of SQLite WAL mode (readers can go while writers write) will still be available to us and are likely to be the best choice.

Durability: Guarantees that once a transaction has been committed, it will remain committed even in the case of a system failure.

This is where we will need to do work.

There will be a much looser coupling between when SQLite thinks data has been written to when it has actually been written. The more asynchronous the final system is the more this will be the case. Everywhere this happens it creates an opportunity for data loss.

Now in normal SQLite operations any of journaling modes other than “OFF” results in safe two-phase commit; however it’s possible to break this by failing to write journal entries that are expected to be written. If this happens, at worst you can get database corruption but at minimum we could suffer data loss. This is because we might tell the server that we have successfully written data but we might (e.g.) suffer a power loss causing the write to fail while it is in-transit.

It will be important to identify which logging modes can support a possible async delay in writes and still be able to restore the db to a consistent previous state. This may in fact be only WAL mode, where partial write-aheads can be ignored. WAL also has the fewest fsync operations.

The storage associated with IndexedDB is itself transacted and so a VFS built on that would be standing on fairly firm ground. If necessary additional information can be provided to the VFS that indicates a checkpoint is happening to flag appropriate transactions.

There is no guarantee that fsync operations actual sync all the way to the disk. This is unfortunately normal, but in principle, with the right mechanism, and with WAL mode as probably the safest choice (and our preferred choice anyway) it seems likely that we could find a workable solution here that would be at least as good as IndexedDB is itself.

Source: https://en.wikipedia.org/wiki/ACID and see Appendix

Issue #10 : Is IndexedDB via VFS fast enough? What is the speed like?

I wanted to have a simple scenario I could try to give me a sense of what the performance is like. The standard WASM demo app includes a query timer so I used that. I only did a couple runs each so there is some error margin here but we are only trying to get a sense of the speed.

Data source: https://github.com/rhashimoto/wa-sqlite compiled locally

The base scenario:

PRAGMA locking_mode=EXCLUSIVE;
PRAGMA journal_mode=WAL;
drop table if exists dummy;create table dummy(id integer);with
nums(n) as (select 1
union all
select n+1 from nums
limit 1000000)
insert into dummy
select * from nums;
select count(*) from dummy;-- for sure can't be optimized
select count(*) from dummy where id % 31 = 0;

Starting from that:

  • The journal mode can be easily changed
PRAGMA locking_mode=EXCLUSIVE;PRAGMA journal_mode=WAL;-- make sure we can read only some of the rows
-- without paying the full cost
select id from dummy where id % 31 = 0 limit 200;

Control Cases

The above were run with the same script on a MacBook Pro.

All the tests were run on the same system in fact.

The extra command “.open pers” was added to force persistence for the later cases. “pers” is just the file name of the database. Note that the select test cases are meaningless for in memory db as it will be empty when loaded.


Even at 1,000,000 rows the results are quite speedy. The Insert test case takes about 6x longer than just the two selects. There isn’t even that much difference between the in memory database and on disk.

OK keeping this in mind… let’s start to look at some of the WASM combinations, there are many.

Default Journaling Mode (DELETE)

These results for delete mode are shown below with various VFS systems installed.

“Default MemFS” is, as the name suggests, the standard MemFS file system. So there is no VFS overhead in this version. This is probably the fastest we can hope to go so that’s sort of the “WASM Control” row. This has no persistence. So a little less than 1s for the base case. This is about 3x slower than the same operation done completely locally.

Things jump very quickly, the same test case done with IndexedDB as the file store jumps to over 78s. This is in some sense our baseline. Let’s call that about 80x slower in round numbers.

Now we want to try to get a sense of where the cost is coming from so we have some additional experiments.

There are 3 runs of the same test case using an in-memory VFS. This takes IndexedDB out of the picture and gives us a sense of “What does the VFS cost?”. The answer is not too much. In fact two of the memory VFS runs were faster than the base case (maybe better caching?). Whatever else is true the overhead of the VFS is not the main source of problems.

Likewise, just adding asyncify does not torpedo things. MemoryVFS with the Asyncify library (which allows asynchronous file systems) only brings the times to around 1.07s or so. So maybe 10% overhead.

So it seems like the costs are coming from IndexedDB itself. This is especially interesting because it casts light on any solution involving IndexedDB.

We have some additional experiments now to control for other factors and to try out the preferred journaling mode.

No Journaling

While we can’t run like this, it is an interesting experiment. Turning off journaling (and hence the ability to rollback etc.) does cut the time significantly.

Journaling is accounting for slightly less than half of the cost.

WAL mode is supposed to be cheaper than DELETE so let’s see how that fares.

WAL Journaling Mode

Here I also wanted to get a sense of the marginal cost of additional rows. I usually do a scale up experiment also to get a sense of what the overhead of the test is but in this case the test overhead seems so very low it’s irrelevant. There is some non-linearity here that is important. Let’s have a look at the data:

The first row of interest is the main test. At about 76.2s it’s no different than the other modes. WAL isn’t buying us anything material. Now let’s go over the other data carefully, and here I’ll go over it in the order it was gathered because there are peculiar numbers that need explanation.

Initially the time went from ~1.1s for 10k rows to ~4.6s for 100k rows, so only an increase of about 4x. A second run at 100k rows was only 3.8s so again well less than the 10x growth you might expect. But moving from 100k rows to 1M rows increased the time to 76.2s. That’s about 16.5x growth. So lots of nonlinearity going on there.

What about the numbers I skipped, the small ones? To get those numbers you first do the bigger-sized run and then go down to the smaller size. So, if you first do 1M rows and THEN do 100k rows the time drops to .2s pretty consistently! And again the same thing happens if you first to 100k rows and then do 10k rows. NOTE: I tried to reproduce this speedup the following day and was not able to do so, there is more mystery here!

Now remember in all cases the entire table is dropped so there is no real ability to do data caching here. And in any case these are insert operations, so caching is kind of out. But it does seem to make a big difference whether or not the database VFS will need to grow to accommodate the new storage. It appears that dropping a table of the same size does not create enough immediately re-usable pages to give any speed win but dropping a much bigger table does…

This means we might be able to get much better performance if we tweak the VFS to pre-allocate more blocks in IndexedDB rather than let IndexedDB grow dynamically.

Read Performance

Consulting the table for the pure read cases gives us much better looking results:

The first three rows are all the same configuration, so here we can see some caching effects likely. But even if we go for a nominal value of say ~2.9s to read 1M rows twice, that’s very strong evidence that it’s the write operations that are costly and not just i/o generally. Read operations seem to be about 14x slower than the raw native build.

The last three rows verify that reading only some of the data is still economical. At ~6ms to read 6200 rows we should expect a cost reduction of about 161x. Even taking the best number we see at least that much gain. So partial reads do not seem to cause any kind of problem.

Cost To Open

To estimate this cost I did this experiment

-- 2 selectsselect count(*) from dummy where id % 31 = 0;
select count(*) from dummy where id % 31 = 0;

Compared to:

-- 4 selectsselect count(*) from dummy where id % 31 = 0;
select count(*) from dummy where id % 31 = 0;
select count(*) from dummy where id % 31 = 0;
select count(*) from dummy where id % 31 = 0;

At 250ms for two selects and and 473ms for 4 selects I get a marginal cost of 223ms for two additional reads. This leaves only 27ms to open the database. Not too shabby.


Now as I wrote earlier I’ve only done a few runs of each test case here, sometimes not even that, so these numbers should be taken with a grain of salt. But generally it seems that with no additional effort we could expect writes to be 80x slower and reads to be 14x slower than native SQLite.

However, the optimization of growth managing the database has a huge potential, with the good runs consistently ~19x faster than standard runs (blended read and write). So that’s certainly something to look into. This would go a long way towards offsetting the high write costs.

In any case, it’s clear that most of the system is doing fine on the performance side. The performance work should focus on the IndexedDB VFS, which is actually very small and pretty easy to understand.

Appendix: (notes on journal modes from the SQLite docs)

Source: https://sqlite.org/pragma.html#pragma_journal_mode


This pragma queries or sets the journal mode for databases associated with the current database connection.


The DELETE journaling mode is the normal behavior. In the DELETE mode, the rollback journal is deleted at the conclusion of each transaction. Indeed, the delete operation is the action that causes the transaction to commit. (See the document titled Atomic Commit In SQLite for additional detail.)

The TRUNCATE journaling mode commits transactions by truncating the rollback journal to zero-length instead of deleting it. On many systems, truncating a file is much faster than deleting the file since the containing directory does not need to be changed.

The PERSIST journaling mode prevents the rollback journal from being deleted at the end of each transaction. Instead, the header of the journal is overwritten with zeros. This will prevent other database connections from rolling the journal back. The PERSIST journaling mode is useful as an optimization on platforms where deleting or truncating a file is much more expensive than overwriting the first block of a file with zeros. See also: PRAGMA journal_size_limit and SQLITE_DEFAULT_JOURNAL_SIZE_LIMIT.

The MEMORY journaling mode stores the rollback journal in volatile RAM. This saves disk I/O but at the expense of database safety and integrity. If the application using SQLite crashes in the middle of a transaction when the MEMORY journaling mode is set, then the database file will very likely go corrupt.

The WAL journaling mode uses a write-ahead log instead of a rollback journal to implement transactions. The WAL journaling mode is persistent; after being set it stays in effect across multiple database connections and after closing and reopening the database. A database in WAL journaling mode can only be accessed by SQLite version 3.7.0 (2010–07–21) or later.

The OFF journaling mode disables the rollback journal completely. No rollback journal is ever created and hence there is never a rollback journal to delete. The OFF journaling mode disables the atomic commit and rollback capabilities of SQLite. The ROLLBACK command no longer works; it behaves in an undefined way. Applications must avoid using the ROLLBACK command when the journal mode is OFF. If the application crashes in the middle of a transaction when the OFF journaling mode is set, then the database file will very likely go corrupt. Without a journal, there is no way for a statement to unwind partially completed operations following a constraint error. This might also leave the database in a corrupted state. For example, if a duplicate entry causes a CREATE UNIQUE INDEX statement to fail half-way through, it will leave behind a partially created, and hence corrupt, index. Because OFF journaling mode allows the database file to be corrupted using ordinary SQL, it is disabled when SQLITE_DBCONFIG_DEFENSIVE is enabled.

Note that the journal_mode for an in-memory database is either MEMORY or OFF and can not be changed to a different value. An attempt to change the journal_mode of an in-memory database to any setting other than MEMORY or OFF is ignored. Note also that the journal_mode cannot be changed while a transaction is active.

I’m a software engineer at Facebook; I specialize in software performance engineering and programming tools generally. I survived Microsoft from 1988 to 2017.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store