A primer on the FlyWire segmentation#

In this tutorial you will learn all you need to know about the FlyWire segmentation.

Before get started, a quick primer on terminology:

  1. In FlyWire, the ID of a neuron (e.g. 720575940618780781) is called the “root ID”.

  2. Each root ID is a collection of “supervoxels”. These supervoxels are the atomic, immutable units of the segmentation.

  3. Every time a neuron is edited (i.e. addition or removal of a supervoxel by merging or splitting) you create a new root ID.

  4. A “materialization” is a snapshot of the segmentation at a given point in time.

If you work in the FlyWire production dataset, you will have to deal with the fact that root IDs are constantly changing as people keep improving the segmentation through proofreading. If you are working with the public release datasets, you will likely stick to root IDs that match one of the available materialization versions. Please find more detailed explanations below.

FlyWire datasets#

FlyWire actually has three different datasets/versions:

  1. The “Public release” contains static snapshots of the segmentation which correspond to specific materialization version (see below for an explanation of materializations). For example, the first ever public release was materialization 630. Anyone has access to this dataset after signing up through the FlyWire website.

  2. The “Production” dataset is where people do the actual proofreading/annotation. As such it is ahead of the publicly released snapshots. To get access to the production dataset you have to be approved by one of the community managers.

  3. Last but not least, “Sandbox” is a training ground that has seen minimal proofreading (i.e. is close to the bsae segmentation). Anyone has access to this dataset after signing up.

Most functions in fafbseg.flywire accept a dataset parameter. As of fafbseg version 3.0.0 the default dataset is the public one.

>>> from fafbseg import flywire
>>> # Defaults to public dataset
>>> flywire.supervoxels_to_roots(79801523353597754)
array([720575940621675174])
>>> # Specifically query the production dataset
>>> flywire.supervoxels_to_roots(79801523353597754, dataset='production')
array([720575940631274967])

You can change this default by running this at the beginning of each session:

>>> from fafbseg import flywire
>>> flywire.set_default_dataset('production')

See the docstring for set_default_dataset() for details.

Alternatively, you can also set an FLYWIRE_DEFAULT_DATASET environment variable before starting the Python session.

$ export FLYWIRE_DEFAULT_DATASET="public"
$ python

Environment variables can be set permanently too. The details of that depend on your operating system and on which terminal (e.g. bash or zsh) you are using. A quick Google should tell you how it works.

FlyWire root IDs - the details#

Under the hood FlyWire is using chunkedgraph, an octree-like structure, to manage the segmentation. In brief: “supervoxels” are the atomic unit of the segmentation which are grouped into “root IDs”. Or conversely: each root ID is a collection of supervoxels. Any edit to the segmentation is effectively just the addition or subtraction of supervoxels to that collection.

Like supervoxels, root IDs are immutable though. So whenever edits are made new root IDs are generated which then represent the post-edit agglomeration of supervoxels. For example, splitting a neuron will generate two new root IDs and invalidate its current root ID. Merging two neurons, on the other hand, will invalidate the two old root IDs and generate one new root ID representing the combination of their supervoxels.

Importantly, “outdated” root IDs are not deleted and you can still pull up e.g. their meshes in the FlyWire neuroglancer. This is super convenient but it comes with a caveat: you can find yourself with a list of root IDs that never co-existed which will be problematic when querying associated meta data (see paragraph below).

Here are a couple fabseg functions that will help you tracking root IDs:

fafbseg.flywire.locs_to_segments(locs[, ...])

Retrieve FlyWire segment (i.e. root) IDs at given location(s).

fafbseg.flywire.locs_to_supervoxels(locs[, ...])

Retrieve FlyWire supervoxel IDs at given location(s).

fafbseg.flywire.supervoxels_to_roots(x[, ...])

Get root(s) for given supervoxel(s).

fafbseg.flywire.is_latest_root(id[, ...])

Check if root is the current one.

fafbseg.flywire.update_ids(id[, stop_layer, ...])

Retrieve the most recent version of given FlyWire (root) ID(s).

fafbseg.flywire.find_common_time(root_ids[, ...])

Find a time at which given root IDs co-existed.

fafbseg.flywire.find_mat_version(ids[, ...])

Find a materialization version (or live) for given IDs.

Materializations and the CAVE#

As established above, root IDs can change over time. So how do we maintain the link between a neuron and its meta data (e.g. its annotations, synapses, etc.) as it evolves? Principally this is done by associating each annotation with an x/y/z coordinate. That coordinate in turn maps to a supervoxel and we can then ask which root ID it currently belongs to - or belonged to if we want to go back in time.

This kind of location to root ID look-up becomes rather expensive when working with large tables: the (filtered) synapse table, for example, has 130M rows each with a pre- and a postsynaptic x/y/z coordinate that needs to be mapped to a root ID.

Fortunately, all of this is done for you by CAVE, the *c*onnectome *a*nnotation *v*ersioning *e*ngine. The gist is this: (almost) every night CAVE looks up the current root IDs for the synaptic connections, the community annotations and the various other tables it stores. These snapshots are called “materializations”. Note that the public dataset only contains a limited set of these materializations.

If we make sure that our root IDs were “alive” at one of the available materialization versions, we can query those tables with very little overhead on our end. Things get tricky if:

  • root IDs are more recent than the latest materialization

  • root IDs only existed briefly in between materializations

  • root IDs never co-existed at any of the materializations

fafbseg tries to abstract away a lot of the complications - in fact the relevant functions such as get_synapses() accept a materialization parameter that defaults to “auto” which will try to find a matching materialization version and complain if that isn’t possible.

In practice, the safe bet is to pick a materialization to work with and stick with it for your analyses. If you are working with the public release data, this isn’t much of a problem since you have only very few versions and no “live” data to work with anyway. Use get_materialization_versions() to get a list of available versions.

Let’s explore this a bit:

>>> # Import the flywire module
>>> from fafbseg import flywire

>>> # We will use the public dataset for this tutorial
>>> flywire.set_default_dataset("public")
Default dataset set to "public".
>>> # Check which materilizations are available
>>> flywire.get_materialization_versions()
time_stamp expires_on is_merged datastack version valid status id
0 2023-03-21 08:10:00 2121-11-10 07:10:00 True flywire_fafb_public 630 True AVAILABLE 718

As you can see, at the time of writing there is only a single materialization available for the public release dataset: 630.

This also means that all queries automatically go against that materialization:

>>> # Fetch the root IDs at given x/y/z coordinate(s)
>>> roots = flywire.locs_to_segments([[75350, 60162, 3162]])
>>> roots
array([720575940631680813])
>>> # We can also specify a timstamp matching the materialization version
>>> # which will be useful later when more versions are available
>>> roots = flywire.locs_to_segments([[75350, 60162, 3162]], timestamp='mat_630')
>>> roots
array([720575940631680813])

What if you’re given a list of root IDs and want to check if they are still up-to-date - or match a given materialization version?

>>> # Check if root IDs are outdated (i.e. have more recent edits)
>>> flywire.is_latest_root([720575940625431866, 720575940621835755])
array([ True, False])
>>> # Likewise, we can ask if they were current at a given materialization
>>> flywire.is_latest_root([720575940625431866, 720575940621835755],
...                        timestamp='mat_630')
array([ True, False])

Is there a way to map root IDs back and forth? There is! We can take a root ID, find its constituent supervoxels and then ask which root IDs they belonged to at a given point in time. This is what update_ids() does:

>>> updated = flywire.update_ids(
...     [720575940621835755, 720575940608788840, 720575940628913983]
... )
>>> updated
old_id new_id confidence changed
0 720575940621835755 720575940636873791 0.99 True
1 720575940608788840 720575940636873791 1.00 True
2 720575940628913983 720575940636873791 0.94 True

In the above example all old IDs are “ancestors” to the same current root ID. Note that by default, update_ids() will map to the most current version but it also accepts a timestamp parameter which lets us map to a specific point in time.

Want to track how a neuron was edited over time? Easy:

>>> edits = flywire.get_edit_history(720575940625431866)
>>> edits.head()
after_root_ids before_root_ids is_merge operation_id segment timestamp user_affiliation user_id user_name
2 [720575940625153661] [720575940613909190] False 546853 720575940625431866 2021-08-19 09:10:18.090 Greg Jefferis Lab 957 Varun Sane
3 [720575940626449738] [720575940617774213, 720575940625153661] True 546854 720575940625431866 2021-08-19 09:10:36.280 Greg Jefferis Lab 957 Varun Sane
4 [720575940604045489] [720575940618706267] False 546855 720575940625431866 2021-08-19 09:11:20.009 Greg Jefferis Lab 957 Varun Sane
5 [720575940626907179] [720575940604045489, 720575940626449738] True 546856 720575940625431866 2021-08-19 09:11:34.230 Greg Jefferis Lab 957 Varun Sane
6 [720575940604045745] [720575940626907179, 720575940626995629] True 546857 720575940625431866 2021-08-19 09:12:06.042 Greg Jefferis Lab 957 Varun Sane

Please see the API documentation for a full list of segmentation-related functions.