A primer on the FlyWire segmentation¶

In this tutorial you will learn all you need to know about the FlyWire segmentation.

Before get started, a quick primer on terminology:

In FlyWire, the ID of a neuron (e.g. 720575940618780781) is called the “root ID”.
Each root ID is a collection of “supervoxels”. These supervoxels are the atomic, immutable units of the segmentation.
Every time a neuron is edited (i.e. addition or removal of a supervoxel by merging or splitting) you create a new root ID.
A “materialization” is a snapshot of the segmentation at a given point in time.

If you work in the FlyWire production dataset, you will have to deal with the fact that root IDs are constantly changing as people keep improving the segmentation through proofreading. If you are working with the public release datasets, you will likely stick to root IDs that match one of the available materialization versions. Please find more detailed explanations below.

FlyWire datasets¶

FlyWire actually has three different datasets/versions:

The “Public release” contains static snapshots of the segmentation which correspond to specific materialization version (see below for an explanation of materializations). For example, the first ever public release was materialization 630. Anyone has access to this dataset after signing up through the FlyWire website.
The “Production” dataset is where people do the actual proofreading/annotation. As such it is ahead of the publicly released snapshots. To get access to the production dataset you have to be approved by one of the community managers.
Last but not least, “Sandbox” is a training ground that has seen minimal proofreading (i.e. is close to the bsae segmentation). Anyone has access to this dataset after signing up.

Most functions in fafbseg.flywire accept a dataset parameter. As of fafbseg version 3.0.0 the default dataset is the public one.

>>> from fafbseg import flywire
>>> # Defaults to public dataset
>>> flywire.supervoxels_to_roots(79801523353597754)
array([720575940621675174])
>>> # Specifically query the production dataset
>>> flywire.supervoxels_to_roots(79801523353597754, dataset='production')
array([720575940631274967])

You can change this default by running this at the beginning of each session:

>>> from fafbseg import flywire
>>> flywire.set_default_dataset('production')

See the docstring for set_default_dataset() for details.

Alternatively, you can also set an FLYWIRE_DEFAULT_DATASET environment variable before starting the Python session.

$ export FLYWIRE_DEFAULT_DATASET="public"
$ python

Environment variables can be set permanently too. The details of that depend on your operating system and on which terminal (e.g. bash or zsh) you are using. A quick Google should tell you how it works.

FlyWire root IDs - the details¶

Under the hood FlyWire is using chunkedgraph, an octree-like structure, to manage the segmentation. In brief: “supervoxels” are the atomic unit of the segmentation which are grouped into “root IDs”. Or conversely: each root ID is a collection of supervoxels. Any edit to the segmentation is effectively just the addition or subtraction of supervoxels to that collection.

Like supervoxels, root IDs are immutable though. So whenever edits are made new root IDs are generated which then represent the post-edit agglomeration of supervoxels. For example, splitting a neuron will generate two new root IDs and invalidate its current root ID. Merging two neurons, on the other hand, will invalidate the two old root IDs and generate one new root ID representing the combination of their supervoxels.

Importantly, “outdated” root IDs are not deleted and you can still pull up e.g. their meshes in the FlyWire neuroglancer. This is super convenient but it comes with a caveat: you can find yourself with a list of root IDs that never co-existed which will be problematic when querying associated meta data (see paragraph below).

Here are a couple fabseg functions that will help you tracking root IDs:

`fafbseg.flywire.locs_to_segments`(locs[, ...])	Retrieve FlyWire segment (i.e. root) IDs at given location(s).
`fafbseg.flywire.locs_to_supervoxels`(locs[, ...])	Retrieve supervoxel IDs at given location(s).
`fafbseg.flywire.supervoxels_to_roots`(x[, ...])	Get root(s) for given supervoxel(s).
`fafbseg.flywire.is_latest_root`(id[, ...])	Check if root is the current one.
`fafbseg.flywire.update_ids`(id[, stop_layer, ...])	Retrieve the most recent version of given FlyWire (root) ID(s).
`fafbseg.flywire.find_common_time`(root_ids[, ...])	Find a time at which given root IDs co-existed.
`fafbseg.flywire.find_mat_version`(ids[, ...])	Find a materialization version (or live) for given IDs.

Materializations and the CAVE¶

As established above, root IDs can change over time. So how do we maintain the link between a neuron and its meta data (e.g. its annotations, synapses, etc.) as it evolves? Principally this is done by associating each annotation with an x/y/z coordinate. That coordinate in turn maps to a supervoxel and we can then ask which root ID it currently belongs to - or belonged to if we want to go back in time.

This kind of location to root ID look-up becomes rather expensive when working with large tables: the (filtered) synapse table, for example, has 130M rows each with a pre- and a postsynaptic x/y/z coordinate that needs to be mapped to a root ID.

Fortunately, all of this is done for you by CAVE, the *c*onnectome *a*nnotation *v*ersioning *e*ngine. The gist is this: (almost) every night CAVE looks up the current root IDs for the synaptic connections, the community annotations and the various other tables it stores. These snapshots are called “materializations”. Note that the public dataset only contains a limited set of these materializations.

If we make sure that our root IDs were “alive” at one of the available materialization versions, we can query those tables with very little overhead on our end. Things get tricky if:

root IDs are more recent than the latest materialization
root IDs only existed briefly in between materializations
root IDs never co-existed at any of the materializations

fafbseg tries to abstract away a lot of the complications - in fact the relevant functions such as get_synapses() accept a materialization parameter that defaults to “auto” which will try to find a matching materialization version and complain if that isn’t possible.

In practice, the safe bet is to pick a materialization to work with and stick with it for your analyses. If you are working with the public release data, this isn’t much of a problem since you have only very few versions and no “live” data to work with anyway. Use get_materialization_versions() to get a list of available versions.

Let’s explore this a bit:

>>> # Import the flywire module
>>> from fafbseg import flywire

>>> # We will use the public dataset for this tutorial
>>> flywire.set_default_dataset("public")

Default dataset set to "public".

>>> # Check which materilizations are available
>>> flywire.get_materialization_versions()

	time_stamp	expires_on	is_merged	datastack	version	valid	status	id
0	2023-03-21 08:10:00	2121-11-10 07:10:00	True	flywire_fafb_public	630	True	AVAILABLE	718

As you can see, at the time of writing there is only a single materialization available for the public release dataset: 630.

This also means that all queries automatically go against that materialization:

>>> # Fetch the root IDs at given x/y/z coordinate(s)
>>> roots = flywire.locs_to_segments([[75350, 60162, 3162]])
>>> roots

array([720575940631680813])

>>> # We can also specify a timstamp matching the materialization version
>>> # which will be useful later when more versions are available
>>> roots = flywire.locs_to_segments([[75350, 60162, 3162]], timestamp='mat_630')
>>> roots

array([720575940631680813])

What if you’re given a list of root IDs and want to check if they are still up-to-date - or match a given materialization version?

>>> # Check if root IDs are outdated (i.e. have more recent edits)
>>> flywire.is_latest_root([720575940625431866, 720575940621835755])

array([ True, False])

>>> # Likewise, we can ask if they were current at a given materialization
>>> flywire.is_latest_root([720575940625431866, 720575940621835755],
...                        timestamp='mat_630')

array([ True, False])

Is there a way to map root IDs back and forth? There is! We can take a root ID, find its constituent supervoxels and then ask which root IDs they belonged to at a given point in time. This is what update_ids() does:

>>> updated = flywire.update_ids(
...     [720575940621835755, 720575940608788840, 720575940628913983]
... )
>>> updated

	old_id	new_id	confidence	changed
0	720575940621835755	720575940636873791	0.99	True
1	720575940608788840	720575940636873791	1.00	True
2	720575940628913983	720575940636873791	0.94	True

In the above example all old IDs are “ancestors” to the same current root ID. Note that by default, update_ids() will map to the most current version but it also accepts a timestamp parameter which lets us map to a specific point in time.

Want to track how a neuron was edited over time? Easy:

>>> edits = flywire.get_edit_history(720575940625431866)
>>> edits.head()

	after_root_ids	before_root_ids	is_merge	operation_id	segment	timestamp	user_affiliation	user_id	user_name
2	[720575940625153661]	[720575940613909190]	False	546853	720575940625431866	2021-08-19 09:10:18.090	Greg Jefferis Lab	957	Varun Sane
3	[720575940626449738]	[720575940617774213, 720575940625153661]	True	546854	720575940625431866	2021-08-19 09:10:36.280	Greg Jefferis Lab	957	Varun Sane
4	[720575940604045489]	[720575940618706267]	False	546855	720575940625431866	2021-08-19 09:11:20.009	Greg Jefferis Lab	957	Varun Sane
5	[720575940626907179]	[720575940604045489, 720575940626449738]	True	546856	720575940625431866	2021-08-19 09:11:34.230	Greg Jefferis Lab	957	Varun Sane
6	[720575940604045745]	[720575940626907179, 720575940626995629]	True	546857	720575940625431866	2021-08-19 09:12:06.042	Greg Jefferis Lab	957	Varun Sane

Please see the API documentation for a full list of segmentation-related functions.