A primer on the FlyWire segmentation¶
In this tutorial you will learn all you need to know about the FlyWire segmentation.
Before get started, a quick primer on terminology:
In FlyWire, the ID of a neuron (e.g.
720575940618780781
) is called the “root ID”.Each root ID is a collection of “supervoxels”. These supervoxels are the atomic, immutable units of the segmentation.
Every time a neuron is edited (i.e. addition or removal of a supervoxel by merging or splitting) you create a new root ID.
A “materialization” is a snapshot of the segmentation at a given point in time.
If you work in the FlyWire production dataset, you will have to deal with the fact that root IDs are constantly changing as people keep improving the segmentation through proofreading. If you are working with the public release datasets, you will likely stick to root IDs that match one of the available materialization versions. Please find more detailed explanations below.
FlyWire datasets¶
FlyWire actually has three different datasets/versions:
The “Public release” contains static snapshots of the segmentation which correspond to specific materialization version (see below for an explanation of materializations). For example, the first ever public release was materialization
630
. Anyone has access to this dataset after signing up through the FlyWire website.The “Production” dataset is where people do the actual proofreading/annotation. As such it is ahead of the publicly released snapshots. To get access to the production dataset you have to be approved by one of the community managers.
Last but not least, “Sandbox” is a training ground that has seen minimal proofreading (i.e. is close to the bsae segmentation). Anyone has access to this dataset after signing up.
Most functions in fafbseg.flywire
accept a dataset
parameter. As of
fafbseg
version 3.0.0
the default dataset is the public one.
>>> from fafbseg import flywire
>>> # Defaults to public dataset
>>> flywire.supervoxels_to_roots(79801523353597754)
array([720575940621675174])
>>> # Specifically query the production dataset
>>> flywire.supervoxels_to_roots(79801523353597754, dataset='production')
array([720575940631274967])
You can change this default by running this at the beginning of each session:
>>> from fafbseg import flywire
>>> flywire.set_default_dataset('production')
See the docstring for set_default_dataset()
for details.
Alternatively, you can also set an FLYWIRE_DEFAULT_DATASET
environment
variable before starting the Python session.
$ export FLYWIRE_DEFAULT_DATASET="public"
$ python
Environment variables can be set permanently too. The details of that depend on your operating system and on which terminal (e.g. bash or zsh) you are using. A quick Google should tell you how it works.
FlyWire root IDs - the details¶
Under the hood FlyWire is using chunkedgraph, an octree-like structure, to manage the segmentation. In brief: “supervoxels” are the atomic unit of the segmentation which are grouped into “root IDs”. Or conversely: each root ID is a collection of supervoxels. Any edit to the segmentation is effectively just the addition or subtraction of supervoxels to that collection.
Like supervoxels, root IDs are immutable though. So whenever edits are made new root IDs are generated which then represent the post-edit agglomeration of supervoxels. For example, splitting a neuron will generate two new root IDs and invalidate its current root ID. Merging two neurons, on the other hand, will invalidate the two old root IDs and generate one new root ID representing the combination of their supervoxels.
Importantly, “outdated” root IDs are not deleted and you can still pull up e.g. their meshes in the FlyWire neuroglancer. This is super convenient but it comes with a caveat: you can find yourself with a list of root IDs that never co-existed which will be problematic when querying associated meta data (see paragraph below).
Here are a couple fabseg
functions that will help you tracking root IDs:
|
Retrieve FlyWire segment (i.e. root) IDs at given location(s). |
|
Retrieve FlyWire supervoxel IDs at given location(s). |
|
Get root(s) for given supervoxel(s). |
|
Check if root is the current one. |
|
Retrieve the most recent version of given FlyWire (root) ID(s). |
|
Find a time at which given root IDs co-existed. |
|
Find a materialization version (or live) for given IDs. |
Materializations and the CAVE¶
As established above, root IDs can change over time. So how do we maintain the link between a neuron and its meta data (e.g. its annotations, synapses, etc.) as it evolves? Principally this is done by associating each annotation with an x/y/z coordinate. That coordinate in turn maps to a supervoxel and we can then ask which root ID it currently belongs to - or belonged to if we want to go back in time.
This kind of location to root ID look-up becomes rather expensive when working with large tables: the (filtered) synapse table, for example, has 130M rows each with a pre- and a postsynaptic x/y/z coordinate that needs to be mapped to a root ID.
Fortunately, all of this is done for you by CAVE, the *c*onnectome *a*nnotation *v*ersioning *e*ngine. The gist is this: (almost) every night CAVE looks up the current root IDs for the synaptic connections, the community annotations and the various other tables it stores. These snapshots are called “materializations”. Note that the public dataset only contains a limited set of these materializations.
If we make sure that our root IDs were “alive” at one of the available materialization versions, we can query those tables with very little overhead on our end. Things get tricky if:
root IDs are more recent than the latest materialization
root IDs only existed briefly in between materializations
root IDs never co-existed at any of the materializations
fafbseg
tries to abstract away a lot of the complications - in fact the
relevant functions such as get_synapses()
accept a
materialization
parameter that defaults to “auto” which will try to find
a matching materialization version and complain if that isn’t possible.
In practice, the safe bet is to pick a materialization to work with and stick
with it for your analyses. If you are working with the public release data, this
isn’t much of a problem since you have only very few versions and no “live” data
to work with anyway. Use get_materialization_versions()
to
get a list of available versions.
Let’s explore this a bit:
>>> # Import the flywire module
>>> from fafbseg import flywire
>>> # We will use the public dataset for this tutorial
>>> flywire.set_default_dataset("public")
Default dataset set to "public".
>>> # Check which materilizations are available
>>> flywire.get_materialization_versions()
time_stamp | expires_on | is_merged | datastack | version | valid | status | id | |
---|---|---|---|---|---|---|---|---|
0 | 2023-03-21 08:10:00 | 2121-11-10 07:10:00 | True | flywire_fafb_public | 630 | True | AVAILABLE | 718 |
As you can see, at the time of writing there is only a single materialization
available for the public release dataset: 630
.
This also means that all queries automatically go against that materialization:
>>> # Fetch the root IDs at given x/y/z coordinate(s)
>>> roots = flywire.locs_to_segments([[75350, 60162, 3162]])
>>> roots
array([720575940631680813])
>>> # We can also specify a timstamp matching the materialization version
>>> # which will be useful later when more versions are available
>>> roots = flywire.locs_to_segments([[75350, 60162, 3162]], timestamp='mat_630')
>>> roots
array([720575940631680813])
What if you’re given a list of root IDs and want to check if they are still up-to-date - or match a given materialization version?
>>> # Check if root IDs are outdated (i.e. have more recent edits)
>>> flywire.is_latest_root([720575940625431866, 720575940621835755])
array([ True, False])
>>> # Likewise, we can ask if they were current at a given materialization
>>> flywire.is_latest_root([720575940625431866, 720575940621835755],
... timestamp='mat_630')
array([ True, False])
Is there a way to map root IDs back and forth? There is! We can take a root
ID, find its constituent supervoxels and then ask which root IDs they belonged
to at a given point in time. This is what update_ids()
does:
>>> updated = flywire.update_ids(
... [720575940621835755, 720575940608788840, 720575940628913983]
... )
>>> updated
old_id | new_id | confidence | changed | |
---|---|---|---|---|
0 | 720575940621835755 | 720575940636873791 | 0.99 | True |
1 | 720575940608788840 | 720575940636873791 | 1.00 | True |
2 | 720575940628913983 | 720575940636873791 | 0.94 | True |
In the above example all old IDs are “ancestors” to the same current root ID.
Note that by default, update_ids()
will map to the
most current version but it also accepts a timestamp
parameter which lets
us map to a specific point in time.
Want to track how a neuron was edited over time? Easy:
>>> edits = flywire.get_edit_history(720575940625431866)
>>> edits.head()
after_root_ids | before_root_ids | is_merge | operation_id | segment | timestamp | user_affiliation | user_id | user_name | |
---|---|---|---|---|---|---|---|---|---|
2 | [720575940625153661] | [720575940613909190] | False | 546853 | 720575940625431866 | 2021-08-19 09:10:18.090 | Greg Jefferis Lab | 957 | Varun Sane |
3 | [720575940626449738] | [720575940617774213, 720575940625153661] | True | 546854 | 720575940625431866 | 2021-08-19 09:10:36.280 | Greg Jefferis Lab | 957 | Varun Sane |
4 | [720575940604045489] | [720575940618706267] | False | 546855 | 720575940625431866 | 2021-08-19 09:11:20.009 | Greg Jefferis Lab | 957 | Varun Sane |
5 | [720575940626907179] | [720575940604045489, 720575940626449738] | True | 546856 | 720575940625431866 | 2021-08-19 09:11:34.230 | Greg Jefferis Lab | 957 | Varun Sane |
6 | [720575940604045745] | [720575940626907179, 720575940626995629] | True | 546857 | 720575940625431866 | 2021-08-19 09:12:06.042 | Greg Jefferis Lab | 957 | Varun Sane |
Please see the API documentation for a full list of segmentation-related functions.