Tutorial 6 - Annotated Data Module¶

The AnnData library is the primary protocol that is used to store imaging data in an efficient, multi-functional format. It is created using the anndata sub-module and can be accessed using trialobj.data. By default, trialobj.data is a data array generated from Suite2p processed data. For all guidance on AnnData objects, visit: https://anndata.readthedocs.io/en/latest/index.html.

The AnnData object is built around the raw Flu matrix of each trialobj . In keeping with AnnData conventions, the data structure is organized in n observations (obs) x m variables (var), where observations are suite2p ROIs and variables are imaging frame timepoints.

none

[1]:

ipython3

import imagingplus as ip


imported imagingplus successfully
        version: 0.2-beta

none

[2]:

ipython3

expobj: ip.Experiment = ip.import_obj(pkl_path='/mnt/qnap_share/Data/imagingplus-example/RL109_analysis.pkl')
print(f'Trials in expobj: {expobj.trialIDs}')
trialobj = expobj.load_trial(trialID=expobj.trialIDs[2])


|- Loaded imagingplus.Experiment object (expID: RL109)109_analysis.pkl ...

Trials in expobj: ['t-005', 't-006', 't-013']

|- Loaded TwoPhotonImagingTrial.alloptical experimental trial object ...

none

[3]:

ipython3

trialobj.data  # this is the anndata object for this trial

none

[3]:

Annotated Data of n_obs (# ROIs) × n_vars (# Frames) = 640 × 16368

storage of Flu data¶

The raw data is stored in .X

none

[4]:

ipython3

print(trialobj.data.X)

print('shape: ', trialobj.data.X.shape)

[[352.13678  411.9472   280.92416  ... 401.3014   515.2566   541.41565 ]
 [192.22421  395.29306  330.7496   ... 257.25806  285.31506  126.660484]
 [336.64996  539.26746  219.30368  ... 423.15295  433.1515   220.52742 ]
 ...
 [308.56497  303.55536  413.3554   ... 482.61044  386.2576   283.1643  ]
 [133.96815  122.96908   84.63106  ... 109.2256   187.91866  159.50813 ]
 [252.49574  240.2455   273.2785   ... 181.601    229.0061   278.74188 ]]
shape:  (640, 16368)

Processed data is added to trialobj.data as a unique layers key.

none

[5]:

ipython3

trialobj.data.layers

none

[5]:

Layers with keys:

none

[6]:

ipython3

# Let's add dFF processing of the raw calcium sigals as a new layer:

from imagingplus.processing.imaging import normalize_dff

dff_arr = normalize_dff(arr=trialobj.data.X, normalize_pct=50)

trialobj.data.add_layer(layer_name='dFF', data=dff_arr)
print(trialobj.data.layers)

Warning:
Cell 16: contains nan
      Mean of the sub-threshold for this cell: nan
Warning:
Cell 410: contains nan
      Mean of the sub-threshold for this cell: nan
Add new dFF layer.
        Layers in object: Layers with keys: dFF
Layers with keys: dFF

none

[7]:

ipython3

print(trialobj.data.layers['dFF'])

print('shape: ', trialobj.data.layers['dFF'].shape)

[[  3.0254273   20.524294   -17.809402   ...  17.409626    50.74975
   58.40316   ]
 [-25.098614    54.028458    28.878689   ...   0.24223907  11.17483
  -50.645935  ]
 [ 17.842701    88.76798    -23.2338     ...  48.122658    51.6226
  -22.805437  ]
 ...
 [ 27.989037    25.911108    71.45485    ... 100.18099     60.215004
   17.453144  ]
 [ 14.804955     5.379218   -27.474817   ...  -6.3983517   61.038216
   36.69162   ]
 [ 42.591587    35.67352     54.32821    ...   2.5552905   29.326313
   57.41353   ]]
shape:  (640, 16368)

The rest of the AnnData data object is built according to the dimensions of the original Flu data input.

observations (Suite2p ROIs metadata and associated processing info)¶

For instance, the metadata for each suite2p ROI stored in Suite2p’s stat.npy output is added to trialobject.data under obs and obsm (1D and >1-D observations annotations, respectively).

none

[8]:

ipython3

trialobj.data.obs

none

[8]:

	ypix	xpix	lam	footprint	mrs	...	radius	aspect_ratio	npix_norm	skew	std
0	[102, 102, 102, 102, 102, 103, 103, 103, 103, ...	[457, 458, 459, 460, 461, 456, 457, 458, 459, ...	[0.0063846777, 0.008958542, 0.011363007, 0.011...	1.0	0.909815	...	3.565604	1.051397	0.649175	3.016955	353.675049
1	[46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 4...	[116, 117, 118, 119, 120, 121, 114, 115, 116, ...	[0.009095913, 0.014569374, 0.01832514, 0.01890...	1.0	0.912076	...	3.538468	1.074428	0.622126	3.784652	422.922577
2	[18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 1...	[202, 203, 204, 205, 200, 201, 202, 203, 204, ...	[0.00545189, 0.006088022, 0.0062021483, 0.0052...	1.0	1.088559	...	4.124215	1.027475	0.919665	3.603348	342.368134
3	[43, 44, 45, 46, 46, 47, 47, 47, 48, 48, 48, 4...	[352, 352, 352, 352, 353, 352, 353, 354, 351, ...	[0.0036495698, 0.0043396214, 0.0031816224, 0.0...	1.0	1.561322	...	8.133019	1.348522	1.325399	3.187822	357.666168
4	[156, 156, 156, 156, 156, 157, 157, 157, 157, ...	[382, 383, 384, 385, 386, 380, 381, 382, 383, ...	[0.013304887, 0.02187323, 0.023734575, 0.01969...	1.0	0.869808	...	3.62042	1.139261	0.554504	2.59998	263.609039
...	...	...	...	...	...	...	...	...	...	...	...
1241	[290, 291, 291, 291, 291, 291, 291, 291, 291, ...	[299, 298, 299, 300, 301, 302, 305, 306, 307, ...	[0.0029507184, 0.0032565512, 0.005071437, 0.00...	2.0	2.336259	...	11.46564	1.251471	2.447931	2.540658	94.641136
1242	[354, 354, 355, 355, 355, 355, 355, 356, 356, ...	[309, 310, 308, 309, 310, 311, 312, 307, 308, ...	[0.00252066, 0.0021455055, 0.007094776, 0.0078...	2.0	2.386548	...	10.831544	1.129317	2.934812	2.229956	79.882561
1246	[15, 15, 16, 16, 16, 17, 17, 17, 17, 17, 17, 1...	[488, 489, 486, 487, 489, 486, 487, 488, 489, ...	[0.010669279, 0.007242187, 0.013514522, 0.0124...	2.0	2.238235	...	13.789857	1.460301	1.636462	4.38241	55.825489
1250	[472, 472, 472, 473, 473, 473, 473, 473, 473, ...	[55, 56, 67, 55, 56, 57, 63, 64, 65, 66, 67, 6...	[0.0023643558, 0.0034383552, 0.0021977199, 0.0...	2.0	2.079421	...	10.794925	1.331867	2.583175	1.233372	64.879417
1251	[342, 342, 342, 343, 343, 343, 343, 343, 344, ...	[128, 129, 130, 126, 127, 128, 129, 130, 124, ...	[0.010067902, 0.007827523, 0.005219734, 0.0070...	2.0	1.777428	...	8.44572	1.100779	1.744658	2.057152	62.520458

640 rows × 15 columns

none

[9]:

ipython3

trialobj.data.obsm

none

[9]:

AxisArrays with keys: ypix, xpix

The .obsm includes the ypix and xpix outputs for each suite2p ROI which represent the pixel locations of the ROI mask.

none

[10]:

ipython3

print('ypix:', trialobj.data.obsm['ypix'][:5], '\n\nxpix: \t', trialobj.data.obsm['xpix'][:5])

ypix: [array([102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 103, 104,
       104, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105,
       105, 105, 106, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107,
       107, 107, 107, 107, 108, 108, 108, 108, 108])
 array([46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48,
       48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 49, 49, 50, 50, 50,
       50, 50, 50, 50, 51, 51, 51, 51, 51, 52, 52, 52])
 array([18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20,
       20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22,
       22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24,
       24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26])
 array([43, 44, 45, 46, 46, 47, 47, 47, 48, 48, 48, 48, 48, 49, 49, 49, 49,
       49, 49, 50, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 51, 52,
       52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 53, 53, 53,
       53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 55,
       55, 55, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 56, 56, 57,
       57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 58, 59, 59])
 array([156, 156, 156, 156, 156, 157, 157, 157, 157, 157, 157, 157, 157,
       158, 158, 158, 158, 158, 158, 158, 158, 159, 159, 159, 159, 159,
       159, 159, 159, 160, 160, 160, 160, 160, 160, 160, 161, 161, 161,
       161, 161])]

xpix:    [array([457, 458, 459, 460, 461, 456, 457, 458, 459, 460, 461, 462, 455,
       456, 457, 458, 459, 460, 461, 462, 455, 456, 457, 458, 459, 460,
       461, 462, 455, 456, 457, 458, 459, 460, 461, 462, 456, 457, 458,
       459, 460, 461, 462, 457, 458, 459, 460, 461])
 array([116, 117, 118, 119, 120, 121, 114, 115, 116, 117, 118, 119, 120,
       121, 122, 115, 116, 117, 118, 119, 120, 121, 122, 115, 116, 117,
       118, 119, 120, 121, 122, 116, 117, 118, 119, 120, 121, 122, 117,
       118, 119, 120, 121, 118, 119, 120])
 array([202, 203, 204, 205, 200, 201, 202, 203, 204, 205, 206, 207, 200,
       201, 202, 203, 204, 205, 206, 207, 208, 200, 201, 202, 203, 204,
       205, 206, 207, 208, 209, 199, 200, 201, 202, 203, 204, 205, 206,
       207, 208, 200, 201, 202, 203, 204, 205, 206, 207, 200, 201, 202,
       203, 204, 205, 206, 207, 201, 202, 203, 204, 205, 206, 207, 202,
       203, 204, 205])
 array([352, 352, 352, 352, 353, 352, 353, 354, 351, 352, 353, 354, 355,
       350, 351, 352, 353, 354, 355, 350, 351, 352, 353, 354, 355, 356,
       349, 350, 351, 352, 353, 354, 355, 349, 350, 351, 352, 353, 354,
       355, 357, 358, 359, 360, 361, 350, 351, 352, 353, 354, 355, 356,
       357, 358, 359, 360, 361, 352, 353, 354, 355, 356, 357, 358, 359,
       360, 361, 354, 355, 356, 357, 358, 359, 360, 361, 362, 354, 355,
       356, 357, 358, 359, 360, 361, 355, 356, 357, 358, 359, 360, 361,
       357, 358, 359, 360, 361, 358, 359])
 array([382, 383, 384, 385, 386, 380, 381, 382, 383, 384, 385, 386, 387,
       380, 381, 382, 383, 384, 385, 386, 387, 380, 381, 382, 383, 384,
       385, 386, 387, 380, 381, 382, 383, 384, 385, 386, 381, 382, 383,
       384, 385])]

variables (temporal synchronization of paq channels and imaging)¶

And the temporal synchronization data of the experiment collected in .paq output is added to the variables annotations under var. These variables are timed to the imaging frame clock timings. The total # of variables is the number of imaging frames in the original Flu data input.

none

[11]:

ipython3

trialobj.data.var

none

[11]:

	frame_clock	x_galvo_uncaging	slm2packio	markpoints2packio	packio2slm	packio2markpoints	pycontrol_rsync	voltage
139577	4.972792	-1.165180	3.329534	0.007827	0.000264	0.000592	0.017035	-0.116807
140252	4.974765	-1.164851	3.325917	0.007827	-0.000394	-0.000394	0.019666	-0.196060
140925	4.971806	-1.165180	3.337755	0.008156	-0.000065	0.000592	0.021310	-0.210858
141595	4.970819	-1.164851	3.331507	0.007827	0.000592	0.000264	0.021310	-0.204939
142267	4.975094	-1.166166	3.340715	0.006841	0.000264	0.001250	0.021639	-0.234206
...	...	...	...	...	...	...	...	...
11136216	4.974436	-1.165837	3.337426	0.006841	-0.000065	0.002565	0.021639	2.423554
11136886	4.975751	-1.164851	3.334138	0.012102	-0.000065	-0.000065	0.017035	2.473211
11137559	4.960953	-1.164851	3.337097	0.008485	0.000264	0.000921	0.019994	2.473211
11138232	4.971477	-1.167153	3.319340	0.006183	-0.000065	-0.000065	0.005525	2.466963
11138904	4.975751	-1.164851	3.329863	0.009471	0.000264	0.000592	0.017693	2.453151

16368 rows × 8 columns

Creating or Modifying AnnData arrays of trialobj¶

There are a number of helper functions to create anndata arrays or modify existing anndata arrays.

none

[12]:

ipython3

# creating new anndata object. This is identical to the base AnnData library.
# the example below is from the Getting Started Tutorial for AnnData:

# any given anndata object is created from constituent data arrays.


# 1) Primary data matrix
import numpy as np
import pandas as pd

n_rois, n_frames = 10, 10000
X = np.random.random((n_rois, n_frames))  # create random data matrix

df = pd.DataFrame(X, columns=range(n_frames), index=np.arange(n_rois, dtype=int).astype(str))
df  # show the dataframe

none

[12]:

	0	1	2	3	4	...	9995	9996	9997	9998	9999
0	0.088470	0.594978	0.267262	0.521285	0.427390	...	0.628226	0.491950	0.023748	0.910001	0.342909
1	0.421657	0.024940	0.345641	0.285778	0.339881	...	0.282903	0.818589	0.758343	0.068396	0.809684
2	0.642708	0.101187	0.787579	0.822067	0.329221	...	0.674419	0.082625	0.676742	0.711652	0.515747
3	0.741156	0.563763	0.390991	0.809422	0.628270	...	0.746982	0.588162	0.203452	0.662033	0.523288
4	0.266626	0.484640	0.430566	0.882055	0.785261	...	0.655115	0.442506	0.116492	0.861459	0.589859
5	0.695614	0.571977	0.633992	0.706400	0.355071	...	0.156109	0.222790	0.958219	0.484075	0.236766
6	0.313605	0.101705	0.080710	0.854698	0.220697	...	0.482442	0.171771	0.278977	0.321641	0.124504
7	0.700918	0.319251	0.173709	0.844428	0.992370	...	0.493461	0.930643	0.548558	0.948738	0.416265
8	0.670791	0.416993	0.405371	0.213854	0.712764	...	0.250209	0.956986	0.325717	0.696112	0.219828
9	0.064023	0.667027	0.198786	0.437727	0.811632	...	0.449985	0.016948	0.336893	0.156778	0.746549

10 rows × 10000 columns

none

[13]:

ipython3

#2) Observations matrix

obs_meta = pd.DataFrame({
    'cell_type': np.random.choice(['exc', 'int'], n_rois),
},
    index=np.arange(n_rois, dtype=int).astype(str),    # these are the same IDs of observations as above!
)
obs_meta

none

[13]:

	cell_type
0	exc
1	int
2	int
3	int
4	exc
5	exc
6	int
7	exc
8	exc
9	int

none

[14]:

ipython3

#3) Variables matrix


var_meta = pd.DataFrame({
    'exp_group': np.random.choice(['A','B', 'C'], n_frames),
},
    index=np.arange(n_frames, dtype=int).astype(str),    # these are the same IDs of observations as above!
)
var_meta

none

[14]:

	exp_group
0	C
1	B
2	B
3	A
4	A
...	...
9995	A
9996	B
9997	B
9998	B
9999	C

10000 rows × 1 columns

none

[15]:

ipython3

#4) Creating a new anndata attribute for the trialobj

import imagingplus.processing.anndata as ad  # from the processing module, import anndata submodule

trialobj.new_anndata = ad.AnnotatedData(X=df,obs=obs_meta, var=var_meta)

print(trialobj.new_anndata)

Created AnnData object:
        Annotated Data of n_obs (# ROIs) × n_vars (# Frames) = 10 × 10000
Annotated Data of n_obs × n_vars = 10 × 10000
available attributes:
        .X (primary datamatrix)
        .obs (obs metadata):
                |- 'cell_type'
        .var (vars metadata):
                |- 'exp_group'

none

[16]:

ipython3

# adding an 'obs' to existing anndata object

new_obs = pd.DataFrame({
    'cell_loc_x': np.random.random_integers(0, 512, n_rois),
    'cell_loc_y': np.random.random_integers(0, 512, n_rois),
},
    index=np.arange(n_rois, dtype=int).astype(str),    # these are the same IDs of observations as above!
)

cell_loc_x = np.random.random_integers(0, 512, n_rois)
cell_loc_y = np.random.random_integers(0, 512, n_rois)


trialobj.new_anndata.add_obs(obs_name='cell_loc_x', values=cell_loc_x)
trialobj.new_anndata.add_obs(obs_name='cell_loc_y', values=cell_loc_y)

print(trialobj.new_anndata)

Annotated Data of n_obs × n_vars = 10 × 10000
available attributes:
        .X (primary datamatrix)
        .obs (obs metadata):
                |- 'cell_type', 'cell_loc_x', 'cell_loc_y'
        .var (vars metadata):
                |- 'exp_group'

none

[17]:

ipython3

# deleting an 'obs' to existing anndata object
# uses the pop method

trialobj.new_anndata.del_obs('cell_type')
print(trialobj.new_anndata)

Annotated Data of n_obs × n_vars = 10 × 10000
available attributes:
        .X (primary datamatrix)
        .obs (obs metadata):
                |- 'cell_loc_x', 'cell_loc_y'
        .var (vars metadata):
                |- 'exp_group'

Note: adding and deleting an ‘var’ to existing anndata object can be done in the exact same manner as demonstrated above for ‘obs’ using .add_var() and .del_var() methods on an anndata object.