Tutorial 6 - Annotated Data Module¶
The AnnData library is the primary protocol that is used to store imaging data in an efficient, multi-functional format. It is created using the anndata sub-module and can be accessed using trialobj.data. By default, trialobj.data is a data array generated from Suite2p processed data. For all guidance on AnnData objects, visit: https://anndata.readthedocs.io/en/latest/index.html.
The AnnData object is built around the raw Flu matrix of each trialobj . In keeping with AnnData conventions, the data structure is organized in n observations (obs) x m variables (var), where observations are suite2p ROIs and variables are imaging frame timepoints.
[1]:
import imagingplus as ip
imported imagingplus successfully
version: 0.2-beta
[2]:
expobj: ip.Experiment = ip.import_obj(pkl_path='/mnt/qnap_share/Data/imagingplus-example/RL109_analysis.pkl')
print(f'Trials in expobj: {expobj.trialIDs}')
trialobj = expobj.load_trial(trialID=expobj.trialIDs[2])
|- Loaded imagingplus.Experiment object (expID: RL109)109_analysis.pkl ...
Trials in expobj: ['t-005', 't-006', 't-013']
|- Loaded TwoPhotonImagingTrial.alloptical experimental trial object ...
[3]:
trialobj.data # this is the anndata object for this trial
[3]:
Annotated Data of n_obs (# ROIs) × n_vars (# Frames) = 640 × 16368
storage of Flu data¶
The raw data is stored in .X
[4]:
print(trialobj.data.X)
print('shape: ', trialobj.data.X.shape)
[[352.13678 411.9472 280.92416 ... 401.3014 515.2566 541.41565 ]
[192.22421 395.29306 330.7496 ... 257.25806 285.31506 126.660484]
[336.64996 539.26746 219.30368 ... 423.15295 433.1515 220.52742 ]
...
[308.56497 303.55536 413.3554 ... 482.61044 386.2576 283.1643 ]
[133.96815 122.96908 84.63106 ... 109.2256 187.91866 159.50813 ]
[252.49574 240.2455 273.2785 ... 181.601 229.0061 278.74188 ]]
shape: (640, 16368)
Processed data is added to trialobj.data as a unique layers key.
[5]:
trialobj.data.layers
[5]:
Layers with keys:
[6]:
# Let's add dFF processing of the raw calcium sigals as a new layer:
from imagingplus.processing.imaging import normalize_dff
dff_arr = normalize_dff(arr=trialobj.data.X, normalize_pct=50)
trialobj.data.add_layer(layer_name='dFF', data=dff_arr)
print(trialobj.data.layers)
Warning:
Cell 16: contains nan
Mean of the sub-threshold for this cell: nan
Warning:
Cell 410: contains nan
Mean of the sub-threshold for this cell: nan
Add new dFF layer.
Layers in object: Layers with keys: dFF
Layers with keys: dFF
[7]:
print(trialobj.data.layers['dFF'])
print('shape: ', trialobj.data.layers['dFF'].shape)
[[ 3.0254273 20.524294 -17.809402 ... 17.409626 50.74975
58.40316 ]
[-25.098614 54.028458 28.878689 ... 0.24223907 11.17483
-50.645935 ]
[ 17.842701 88.76798 -23.2338 ... 48.122658 51.6226
-22.805437 ]
...
[ 27.989037 25.911108 71.45485 ... 100.18099 60.215004
17.453144 ]
[ 14.804955 5.379218 -27.474817 ... -6.3983517 61.038216
36.69162 ]
[ 42.591587 35.67352 54.32821 ... 2.5552905 29.326313
57.41353 ]]
shape: (640, 16368)
The rest of the AnnData data object is built according to the dimensions of the original Flu data input.
observations (Suite2p ROIs metadata and associated processing info)¶
For instance, the metadata for each suite2p ROI stored in Suite2p’s stat.npy output is added to trialobject.data under obs and obsm (1D and >1-D observations annotations, respectively).
[8]:
trialobj.data.obs
[8]:
| ypix | xpix | lam | footprint | mrs | ... | radius | aspect_ratio | npix_norm | skew | std | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | [102, 102, 102, 102, 102, 103, 103, 103, 103, ... | [457, 458, 459, 460, 461, 456, 457, 458, 459, ... | [0.0063846777, 0.008958542, 0.011363007, 0.011... | 1.0 | 0.909815 | ... | 3.565604 | 1.051397 | 0.649175 | 3.016955 | 353.675049 |
| 1 | [46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 4... | [116, 117, 118, 119, 120, 121, 114, 115, 116, ... | [0.009095913, 0.014569374, 0.01832514, 0.01890... | 1.0 | 0.912076 | ... | 3.538468 | 1.074428 | 0.622126 | 3.784652 | 422.922577 |
| 2 | [18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 1... | [202, 203, 204, 205, 200, 201, 202, 203, 204, ... | [0.00545189, 0.006088022, 0.0062021483, 0.0052... | 1.0 | 1.088559 | ... | 4.124215 | 1.027475 | 0.919665 | 3.603348 | 342.368134 |
| 3 | [43, 44, 45, 46, 46, 47, 47, 47, 48, 48, 48, 4... | [352, 352, 352, 352, 353, 352, 353, 354, 351, ... | [0.0036495698, 0.0043396214, 0.0031816224, 0.0... | 1.0 | 1.561322 | ... | 8.133019 | 1.348522 | 1.325399 | 3.187822 | 357.666168 |
| 4 | [156, 156, 156, 156, 156, 157, 157, 157, 157, ... | [382, 383, 384, 385, 386, 380, 381, 382, 383, ... | [0.013304887, 0.02187323, 0.023734575, 0.01969... | 1.0 | 0.869808 | ... | 3.62042 | 1.139261 | 0.554504 | 2.59998 | 263.609039 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 1241 | [290, 291, 291, 291, 291, 291, 291, 291, 291, ... | [299, 298, 299, 300, 301, 302, 305, 306, 307, ... | [0.0029507184, 0.0032565512, 0.005071437, 0.00... | 2.0 | 2.336259 | ... | 11.46564 | 1.251471 | 2.447931 | 2.540658 | 94.641136 |
| 1242 | [354, 354, 355, 355, 355, 355, 355, 356, 356, ... | [309, 310, 308, 309, 310, 311, 312, 307, 308, ... | [0.00252066, 0.0021455055, 0.007094776, 0.0078... | 2.0 | 2.386548 | ... | 10.831544 | 1.129317 | 2.934812 | 2.229956 | 79.882561 |
| 1246 | [15, 15, 16, 16, 16, 17, 17, 17, 17, 17, 17, 1... | [488, 489, 486, 487, 489, 486, 487, 488, 489, ... | [0.010669279, 0.007242187, 0.013514522, 0.0124... | 2.0 | 2.238235 | ... | 13.789857 | 1.460301 | 1.636462 | 4.38241 | 55.825489 |
| 1250 | [472, 472, 472, 473, 473, 473, 473, 473, 473, ... | [55, 56, 67, 55, 56, 57, 63, 64, 65, 66, 67, 6... | [0.0023643558, 0.0034383552, 0.0021977199, 0.0... | 2.0 | 2.079421 | ... | 10.794925 | 1.331867 | 2.583175 | 1.233372 | 64.879417 |
| 1251 | [342, 342, 342, 343, 343, 343, 343, 343, 344, ... | [128, 129, 130, 126, 127, 128, 129, 130, 124, ... | [0.010067902, 0.007827523, 0.005219734, 0.0070... | 2.0 | 1.777428 | ... | 8.44572 | 1.100779 | 1.744658 | 2.057152 | 62.520458 |
640 rows × 15 columns
[9]:
trialobj.data.obsm
[9]:
AxisArrays with keys: ypix, xpix
The .obsm includes the ypix and xpix outputs for each suite2p ROI which represent the pixel locations of the ROI mask.
[10]:
print('ypix:', trialobj.data.obsm['ypix'][:5], '\n\nxpix: \t', trialobj.data.obsm['xpix'][:5])
ypix: [array([102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 103, 104,
104, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105,
105, 105, 106, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107,
107, 107, 107, 107, 108, 108, 108, 108, 108])
array([46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48,
48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 49, 49, 50, 50, 50,
50, 50, 50, 50, 51, 51, 51, 51, 51, 52, 52, 52])
array([18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20,
20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22,
22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24,
24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26])
array([43, 44, 45, 46, 46, 47, 47, 47, 48, 48, 48, 48, 48, 49, 49, 49, 49,
49, 49, 50, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 51, 52,
52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 53, 53, 53,
53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 55,
55, 55, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 56, 56, 57,
57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 58, 59, 59])
array([156, 156, 156, 156, 156, 157, 157, 157, 157, 157, 157, 157, 157,
158, 158, 158, 158, 158, 158, 158, 158, 159, 159, 159, 159, 159,
159, 159, 159, 160, 160, 160, 160, 160, 160, 160, 161, 161, 161,
161, 161])]
xpix: [array([457, 458, 459, 460, 461, 456, 457, 458, 459, 460, 461, 462, 455,
456, 457, 458, 459, 460, 461, 462, 455, 456, 457, 458, 459, 460,
461, 462, 455, 456, 457, 458, 459, 460, 461, 462, 456, 457, 458,
459, 460, 461, 462, 457, 458, 459, 460, 461])
array([116, 117, 118, 119, 120, 121, 114, 115, 116, 117, 118, 119, 120,
121, 122, 115, 116, 117, 118, 119, 120, 121, 122, 115, 116, 117,
118, 119, 120, 121, 122, 116, 117, 118, 119, 120, 121, 122, 117,
118, 119, 120, 121, 118, 119, 120])
array([202, 203, 204, 205, 200, 201, 202, 203, 204, 205, 206, 207, 200,
201, 202, 203, 204, 205, 206, 207, 208, 200, 201, 202, 203, 204,
205, 206, 207, 208, 209, 199, 200, 201, 202, 203, 204, 205, 206,
207, 208, 200, 201, 202, 203, 204, 205, 206, 207, 200, 201, 202,
203, 204, 205, 206, 207, 201, 202, 203, 204, 205, 206, 207, 202,
203, 204, 205])
array([352, 352, 352, 352, 353, 352, 353, 354, 351, 352, 353, 354, 355,
350, 351, 352, 353, 354, 355, 350, 351, 352, 353, 354, 355, 356,
349, 350, 351, 352, 353, 354, 355, 349, 350, 351, 352, 353, 354,
355, 357, 358, 359, 360, 361, 350, 351, 352, 353, 354, 355, 356,
357, 358, 359, 360, 361, 352, 353, 354, 355, 356, 357, 358, 359,
360, 361, 354, 355, 356, 357, 358, 359, 360, 361, 362, 354, 355,
356, 357, 358, 359, 360, 361, 355, 356, 357, 358, 359, 360, 361,
357, 358, 359, 360, 361, 358, 359])
array([382, 383, 384, 385, 386, 380, 381, 382, 383, 384, 385, 386, 387,
380, 381, 382, 383, 384, 385, 386, 387, 380, 381, 382, 383, 384,
385, 386, 387, 380, 381, 382, 383, 384, 385, 386, 381, 382, 383,
384, 385])]
variables (temporal synchronization of paq channels and imaging)¶
And the temporal synchronization data of the experiment collected in .paq output is added to the variables annotations under var. These variables are timed to the imaging frame clock timings. The total # of variables is the number of imaging frames in the original Flu data input.
[11]:
trialobj.data.var
[11]:
| frame_clock | x_galvo_uncaging | slm2packio | markpoints2packio | packio2slm | packio2markpoints | pycontrol_rsync | voltage | |
|---|---|---|---|---|---|---|---|---|
| 139577 | 4.972792 | -1.165180 | 3.329534 | 0.007827 | 0.000264 | 0.000592 | 0.017035 | -0.116807 |
| 140252 | 4.974765 | -1.164851 | 3.325917 | 0.007827 | -0.000394 | -0.000394 | 0.019666 | -0.196060 |
| 140925 | 4.971806 | -1.165180 | 3.337755 | 0.008156 | -0.000065 | 0.000592 | 0.021310 | -0.210858 |
| 141595 | 4.970819 | -1.164851 | 3.331507 | 0.007827 | 0.000592 | 0.000264 | 0.021310 | -0.204939 |
| 142267 | 4.975094 | -1.166166 | 3.340715 | 0.006841 | 0.000264 | 0.001250 | 0.021639 | -0.234206 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 11136216 | 4.974436 | -1.165837 | 3.337426 | 0.006841 | -0.000065 | 0.002565 | 0.021639 | 2.423554 |
| 11136886 | 4.975751 | -1.164851 | 3.334138 | 0.012102 | -0.000065 | -0.000065 | 0.017035 | 2.473211 |
| 11137559 | 4.960953 | -1.164851 | 3.337097 | 0.008485 | 0.000264 | 0.000921 | 0.019994 | 2.473211 |
| 11138232 | 4.971477 | -1.167153 | 3.319340 | 0.006183 | -0.000065 | -0.000065 | 0.005525 | 2.466963 |
| 11138904 | 4.975751 | -1.164851 | 3.329863 | 0.009471 | 0.000264 | 0.000592 | 0.017693 | 2.453151 |
16368 rows × 8 columns
Creating or Modifying AnnData arrays of trialobj¶
There are a number of helper functions to create anndata arrays or modify existing anndata arrays.
[12]:
# creating new anndata object. This is identical to the base AnnData library.
# the example below is from the Getting Started Tutorial for AnnData:
# any given anndata object is created from constituent data arrays.
# 1) Primary data matrix
import numpy as np
import pandas as pd
n_rois, n_frames = 10, 10000
X = np.random.random((n_rois, n_frames)) # create random data matrix
df = pd.DataFrame(X, columns=range(n_frames), index=np.arange(n_rois, dtype=int).astype(str))
df # show the dataframe
[12]:
| 0 | 1 | 2 | 3 | 4 | ... | 9995 | 9996 | 9997 | 9998 | 9999 | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0.088470 | 0.594978 | 0.267262 | 0.521285 | 0.427390 | ... | 0.628226 | 0.491950 | 0.023748 | 0.910001 | 0.342909 |
| 1 | 0.421657 | 0.024940 | 0.345641 | 0.285778 | 0.339881 | ... | 0.282903 | 0.818589 | 0.758343 | 0.068396 | 0.809684 |
| 2 | 0.642708 | 0.101187 | 0.787579 | 0.822067 | 0.329221 | ... | 0.674419 | 0.082625 | 0.676742 | 0.711652 | 0.515747 |
| 3 | 0.741156 | 0.563763 | 0.390991 | 0.809422 | 0.628270 | ... | 0.746982 | 0.588162 | 0.203452 | 0.662033 | 0.523288 |
| 4 | 0.266626 | 0.484640 | 0.430566 | 0.882055 | 0.785261 | ... | 0.655115 | 0.442506 | 0.116492 | 0.861459 | 0.589859 |
| 5 | 0.695614 | 0.571977 | 0.633992 | 0.706400 | 0.355071 | ... | 0.156109 | 0.222790 | 0.958219 | 0.484075 | 0.236766 |
| 6 | 0.313605 | 0.101705 | 0.080710 | 0.854698 | 0.220697 | ... | 0.482442 | 0.171771 | 0.278977 | 0.321641 | 0.124504 |
| 7 | 0.700918 | 0.319251 | 0.173709 | 0.844428 | 0.992370 | ... | 0.493461 | 0.930643 | 0.548558 | 0.948738 | 0.416265 |
| 8 | 0.670791 | 0.416993 | 0.405371 | 0.213854 | 0.712764 | ... | 0.250209 | 0.956986 | 0.325717 | 0.696112 | 0.219828 |
| 9 | 0.064023 | 0.667027 | 0.198786 | 0.437727 | 0.811632 | ... | 0.449985 | 0.016948 | 0.336893 | 0.156778 | 0.746549 |
10 rows × 10000 columns
[13]:
#2) Observations matrix
obs_meta = pd.DataFrame({
'cell_type': np.random.choice(['exc', 'int'], n_rois),
},
index=np.arange(n_rois, dtype=int).astype(str), # these are the same IDs of observations as above!
)
obs_meta
[13]:
| cell_type | |
|---|---|
| 0 | exc |
| 1 | int |
| 2 | int |
| 3 | int |
| 4 | exc |
| 5 | exc |
| 6 | int |
| 7 | exc |
| 8 | exc |
| 9 | int |
[14]:
#3) Variables matrix
var_meta = pd.DataFrame({
'exp_group': np.random.choice(['A','B', 'C'], n_frames),
},
index=np.arange(n_frames, dtype=int).astype(str), # these are the same IDs of observations as above!
)
var_meta
[14]:
| exp_group | |
|---|---|
| 0 | C |
| 1 | B |
| 2 | B |
| 3 | A |
| 4 | A |
| ... | ... |
| 9995 | A |
| 9996 | B |
| 9997 | B |
| 9998 | B |
| 9999 | C |
10000 rows × 1 columns
[15]:
#4) Creating a new anndata attribute for the trialobj
import imagingplus.processing.anndata as ad # from the processing module, import anndata submodule
trialobj.new_anndata = ad.AnnotatedData(X=df,obs=obs_meta, var=var_meta)
print(trialobj.new_anndata)
Created AnnData object:
Annotated Data of n_obs (# ROIs) × n_vars (# Frames) = 10 × 10000
Annotated Data of n_obs × n_vars = 10 × 10000
available attributes:
.X (primary datamatrix)
.obs (obs metadata):
|- 'cell_type'
.var (vars metadata):
|- 'exp_group'
[16]:
# adding an 'obs' to existing anndata object
new_obs = pd.DataFrame({
'cell_loc_x': np.random.random_integers(0, 512, n_rois),
'cell_loc_y': np.random.random_integers(0, 512, n_rois),
},
index=np.arange(n_rois, dtype=int).astype(str), # these are the same IDs of observations as above!
)
cell_loc_x = np.random.random_integers(0, 512, n_rois)
cell_loc_y = np.random.random_integers(0, 512, n_rois)
trialobj.new_anndata.add_obs(obs_name='cell_loc_x', values=cell_loc_x)
trialobj.new_anndata.add_obs(obs_name='cell_loc_y', values=cell_loc_y)
print(trialobj.new_anndata)
Annotated Data of n_obs × n_vars = 10 × 10000
available attributes:
.X (primary datamatrix)
.obs (obs metadata):
|- 'cell_type', 'cell_loc_x', 'cell_loc_y'
.var (vars metadata):
|- 'exp_group'
[17]:
# deleting an 'obs' to existing anndata object
# uses the pop method
trialobj.new_anndata.del_obs('cell_type')
print(trialobj.new_anndata)
Annotated Data of n_obs × n_vars = 10 × 10000
available attributes:
.X (primary datamatrix)
.obs (obs metadata):
|- 'cell_loc_x', 'cell_loc_y'
.var (vars metadata):
|- 'exp_group'
Note: adding and deleting an ‘var’ to existing anndata object can be done in the exact same manner as demonstrated above for ‘obs’ using .add_var() and .del_var() methods on an anndata object.
