Tutorial 6 - Annotated Data Module

The AnnData library is the primary protocol that is used to store imaging data in an efficient, multi-functional format. It is created using the anndata sub-module and can be accessed using trialobj.data. By default, trialobj.data is a data array generated from Suite2p processed data. For all guidance on AnnData objects, visit: https://anndata.readthedocs.io/en/latest/index.html.

The AnnData object is built around the raw Flu matrix of each trialobj . In keeping with AnnData conventions, the data structure is organized in n observations (obs) x m variables (var), where observations are suite2p ROIs and variables are imaging frame timepoints.

none
[1]:
ipython3
import imagingplus as ip

imported imagingplus successfully
        version: 0.2-beta

none
[2]:
ipython3
expobj: ip.Experiment = ip.import_obj(pkl_path='/mnt/qnap_share/Data/imagingplus-example/RL109_analysis.pkl')
print(f'Trials in expobj: {expobj.trialIDs}')
trialobj = expobj.load_trial(trialID=expobj.trialIDs[2])

|- Loaded imagingplus.Experiment object (expID: RL109)109_analysis.pkl ...

Trials in expobj: ['t-005', 't-006', 't-013']

|- Loaded TwoPhotonImagingTrial.alloptical experimental trial object ...

none
[3]:
ipython3
trialobj.data  # this is the anndata object for this trial
none
[3]:
Annotated Data of n_obs (# ROIs) × n_vars (# Frames) = 640 × 16368

storage of Flu data

The raw data is stored in .X

none
[4]:
ipython3
print(trialobj.data.X)

print('shape: ', trialobj.data.X.shape)
[[352.13678  411.9472   280.92416  ... 401.3014   515.2566   541.41565 ]
 [192.22421  395.29306  330.7496   ... 257.25806  285.31506  126.660484]
 [336.64996  539.26746  219.30368  ... 423.15295  433.1515   220.52742 ]
 ...
 [308.56497  303.55536  413.3554   ... 482.61044  386.2576   283.1643  ]
 [133.96815  122.96908   84.63106  ... 109.2256   187.91866  159.50813 ]
 [252.49574  240.2455   273.2785   ... 181.601    229.0061   278.74188 ]]
shape:  (640, 16368)

Processed data is added to trialobj.data as a unique layers key.

none
[5]:
ipython3
trialobj.data.layers
none
[5]:
Layers with keys:
none
[6]:
ipython3
# Let's add dFF processing of the raw calcium sigals as a new layer:

from imagingplus.processing.imaging import normalize_dff

dff_arr = normalize_dff(arr=trialobj.data.X, normalize_pct=50)

trialobj.data.add_layer(layer_name='dFF', data=dff_arr)
print(trialobj.data.layers)
Warning:
Cell 16: contains nan
      Mean of the sub-threshold for this cell: nan
Warning:
Cell 410: contains nan
      Mean of the sub-threshold for this cell: nan
Add new dFF layer.
        Layers in object: Layers with keys: dFF
Layers with keys: dFF
none
[7]:
ipython3
print(trialobj.data.layers['dFF'])

print('shape: ', trialobj.data.layers['dFF'].shape)
[[  3.0254273   20.524294   -17.809402   ...  17.409626    50.74975
   58.40316   ]
 [-25.098614    54.028458    28.878689   ...   0.24223907  11.17483
  -50.645935  ]
 [ 17.842701    88.76798    -23.2338     ...  48.122658    51.6226
  -22.805437  ]
 ...
 [ 27.989037    25.911108    71.45485    ... 100.18099     60.215004
   17.453144  ]
 [ 14.804955     5.379218   -27.474817   ...  -6.3983517   61.038216
   36.69162   ]
 [ 42.591587    35.67352     54.32821    ...   2.5552905   29.326313
   57.41353   ]]
shape:  (640, 16368)

The rest of the AnnData data object is built according to the dimensions of the original Flu data input.

observations (Suite2p ROIs metadata and associated processing info)

For instance, the metadata for each suite2p ROI stored in Suite2p’s stat.npy output is added to trialobject.data under obs and obsm (1D and >1-D observations annotations, respectively).

none
[8]:
ipython3
trialobj.data.obs
none
[8]:
ypix xpix lam footprint mrs ... radius aspect_ratio npix_norm skew std
0 [102, 102, 102, 102, 102, 103, 103, 103, 103, ... [457, 458, 459, 460, 461, 456, 457, 458, 459, ... [0.0063846777, 0.008958542, 0.011363007, 0.011... 1.0 0.909815 ... 3.565604 1.051397 0.649175 3.016955 353.675049
1 [46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 4... [116, 117, 118, 119, 120, 121, 114, 115, 116, ... [0.009095913, 0.014569374, 0.01832514, 0.01890... 1.0 0.912076 ... 3.538468 1.074428 0.622126 3.784652 422.922577
2 [18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 1... [202, 203, 204, 205, 200, 201, 202, 203, 204, ... [0.00545189, 0.006088022, 0.0062021483, 0.0052... 1.0 1.088559 ... 4.124215 1.027475 0.919665 3.603348 342.368134
3 [43, 44, 45, 46, 46, 47, 47, 47, 48, 48, 48, 4... [352, 352, 352, 352, 353, 352, 353, 354, 351, ... [0.0036495698, 0.0043396214, 0.0031816224, 0.0... 1.0 1.561322 ... 8.133019 1.348522 1.325399 3.187822 357.666168
4 [156, 156, 156, 156, 156, 157, 157, 157, 157, ... [382, 383, 384, 385, 386, 380, 381, 382, 383, ... [0.013304887, 0.02187323, 0.023734575, 0.01969... 1.0 0.869808 ... 3.62042 1.139261 0.554504 2.59998 263.609039
... ... ... ... ... ... ... ... ... ... ... ...
1241 [290, 291, 291, 291, 291, 291, 291, 291, 291, ... [299, 298, 299, 300, 301, 302, 305, 306, 307, ... [0.0029507184, 0.0032565512, 0.005071437, 0.00... 2.0 2.336259 ... 11.46564 1.251471 2.447931 2.540658 94.641136
1242 [354, 354, 355, 355, 355, 355, 355, 356, 356, ... [309, 310, 308, 309, 310, 311, 312, 307, 308, ... [0.00252066, 0.0021455055, 0.007094776, 0.0078... 2.0 2.386548 ... 10.831544 1.129317 2.934812 2.229956 79.882561
1246 [15, 15, 16, 16, 16, 17, 17, 17, 17, 17, 17, 1... [488, 489, 486, 487, 489, 486, 487, 488, 489, ... [0.010669279, 0.007242187, 0.013514522, 0.0124... 2.0 2.238235 ... 13.789857 1.460301 1.636462 4.38241 55.825489
1250 [472, 472, 472, 473, 473, 473, 473, 473, 473, ... [55, 56, 67, 55, 56, 57, 63, 64, 65, 66, 67, 6... [0.0023643558, 0.0034383552, 0.0021977199, 0.0... 2.0 2.079421 ... 10.794925 1.331867 2.583175 1.233372 64.879417
1251 [342, 342, 342, 343, 343, 343, 343, 343, 344, ... [128, 129, 130, 126, 127, 128, 129, 130, 124, ... [0.010067902, 0.007827523, 0.005219734, 0.0070... 2.0 1.777428 ... 8.44572 1.100779 1.744658 2.057152 62.520458

640 rows × 15 columns

none
[9]:
ipython3
trialobj.data.obsm
none
[9]:
AxisArrays with keys: ypix, xpix

The .obsm includes the ypix and xpix outputs for each suite2p ROI which represent the pixel locations of the ROI mask.

none
[10]:
ipython3
print('ypix:', trialobj.data.obsm['ypix'][:5], '\n\nxpix: \t', trialobj.data.obsm['xpix'][:5])
ypix: [array([102, 102, 102, 102, 102, 103, 103, 103, 103, 103, 103, 103, 104,
       104, 104, 104, 104, 104, 104, 104, 105, 105, 105, 105, 105, 105,
       105, 105, 106, 106, 106, 106, 106, 106, 106, 106, 107, 107, 107,
       107, 107, 107, 107, 108, 108, 108, 108, 108])
 array([46, 46, 46, 46, 46, 46, 47, 47, 47, 47, 47, 47, 47, 47, 47, 48, 48,
       48, 48, 48, 48, 48, 48, 49, 49, 49, 49, 49, 49, 49, 49, 50, 50, 50,
       50, 50, 50, 50, 51, 51, 51, 51, 51, 52, 52, 52])
 array([18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 19, 19, 20, 20, 20, 20, 20,
       20, 20, 20, 20, 21, 21, 21, 21, 21, 21, 21, 21, 21, 21, 22, 22, 22,
       22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 23, 24, 24,
       24, 24, 24, 24, 24, 24, 25, 25, 25, 25, 25, 25, 25, 26, 26, 26, 26])
 array([43, 44, 45, 46, 46, 47, 47, 47, 48, 48, 48, 48, 48, 49, 49, 49, 49,
       49, 49, 50, 50, 50, 50, 50, 50, 50, 51, 51, 51, 51, 51, 51, 51, 52,
       52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 52, 53, 53, 53, 53, 53, 53,
       53, 53, 53, 53, 53, 53, 54, 54, 54, 54, 54, 54, 54, 54, 54, 54, 55,
       55, 55, 55, 55, 55, 55, 55, 55, 56, 56, 56, 56, 56, 56, 56, 56, 57,
       57, 57, 57, 57, 57, 57, 58, 58, 58, 58, 58, 59, 59])
 array([156, 156, 156, 156, 156, 157, 157, 157, 157, 157, 157, 157, 157,
       158, 158, 158, 158, 158, 158, 158, 158, 159, 159, 159, 159, 159,
       159, 159, 159, 160, 160, 160, 160, 160, 160, 160, 161, 161, 161,
       161, 161])]

xpix:    [array([457, 458, 459, 460, 461, 456, 457, 458, 459, 460, 461, 462, 455,
       456, 457, 458, 459, 460, 461, 462, 455, 456, 457, 458, 459, 460,
       461, 462, 455, 456, 457, 458, 459, 460, 461, 462, 456, 457, 458,
       459, 460, 461, 462, 457, 458, 459, 460, 461])
 array([116, 117, 118, 119, 120, 121, 114, 115, 116, 117, 118, 119, 120,
       121, 122, 115, 116, 117, 118, 119, 120, 121, 122, 115, 116, 117,
       118, 119, 120, 121, 122, 116, 117, 118, 119, 120, 121, 122, 117,
       118, 119, 120, 121, 118, 119, 120])
 array([202, 203, 204, 205, 200, 201, 202, 203, 204, 205, 206, 207, 200,
       201, 202, 203, 204, 205, 206, 207, 208, 200, 201, 202, 203, 204,
       205, 206, 207, 208, 209, 199, 200, 201, 202, 203, 204, 205, 206,
       207, 208, 200, 201, 202, 203, 204, 205, 206, 207, 200, 201, 202,
       203, 204, 205, 206, 207, 201, 202, 203, 204, 205, 206, 207, 202,
       203, 204, 205])
 array([352, 352, 352, 352, 353, 352, 353, 354, 351, 352, 353, 354, 355,
       350, 351, 352, 353, 354, 355, 350, 351, 352, 353, 354, 355, 356,
       349, 350, 351, 352, 353, 354, 355, 349, 350, 351, 352, 353, 354,
       355, 357, 358, 359, 360, 361, 350, 351, 352, 353, 354, 355, 356,
       357, 358, 359, 360, 361, 352, 353, 354, 355, 356, 357, 358, 359,
       360, 361, 354, 355, 356, 357, 358, 359, 360, 361, 362, 354, 355,
       356, 357, 358, 359, 360, 361, 355, 356, 357, 358, 359, 360, 361,
       357, 358, 359, 360, 361, 358, 359])
 array([382, 383, 384, 385, 386, 380, 381, 382, 383, 384, 385, 386, 387,
       380, 381, 382, 383, 384, 385, 386, 387, 380, 381, 382, 383, 384,
       385, 386, 387, 380, 381, 382, 383, 384, 385, 386, 381, 382, 383,
       384, 385])]

variables (temporal synchronization of paq channels and imaging)

And the temporal synchronization data of the experiment collected in .paq output is added to the variables annotations under var. These variables are timed to the imaging frame clock timings. The total # of variables is the number of imaging frames in the original Flu data input.

none
[11]:
ipython3
trialobj.data.var
none
[11]:
frame_clock x_galvo_uncaging slm2packio markpoints2packio packio2slm packio2markpoints pycontrol_rsync voltage
139577 4.972792 -1.165180 3.329534 0.007827 0.000264 0.000592 0.017035 -0.116807
140252 4.974765 -1.164851 3.325917 0.007827 -0.000394 -0.000394 0.019666 -0.196060
140925 4.971806 -1.165180 3.337755 0.008156 -0.000065 0.000592 0.021310 -0.210858
141595 4.970819 -1.164851 3.331507 0.007827 0.000592 0.000264 0.021310 -0.204939
142267 4.975094 -1.166166 3.340715 0.006841 0.000264 0.001250 0.021639 -0.234206
... ... ... ... ... ... ... ... ...
11136216 4.974436 -1.165837 3.337426 0.006841 -0.000065 0.002565 0.021639 2.423554
11136886 4.975751 -1.164851 3.334138 0.012102 -0.000065 -0.000065 0.017035 2.473211
11137559 4.960953 -1.164851 3.337097 0.008485 0.000264 0.000921 0.019994 2.473211
11138232 4.971477 -1.167153 3.319340 0.006183 -0.000065 -0.000065 0.005525 2.466963
11138904 4.975751 -1.164851 3.329863 0.009471 0.000264 0.000592 0.017693 2.453151

16368 rows × 8 columns

Creating or Modifying AnnData arrays of trialobj

There are a number of helper functions to create anndata arrays or modify existing anndata arrays.

none
[12]:
ipython3
# creating new anndata object. This is identical to the base AnnData library.
# the example below is from the Getting Started Tutorial for AnnData:

# any given anndata object is created from constituent data arrays.


# 1) Primary data matrix
import numpy as np
import pandas as pd

n_rois, n_frames = 10, 10000
X = np.random.random((n_rois, n_frames))  # create random data matrix

df = pd.DataFrame(X, columns=range(n_frames), index=np.arange(n_rois, dtype=int).astype(str))
df  # show the dataframe
none
[12]:
0 1 2 3 4 ... 9995 9996 9997 9998 9999
0 0.088470 0.594978 0.267262 0.521285 0.427390 ... 0.628226 0.491950 0.023748 0.910001 0.342909
1 0.421657 0.024940 0.345641 0.285778 0.339881 ... 0.282903 0.818589 0.758343 0.068396 0.809684
2 0.642708 0.101187 0.787579 0.822067 0.329221 ... 0.674419 0.082625 0.676742 0.711652 0.515747
3 0.741156 0.563763 0.390991 0.809422 0.628270 ... 0.746982 0.588162 0.203452 0.662033 0.523288
4 0.266626 0.484640 0.430566 0.882055 0.785261 ... 0.655115 0.442506 0.116492 0.861459 0.589859
5 0.695614 0.571977 0.633992 0.706400 0.355071 ... 0.156109 0.222790 0.958219 0.484075 0.236766
6 0.313605 0.101705 0.080710 0.854698 0.220697 ... 0.482442 0.171771 0.278977 0.321641 0.124504
7 0.700918 0.319251 0.173709 0.844428 0.992370 ... 0.493461 0.930643 0.548558 0.948738 0.416265
8 0.670791 0.416993 0.405371 0.213854 0.712764 ... 0.250209 0.956986 0.325717 0.696112 0.219828
9 0.064023 0.667027 0.198786 0.437727 0.811632 ... 0.449985 0.016948 0.336893 0.156778 0.746549

10 rows × 10000 columns

none
[13]:
ipython3
#2) Observations matrix

obs_meta = pd.DataFrame({
    'cell_type': np.random.choice(['exc', 'int'], n_rois),
},
    index=np.arange(n_rois, dtype=int).astype(str),    # these are the same IDs of observations as above!
)
obs_meta
none
[13]:
cell_type
0 exc
1 int
2 int
3 int
4 exc
5 exc
6 int
7 exc
8 exc
9 int
none
[14]:
ipython3
#3) Variables matrix


var_meta = pd.DataFrame({
    'exp_group': np.random.choice(['A','B', 'C'], n_frames),
},
    index=np.arange(n_frames, dtype=int).astype(str),    # these are the same IDs of observations as above!
)
var_meta
none
[14]:
exp_group
0 C
1 B
2 B
3 A
4 A
... ...
9995 A
9996 B
9997 B
9998 B
9999 C

10000 rows × 1 columns

none
[15]:
ipython3
#4) Creating a new anndata attribute for the trialobj

import imagingplus.processing.anndata as ad  # from the processing module, import anndata submodule

trialobj.new_anndata = ad.AnnotatedData(X=df,obs=obs_meta, var=var_meta)

print(trialobj.new_anndata)
Created AnnData object:
        Annotated Data of n_obs (# ROIs) × n_vars (# Frames) = 10 × 10000
Annotated Data of n_obs × n_vars = 10 × 10000
available attributes:
        .X (primary datamatrix)
        .obs (obs metadata):
                |- 'cell_type'
        .var (vars metadata):
                |- 'exp_group'
none
[16]:
ipython3
# adding an 'obs' to existing anndata object

new_obs = pd.DataFrame({
    'cell_loc_x': np.random.random_integers(0, 512, n_rois),
    'cell_loc_y': np.random.random_integers(0, 512, n_rois),
},
    index=np.arange(n_rois, dtype=int).astype(str),    # these are the same IDs of observations as above!
)

cell_loc_x = np.random.random_integers(0, 512, n_rois)
cell_loc_y = np.random.random_integers(0, 512, n_rois)


trialobj.new_anndata.add_obs(obs_name='cell_loc_x', values=cell_loc_x)
trialobj.new_anndata.add_obs(obs_name='cell_loc_y', values=cell_loc_y)

print(trialobj.new_anndata)
Annotated Data of n_obs × n_vars = 10 × 10000
available attributes:
        .X (primary datamatrix)
        .obs (obs metadata):
                |- 'cell_type', 'cell_loc_x', 'cell_loc_y'
        .var (vars metadata):
                |- 'exp_group'
none
[17]:
ipython3
# deleting an 'obs' to existing anndata object
# uses the pop method

trialobj.new_anndata.del_obs('cell_type')
print(trialobj.new_anndata)
Annotated Data of n_obs × n_vars = 10 × 10000
available attributes:
        .X (primary datamatrix)
        .obs (obs metadata):
                |- 'cell_loc_x', 'cell_loc_y'
        .var (vars metadata):
                |- 'exp_group'

Note: adding and deleting an ‘var’ to existing anndata object can be done in the exact same manner as demonstrated above for ‘obs’ using .add_var() and .del_var() methods on an anndata object.