-
Notifications
You must be signed in to change notification settings - Fork 9
/
Copy pathpractices.tex
240 lines (189 loc) · 18.8 KB
/
practices.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
% -----------------------------------------------------------------------------
\section{Recommended Best Practices}
\label{sec:bestpractices}
% -----------------------------------------------------------------------------
\subsection{SBOL Versions}
To differentiate between major versions of SBOL, different namespaces are used. For example, SBOL3 has the namespace \url{http://sbols.org/v3#}, while SBOL2 has the namespace \url{http://sbols.org/v2#}. These different versions of SBOL SHOULD NOT be semantically mixed. For example, an SBOL 3.x \sbol{SubComponent} SHOULD NOT refer to an SBOL 2.x \external{ComponentInstance}, and, likewise, an SBOL 2.x \external{ComponentInstance} SHOULD NOT refer to an SBOL 1.x \external{DnaComponent}.
\subsection{Compliant SBOL Objects}
\label{sec:compliant}
Maintaining unique URIs for all SBOL objects can be challenging. To reduce this burden, users of SBOL 3.x are encouraged to follow a few simple rules when constructing the URIs and related properties for SBOL objects. When these rules are followed in constructing an SBOL object, we say that this object is \emph{compliant}. These rules are as follows:
Compliant URIs for \sbol{TopLevel} objects MUST conform to the following pattern:
\begin{quotation}
\refObj{namespace}/\refObj{collection\_structure}/\refObj{displayId}
\end{quotation}
The \refObj{namespace} token MAY further decompose into \refObj{domain}/\refObj{root} tokens. The \refObj{root} and \refObj{collection\_structure} tokens may optionally be omitted; alternatively, they may consist of an arbitrary number of delimiter-separated layers. Note that this pattern means that SBOL-compliant \sbol{URI}s can be automatically decomposed with the aid of a \sbol{Namespace}. SBOL-compliant objects can be easily remapped into new namespaces by changing only the \refObj{namespace}.
Consider, for example, the SBOL-compliant \sbol{URI}:
\begin{quote}``https://synbiohub.org/igem/2017\_distribution/promoters/constitutive/BBa\_J23101''\end{quote}
in \sbol{Namespace} ``https://synbiohub.org/igem/2017\_distribution''.
This \sbol{URI} can be decomposed as follows:
\begin{quote}
namespace: ``https://synbiohub.org/igem/2017\_distribution'' \linebreak
domain: ``https://synbiohub.org'' \linebreak
root: ``igem/2017\_distribution'' \linebreak
collection: ``promoters/constitutive'' \linebreak
displayId: ``BBa\_J23101'' \linebreak
\end{quote}
SBOL-compliant URIs also facilitate auto-construction of child objects with unique \sbol{URI}s.
Child objects of \sbol{TopLevel} objects with compliant \sbol{URI}s MUST conform to the following pattern:\\ ``\refObj{parent\_uri}/\refObj{child\_type}\refObj{child\_type\_counter}'' where the \refObj{parent\_uri} refers to the URI of the parent object, the \refObj{child\_type} refers to the SBOL class of the child object, and \refObj{child\_type\_counter} is a unique index for the child object.
The \refObj{child\_type\_counter} of a new object SHOULD be calculated at time of object creation as 1 + the maximum \refObj{child\_type\_counter} for each \refObj{child\_type} object in the parent (e.g., ``\refObj{parent\_uri}/SequenceAnnotation37'').
Note that numbering is independent for each type, so a \sbol{Component} can have children ``SubComponent37'' and ``Constraint37''.
All examples in this specification use compliant \sbol{URI}s.
\subsection{Versioning SBOL Objects}
SBOL 3.x does not specify an explicit versioning scheme. Rather it is left for experimentation across different tools. This allows version information to be included in the root (e.g., GitHub style: ``igem/HEAD/''), collection structure (e.g., ``promoters/constitutive/2/''), in tool-specific conventions on \sbol{displayId} (e.g., ``BBa\_J23101\_v2'') or in information outside of the \sbol{URI} (e.g., by attaching \external{prov:wasRevisionOf} properties).
\subsection{Annotations: Embedded Objects vs. External References}
When annotating an SBOL document with additional information, there are
two general methods that can be used:
\begin{itemize}
\item Embed the information in the SBOL document using properties outside of the SBOL namespace.
\item Store the information separately and annotate the SBOL document with \sbol{URI}s that point to it.
\end{itemize}
In theory, either method can be used in any case. (Note that a third case not
discussed here is to annotate external objects with links
to SBOL documents, rather than annotating SBOL documents with links to external objects.)
In practice,
embedding large amounts of non-SBOL data into SBOL documents is likely
to cause problems for people and software tools trying to manage and
exchange such documents. Therefore, it is RECOMMENDED that small amounts of information (e.g., design notes or preferred graphical layout) be embedded in the SBOL model, while large amounts of information (e.g., the contents of the scientific publication from which a model was derived or flow cytometry data that characterizes performance) be linked with URIs pointing to external resources. The boundary between ``small'' and ``large'' is left deliberately vague, recognizing that it will likely depend on the particulars of a given SBOL application.
\subsection{Completeness and Validation}
RDF documents containing serialized SBOL objects might or might not be
entirely self-contained. A SBOL document is self-contained or ``complete'' if every SBOL object referred to in the document is contained in the document. It is RECOMMENDED that serializations be complete whenever practical. In order words, when serializing an SBOL object, serialize all of the other objects that it points to, then serialize all of the other objects that these objects point to, etc., until the document is complete.
It is important to note that there is no guarantee that an RDF document
contains valid SBOL. When SBOL objects are read from an RDF document,
the program doing so SHOULD verify that all of the property
values encoded therein have the correct data type (e.g., that the object
pointed to by the \sbol{Sequence} property of a
\sbol{Component} is really a \sbol{Sequence}).
For complete files, this validation can be carried out entirely locally. For files that are not complete, an implementation either needs to have a means of validating those external references (e.g., by
retrieving them from a repository), or it needs to mark them as
unverified and not depend on their correctness.
\subsection{Recommended Ontologies for External Terms}
\label{sec:recomm_ontologies}
External ontologies and controlled vocabularies are an integral part of SBOL. SBOL uses \sbol{URI}s to access existing biological information through these resources.
New SBOL-specific terms are defined only when necessary.
For example, \sbol{Component} \sbolmult{type:C}{type}s, such as DNA or protein, are described using Systems Biology Ontology (SBO) terms. Similarly, the \sbolmult{role:C}{role}s of a DNA or RNA \sbol{Component} are described via Sequence Ontology (SO) terms. Although RECOMMENDED ontologies have been indicated in relevant sections where possible, other resources providing similar terms can also be used. A summary of these external sources can be found in \ref{tbl:preferred_external_resources}.
\begin{table}[htp]
\begin{edtable}{tabular}{p{2cm}p{1.5cm}p{5cm}p{6cm}}
\toprule
\textbf{SBOL Entity} & \textbf{Property} & \textbf{Preferred External Resource}
& \textbf{More Information} \\
\midrule
\textbf{Component} & type & SBO (physical entity branch)& \url{http://www.ebi.ac.uk/sbo/main/}\\
& type & SO (nucleic acid topology)& \url{http://www.sequenceontology.org}\\
& role & SO (\textit{DNA} or \textit{RNA}) & \url{http://www.sequenceontology.org} \\
& role & CHEBI (\textit{small molecule}) & \url{https://www.ebi.ac.uk/chebi/} \\
& role & UniProt (\textit{protein}) & \url{https://www.uniprot.org/} \\
& role & NCIT (\textit{samples}) & \url{https://ncithesaurus.nci.nih.gov/} \\
\textbf{Interaction} & type & SBO (occurring entity branch) &
\url{http://www.ebi.ac.uk/sbo/main/} \\
\textbf{Participation} & role & SBO (participant roles branch) &
\url{http://www.ebi.ac.uk/sbo/main/} \\
\textbf{Model} & language & EDAM & \url{http://bioportal.bioontology.org/ontologies/EDAM} \\
& framework & SBO (modeling framework branch) &
\url{http://www.ebi.ac.uk/sbo/main/} \\
\textbf{om:Measure} & type & SBO (systems description parameters) &
\url{http://www.ebi.ac.uk/sbo/main/} \\
\bottomrule
\end{edtable}
\caption{Preferred external resources from which to draw values for various SBOL properties.}
\label{tbl:preferred_external_resources}
\end{table}
The URIs for ontological terms SHOULD come from identifiers.org. However, it is acceptable to use terms from purl.org as an alternative, for example when RDF tooling requires URIs to be represented as compliant QNames. SBOL software may convert between these forms as required.
\subsection{Annotating Entities with Date \& Time}\label{sec:DateTime}
Entities in an SBOL document can be annotated with creation and modification dates. It is RECOMMENDED that predicates, or properties, from DCMI Metadata Terms SHOULD be used to include date and time information. The \texttt{created} and \texttt{modified} terms SHOULD respectively be used to annotate SBOL entities with creation and modification dates. Date and time values SHOULD be expressed using the XML Schema \texttt{DateTime} datatype~\citep{Biron2004}. For example, ``\texttt{2016-03-16T20:12:00Z}'' specifies that the day is 16 March 2016 and the time is 20:12pm in UTC (Coordinated Universal Time).
\subsection{Annotating Entities with Authorship information}\label{sec:Authorship}
Authorship information should ideally be added to \sbol{TopLevel} entities where possible. It is RECOMMENDED that the \texttt{creator} DCMI Metadata term SHOULD be used to annotate SBOL entities with authorship information using free text. This property can be repeated for each author.
%The example below shows the use of this property for two authors and the values shown are free text \texttt{String} literals.
\subsection{Host Context / Ontologies for Experiments}
\subsubsection{Mixtures via Components}
Any \sbol{Component} can be interpreted as specifying a mixture of the material entity (SBO:0000240) \sbol{Feature}s that it includes. The amount of each such instance included in the mixture SHOULD be specified by attaching a \om{Measure} with a \sbolmult{type:Measure}{type} set to the appropriate SBO term. The SBO terms that are RECOMMENDED as appropriate are members of the Systems Description Parameter (SBO:0000545) branch of SBO. Examples include:
\begin{itemize}
\item SBO:0000540: fraction of an entity pool (e.g., 1/3 CHO cells, 2/3 HEK cells)
\item SBO:0000472: molar concentration of an entity (e.g., 1 mM Arabinose)
\item SBO:0000361: amount of an entity pool (e.g., 200 uL M9 media)
\end{itemize}
Mixtures MAY be defined recursively, as mixtures of mixtures of mixtures, etc.
\subsubsection{Media, Inducers, and Other Reagents}
Each reagent, whether ``atomic'' (e.g., rainbow bead control) or mixture (e.g., M9 media), SHOULD be represented as a \sbol{Component}.
The roles of reagents may vary in context: for example, Arabinose may serve as an inducer or as a media carbon source. As such, role SHOULD be indicated by an NCI Thesaurus (NCIT) term in a \sbolmult{role:F}{role} property of the \sbol{SubComponent}. Examples include:
\begin{itemize}
\item NCIT:C64356: Positive Control
\item NCIT:C48694: Cell
\item NCIT:C85504: Media
\item NCIT:C14419: Strain
\item NCIT:C120268: Inducer
\end{itemize}
For more information on representing cells, strains, plasmids, and genomes, see \ref{bp:cells}
%\todo[inline]{Should we switch the GO term recommendation below to match the NCIT recommendation below?}
\subsubsection{Samples}
A complete specification of a sample SHOULD be a \sbol{Component} that includes at least:
\begin{itemize}
\item A \sbol{SubComponent} instantiating each strain in the sample
\item A \sbol{SubComponent} for the media or buffer
\item A \sbol{SubComponent} for each additional reagent added to the media (e.g., inducers, antibiotics)
\item \om{Measure}s on each of these specifying the amount in the sample
\item \om{Measure}s on the \sbol{Component} for each environmental parameter (e.g., temperature, pH, culturing time)
\end{itemize}
\subsubsection{Other Experimental Parameters}
In order to deal with parameters associated with the context in general but not specific instances, e.g., temperature, pH, total sample volume, the \sbol{hasMeasure} property of \sbol{Identified} can be used. The \sbol{hasMeasure} of a \sbol{Component} provides context-free information (e.g., the pH of M9 media, the GC-content of a GFP coding sequence), while the \sbol{hasMeasure} of a material entity (SBO:0000240) \sbol{Feature} provides a measurement in context (e.g., the dosage of Arabinose in a sample).
Values of these parameters SHOULD be specified by attaching a \om{Measure} with a \sbolmult{type:Measure}{type} set to the appropriate SBO term. The SBO terms that are RECOMMENDED as appropriate are members of the Systems Description Parameter (SBO:0000545) branch of SBO. Examples include:
\begin{itemize}
\item SBO:0000147: thermodynamic temperature (e.g., culturing at 27 C)
\item SBO:0000332: half-life of an exponential decay (e.g., decay rate of a gRNA)
\item SBO:0000304: pH (e.g., pH of M9 media)
\end{itemize}
\subsection{Multicellular System Designs}
SBOL has been used extensively to represent designs in homogeneous systems, where the same design is implemented in every cell. However, in recent years there has been increasing interest in multicellular systems, where biological designs are split across multiple cells to optimize the system behavior and function. Therefore, there is a need to define a set of best practices so that multicellular systems can be captured using SBOL in a standard way.
\subsubsection{Representing Cell Types}
\label{bp:cells}
To represent multicellular systems using SBOL, it is first necessary to represent cells.
When doing so, it is important to be able to capture the following information: (i) taxonomy of the strain used, (ii) interactions occurring within cells of this type, and (iii) components inside the type of cell (e.g. genomes, plasmids).
The approach RECOMMENDED in this section is capable of capturing this information, as shown in the example in \ref{uml:cell_representation}.
It uses a \sbol{Component} to represent a system that contains cells of the given type.
The cells themselves are represented by a \sbol{SubComponent} inside the \sbol{Component}, which is an \sbol{instanceOf}
a \sbol{Component} capturing information about the species and strain of the cell in the design.
This \sbol{Component} has a \sbolmult{type:C}{type} of ``cell'' from the Gene Ontology (GO:0005623), and a \sbolmult{role:C}{role} of ``physical compartment'' (SBO:0000290).
%\todo[inline]{What property are we actually recommending to use for the annotation?}
Taxonomic information is captured by annotating the class instance with a URI for an entry in the NCBI Taxonomy Database.
As usual, other entities besides the cell that are relevant to the design are also captured as \sbol{Feature}s.
When these are contained within the cell, they are captured using a \sbol{Constraint} with restriction \texttt{contains} with the cell as \sbol{subject} and contained object as \sbol{object}.
Interactions which occur in this system are captured using the \sbol{Interaction} and \sbol{Participation} classes.
Interactions which occur within the cell are specified by \sbol{Interaction} classes which contain the \sbol{SubComponent} instance representing the cell as a \sbol{participant} with a \sbolmult{role:P}{role} of ``physical compartment'' (SBO:0000290).
\begin{figure}[htp]
\begin{center}
\includegraphics[width=\textwidth]{uml/cell_representation}
\caption[Repressenting a cell]{This is a proposed approach for capturing cell designs in SBOL. A \sbol{Component} annotated with a URI pointing to an entry in the NCBI Taxonomy Database is used to capture information about the cell's strain/species.
The \sbol{Component} has a type of ``Cell'' from the Gene Ontology (GO), and a role of ``physical compartment''.
Another \sbol{Component} is used to represent a system in which the cell is implemented.
Entities, including the cell, are instantiated as \sbol{SubComponent}s, and processes are captured using the \sbol{Interaction} class.
Processes that are contained within the cell are represented by including the cell as a participant with a role of ``physical compartment''. }
\label{uml:cell_representation}
\end{center}
\end{figure}
\subsubsection{Multiple Cell Types in a Single Design}
The same approach can be extended to represent systems with multiple types of cells.
The multicellular system can be represented as a \sbol{Component} that includes each strain of cell as a \sbol{SubComponent} that is an \sbol{instanceOf} a \sbol{Component} defining its strain.
Interactions and constraints, such as a molecule that both strains interact with, are implemented using \sbol{ComponentReference}s to link to the definitions within each cell system description.
An example is shown in \ref{uml:multiple_cell_representation}.
\begin{figure}[htp]
\begin{center}
\includegraphics[width=\textwidth]{uml/two_cell_representation}
\caption[]{Captured here is a design involving two cells which both interact with the small molecule ``Molecule A''.
Designs for the sender and receiver systems are captured using constraint to show that each of these cells interacts with the Molecule A contained within it.
The overall multicellular system is represented by a \sbol{Component} with a \sbolmult{role:C}{role} of ``functional compartment'', which is an SBO term.
The two systems are included in this multicellular design as \sbol{SubComponent}s, and the fact that Molecule A is shared between systems is indicated with a constraint.}
\label{uml:multiple_cell_representation}
\end{center}
\end{figure}
\subsubsection{Cell Ratios}
The proportion of cell types present in a multicellular system can be captured using \om{Measure} on the representations of cells in the design.
As a best practice, the value of these measure classes is a percentage less than or equal to 100\%, representing the amount of a cell type present in the system compared to all other cell types present.
Therefore, the sum of all these values specified in the system will typically be equal to 100\%, though this may not be the case if the system is not completely defined.
An example is shown in \ref{uml:cell_ratios}.
\begin{figure}[htp]
\begin{center}
\includegraphics[width=\textwidth]{uml/cell_ratios}
\caption[]{Annotating class instances with cellular proportions. Instances of the Measure class are used to capture the percentage of each cell type present in the multicellular system design.
}
\label{uml:cell_ratios}
\end{center}
\end{figure}