Synthetic biology has reached the same inflection point achieved by computer science in the 1950's
The foundational pieces emerging, in the form of standardized DNA parts, packaged for combinatoric assembly using standards such as BioBricks.
But:
- Creating new DNA parts is tedious and time-consuming
- Constructing systems from sets of DNA parts is an ad hoc, manual process that limits the size, complexity, and capability of the resulting systems.
The time is right for automation of biological design.
We work at the intersection of synthetic biology and computer science. We have introduced and pursued a vision of a toolchain that stretches from high-level languages to cellular implantation of genetic circuits. We have developed tools for high-level design and data representation. Our efforts have also focused on the reproducibility of results, improving device libraries, and high-precision prediction.
We apply lessons learned and techniques from computer science and artificial intelligence to synthetic biology. Engineering practices such as libraries of parts, modularization, standards and interfaces, computer aided design, and AI techniques can advance the capabilities of synthetic biology.
- built the first end-to-end toolchain for synthetic biology design automation including the BioCompiler that outperforms human designers.
- active participant in the Synthetic Biology Open Language (SBOL) that serves as a hub for linking many different synthetic biology resources.
- led the iGEM interlab study that examines characterization and reproducibility of results across hundreds of laboratories.
- developed a calibrated flow cytometry method to measure, compare, and combine biological circuit components which enabled RTX BBN Technologies' high-precision quantitative prediction software more accurately.
Our research spans the boundary between academia and industry and between data-driven/AI methods and wet-lab investigations.
Areas of work include:
- Calibrated measurement and characterization of genetic devices
- Engineering of high-performance genetic regulation devices
- Representation and sharing of designs, protocols, and data
- Sequence-based detection of pathogens, toxins, and signs of engineering
- Modeling and manipulating the 3D growth and differentiation of cells
- Applied capabilities of multi-organism communities
- Advanced biological detection methods
Synthetic Biology is important for diverse applications
including:
- new medical diagnostics and therapies,
- environmental remediation and sensing, and
- chemical production or detection
Under DARPA’s Bio Reporters for Subterranean Surveillance program we are exploring a method to use naturally-occurring fungus to detect buried TNT. By using natural soil fungal webs to propagate engineered bacteria underground and send signals back up to the surface, we plan to create a warning system that glows under ultra-violet light to indicate the presence of buried TNT.
Under DARPA’s Friend or Foe program, scientists at Raytheon Technologies are developing a portable device to detect bacteria and evaluate their potential to cause harm as soon as or even before they pose a threat to civilian and military populations. This system will characterize bacteria quickly by examining their behavior, where current surveillance techniques don’t work on undiscovered bacterial strains or on bacteria engineered to evade detection.
Right now, there is no technology that can quickly detect engineered microorganisms. For the Intelligence Advanced Research Projects Activity (IARPA), we’re developing a system that adapts proven microfluidics hardware and uses our proven cybersecurity techniques to identify microorganisms based on their DNA sequences.
News & Feature Stories
Related Solutions
FAST-NA
Publications
pySBOL3: SBOL3 for Python Programmers
Abstract
The Synthetic Biology Open Language version 3 (SBOL3) provides a data model for representation of synthetic biology information across multiple scales and throughout the design-build-test-learn workflow. To support practical use of this data model, we have developed pySBOL3, a Python library that allows programmers to create and edit SBOL3 documents. Here we describe this library and key engineering decisions in its design. The resulting implementation is a compact and maintainable core that provides both a familiar, pythonic interface for manipulating SBOL3 objects as well as mechanisms for building additional extensions and representations on this base.
Tyto: A Python Tool Enabling Better Annotation Practices for Synthetic Biology Data-Sharing
Abstract
As synthetic biology becomes increasingly automated and data-driven, tools that help researchers implement FAIR (findable-accessible-interoperable-reusable) data management practices are needed. Crucially, in order to support machine processing and reusability of data, it is important that data artifacts are appropriately annotated with metadata drawn from controlled vocabularies. Unfortunately, adopting standardized annotation practices is difficult for many research groups to adopt, given the set of specialized database science skills usually required to interface with ontologies. In response to this need, Take Your Terms from Ontologies (Tyto) is a lightweight Python tool that supports the use of controlled vocabularies in everyday scripting practice. While Tyto has been developed for synthetic biology applications, its utility may extend to users working in other areas of bioinformatics research as well.
Round Trip: An Automated Pipeline for Experimental Design, Execution, and Analysis
Abstract
Synthetic biology is a complex discipline that involves creating detailed, purpose-built designs from genetic parts. This process is often phrased as a Design-Build-Test-Learn loop, where iterative design improvements can be made, implemented, measured, and analyzed. Automation can potentially improve both the end-to-end duration of the process and the utility of data produced by the process. One of the most important considerations for the development of effective automation and quality data is a rigorous description of implicit knowledge encoded as a formal knowledge representation. The development of knowledge representation for the process poses a number of challenges, including developing effective human–machine interfaces, protecting against and repairing user error, providing flexibility for terminological mismatches, and supporting extensibility to new experimental types. We address these challenges with the DARPA SD2 Round Trip software architecture.
Building an Open Representation for Biological Protocols
Abstract
Laboratory protocols are critical to biological research and development, yet difficult to communicate and reproduce across projects, investigators, and organizations. While many attempts have been made to address this challenge, there is currently no available protocol representation that is unambiguous enough for precise interpretation and automation, yet simultaneously abstract enough to enable reuse and adaptation. The Protocol Activity Markup Language (PAML) is a free and open protocol representation aiming to address this gap, building on a foundation of UML, Autoprotocol, and SBOL RDF. PAML provides a representation both for protocols and for records of their execution and the resulting data, as well as a framework for exporting from PAML for execution by either humans or laboratory automation. PAML is currently implemented in the form of an RDF knowledge representation, specification document, and Python library, can be exported for execution as either a manual "paper protocol" or Autoprotocol, and is being further developed as an open community effort.
Abstract
Computational tools addressing various components of design-build-test-learn loops (DBTL) for the construction of synthetic genetic networks exist, but do not generally cover the entire DBTL loop. This manuscript introduces an end-to-end sequence of tools that together form a DBTL loop called DART (Design Assemble Round Trip). DART provides rational selection and refinement of genetic parts to construct and test a circuit. Computational support for experimental process, metadata management, standardized data collection, and reproducible data analysis is provided via the previously published Round Trip (RT) test-learn loop. The primary focus of this work is on the Design Assemble (DA) part of the tool chain, which improves on previous techniques by screening up to thousands of network topologies for robust performance using a novel robustness score derived from dynamical behavior based on circuit topology only.
Computational Prediction of Synthetic Circuit Function Across Growth Conditions
Abstract
A challenge in the design and construction of synthetic genetic circuits is that they will operate within biological systems that have noisy and changing parameter regimes that are largely unmeasurable. The outcome is that these circuits do not operate within design specifications or have a narrow operational envelope in which they can function. This behavior is often observed as a lack of reproducibility in function from day to day or lab to lab. Moreover, this narrow range of operating conditions does not promote reproducible circuit function in deployments where environmental conditions for the chassis are changing, as environmental changes can affect the parameter space in which the circuit is operating. Here we describe a computational method for assessing the robustness of circuit function across broad parameter regions. Previously designed circuits are assessed by this computational method and then circuit performance is measured across multiple growth conditions in budding yeast. The computational predictions are correlated with experimental findings, suggesting that the approach has predictive value for assessing the robustness of a circuit design.
Highly-Automated, High-Throughput Replication of Yeast-based Logic Circuit Design Assessments
Abstract
We describe an experimental campaign that replicated the performance assessment of logic gates engineered into cells of S. cerevisiae by Gander, et al. Our experimental campaign used a novel high throughput experimentation framework developed under DARPA9s Synergistic Discovery and Design (SD2) program: a remote robotic lab at Strateos executed a parameterized experimental protocol. Using this protocol and robotic execution, we generated two orders of magnitude more flow cytometry data than the original experiments. We discuss our results, which largely, but not completely, agree with the original report, and make some remarks about lessons learned.
Fungal highways enable migration and communication of engineered bacteria in soil
Abstract
The soil microbiome is essential for natural chemical cycles, contains immense biosynthetic capacity, and interfaces with civilization through agriculture, the built environment, and national defense. Understanding and harnessing the soil microbiome is therefore critical on multiple fronts. One potential use of the soil microbiome is quantification of soil chemical state at depth across broad areas. However, inducible circuits that function in soil and delivery of circuits underground is currently impossible – for example, if the typical genetic circuit host E. coli even survives in soil it cannot penetrate into the soil or maintain burdensome plasmids for long. However, the study of natural soil microbiomes has revealed that bacterial migration and long-distance chemical signaling occurs naturally, facilitated by filamentous fungal highways. If these properties could be harnessed, then delivery and sensing underground would be possible.
Engineered yeast genomes accurately assembled from pure and mixed samples
Abstract
Yeast whole genome sequencing (WGS) lacks end-to-end workflows that identify genetic engineering. Here we present Prymetime, a tool that assembles yeast plasmids and chromosomes and annotates genetic engineering sequences. It is a hybrid workflow—it uses short and long reads as inputs to perform separate linear and circular assembly steps. This structure is necessary to accurately resolve genetic engineering sequences in plasmids and the genome. We show this by assembling diverse engineered yeasts, in some cases revealing unintended deletions and integrations. Furthermore, the resulting whole genomes are high quality, although the underlying assembly software does not consistently resolve highly repetitive genome features. Finally, we assemble plasmids and genome integrations from metagenomic sequencing, even with 1 engineered cell in 1000. This work is a blueprint for building WGS workflows and establishes WGS-based identification of yeast genetic engineering.
Curation Principles derived from the Analysis of the SBOL iGEM Data Set
Abstract
As an engineering endeavor, synthetic biology requires effective sharing of genetic design information that can be reused in the construction of new designs. While there are a number of large community repositories of design information, curation of this information has been limited. This in turn limits the ways in which design information can be put to use. The aim of this work was to improve this situation by creating a curated library of parts from the International Genetically Engineered Machines (iGEM) registry data set. To this end, an analysis of the Synthetic Biology Open Language (SBOL) version of the iGEM registry was carried out using four different approaches simple statistics, SnapGene auto annotation, SYNBICT auto annotation, and expert analysis the results of which are presented herein. Key challenges encountered include the use of free text, insufficient part provenance, part duplication, lack of part removal, and insufficient continuous curation. On the basis of these analyses, the focus has shifted from the creation of a curated iGEM part library to instead the extraction of a set of lessons, which are presented here. These lessons can be exploited to facilitate the creation and curation of other part libraries using a simpler and less labor intensive process.
Stability and Resilience of Distributed Information Spreading in Aggregate Computing
Abstract
Spreading information through a network of devices is a core activity for most distributed systems. As such, self-stabilizing algorithms implementing information spreading are one of the key building blocks enabling aggregate computing to provide resilient coordination in open complex distributed systems. This paper improves a general spreading block in the aggregate computing literature by making it resilient to network perturbations, establishes its global uniform asymptotic stability and proves that it is ultimately bounded under persistent disturbances. The ultimate bounds depend only on the magnitude of the largest perturbation and the network diameter, and three design parameters trade off competing aspects of performance. For example, as in many dynamical systems, values leading to greater resilience to network perturbations slow convergence and vice versa.
Intent Parser: A Tool for Codification and Sharing of Experimental Design
Abstract
Communicating information about experimental design among a team of collaborators is challenging because different people tend to describe experiments in different ways and with different levels of detail. Sometimes, humans can interpret missing information by making assumptions and drawing inferences from information already provided. Doing so, however, is error-prone and typically requires a high level of interpersonal communication. In this paper, we present a tool that addresses this challenge by providing a simple interface for incremental formal codification of experiment designs. Users interact with a Google Docs word-processing interface with structured tables, backed by assisted linking to machine-readable definitions in a data repository (SynBioHub) and specification of available protocols and requests for execution in the Open Protocol Interface Language (OPIL). The result is an easy-to-use tool for generating machine-readable descriptions of experiment designs with which users in the DARPA SD2 program have collected data from 80 208 samples using a variety of protocols and instruments over the course of 181 experiment runs.
Abstract
Microphysiological organ-on-chip models offer the potential to improve the prediction of drug safety and efficacy through recapitulation of human physiological responses. The importance of including multiple cell types within tissue models has been well documented. However, the study of cell interactions in vitro can be limited by complexity of the tissue model and throughput of current culture systems. Here, we describe the development of a co-culture microvascular model and relevant assays in a high-throughput thermoplastic organ-on-chip platform, PREDICT96. The system consists of 96 arrayed bilayer microfluidic devices containing retinal microvascular endothelial cells and pericytes cultured on opposing sides of a microporous membrane.
Abstract
Drug development suffers from a lack of predictive and human-relevant in vitro models. Organ-on-chip (OOC) technology provides advanced culture capabilities to generate physiologically appropriate, human-based tissue in vitro, therefore providing a route to a predictive in vitro model. However, OOC technologies are often created at the expense of throughput, industry-standard form factors, and compatibility with state-of-the-art data collection tools. Here we present an OOC platform with advanced culture capabilities supporting a variety of human tissue models including liver, vascular, gastrointestinal, and kidney. The platform has 96 devices per industry standard plate and compatibility with contemporary high-throughput data collection tools. Specifically, we demonstrate programmable flow control over two physiologically relevant flow regimes: perfusion flow that enhances hepatic tissue function and high-shear stress flow that aligns endothelial monolayers.
Synthetic Biology Curation Tools (SYNBICT)
Abstract
Much progress has been made in developing tools to generate component-based design representations of biological systems from standard libraries of parts. Most biological designs, however, are still specified at the sequence level. Consequently, there exists a need for a tool that can be used to automatically infer component-based design representations from sequences, particularly in cases when those sequences have minimal levels of annotation. Such a tool would assist computational synthetic biologists in bridging the gap between the outputs of sequence editors and the inputs to more sophisticated design tools, and it would facilitate their development of automated workflows for design curation and quality control. Accordingly, we introduce Synthetic Biology Curation Tools (SYNBICT), a Python tool suite for automation-assisted annotation, curation, and functional inference for genetic designs. We have validated SYNBICT by applying it to genetic designs in the DARPA Synergistic Discovery & Design (SD2) program and the International Genetically Engineered Machines (iGEM) 2018 distribution. Most notably, SYNBICT is more automated and parallelizable than manual design editors, and it can be applied to interpret existing designs instead of only generating new ones.
Synthetic biology open language visual (SBOL visual) version 3.0
Abstract
People who engineer biological organisms often find it useful to draw diagrams in order to communicate both the structure of the nucleic acid sequences that they are engineering and the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. SBOL Visual aims to organize and systematize such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 3.0 of SBOL Visual, a new major revision of the standard. The major difference between SBOL Visual 3 and SBOL Visual 2 is that diagrams and glyphs are defined with respect to the SBOL 3 data model rather than the SBOL 2 data model. A byproduct of this change is that the use of dashed undirected lines for subsystem mappings has been removed, pending future determination on how to represent general SBOL 3 constraints; in the interim, this annotation can still be used as an annotation. Finally, deprecated material has been removed from collection of glyphs: the deprecated “insulator” glyph and “macromolecule” alternative glyphs have been removed, as have the deprecated BioPAX alternatives to SBO terms.
Towards collaborative and automated development of resources for data standards in synthetic biology
Abstract
Data standards in synthetic biology are becoming ever more important as the number of tools addressing different needs increases, such as designing genetic circuits and visualizing and storing the designs. The Synthetic Biology Open Language (SBOL) [4] has been developed to provide a mechanism for the electronic exchange and common understanding of these designs and related information. Moreover, SBOL Visual [1] standardizes the representation of genetic circuit designs via well-defined glyphs.
Excel-SBOL Converter: Creating SBOL from Excel Templates and Vice Versa
Abstract
Synthetic biology is bringing together engineers and biologists [10]. Associated with this interdisciplinary movement is the need for reusable tools that supplement the current understanding of genetic sequences. To satisfy this need, Synthetic Biology communities across the world have developed tools and ontologies to help describe their unique semantic annotations [1, 3–9, 13, 14, 17–19, 22]. Shared representations for data and metadata, grounded in well-defined ontology terms, can help reduce confusion when sharing materials between practitioners,[20]. The Synthetic Biology Open Language (SBOL) [5] is one of the approaches that has been developed to address this challenge. SBOL provides a standardized format for the electronic exchange of information on the structural and functional aspect of biological designs, supporting use of engineering principles of abstraction, modularity, and standardization in synthetic biology. Many tools have been created that work with SBOL, including the SynBioHub repository software for storing and sharing designs [12].
Data Representation in the DARPA SD2 Program
Abstract
Modern scientific enterprises are often highly complex and multidisciplinary, particularly in areas like synthetic biology where the subject at hand is itself inherently complex and multidisciplinary. Collaboration across many organizations is necessary to efficiently tackle such problems [6, 15], but remains difficult. The challenge is further amplified by automation that increases the pace at which new information can be produced, and particularly so for matters of fundamental research, where concepts and definitions are inherently fluid and may rapidly change as an investigation evolves [7].
Cyberbiosecurity and Public Health in the Age of COVID-19
Abstract
Introduction Cyber biosecurity, the aspect of biosecurity involving the digital representation of biological data, had already been emerging as a matter of public concern even prior to the onset of the COVID-19 pandemic. Key issues of concern include, among others, the privacy of patient data, the security of public health databases, the integrity of diagnostic test data, the integrity of public biological databases, the security implications of automated laboratory systems and the security of proprietary biological engineering advances.
Effect of Monotonic Filtering on Graph Collection Dynamics,
Abstract
Distributed data collection is a fundamental task in open systems. In such networks, data is aggregated across a network to produce a single aggregated result at a source device. Though self-stabilizing, algorithms performing data collection can produce large overestimates of aggregates in the transient phase. For example, in [1] we demonstrated that in a line graph, a switch of sources after initial stabilization may produce overestimates that are quadratic in the network diameter. We also proposed monotonic filtering as a strategy for removing such large overestimates. Monotonic filtering prevents the transfer of data from device A to device B unless the distance estimate at A is more than that at B at the previous iteration.
Abstract
Many synthetic gene circuits are restricted to single-use applications or require iterative refinement for incorporation into complex systems. One example is the recombinase-based digitizer circuit, which has been used to improve weak or leaky biological signals. Here we present a workflow to quantitatively define digitizer performance and predict responses to different input signals. Using a combination of signal-to-noise ratio (SNR), area under a receiver operating characteristic curve (AUC), and fold change (FC), we evaluate three smallmolecule inducible digitizer designs demonstrating FC up to 508x and SNR up to 3.77 dB. To study their behavior further and improve modularity, we develop a mixed phenotypic/ mechanistic model capable of predicting digitizer configurations that amplify a synNotch cellto-cell communication signal (Δ SNR up to 2.8 dB). We hope the metrics and modeling approaches here will facilitate incorporation of these digitizers into other systems while providing an improved workflow for gene circuit characterization.
Synthetic biology open language visual (SBOL Visual) version 2.3
Abstract
People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been
Abstract
Reproducibility is a key challenge of synthetic biology, but the foundation of reproducibility is only as solid as the reference materials it is built upon. Here we focus on the reproducibility of fluorescence measurements from bacteria transformed with engineered genetic constructs. This comparative analysis comprises three large interlaboratory studies using flow cytometry and plate readers, identical genetic constructs, and compatible unit calibration protocols. Across all three studies, we find similarly high precision in the calibrants used for plate readers. We also find that fluorescence measurements agree closely across the flow cytometry results and two years of plate reader results, with an average standard deviation of 1.52-fold, while the third year of plate reader results are consistently shifted by more than an order of magnitude, with an average shift of 28.9-fold. Analyzing possible sources of error indicates this shift is due to incorrect preparation of the fluorescein calibrant. These findings suggest that measuring fluorescence from engineered constructs is highly reproducible, but also that there is a critical need for access to quality controlled fluorescent calibrants for plate readers.
A Lyapunov Analysis of a Most Probable Path Finding Algorithm
Abstract
Distributed information spreading algorithms are important building blocks in Aggregate Computing. We consider a special case, namely for finding a most probable path for message delivery from a set of sources to each device in a network. We formulate a Lyapunov function to prove its regional stability subject to initialization of estimated probabilities to the natural interval 0,1). We also prove that the algorithm converges in a finite time, and is ultimately bounded under persistent measurement errors. We provide tight bounds for convergence time, the ultimate bound, and the time for its attainment.
Priority-enabled Load Balancing for Dispersed Computing
Abstract
Opportunistic managed access to local in-network compute resources can improve the performance of distributed applications and reduce the dependence on shared network resources. Instead of backhauling application data to a centralized cloud data center for processing, networked services may be adaptively and continuously dispersed into shared compute resources that are closer to the source of need. While this approach has several benefits, support for mission-aware access to computation is often an afterthought, and is implemented as a brittle extension over traditional load-balancer solutions.
Incomplete Cell Sorting Creates Engineerable Structures with Long-Term Stability
Abstract
Adhesion-mediated cell sorting has long been considered an organizing principle in developmental biology. While most computational models have emphasized the dynamics of segregation to fully sorted structures, cell sorting can also generate a plethora of transient, incompletely sorted states. The timescale of such states in experimental systems is unclear: if they are long-lived, they can be harnessed by development or engineered in synthetic tissues. Here, we use experiments and computational modeling to demonstrate how such structures can be systematically designed by quantitative control of cell composition. By varying the number of highly adhesive and less adhesive cells in multicellular aggregates, we find the cell-type ratio and total cell count control pattern formation, with resulting structures maintained for several days. Our work takes a step toward mapping the design space of self-assembling structures in development and provides guidance to the emerging field of shape engineering with synthetic biology.
Abstract
Laboratory automation now commonly allows high-throughput sample preparation, culturing, and acquisition of microscopy images, but quantitative image analysis is often still a painstaking and subjective process. This is a problem especially significant for work on programmed morphogenesis, where the spatial organization of cells and cell types is of paramount importance. To address the challenges of quantitative analysis for such experiments, we have developed TASBE Image Analytics, a software pipeline for automatically segmenting collections of cells using the fluorescence channels of microscopy images. With TASBE Image Analytics, collections of cells can be grouped into spatially disjoint segments, the movement or development of these segments tracked over time, and rich statistical data output in a standardized format for analysis. Processing is readily configurable, rapid, and produces results that closely match hand annotation by humans for all but the smallest and dimmest segments. TASBE Image Analytics can thus provide the analysis necessary to complete the design-build-test-learn cycle for high-throughput experiments in programmed morphogenesis, as validated by our application of this pipeline to process experiments on shape formation with engineered CHO and HEK293 cells.
Levels of Autonomy in Synthetic Biology Engineering
Abstract
Engineering biological organisms is a complex, challenging, and often slow process. Other engineering domains have addressed such challenges with a combination of standardization and automation, enabling a divide-and-conquer approach to complexity and greatly increasing productivity. For example, standardization and automation allow rapid and predictable translation of prototypes into fielded applications (e.g., “design for manufacturability”), simplify sharing and reuse of work between groups, and enable reliable outsourcing and integration of specialized subsystems. Although this approach has also been part of the vision of synthetic biology, almost since its very inception (Knight & Sussman, 1998), this vision still remains largely unrealized (Carbonell et al, 2019).
Field-based Coordination with the Share Operator
Abstract
Field-based coordination has been proposed as a model for coordinating collective adaptive systems, promoting a view of distributed computations as functions manipulating data structures spread over space and evolving over time, called computational fields. The field calculus is a formal foundation for field computations, providing specific constructs for evolution (time) and neighbor interaction (space), which are handled by separate operators (called rep and nbr, respectively). This approach, however, intrinsically limits the speed of information propagation that can be achieved by their combined use. In this paper, we propose a new field-based coordination operator called share, which captures the space-time nature of field computations in a single operator that declaratively achieves: (i) observation of neighbors’ values; (ii) reduction to a single local value; and (iii) update and converse sharing to neighbors of a local variable.
Robust Estimation of Bacterial Cell Count from Optical Density
Abstract
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data.
Abstract
Reproducibility is a key challenge of synthetic biology, but the foundation of reproducibility is only as solid as the reference materials it is built upon. Here we focus on the reproducibility of fluorescence measurements from bacteria transformed with engineered genetic constructs. This comparative analysis comprises three large interlaboratory studies using flow cytometry and plate readers, identical genetic constructs, and compatible unit calibration protocols. Across all three studies, we find similarly high precision in the calibrants used for plate readers. We also find that fluorescence measurements agree closely across the flow cytometry results and two years of plate reader results, with an average standard deviation of 1.52-fold, while the third year of plate reader results are consistently shifted by more than an order of magnitude, with an average shift of 28.9-fold. Analyzing possible sources of error indicates this shift is due to incorrect preparation of the fluorescein calibrant.
The Synthetic Biology Open Language (SBOL) Version 3: Simplified Data Exchange for Bioengineering
Abstract
The Synthetic Biology Open Language (SBOL) is a community-developed data standard that allows knowledge about biological designs to be captured using a machine-tractable, ontology-backed representation that is built using Semantic Web technologies. While early versions of SBOL focused only on the description of DNAbased components and their sub-components, SBOL can now be used to represent knowledge across multiple scales and throughout the entire synthetic biology workflow, from the specification of a single molecule or DNA fragment through to multicellular systems containing multiple interacting genetic circuits. The third major iteration of the SBOL standard, SBOL3, is an effort to streamline and simplify the underlying data model with a focus on real-world applications, based on experience from the deployment of SBOL in a variety of scientific and industrial settings. Here, we introduce the SBOL3 specification both in comparison to previous versions of SBOL and through practical examples of its use. Keywords: synthetic biology, data standards, data exchange, knowledge representation, SBOL
A Lyapunov Analysis of a Most Probable Path Finding Algorithm
Abstract
Distributed information spreading algorithms are important building blocks in Aggregate Computing. We consider a special case, namely for finding a most probable path for message delivery from a set of sources to each device in a network. We formulate a Lyapunov function to prove its regional stability subject to initialization of estimated probabilities to the natural interval [0,1). We also prove that the algorithm converges in a finite time, and is ultimately bounded under persistent measurement errors. We provide tight bounds for convergence time, the ultimate bound, and the time for its attainment.
Priority-enabled Load Balancing for Dispersed Computing
Abstract
Opportunistic managed access to local in-network compute resources can improve the performance of distributed applications and reduce the dependence on shared network resources. Instead of backhauling application data to a centralized cloud for processing, networked services may be adaptively and continuously dispersed into shared compute resources that are closer to the source of need. While this approach has several benefits, support for mission-aware access to computation is often an afterthought, and is implemented as a brittle extension over traditional load-balancer solutions. In this work, we investigate the design of two priority-aware resource allocation strategies and two load-balancing dispatching strategies as first class citizens in an open-source dispersed computing middleware. We present a theoretic analysis of these load-balancing primitives to identify weaknesses and strengths in our design, and recommend future directions. We then prototype two priority-aware allocation algorithms to validate our priority predictions. In initial experiments our prototype shows substantial gains in processing prioritized load. Finally, we make our source-code and experimental configurations open source.
Incomplete Cell Sorting Creates Engineerable Structures with Long-Term Stability
Abstract
Adhesion-mediated cell sorting has long been considered an organizing principle in developmental biology. While most computational models have emphasized the dynamics of segregation to fully sorted structures, cell sorting can also generate a plethora of transient, incompletely sorted states. The timescale of such states in experimental systems is unclear: if they are long-lived, they can be harnessed by development or engineered in synthetic tissues. Here, we use experiments and computational modeling to demonstrate how such structures can be systematically designed by quantitative control of cell composition. By varying the number of highly adhesive and less adhesive cells in multicellular aggregates, we find the cell-type ratio and total cell count control pattern formation, with resulting structures maintained for several days. Our work takes a step toward mapping the design space of self-assembling structures in development and provides guidance to the emerging field of shape engineering with synthetic biology.
CMOS Electrochemical Imaging Arrays for the Detection and Classification of Microorganisms
Abstract:
Microorganisms account for most of the biodiversity on earth. Yet while there are increasingly powerful tools for studying microbial genetic diversity, there are fewer tools for studying microorganisms in their natural environments. In this paper, we present recent advances in CMOS electrochemical imaging arrays for detecting and classifying microorganisms. These microscale sensing platforms can provide non-optical measurements of cell geometries, behaviors, and metabolic markers. We review integrated electronic sensors appropriate for monitoring microbial growth, and present measurements of single-celled algae using a CMOS sensor array with thousands of active pixels. Integrated electrochemical imaging can contribute to improved medical diagnostics and environmental monitoring, as well as discoveries of new microbial populations.
Abstract
Laboratory automation now commonly allows high-throughput sample preparation, culturing, and acquisition of microscopy images, but quantitative image analysis is often still a painstaking and subjective process. This is a problem especially significant for work on programmed morphogenesis, where the spatial organization of cells and cell types is of paramount importance. To address the challenges of quantitative analysis for such experiments, we have developed TASBE Image Analytics, a software pipeline for automatically segmenting collections of cells using the fluorescence channels of microscopy images. With TASBE Image Analytics, collections of cells can be grouped into spatially disjoint segments, the movement or development of these segments tracked over time, and rich statistical data output in a standardized format for analysis. Processing is readily configurable, rapid, and produces results that closely match hand annotation by humans for all but the smallest and dimmest segments. TASBE Image Analytics can thus provide the analysis necessary to complete the design-build-test-learn cycle for high-throughput experiments in programmed morphogenesis, as validated by our application of this pipeline to process experiments on shape formation with engineered CHO and HEK293 cells.
Levels of Autonomy in Synthetic Biology Engineering
Abstract
Engineering biological organisms is a complex process and challenging that could benefit from a combination of standardization and automation. This Commentary discussed the advantages and challenges of achieving high levels of autonomy in synthetic biology.
Field-based Coordination with the Share Operator
Abstract
Recent work in the area of coordination models and collective adaptive systems promotes a view of distributed computations as functions manipulating computational fields (data structures spread over space and evolving over time), and introduces the field calculus as a formal foundation for field computations. In field calculus, evolution (time) and neighbor interaction (space) are handled by separate functional operators: however, this intrinsically limits the speed of information propagation that can be achieved by their combined use. In this paper, we propose a new field-based coordination operator called share, which captures the space-time nature of field computations in a single operator that declaratively achieves: (i) observation of neighbors’ values; (ii) reduction to a single local value; and (iii) update and converse sharing to neighbors of a local variable. In addition to conceptual economy, use of the share operator also allows many prior field calculus algorithms to be greatly accelerated, which we validate empirically with simulations of a number of frequently used network propagation and collection algorithms.
Robust Estimation of Bacterial Cell Count from Optical Density
Abstract
Optical density (OD) is widely used to estimate the density of cells in liquid culture, but cannot be compared between instruments without a standardized calibration protocol and is challenging to relate to actual cell count. We address this with an interlaboratory study comparing three simple, low-cost, and highly accessible OD calibration protocols across 244 laboratories, applied to eight strains of constitutive GFP-expressing E. coli. Based on our results, we recommend calibrating OD to estimated cell count using serial dilution of silica microspheres, which produces highly precise calibration (95.5% of residuals <1.2-fold), is easily assessed for quality control, also assesses instrument effective linear range, and can be combined with fluorescence calibration to obtain units of Molecules of Equivalent Fluorescein (MEFL) per cell, allowing direct comparison and data fusion with flow cytometry measurements: in our study, fluorescence per cell measurements showed only a 1.07-fold mean difference between plate reader and flow cytometry data.
Capturing Multicellular System Designs Using the Synthetic Biology Open Language
Abstract
Synthetic biology aims to improve the development of biological systems and in-crease their reproducibility through the use of engineering principles, such as standardization and modularization. It is important that these systems can be represented and shared in a standard way to ensure they are easily understood, reproduced, and utilized by other researchers. The Synthetic Biology Open Language (SBOL) is a data standard for sharing biological designs and information about their implementation and characterization. Thus far, this standard has been used to represent designs in homogeneous systems, where the same design is implemented in every cell. In recent years there has been increasing interest in multicellular systems, where biological designs are split across multiple cells to optimize the system behavior and function. Here we show how the SBOL standard can be used to represent such multicellular systems, and hence how researchers can better share designs with the community.
The Synthetic Biology Open Language (SBOL) Version 3: Simplified Data Exchange for Bioengineering
Abstract
The Synthetic Biology Open Language (SBOL) is a community-developed data standard that allows knowledge about biological designs to be captured using a machine-tractable, ontology-backed representation that is built using Semantic Web technologies. While early versions of SBOL focused only on the description of DNA-based components and their sub-components, SBOL can now be used to represent knowledge across multiple scales and throughout the entire synthetic biology workflow, from the specification of a single molecule or DNA fragment through to multicellular systems containing multiple interacting genetic circuits. The third major iteration of the SBOL standard, SBOL3, is an effort to streamline and simplify the underlying data model with a focus on real-world applications, based on experience from the deployment of SBOL in a variety of scientific and industrial settings. Here, we introduce the SBOL3 specification both in comparison to previous versions of SBOL and through practical examples of its use.
Improving Collection Dynamics by Monotonic Filtering
Abstract
A key coordination problem in distributed open systems is distributed sensing, as achieved by cooperation and interaction among individual devices. An archetypal operation of distributed sensing is data summarization over a region of space, by which many higher level problems can be addressed, including counting items, measuring space, averaging environmental values, etc. A typical coordination strategy to perform data summarization in a peer-to-peer scenario, where devices can communicate only with a neighborhood, is to progressively accumulate information towards one or more collector devices, though this typically exhibits problems of reactivity and fragility. In this paper, we present a monotonic filtering strategy for improving the dynamics of single path collection algorithms. The strategy consists of inhibiting communication across devices whose distance towards the collector device is not decreasing. We prove that single path collection in a line graph results in quadratic overestimates after a source change and that these overestimates disappear with the application of monotonic filtering. These preliminary results suggest that monotonic filtering is likely to improve the dynamics of singlepath collection algorithms, by preventing excessive overestimates.
Round-Trip: An Automated Pipeline for Experimental Design, Execution, and Analysis
Abstract
Synthetic biology is a complex discipline that involves creating detailed, purpose-built designs from genetic parts. This process is often phrased as a Design-Build-Test-Learn loop, where iterative design improvements can be made, implemented, measured, and analyzed. Automation can potentially improve both the end-to-end duration of the process and the utility of data produced by the process. One of the most important considerations for the development of effective automation and quality data is a rigorous description of implicit knowledge encoded as a formal knowledge representation. The development of knowledge representation for the process poses a number of challenges, including developing effective human–machine interfaces, protecting against and repairing user error, providing flexibility for terminological mismatches, and supporting extensibility to new experimental types. We address these challenges with the DARPA SD2 Round Trip software architecture.
Intent Parser: a tool for codifying experiment design
Abstract
Communicating information about experimental design among a team of collaborators is challenging because different people tend to describe experiments in different ways and with different levels of detail. Sometimes, humans can interpret missing information by making assumptions and drawing inferences from information already provided. Doing so, however, is error-prone and typically requires a high level of interpersonal communication. In this paper, we present a tool that addresses this challenge by providing a simple interface for incremental formal codification of experiment designs. Users interact with a Google Docs word-processing interface with structured tables, backed by assisted linking to machine-readable definitions in a data repository (SynBioHub) and specification of available protocols and requests for execution in the Open Protocol Interface Language (OPIL). The result is an easy-to-use tool for generating machine-readable descriptions of experiment designs with which users in the DARPA SD2 program have collected data from 80 208 samples using a variety of protocols and instruments over the course of 181 experiment runs.
Collaborative Terminology: SBOL Project Dictionary
Abstract
Sharing information about biological experiments between researchers is often challenging. Reagents, strains, and genetic constructs are often given “shorthand” names that are ambiguous (e.g., “ara” for L-arabinose), differ between researchers (e.g., “L-arab” vs. “Arabinose”) or are unknown outside of a particular group (e.g., “plasmid 37”). Likewise, the particular combinations used in each sample of an experiment are often expressed in variable personal shorthand’s, often accidentally omitting important details
Describing engineered biological systems with SBOL3 and ShortBOL2
Abstract
Data standards are essential to exchange information about the engineering of biological systems. The Synthetic Biology Open Language (SBOL) is a community-driven standard that facilitates the exchange of data relating to the design, implementation, testing and refinement of engineered biological systems [4]. Versions 1 and 2 of SBOL have gained widespread adoption, with over 170 developers, 29 SBOL supporting software tools and 42 institutions involved in their development and deployment (as of June 2020). Recently, SBOL was refactored to simplify its data model, resulting in the release of the SBOL3 speciation [1].
Synthetic Biology Open Language (SBOL) Version 3.0.0
Abstract
Synthetic biology builds upon genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. When designing a synthetic system, synthetic biologists need to exchange information about multiple types of molecules, the intended behavior of the system, and actual experimental measurements. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, following an open community process involving both wet bench scientists and dry scientific modelers and software developers, across academia, industry, and other institutions. This document describes SBOL 3.0.0, which condenses and simplifies previous versions of SBOL based on experiences in deployment across a variety of scientific and industrial settings. In particular, SBOL 3.0.0, (1) separates sequence features from part/sub-part relationships, (2) renames Component Definition/Component to Component/Sub-Component, (3) merges Component and Module classes, (4) ensures consistency between data model and ontology terms, (5) extends the means to define and reference Sub-Components, (6) refines requirements on object URIs, (7) enables graph-based serialization, (8) moves Systems Biology Ontology (SBO) for Component types, (9) makes all sequence associations explicit, (10) makes interfaces explicit, (11) generalizes Sequence Constraints into a general structural Constraint class, and (12) expands the set of allowed constraints.
Synthetic Biology Open Language Visual (SBOL Visual) Version 2.2
Abstract
People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.2 of SBOL Visual, which builds on the prior SBOL Visual 2.1 in several ways. First, the grounding of molecular species glyphs is changed from BioPAX to SBO, aligning with the use of SBO terms for interaction glyphs. Second, new glyphs are added for proteins, introns, and polypeptide regions (e. g., protein domains), the prior recommended macromolecule glyph is deprecated in favor of its alternative, and small polygons are introduced as alternative glyphs for simple chemicals.
Synthetic biology open language (SBOL) version 2.3
Abstract
Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems is to improve the exchange of information about designed systems between laboratories. The synthetic biology open language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.3.0 of SBOL, which builds upon version 2.2.0 published in last year’s JIB Standards in Systems Biology special issue.
Abstract
Synthetic biology needs to adopt sound scientific and industry‐like standards in order to achieve its ambitious goals of efficient and accurate engineering of biological systems.
Embrace experimentation in biosecurity governance
Abstract
As biological research and its applications rapidly evolve, new attempts at the governance of biology are emerging, challenging traditional assumptions about how science works and who is responsible for governing. However, these governance approaches often are not evaluated, analyzed, or compared. This hinders the building of a cumulative base of experience and opportunities for learning. Consider “biosecurity governance,” a term with no internationally agreed definition, here defined as the processes that influence behavior to prevent or deter misuse of biological science and technology. Changes in technical, social, and political environments, coupled with the emergence of natural diseases such as coronavirus disease 2019 (COVID-19), are testing existing governance processes. This has led some communities to look beyond existing biosecurity models, policies, and procedures. But without systematic analysis and learning across them, it is hard to know what works.
Abstract
Standardizing the visual representation of genetic parts and circuits is essential for unambiguously creating and interpreting genetic designs. To this end, an increasing number of tools are adopting well-defined glyphs from the Synthetic Biology Open Language (SBOL) Visual standard to represent various genetic parts and their relationships. However, the implementation and maintenance of the relationships between biological elements or concepts and their associated glyphs has up to now been left up to tool developers. We address this need with the SBOL Visual 2 Ontology, a machine-accessible resource that provides rules for mapping from genetic parts, molecules, and interactions between them, to agreed SBOL Visual glyphs. This resource, together with a web service, can be used as a library to simplify the development of visualization tools, as a stand-alone resource to computationally search for suitable glyphs, and to help facilitate integration with existing biological ontologies and standards in synthetic biology.
Abstract
The Synthetic Biology Open Language (SBOL) is an emerging synthetic biology data exchange standard, designed primarily for unambiguous and efficient machine-to-machine communication. However, manual editing of SBOL is generally difficult for nontrivial designs. Here, we describe ShortBOL, a lightweight SBOL scripting language that bridges the gap between manual editing, visual design tools, and direct programming. ShortBOL is a shorthand textual language developed to enable users to create SBOL designs quickly and easily, without requiring strong programming skills or visual design tools.
Organizing Genome Editing for the Gigabase Scale
Abstract
Genome-scale engineering holds great potential to impact science, industry, medicine, and society, and recent improvements in DNA synthesis have enabled the manipulation of megabase genomes. However, coordinating and integrating the workflows and large teams necessary for gigabase genome engineering remains a considerable challenge. We examine this issue and recommend a path forward by: 1) adopting and extending existing representations for designs, assembly plans, samples, data, and workflows; 2) developing new technologies for data curation and quality control; 3) conducting fundamental research on genome-scale modeling and design; and 4) developing new legal and contractual infrastructure to facilitate collaboration.
Abstract
Standardizing the visual representation of genetic parts and circuits is essential for unambiguously creating and interpreting genetic designs. To this end, an increasing number of tools are adopting well-defined glyphs from the Synthetic Biology Open Language (SBOL) Visual standard to represent various genetic parts and their relationships. However, the implementation and maintenance of the relationships between biological elements or concepts and their associated glyphs has up to now been left up to tool developers. We address this need with the SBOL Visual 2 Ontology, a machine-accessible resource that provides rules for mapping from genetic parts, molecules, and interactions between them, to agreed SBOL Visual glyphs. This resource, together with a web service, can be used as a library to simplify the development of visualization tools, as a stand-alone resource to computationally search for suitable glyphs, and to help facilitate integration with existing biological ontologies and standards in synthetic biology.
Automated Detection of Yeast Genetic Engineering in Whole Genomes and Metagenomes with Prymetime
Abstract
Yeast genomes can be assembled from sequencing data, but genetic engineering changes often fail to be resolved with accuracy, completeness, and contiguity. Further, searching for engineered sequences in sequence data is currently a manual process. To overcome these challenges, we applied nanopore assembly and short read error correction to create an integrated workflow that achieves accurate whole genome and plasmid sequences of engineered yeasts, automatically annotating synthetic biology parts. We named this workflow Prymetime, "Pipeline for Recombinant Yeast genoMEs That Illuminates Markers of Engineering."
Verification of genetic engineering in yeasts with nanopore whole genome sequencing
Abstract
Yeast genomes can be assembled from sequencing data, but genetic engineering changes often fail to be resolved with accuracy, completeness, and contiguity. Further, searching for engineered sequences in sequence data is currently a manual process. To overcome these challenges, we applied nanopore assembly and short read error correction to create an integrated workflow that achieves accurate whole genome and plasmid sequences of engineered yeasts, automatically annotating synthetic biology parts. We named this workflow Prymetime, "Pipeline for Recombinant Yeast genoMEs That Illuminates Markers of Engineering."
2018
Small molecule-based regulation of gene expression for RNA-delivered circuits in mammalian cells
Abstract
Synthetic mRNA is an attractive vehicle for gene therapies because of its transient nature and improved safety profile over DNA. However, unlike DNA, broadly applicable methods to control expression from mRNA are lacking. Here we describe a platform for small-molecule-based regulation of expression from modified RNA (modRNA) and self-replicating RNA (replicon) delivered to mammalian cells. Specifically, we engineer small-molecule-responsive RNA binding proteins to control expression of proteins from RNA-encoded genetic circuits. Coupled with specific modRNA dosages or engineered elements from a replicon, including a sub genomic promoter library, we demonstrate the capability to externally regulate the timing and level of protein expression. These control mechanisms facilitate the construction of ON, OFF, and two-output switches, with potential therapeutic applications such as inducible cancer immunotherapies. These circuits, along with other synthetic networks that can be developed using these tools, will expand the utility of synthetic mRNA as a therapeutic modality.
2018
Capturing Multicellular System Designs Using the Synthetic Biology Open Language (SBOL)
Abstract
Synthetic biology aims to improve the development of biological systems and increase their reproducibility through the use of engineering principles, such as standardization and modularization. It is important that these systems can be represented and shared in a standard way to ensure they are easily understood, reproduced, and utilized by other researchers. The Synthetic Biology Open Language (SBOL) is a data standard for sharing biological designs and information about their implementation and characterization. Thus far, this standard has been used to represent designs in homogeneous systems, where the same design is implemented in every cell. In recent years there has been increasing interest in multicellular systems, where biological designs are split across multiple cells to optimize the system behavior and function. Here we show how the SOBL standard can be used to represent such multicellular systems and hence how researchers can better share designs with the community.
2018
Formalizing Sample Transformation Plans
Abstract
Experimental protocols are typically represented in either a natural language that is hard to replicate or compare, or in procedural languages that are difficult to automatically synthesize, detach from a specific experimental design for reuse, or analyze. We introduce a new approach based on techniques from automated planning. We describe how to represent transformation operators that manipulate samples in terms of applying conditions to samples. We define the semantics of this representation. We also present a simplified version of the notation that removes much of the modeling burden required of scientists. The resulting representation supports automated planning, provides sample provenance and metadata tracking at no cost by virtue of a plan’s causal structure, and separates protocol specification from experimental design.
2018
Time to get serious about measurement in synthetic biology
Abstract
For synthetic biology to mature, composition of devices into functional systems must become routine. This requires widespread adoption of comparable and replicable units of measurement. Interlaboratory studies organized through the International Genetically Engineered Machine (iGEM) competition show that fluorescence can be calibrated with simple, low-cost protocols, so fluorescence should no longer be published without units.
2018
Toward Programming 3D Shape Formation in Mammalian Cells
Abstract
Biological cells are remarkably elective at predictable and resilient formation of complex three-dimensional shapes, as aptly demonstrated by most multicellular life on this planet. Not only can intricate shapes be formed with high reliability, but organisms also maintain functional integration of the entire system throughout development, as well as adapting form in response to environmental conditions, damage, and other disruptions. Moreover, these feats of manufacturing are accomplished entirely with reprocessed locally harvested materials.
2018
Abstract
A critical bottleneck for large-scale engineering collaboration in synthetic biology has been the inability to integrate data through successive stages of the design-build-test-learn (DBTL) engineering life-cycle. These workflows generate large volumes of data and physical artifacts (e.g., DNA samples and cell stocks) that are difficult to organize, track, and manage without systematized, automated tool chains.
2018
Specifying Combinatorial Designs with the Synthetic Biology Open Language
Abstract
During the last decade, new technologies have been developed for the combinatorial assembly of genetic parts [8, 9], enabling synthetic biologists to more readily generate libraries of genetic construct variants. These types of combinatorial libraries can play an important role in genetic design by allowing designers to explore the impact of part choice, order, and orientation on construct behavior. In order to support the design of such libraries, new tools and formalisms have been developed to enable the specification, permutation, and sampling of combinatorial genetic design spaces [1, 2]. In turn, these formalisms have given rise to the need for a standard representation of combinatorial genetic designs in order to enable sharing of such designs between tools and laboratories and to simplify human and machine reasoning over them.
2018
Quantification of Bacterial Fluorescence using Independent Calibrants
Abstract
Fluorescent reporters are commonly used to quantify activities or properties of both natural and engineered cells. Fluorescence is still typically reported only in arbitrary or normalized units, however, rather than in units defined using an independent calibrant, which is problematic for scientific reproducibility and even more so when it comes to effective engineering. In this paper, we report an inter-laboratory study showing that simple, low-cost unit calibration protocols can remedy this situation, producing comparable units and dramatic improvements in precision over both arbitrary and normalized units. Participants at 92 institutions around the world measured fluorescence from E. coli transformed with three engineered test plasmids, plus positive and negative controls, using simple, low-cost unit calibration protocols designed for use with a plate reader and/or flow cytometer. In addition to providing comparable units, use of an independent calibrant allows quantitative use of positive and negative controls to identify likely instances of protocol failure. The use of independent calibrants thus allows order of magnitude improvements in precision, narrowing the 95% confidence interval of measurements in our study up to 600-fold compared to normalized units.
2018
XPlan: Experiment Planning for Synthetic Biology
Abstract
We describe preliminary work on XPlan as a system for experiment planning in synthetic biology, in synthetic biology, as in other emerging fields, scientific exploration and engineering design must be interleaved, because of uncertainty about the underlying mechanisms. Through its experiment planning, XPlan provides a coordinating linchpin in DARPA’s Synergistic Discover and Design (SD2) platform to automate scientific discover, closing the loop between multiple machine learning analysis and biological design tools and wet labs to guide the discovery and design process.
2018
Engineering modular intracellular protein sensor-actuator devices
Abstract
Understanding and reshaping cellular behaviors with synthetic gene networks requires the ability to sense and respond to changes in the intracellular environment. Intracellular proteins are involved in almost all cellular processes, and thus can provide important information about changes in cellular conditions such as infections, mutations, or disease states. Here we report the design of a modular platform for intra-body-based protein sensing-actuation devices with transcriptional output triggered by detection of intracellular proteins in mammalian cells. We demonstrate reporter activation response (fluorescence, apoptotic gene) to proteins involved in hepatitis C virus (HCV) infection, human immunodeficiency virus (HIV) infection, and Huntington’s disease, and show sensor-based interference with HIV-1 downregulation of HLA-I in infected T cells. Our method provides a means to link varying cellular conditions with robust control of cellular behavior for scientific and therapeutic applications.
2018
Synthetic Biology Open Language (SBOL) Version 2.2.0
Abstract
Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems would be to improve the exchange of information about designed systems between laboratories. The synthetic biology open language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.2.0 of SBOL that builds upon version 2.1.0published in last year’s JIB special issue. In particular, SBOL 2.2.0 includes improved description and validation rules for genetic design provenance, an extension to support combinatorial genetic designs, a new class to add non-SBOL data as attachments, a new class for genetic design implementations, and a description of a methodology to describe the entire design-build-test-learn cycle within the SBOL data model.
2018
Synthetic Biology Open Language Visual (SBOL Visual) Version 2.0
Abstract
People who are engineering biological organisms often find it useful to communicate in diagrams, both about the structure of the nucleic acid sequences that they are engineering and about the functional relationships between sequence features and other molecular species. Some typical practices and conventions have begun to emerge for such diagrams. The Synthetic Biology Open Language Visual (SBOL Visual) has been developed as a standard for organizing and systematizing such conventions in order to produce a coherent language for expressing the structure and function of genetic designs. This document details version 2.0 of SBOL Visual, which builds on the prior SBOL Visual 1.0 standard by expanding diagram syntax to include functional inter-actions and molecular species, making the relationship between diagrams and the SBOL data model explicit, supporting families of symbol variants, clarifying a number of requirements and best practices, and significantly expanding the collection of diagram glyphs.
2018
Managing Bioengineering Complexity
Abstract
Engineering the behavior of cells by modification of their genetic machinery holds the potential for revolutionary advances in many important application areas, including medical therapies, vaccination, manufacturing of proteins and other organic compounds, and environmental remediation. As capabilities and potential applications grow, the complexity and cross-disciplinary knowledge required to employ them is also growing rapidly. Managing the complexity of biological engineering is thus a problem of increasing importance. The rapid pace of advancement makes it important to have good methods for integration of new knowledge and procedures into organism engineering workflows.
2017
A Visual Language for Protein Design
Abstract
As protein engineering becomes more sophisticated, practitioners increasingly need to share diagrams for communicating protein designs. To this end, we present a draft visual language, Protein Language that describes the high-level architecture of an engineered protein with a few easy-to-draw glyphs, intended to be compatible with other biological diagram languages such as SBOL and SBGN. Protein Language consists of glyphs for representing important features (e.g., globular domains, recognition and localization sequences, sites of covalent modification, cleavage and catalysis), rules for composing these glyphs to represent complex architectures, and rules constraining the scaling and styling of diagrams.
2017
Toward Quantitative Comparison of Fluorescent Protein Expression Levels via Fluorescent Beads
Abstract
Establishing an elective engineering discipline always requires standardized and comparable units of measurement. Such measurements serve as a means of communication between the people and machines interacting with a project, ensure compatibility between components, and allow prediction of the results of design decisions. Regulating gene expression is foundational for organism engineering, and flow cytometry is an excellent means of quantifying large numbers of single cell gene expression measurements. At present, however, flow cytometry data is still often acquired in arbitrary or relative units, without standardizing the measurement by comparison to an independent reference material (i.e., one enabling precise calibration of measurements). Some have proposed standardizing to a biological cultured reference material (e.g., [3]), but fluorescence from such materials varies strongly, unpredictably, and often not proportional to the samples it is intended to be a reference for, thus resulting in a large degree of uncertainty in measurement.
2017
Biochemical complexity drives log-normal variation in genetic expression
Abstract
Cells exhibit a high degree of variation in levels of gene expression, even within otherwise homogeneous populations. The standard model to describe this variation centers on a gamma distribution driven by stochastic bursts of translation. Stochastic bursting, however, cannot account for the well-established behavior of strong transcriptional repressors. Instead, it can be shown that the very complexity of the biochemical processes involved in gene expression drives an emergent log-normal distribution of expression levels. Emergent log-normal distributions can account for the observed behavior of transcriptional repressors, are still compatible with stochastically constrained distributions, and have important implications for both analysis of gene expression data and the engineering of biological organisms.
2017
A Standard-Enabled Workflow for Synthetic Biology
Abstract
A synthetic biology workflow is composed of data repositories that provide information about genetic parts, sequence-level design tools to compose these parts into circuits, visualization tools to depict these designs, genetic design tools to select parts to create systems, and modeling and simulation tools to evaluate alternative design choices. Data standards enable the ready exchange of information within such a workflow, allowing repositories and tools to be connected from a diversity of sources. The present paper describes one such workflow that utilizes, among others, the Synthetic Biology Open Language (SBOL) to describe genetic designs, the Systems Biology Markup Language to model these designs, and SBOL Visual to visualize these designs. We describe how a standard-enabled workflow can be used to produce types of design information, including multiple repositories and software tools exchanging information using a variety of data standards. Recently, the ACS Synthetic Biology journal has recommended the use of SBOL in their publications.
2017
Reducing DNA context dependence in bacterial promoters
Abstract
Variation in the DNA sequence upstream of bacterial promoters is known to affect the expression levels of the products they regulate, sometimes dramatically. While neutral synthetic insulator sequences have been found to buffer promoters from upstream DNA context, there are no established methods for designing effective insulator sequences with predictable effects on expression levels. We address this problem with Degenerate Insulation Screening (DIS), a novel method based on a randomized 36-nucleotide insulator library and a simple, high-throughput, flow-cytometry-based screen that randomly samples from a library of 436 potential insulated promoters. The results of this screen can then be compared against a reference uninsulated device to select a set of insulated promoters providing a precise level of expression. We verify this method by insulating the constitutive, inducible, and repressible promotors of a four transcriptional-unit inverter (NOT-gate) circuit, finding both that order dependence is largely eliminated by insulation and that circuit performance is also significantly improved, with a 5.8-fold mean improvement in on/off ratio.
2016
Mathematical Foundations of Variation in Gene Expression
Abstract
A key challenge in engineering biological organisms is the high degree of cell-to-cell variation commonly observed in gene expression. The inherently discrete and stochastic nature of the chemical reactions that underlay gene expression has been proposed as an explanation for the highly asymmetric distributions that are frequently observed [1], with bursts of expression leading to a Gamma distribution. While this may explain the behavior of systems with very low expression, it is insufficient to account for the high degree of cell-to-cell variation that is typically still observed even with strong expression (e.g., more than 2-fold standard deviation with a mean of many millions of molecules in [2]). In essence, with strong expression there are typically so many molecules involved that the law of large numbers will generally render the impact of chemical stochasticity largely insignificant.
2016
Synthetic Biology Open Language (SBOL) Version 2.1.0
Abstract
Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. One method to ameliorate these problems would be to improve the exchange of information about designed systems between laboratories. The Synthetic Biology Open Language (SBOL) has been developed as a standard to support the specification and exchange of biological design information in synthetic biology, filling a need not satisfied by other pre-existing standards. This document details version 2.1 of SBOL that builds upon version 2.0 published in last year’s JIB special issue. In particular, SBOL 2.1 includes improved rules for what constitutes a valid SBOL document, new role fields to simplify the expression of sequence features and how components are used in context, and new best practices descriptions to improve the exchange of basic sequence topology information and the description of genetic design provenance, as well as miscellaneous other minor improvements.
2016
Managing Bioengineering Complexity with AI Techniques
Abstract
Our capabilities for systematic design and engineering of biological systems are rapidly increasing. Effectively engineering such systems, however, requires the synthesis of a rapidly expanding and changing complex body of knowledge, protocols, and methodologies. Many of the problems in managing this complexity, however, appear susceptible to being addressed by artificial intelligence (AI) techniques, i.e., methods enabling computers to represent, acquire, and employ knowledge. Such methods can be employed to automate physical and informational “routine” work and thus better allow humans to focus their attention on the deeper scientific and engineering issues. This paper examines the potential impact of AI on the engineering of biological organisms through the lens of a typical organism engineering workflow. We identify a number of key opportunities for significant impact, as well as challenges that must be overcome.
2016
Design for Improved Repression in RNA Replicons
Abstract
RNA replicons are an emerging platform for synthetic biology, in which the infective capsid of a RNA virus is replaced with an engineered payload while its self-replication capability is retained [4, 3, 1, 6]. This self-replication capability allows RNA replicons entering a cell to amplify their engineered elements, providing strong expression from a low initial dose without integration into host DNA or propagation to other cells. Replicons thus offer an attractive platform for developing medical applications such as vaccines [2, 3] and stem-cell generation [7], combining both strong expression and relative genetic isolation. Development of RNA replicons to date has focused primarily on derivatives of alphaviruses, a well-characterized family of positive-strand RNA viruses, and most particularly the Sindbis and VEE vectors [4]. Protein expression from RNA replicons can be precisely predicted and controlled [1], and can support standard synthetic circuits such as cascades and toggle switches [6]
2016
Abstract
Research is communicated more effectively and reproducibly when articles depict genetic designs consistently and fully disclose the complete sequences of all reported constructs. ACS Synthetic Biology is now providing authors with updated guidance and piloting a new tool and publication workflow that facilitate compliance with these recommended practices and standards for visual representation and data exchange.
2016
Sharing Structure and Function in Biological Design with SBOL 2.0
Abstract
The Synthetic Biology Open Language (SBOL) is a standard that enables collaborative engineering of biological systems across different institutions and tools. SBOL is developed through careful consideration of recent synthetic biology trends, real use cases, and consensus among leading researchers in the field and members of commercial biotechnology enterprises. We demonstrate and discuss how a set of SBOL-enabled software tools can form an integrated, cross-organizational workflow to recapitulate the design of one of the largest published genetic circuits to date, a 4-input AND sensor. This design encompasses the structural components of the system, such as its DNA, RNA, small molecules, and proteins, as well as the interactions between these components that determine the system’s behavior/function.
2016
IWBDA 2015 (editorial) Jacob Beal
Abstract
The International Workshop on Bio-Design Automation (IWBDA) brings together researchers from the synthetic biology, systems biology, and design automation communities. One of the key challenges of synthetic biology is the sheer complexity of engineering biological systems, with regards to both the nature of biological organisms and the profusion of components, protocols, and methods with which these organisms are engineered. The motivating goal of IWBDA is to address these challenges by fostering cross-disciplinary discussion and collaboration between researchers with back grounds in biology, computation, and other relevant disciplines. The seventh IWBDA, organized by the nonprofit Bio-Design Automation Consortium (BDAC), was held at the University of Washington in Seattle, Washington on August 19th through 21st, 2015. This special ACS Synthetic Biology issue includes eight papers associated with the work presented at IWBDA, spanning a wide range of different topics and focus areas.
2016
libSBOLj 2.0: A Java Library to Support SBOL 2.0
Abstract
The Synthetic Biology Open Language (SBOL) is an emerging data standard for representing synthetic biology designs. The goal of SBOL is to improve the reproducibility of these designs and their electronic exchange between researchers and/or genetic design automation tools. The latest version of the standard, SBOL 2.0, enables the annotation of a large variety of biological components (e.g., DNA, RNA, proteins, complexes, small molecules, etc.) and their interactions. SBOL 2.0 also allows researchers to organize components into hierarchical modules, to specify their intended functions, and to link modules to models that describe their behavior mathematically. To support the use of SBOL 2.0, we have developed the libSBOLj 2.0 Java library, which provides an easy to use Application Programming Interface (API) for developers, including manipulation of SBOL constructs, serialization to and from an RDF/XML file format, and migration support in the form of conversion from the prior SBOL 1.1 standard to SBOL 2.0. This letter describes the libSBOLj 2.0 library and key engineering decisions involved in its design.
2016
Reproducibility of Fluorescent Expression from Engineered Biological Constructs in E. coli
Abstract
We present results of the first large-scale inter-laboratory study carried out in synthetic biology, as part of the 2014 and 2015 International Genetically Engineered Machine (iGEM) competitions. Participants at 88 institutions around the world measured fluorescence from three engineered constitutive constructs in E. coli. Few participants were able to measure absolute fluorescence, so data was analyzed in terms of ratios. Precision was strongly related to fluorescent strength, ranging from 1.54-fold standard deviation for the ratio between strong promoters to 5.75-fold for the ratio between the strongest and weakest promoter, and while host strain did not affect expression ratios, choice of instrument did. This result shows that high quantitative precision and reproducibility of results is possible, while at the same time indicating areas needing improved laboratory practices.
2016
Abstract
Multipart and modular DNA part libraries and assembly standards have become common tools in synthetic 10 biology since the publication of the Gibson and Golden Gate assembly methods, yet no multipart modular library exists for use in 11 bacterial systems. Building upon the existing MoClo assembly framework, we have developed a publicly available collection of 12 modular DNA parts and enhanced MoClo protocols to enable rapid one-pot, multipart assembly, combinatorial design, and 13 expression tuning in Escherichia coli. The Cross-disciplinary Integration of Design Automation Research lab (CIDAR) MoClo 14 Library is openly available and contains promoters, ribosomal binding sites, coding sequence, terminators, vectors, and a set of 15 fluorescent control plasmids. Optimized protocols reduce reaction time and cost by >80% from that of previously published protocols.
2015
SBOL Visual: Standard Schematics for Synthetic Genetic Constructs
Abstract
Synthetic Biology Open Language (SBOL) Visual is a graphical standard for genetic engineering. It consists of symbols representing DNA subsequences, including regulatory elements and DNA assembly features. These symbols can be used to draw illustrations for communication and instruction, and as image assets for computer-aided design. SBOL Visual is a community standard, freely available for personal, academic, and commercial use (Creative Commons CC0 license). We provide prototypical symbol images that have been used in scientific publications and software tools.
2015
Cas9 gRNA engineering for selectable genome editing, activation and repression
Abstract
We demonstrate that by altering the length of Cas9- associated guide RNA (gRNA) we were able to control Cas9 nuclease activity and simultaneously perform genome editing and transcriptional regulation with a single Cas9 protein. We exploited these principles to engineer mammalian synthetic circuits with combined transcriptional regulation and kill functions governed by a single multifunctional Cas9 protein.
2015
Design of Biological Circuits Using Signal-to-Noise Ratio
Abstract
Biological computing circuits have a role to play in many synthetic biology applications, such as precision cancer therapy, sensing chemical threats, or control of biosynthesis processes. Actually realizing such circuits effectively, however, has been quite difficult: until recently, neither high-precision prediction nor high-performance component libraries were available. Thus, although many design approaches for selecting components to realize a circuit have been proposed (e.g., [11, 6, 9], to name a few), it has been unclear which, if any, of these approaches was likely to actually be practical for the realization of biological circuits.
2015
Copyright and Licensing of BBF RFCs
Abstract
The BBF RFC process currently is managed by the BioBricks Foundation, and BBF RFC documents are made available as PDF files through DSpace@MIT. Until now notification of the licensing terms for BBF RFC documents have been indicated on the BioBricks Foundation’s website and on DSpace@MIT, but have not been included on the actual BBF RFC documents. Because PDF files travel freely over the internet, the lack of a licensing notice on the actual BBF RFC documents has led to unnecessary confusion.
2015
Synthetic Biology Open Language (SBOL) Version 2.0.0
Abstract
Synthetic biology builds upon the techniques and successes of genetics, molecular biology, and metabolic engineering by applying engineering principles to the design of biological systems. These principles include standardization, modularity, and design abstraction. The field still faces substantial challenges, including long development times, high rates of failure, and poor reproducibility. A common factor of these challenges is the exchange of information about designed systems between laboratories. When designing a synthetic system, synthetic biologists need to exchange information about multiple types of molecules and their expected behavior in the design. Furthermore, there are often multiple degrees of separation between a specified nucleic acid sequence (e.g., a sequence that encodes an enzyme or transcription factor) and the molecular interactions that a designer intends to result from said sequence (e.g., chemical modification of metabolites or regulation of gene expression), yet these different perspectives need to be connected together in the engineering of biological systems.
2015
Associated abstract at IWBDA'15
Abstract
The initial version of the Synthetic Biology Open Language (SBOL) was designed for the exchange of information about biological designs at the DNA level. As the field of synthetic biology matures, however, there is a clear need to extend SBOL to capture the function of biological designs and their structure beyond annotated DNA sequences [2]. To support the specification of increasingly complex and diverse biological designs, standards need to represent data on both biological structure and function in a modular, hierarchical fashion. These include data on biological interactions, which are especially important for the functional composition of biological components, and meta-data on computational models, which are important for linking biological designs to more detailed descriptions of their behavior in specific biological contexts.
2015
Signal-to-noise ratio measures efficacy of biological computing devices and circuits
Abstract
Engineering biological cells to perform computations has a broad range of important potential applications, including precision medical therapies, biosynthesis process control, and environmental sensing. Implementing predictable and effective computation, however, has been extremely difficult to date, due to a combination of poor composability of available parts and of insufficient characterization of parts and their interactions with the complex environment in which they operate. In this paper, the author argues that this situation can be improved by quantitative signal-to-noise analysis of the relationship between computational abstractions and the variation and uncertainty endemic in biological organisms.
2015
Accurate Predictions of Genetic Circuit Behavior from Part Characterization and Modular Composition
Abstract
A long-standing goal of synthetic biology is to rapidly engineer new regulatory circuits from simpler devices. As circuit complexity grows, it becomes increasingly important to guide design with quantitative models, but previous efforts have been hindered by lack of predictive accuracy. To address this, we developed Empirical Quantitative Incremental Prediction (EQuIP), a new method for accurate prediction of genetic regulatory network behavior from detailed characterizations of their components. In EQuIP, precisely calibrated time-series and dosage-response assays are used to construct hybrid phenotypic/ mechanistic models of regulatory processes. This hybrid method ensures that model parameters match observable phenomena using phenotypic formulation where current hypotheses about biological mechanisms do not agree closely with experimental observations. We demonstrate EQuIP’s precision at predicting distributions of cell behaviors for six transcriptional cascades and three feed-forward circuits in mammalian cells. Our cascade predictions have only 1.6-fold mean error over a 261-fold mean range of fluorescence variation, owing primarily to calibrated measurements and piecewise-linear models. Predictions for three feed-forward circuits had a 2.0-fold mean error on a 333-fold mean range, further demonstrating that EQuIP can scale to more complex systems. Such accurate predictions will foster reliable forward engineering of complex biological circuits from libraries of standardized devices.
2015
Bridging the Gap: A Roadmap to Breaking the Biological Design Barrier
Abstract
This paper presents an analysis of an emerging bottleneck in organism engineering, and paths by which it may be overcome. Recent years have seen the development of a profusion of synthetic biology tools, largely falling into two categories: high-level “design” tools aimed at mapping from organism specifications to nucleic acid sequences implementing those specifications, and low-level “build and test” tools aimed at faster, cheaper, and more reliable fabrication of those sequences and assays of their behavior in engineered biological organisms. Between the two families, however, there is a major gap: we still largely lack the predictive models and component characterization data required to effectively determine which of the many possible candidate sequences considered in the design phase are the most likely to produce useful results when built and tested.
2015
Model-Driven Engineering of Gene Expression from RNA Replicons
Abstract
RNA replicons are an emerging platform for engineering synthetic biological systems. Replicons self-amplify, can provide persistent high-level expression of proteins even from a small initial dose, and, unlike DNA vectors, pose minimal risk of chromosomal integration. However, no quantitative model sufficient for engineering levels of protein expression from such replicon systems currently exists. Here, we aim to enable the engineering of multigene expression from more than one species of replicon by creating a computational model based on our experimental observations of the expression dynamics in single- and multi-replicon systems.
2015
Proposed Data Model for the Next Version of the Synthetic Biology Open Language
Abstract
While the first version of the Synthetic Biology Open Language (SBOL) has been adopted by several academic and commercial genetic design automation (GDA) software tools, it only covers a limited number of the requirements for a standardized exchange format for synthetic biology. In particular, SBOL Version 1.1 is capable of representing DNA components and their hierarchical composition via sequence annotations. This proposal revises SBOL Version 1.1, enabling the representation of a wider range of components with and without sequences, including RNA components, protein components, small molecules, and molecular complexes. It also introduces modules to instantiate groups of components on the basis of their shared function and assert molecular interactions between components. By increasing the range of structural and functional descriptions in SBOL and allowing for their composition, the proposed improvements enable SBOL to represent and facilitate the exchange of a broader class of genetic design.
2014
Precision Design of Expression from RNA Replicons
Abstract
RNA replicons are an emerging platform of increasing interest, particularly for vaccination and therapeutic applications [3]. A replicon is based on a virus, but replaces the infective capsid proteins with engineered “payload” genes [5]. Here we focus on replicons derived from alphavirus, a positivestrand RNA virus, with architecture and lifecycle shown in Figure 1: the replicon RNA begins with a complex of nonstructural proteins (NSPs) that create “viral factories” where it replicates [4]. A sub genomic promoter next induces production of shorter mRNAs containing engineered payload genes, which are translated to produce the proteins encoded by the payload sequences. Finally, both mRNA and proteins are removed by normal processes of dilution and decay.
2014
Abstract
The re-use of previously validated designs is critical to the evolution of synthetic biology from a research discipline to an engineering practice. Here we describe the Synthetic Biology Open Language (SBOL), a proposed data standard for exchanging designs within the synthetic biology community. SBOL represents synthetic biology designs in a community-driven, formalized format for exchange between software tools, research groups and commercial service providers. The SBOL Developers Group has implemented SBOL as an XML/RDF serialization and provides software libraries and specification documentation to help developers implement SBOL in their own software. We describe early successes, including a demonstration of the utility of SBOL for information exchange between several different software tools and repositories from both academic and industrial partners. As a community-driven standard, SBOL will be updated as synthetic biology evolves to provide specific capabilities for different aspects of the synthetic biology workflow.
2013
Functional synthesis of genetic regulatory networks
Abstract
As synthetic biologists improve their ability to engineer complex computations in living organisms, there is increasing interest in using programming languages to assist in the design and composition of biological constructs. In this paper, we argue that there is a natural fit between functional programming and genetic regulatory networks, exploring this connection in depth through the example of BioProto, a piggyback DSL on the Proto general-purpose spatial language. In particular, we present the first formalization of BioProto syntax and semantics, and compare these to the formal syntax and semantics of the parent language Proto. Finally, we examine the pragmatics of implementing BioProto and challenges to proving correctness of BioProto programs.
2013
How can AI help Synthetic Biology?
Abstract
Our primary goal in this talk is to draw the attention of the AI community to a novel and rich application domain, namely Synthetic Biology. Synthetic biology is the systematic design and engineering of biological systems. Synthetic organisms are currently designed at the DNA level, which limits the complexity of the systems. In our talk we will introduce the domain, describe the current workflow used by synthetic biologists, and demonstrate the feasibility of progress in this domain. Problems specific to each AI topic area will be highlighted.
2013
Synthetic Biology Open Language Visual: an ontological use case
Abstract
Synthetic Biology Open Language (SBOL) is a data exchange standard for the specification of forward engineered genetic designs (Galdzicki 2012). SBOL Visual is the graphical counterpart to SBOL, used to represent designs in a human readable manner. The central element in the SBOL data model is the DNA Component, which represents the design of a contiguous piece of DNA. DNA Components have an assigned functional role, generally referred to as ‘part type’ among synthetic biologists and which is analogous to the feature keys in annotated DNA sequences.
2013
Accurate Predictions of Genetic Circuit Behavior from Part Characterization and Modular Composition
Abstract
A long-standing goal of synthetic biology is to rapidly engineer new regulatory circuits from simpler regulatory elements [8, 16, 2, 7]. As the complexity of engineered circuit’s increases, it becomes increasingly important to utilize quantitative models to guide circuit construction effectively, but previous efforts have been hindered by lack of accuracy in predictions of circuit behavior [13, 10]. To address this shortcoming, we have developed Empirical Quantitative Incremental Prediction (EQuIP), a new method for accurate prediction of genetic regulatory network behavior. EQuIP predictions are based on a compos able black-box model derived solely from empirical observations of steady state and dynamic behavior.
2013
Online Tools for Characterization, Design, and Debugging
Abstract
The engineering of biological systems can be greatly aided by better models, derived from high-quality characterization data, and by better means for designing and debugging new genetic circuits. Web-based tools and repositories have proven a successful approach to distributing such techniques, particularly because the centralization of infrastructure greatly decreases adoption cost for new users. Notable examples include the Parts Registry [8], the RBS calculator [10], GeneDesign [9], GenoCAD [4], BioFab [7], and JBEI ICE [6].
2013
Synthetic Biology Open Language Visual: An Open-Source Graphical Notation for Synthetic Biology
Abstract
The Synthetic Biology Open Language Visual (SBOL Visual) project is an effort toward developing a community-driven open standard for visual representation of genetic designs. Standardized visual notation for communicating designs has proven to be useful in many engineering disciplines. A de facto visual notation does exist in synthetic biology; however, it is incomplete, is often extended ad hoc, and exists as a poorly defined, voluntary, communal convention rather than an explicit standard. Because synthetic biology endeavors often require a multidisciplinary team, a common visual system of communication with well-defined semantics is vital.
2013
Recent Advances in the Synthetic Biology Open Language
Abstract
A significant concern in the synthetic biology community is the difficulty in reproducing results reported in the literature [5]. To address this problem, in 2008, a small group of researchers proposed the development of the synthetic biology open language (SBOL), an open-source standard for the exchange of genetic designs. In 2011, the first version of the SBOL core data model was released [2]. In 2013, the first version of a standard for visualization of genetic designs expressed in SBOL was also released [6]. Leveraging libSBOLj, a java-based library for SBOL’s core data model, 18 software tools now support SBOL. While this represents excellent progress, there is still a lot of work to do.
2013
Synthetic Biology Open Language Visual (SBOL Visual), version 1.0.0
Abstract
The Synthetic Biology Open Language Visual (SBOL Visual) project is an effort to create an open-source graphical notation to support the description and specification of genetic designs. SBOL Visual is intended for use by biological engineers in forward engineering projects. It aims to encourage and support model-driven engineering by establishing a common set of symbols.
2012
An End-to-End Workflow for Engineering of Biological Networks from High-Level Specifications
Abstract
We present a workflow for the design and production of biological networks from high-level program specifications. The workflow is based on a sequence of intermediate models that incrementally translate high-level specifications into DNA samples that implement them. We identify algorithms for translating between adjacent models and implement them as a set of software tools, organized into a four-stage toolchain: Specification, Compilation, Part Assignment, and Assembly. The specification stage begins with a Boolean logic computation specified in the Proto programming language. The compilation stage uses a library of network motifs and cellular platforms, also specified in Proto, to transform the program into an optimized Abstract Genetic Regulatory Network (AGRN) that implements the programmed behavior.
2012
Synthetic Biology Open Language (SBOL) Version 1.1.0
Abstract
In this BioBricks Foundation Request for Comments (BBF RFC), we specify the Synthetic Biology Open Language (SBOL) Version 1.1.0 to enable the electronic exchange of information describing DNA components used in synthetic biology. We define: 1. the vocabulary, a set of preferred terms and 2 the core data model, a common computational representation.
2012
Automated Selection of Synthetic Biology Parts for Genetic Regulatory Networks
Abstract
Raising the level of abstraction for synthetic biology design requires solving several challenging problems, including mapping abstract designs to DNA sequences. In this paper we present the first formalism and algorithms to address this problem. The key steps of this transformation are feature matching, signal matching, and part matching. Feature matching ensures that the mapping satisfies the regulatory relationships in the abstract design. Signal matching ensures that the expression levels of functional units are compatible. Finally, part matching finds a DNA part sequence that can implement the design. Our software tool MatchMaker implements these three steps.
2012
Abstract
The TASBE (A Tool-Chain to Accelerate Synthetic Biological Engineering) project [2] developed a tool-chain (Figure 1) to design and build synthetic biology systems. These tools convert a circuit description written in a high-level language to an implementation in cells, assembled with laboratory robots. Each tool addresses a different sub-problem. This paper describes each tool and its key results.
2012
Toward Automated Design of Cell State Detectors
Abstract
There are a wide range of applications in which it would be useful to have a small synthetic biology circuit that could reliably classify cell state. For example, in [5], the authors propose cancer therapy based on a circuit that uses miRNA markers to test whether a cell belongs to a particular type of cancer and then kills only those cells. The authors then demonstrate a miRNA classifier that can distinguish between HeLa cells and several other cell lines. This same approach might be applied to therapeutics for many other diseases, as well as for high-precision assays that can monitor the cell-by-cell progress of a disease being studied, and for many other possible applications.
2012
A Method for Fast, High-Precision Characterization of Synthetic Biology Devices
Abstract
Engineering biological systems with predictable behavior is a foundational goal of synthetic biology. To accomplish this, it is important to accurately characterize the behavior of biological devices. Prior characterization efforts, however, have generally not yielded enough high-quality information to enable compositional design. In the TASBE (A Tool-Chain to Accelerate Synthetic Biological Engineering) project we have developed a new characterization technique capable of producing such data. This document describes the techniques we have developed, along with examples of their application, so that the techniques can be accurately used by others.
2011
Bridging Biology and Engineering Together with Spatial Computing
Abstract
Biological systems can often be viewed as spatial computers: space-filling collections of computational devices with strongly localized communication. Applying a continuous-space abstraction allows the behavior of such systems to be modeled or specified in terms of aggregate geometry and information flow. This can simplify both the engineering of biological systems and the application of biological models to the engineering of non-biological systems, as illustrated by examples from synthetic biology and morphogenetic engineering.
2011
Abstract
The field of synthetic biology promises to revolutionize our ability to engineer biological systems, providing important benefits for a variety of applications. Recent advances in DNA synthesis and automated DNA assembly technologies suggest that it is now possible to construct synthetic systems of significant complexity. However, while a variety of novel genetic devices and small engineered gene networks have been successfully demonstrated, the regulatory complexity of synthetic systems that have been reported recently has somewhat plateaued due to a variety of factors, including the complexity of biology itself and the lag in our ability to design and optimize sophisticated biological circuitry.
2011
High-Level Programming Languages for Bio-Molecular Systems
Abstract
In electronic computing, high-level languages hide much of the details, allowing non-experts and sometimes even children to program and create systems. High level languages for bio-molecular systems aim to achieve a similar level of abstraction, so that a system might be designed on the basis of the behaviors that are desired, rather than the particulars of the genetic code that will be used to implement these behaviors. The drawback to this sort of high-level approach is that it generally means giving up control over some aspects of the system and having decreased efficiency relative to hand-tuned designs.
2011
TASBE: A Tool-Chain to Accelerate Synthetic Biological Engineering
Abstract
There is a pressing need for design automation tools for synthetic biology systems. Compared to electronic circuits, cellular information processing has more complex elementary components and a greater complexity of interactions among components. Moreover, chemical computation within a cell is strongly affected both by other computations simultaneously occurring in the cell and by the cell’s native metabolic processes and its external environment. This complexity implies an engineering work-flow that is currently highly iterative, error-prone, and extremely slow—critical problems that must all be addressed in order to realize the potential of synthetic biology.
2011
Abstract
Synthetic biology is an emerging field in which biologists modify or design the behavior of organisms to engineer systems that perform computation in diverse biological applications. Synthetic biologists design such a complex system by composing basic functional units—e.g., a promoter or a gene—into a regulatory network that exhibits the desired transcriptional behavior. As the desired behavior becomes more sophisticated, the size of the network grows, the complexity of the design becomes an impending concern [2], and its assembly and verification, an arduous task.
2008
Cells Are Plausible Targets for High-Level Spatial Languages
Abstract
High level languages greatly increase the power of a programmer at the cost of programs that consume more resources than those written at a lower level of abstraction. This inefficiency is a major concern for the programming of biological systems: although advances in synthetic biology are beginning to allow bacteria to be programmed at an “assembly language” level, metabolic and chemical constraints currently place tight limits on the computational resources available. We find, however, that the semantics of the Proto spatial computing language appear to be a good match for engineered genetic regulatory networks, and particularly for describing the spatial differentiation necessary to construct tissues or organs.