SeaMonitor Data Platform

McClintock Lab

Management of diver-collected data has traditionally relied on an ad-hoc collection of tools and processes. Such data are often captured on waterproof paper datasheets, transcribed to Excel, and then analyzed in a statistics package. The results are later transmitted by way of printed or electronic reports to decision makers. Even under ideal circumstances (i.e., with highly trained, disciplined researchers) this workflow is error prone and inefficient.

Below we will describe our vision for a software service that would dramatically improve the collection, visualization, and sharing of marine ecological monitoring and assessment data. By managing the entire life-cycle of these data in a single open platform, higher quality data could be more easily shared and used in decision-making with less difficulty than with an assortment of general purpose tools. This is not an exhaustive list of what this platform would do, but a collection of demos and mockups that give a small peek at what high quality, purpose-specific software could achieve.

Data Entry

Example #1, Entering fish transect data

Transcribing data from paper datasheets is a significant time commitment and the first place mistakes appear. A streamlined and easy to use interface should make data entry a quick task that can be performed in the field, with or without network connectivity. When transcribed soon after collection, handwriting can be more easily descerned, mistakes corrected, and unwritten details recalled.

The system should catch missing-value errors, enforce taxonomic lists, and perform other data validation tasks without frustrating the user. It must also allow for exceptional circumstances like additions to a species list.

The demo below can be used with an example datasheet, This demo and the accompanying example datasheet are based on the fish sampling protocol described in Coral Reef Monitoring Protocol for Assessing Marine Protected Areas by Gabby Ahmadia, Joanne Wilson and Alison Green. which will highlight features such as validation of fish size observations based on information from eol.org.

Example #2, Uploading photo quadrats

Cloud storage solutions make it easy to store, organize, and browse large volumes of photo data. Using simple heuristics and the increasing sophistication of camera hardware Many cameras such as GoPro offer wifi APIs for accessing photos that could be used to upload media without disassembling underwater housings or fumbling with memory cards. Additionally, they may have features to flag important photos using metadata that could be used to tag replicates in a transect. SeaMonitor should offer tight integration with these features and distribute innovative hardware plans that improve data collection efficiency. , it should be possible to organize a day's media in a single step. Photo quadrat classification could be performed in the tool, saving images for later reference or classification by multiple observers for increased accuracy or as part of citizen-science projects.

Quality Control and Data Analysis

Example #3, Monitoring observer bias and outliers

Critical to the process of collecting data is quality control. Routines for finding outliers, assessing observer bias, and catching transcription errors would be run continually as data is collected. Project PIs can subscribe to email notifications when data quality problems are detected and use tools to resolve inconsistencies.

Example #4, Visualizing data

Data visualization is a key part of the platform. Visualization options should be relevant, beautiful, and fast in order to support free exploration and serendipitous insights. By generating derivative measures such as #/m² and percent-cover, abundance and density of taxa can be compared across sampling protocols. More complex analyses can be used to analyze the similarity of sampling sitesNonmetric multidimensional scaling (MDS) has been used in the California Channel Islands to evaluate the similarity of study sites based on subtidal community composition, exposing biogeographic patterns that can help evaluate the effects of MPAs. It's an example of the type of analysis that's complex to setup using analytical tools like R but with well standardized data structures could be made available out of the box in SeaMonitor. , and site and taxonomic groupings can be specified to evaluate marine protected area performance.

Data in the example below is from the US National Park Service Kelp Forest Monitoring Program NPS is utilizing a variant of MarineMap developed by our lab called Pyrifera to visualize monitoring data and over 20 years of underwater video transects. .

Sharing Data

Sharing data and methodology is essential to the scientific process but is often hindered by technical complications, demands on staff, and concerns over attribution and licensing. All these issues can be addressed by SeaMonitor.

When creating a project, PI's can set embargos on publishing based on data collection date, and filter vulnerable species data from results. Setting clear usage guidelines via an open data license ODbL and Creative Commons licenses are now used by many publishers to provide open access to research data. Opportunities to coordinate with publishers on licensing and promote links to data on SeaMonitor could drive users to the platform and simplify publishing for researchers. would be a required step when publishing data. PI's would invite collaborators to their project with fine-grained access control, and the system tracks all edits based on user account with support for undoing changes. Up to date data would be made available in both raw formats and summary tables via download links and an open API, freeing researchers from manually creating data products on request.

This looks great. How can we make it a reality?

The McClintock Lab has extensive experience developing complex software services from our work on SeaSketch. With a core set of flexible features, SeaSketch can be configured to support a broad set of use cases, from planning marine sanctuaries in Barbuda, multi-sector marine spatial planning in New Zealand, to coordinating data collection efforts within NOAA.

We believe that we can take this experience along with our experience visualizing monitoring data In 2005 our principal developer created one of the first Google Maps mashups to visualize PISCO subtidal monitoring data from the California Channel Islands. This idea eventually grew into content for Google Ocean and an application for the National Park Service's Kelp Forest Monitoring Program. to create a similarly flexible tool that can be used by a variety of organizations. The web, database, cloud, and mobile technology needed to build this vision are not only available but mature and well understood. We're looking for a handful of partners to work with designing a flexible framework for an initial set of sampling protocols that can grow over time. If you are interested in participating, please get in touch with us and share this document with your collegues.