Serckit Technical Report Draft
Serckit is the web-based infrastructure that serves the SERC office. It is a content management system and publishing platform for SERC-hosted websites and integrates a number of purpose-built web-based tools that support SERC projects. This report describes the state of Serckit as of the fall of 2019 and provides guidance on its future development.
Serckit Design Goals
The history and nature of Serckit is deeply intertwined with the work of the SERC office. So to understand Serckit it is important to understand SERC's work. SERC is a grant funded office at Carleton College. Founded in 2002, SERC engages in multi-institutional collaborative projects that work to improve science education in a wide variety of ways. SERC's role in a majority of these projects includes facilitating communication and dissemination of project materials. Serckit is the platform through which this work happens.
From this it naturally follows that Serckit's core goal is to support the needs of SERC projects. The scope and direction of these projects has evolved over time (e.g. SERC's original projects were entirely in the geosciences but that scope has expanded to include other disciplines and projects that address non-disciplinary issues in higher education). As a result the specific focii for Serckit's development have changed over time, responding to the project needs of the day. That said there are common and enduring needs that Serckit has tried to address that span many of those projects and which often play out at a scale beyond individual projects. These can be framed as 3 goals:
- Serckit provides an online platform through which projects can develop, publish and disseminate their materials and through which they can coordinate their events and activities (e.g. workshops, webinars). It also serves as an internal project platform for communicating, storing and sharing information, and otherwise managing project work.
- Serckit offers a central location where a community can discover and engage with the work of multiple projects. Almost every education project has an intended audience that is also, simultaneously, being served by other projects. Both the individual projects and individual users are better served when there are connections made between related projects. Connections that allow users to move among the resources of multiple projects so that they can discover those that best align with their needs. Serckit needs to facilitate this cross-project exploration while preserving the identity and integrity of the individual project websites. The Teach the Earth portal is a key example of Serckit playing this role for the geoscience education community.
- Serckit reflects and appropriately supports the reality that the projects and the communities they serve are not distinct players. Instead projects emerge from communities and are one mechanism through which a community realizes its aspirations. Serckit functionality needs to reflect community (and not just project) needs. This includes elements like support of community-driven synthesis activities and shared community repositories, as well Serckit's role in providing administrative tools for NAGT and supporting SERC's collaboration with NAGT around the On the Cutting Edge Professional development program.
The development of Serckit (which was called the SERC content management system or simply 'the CMS' up until 2015) started in 2002 when it was recognized that SERC's needs for efficient publishing of web materials was going to grow beyond the then standard practice which involved authoring html with desktop tools (e.g. Macromedia's Dreamweaver) and transferring those files to a web server via FTP. At that time there were a number of new systems that promised to allow creation of web pages directly through a web browser interface. This direction seemed well matched to SERC's needs to support a growing cadre of website authors working across the country on a variety of projects.
A system was envisioned that would allow authors to engage in the core web authoring tasks (e.g. entering and formatting text including images and downloadable files, building site navigation, etc..) through an entirely web-based interface. This would simplify the authoring process, reducing the complexity of supporting multiple authors as well as allow a number of web best practices (e.g. use of consistent design element, standardized navigation, creation of underlying html that met accessibility requirements) to be baked into the system via templates and authoring tools that didn't give authors the freedom to 'do the wrong thing'.
Simultaneously SERC was involved in several projects where digital library technologies were at the center. It was clear that the ability to author, manage, search and share metadata files (mostly encoded in XML) which described education resources was going to be an important capability for SERC to develop. So we envisioned a system that might combine web content authoring with these digital library capabilities.
A key decision at this point was whether to adopt an existing tool or framework for this system or to build from scratch. Although web-based content management tools are routine today in 2002 they were few and far between. Drupal 1.0 had just been released the year before and Wordpress didn't yet exist. There were a number of complex Content Management Systems (e.g. Vignette) with feature sets aimed at Fortune 500 markets (and price-tags to match) and a small set of nascent open source projects -- such as Drupal and Mambo -- with limited functionality, small user communities, and uncertain futures. None of these tools could fully meet our initial needs (especially with the regard to digital library support). It was clear that the capabilities we envisioned could become central to SERC's future and that it would be important to have control over a variety of implementation details that would be critical to our particular use cases. In the absence of a clear long-term winner we could hitch our horse to among the existing tools we embarked on creating a system from scratch. A key tenant in taking this step is that our particular use case would likely remain quite narrow (supporting educators sharing educational materials) compared to the broad aspiration of projects like Drupal. So by focusing on the specific functionalities needed by our projects, rather than making a more general-purpose CMS, we might be able to get the tools we needed with a limited (or at least achievable) development effort.
Coding on the new system commenced in October of 2002. An initial set of authors were given access to start creating content through the system in January of 2003 and the first public pages were published through the system in May of 2003. A set of integrated digital library tools were built into this new system which is described in this 2005 article in D-Lib Magazine http://dlib.org/dlib/january05/fox/01fox.html.
Since this initial work Serckit development has continued to be driven by the needs of SERC projects. The ability to author web content still sits as a core functionality of the system with a strong focus on the particular affordances needed to streamline the creation and support the adoption of high-quality STEM educational materials. This is complemented by a growing suite of tools for communication and project coordination, workshop and webinar facilitation as well as a range of purpose-built tools described in more detail elsewhere in this report.
Serckit currently plays host to 150 distinct web projects with 47,000 pages of content distributed among 4700 modules. These are visited by over 5 million visitors each year. The system hosts 25,000 user accounts, contact and demographic information for 70,000 people, 250,000 files and images, and 1200 email lists. Serckit distributes about 100,000 emails and receives about 2500 form submissions each month.
Serckit Development Principles
Over the past 17 years Serckit development has been driven by a consistent development philosophy rooted in the context provided by SERC's work. Each web project SERC engages with has specific needs and a well-defined budget for its website. Serckit's development has been guided by this series of project needs (and so indirectly by the priorities of the project funders) with project budgets providing a clear prioritizing force. At the same time the goals articulated above indicate that Serckit development should consider broader community needs even as it works to meet the requirements of individual projects. As a result we have arrived at a guiding set of development principles:
Prioritize features with cross-project utility
There is significant and intentional overlap in the goals and methods of most SERC projects. As a result the tools and functionality they need from their websites overlap heavily as well. So there are many cases where the development of a new Serckit feature, even if quite specialized, will end up being of utility to projects beyond the one that might initially fund its development. Identifying and prioritizing the development of features with a high potential for re-use by future projects has allowed us to develop a corpus of features well matched to any new project. A typical development trajectory involves new functionality being developed to serve the specific needs of an individual project. If the functionality seems to have potential beyond the originating project we iterate, expanding and evolving the functionality over the course of its use by several projects. Only when utility is proven and the exact needed functionality is ground-truthed across several implementations with several projects do we decide to make the investment to generalize a given feature.
Single Shared System
Beyond just sharing tools across projects Serckit has, since its inception, followed a single-system model. All SERC-hosted sites sit in shared space. There is a single copy of Serckit code running at any given time. Content is stored in a single database. The dividing line between content from different projects is something managed explicitly within (and by) the system -- rather than by running separate "copies" of Serckit for each project. These means the barrier between projects is semi-permeable. This allows for a flexibility that has been important for many projects. Content can be trivially 'shared' between projects. For example allowing a collection of teaching materials in one project to include materials developed by another -- or the development of portal sites that provide a view into content from across multiple (or all) projects. Likewise an individual's accounts and profile information can span their participation in multiple projects. SERC projects revolve around the transient and complex inter-relationships between PI's, communities of educators, funding streams and educational priorities. The single system model means SERC-hosted content can adjust its organization fluidly in responses to these changing project realities and also be organized in community-serving portals that transcend individual projects.
Serckit Use is Facilitated by Expert SERC Staff
SERC has avoided going the direction of low-cost, low-touch self-serve web hosting environments. There are many sources for this sort of service and our focus has been on providing a service for education projects that value having hands-on support from SERC staff. As a result much of the use of Serckit, especially the configuration and operation of some of its more sophisticated functionality, is primarily by SERC staff. This has a number of important impacts. First, it greatly reduces the need to build polished interfaces that guide novice users toward successful use of complex features. Instead we can rely on SERC experts to facilitate use of the tools; either providing hands-on guidance or simply doing the complex tasks themselves. Second, it allows us to deploy new functionality knowing that the initial users, our own staff, will be able to give us immediate and thoughtful feedback leading to rapid iterative improvement. Finally it means that our administrative tools can be tiered in their level of polish and orientation toward new users. Tools only used by staff can get by with simple, efficient interfaces while outward-facing tools used by broader audiences get more polished, guided interfaces.
Optimize for our Limited Set of Use Cases
Rather than attempting to build a general purpose tool Serckit is intentionally optimized to support the specific set of workflows, work styles and conventions of our specific audience: educators leading education-reform projects and the communities they serve. This means we can save significant effort by not developing the flexibility and functionality needed in a more general purpose tool. More importantly we can optimize interfaces that do matter to our users to match their needs. The words used, the sequence of steps and the output of each tool can closely match the expectations of our users. And because of the close support interaction between users and our support staff we can get continuous feedback on how well we are meeting these expectations.
Make Conservative, Long Game Choices for Underpinning Cyberinfrastructure
Serckit functionality is implemented on top of a traditional web infrastructure stack. There are a myriad of possible choices for each element in this stack with new options continually emerging that promise to magically solve the problems of the past. However, the core functionality we need from this stack would be adequately served by almost any toolset. It is the custom functionality we build on top of the stack that is Serckit's value added: not the speed of the http server or the technical superiority of the database or the buzzword compliance of the framework. So choices in the technical stack lean strongly to technologies that are simple, well-documented and have long track records of doing exactly the sort of work we need for many other people in a bullet-proof manner. Most importantly we choose technologies that are likely to be well supported far into the future. Re-architecting our search infrastructure, or rewriting code to fit a new framework, simply because the external technology we relied on has fallen by the wayside would be a huge distraction. Our infrastructure stack has evolved gradually over the 17 years Serckit has been running; staying in the sweet spot of well supported tools and several steps back from the bleeding edge.
Use the Simplest Stack Possible
A corollary to making conservative infrastructure choices is to make as few of them as possible. Each new tool, framework, abstraction layer or service brings with it the obligation to develop new in-house expertise and keep that expertise up-to-date. Each new element in the infrastructure makes the overall system harder to reason about, troubleshoot, secure and make reliable. So we lean strongly on the side of not adopting new infrastructure elements unless we're comfortable that they are worth this full set of liabilities.
Make Frequent, Small Improvements and Prioritize Operational Reliability
Serckit development has always been done by a very small team (one or two individuals) who also manage server operations. Code evolves in small, easily reversible iterations which are tested in staging environments and then rolled out to production servers. Small changes have isolated impacts (assuming code is well architected) making quality assurance and troubleshooting easier. Larger system changes when unavoidable are rolled out to internal (SERC staff) users only (even if that requires a bit more engineering) for full testing and ground truthing before release to all users. When bugs are discovered their full resolution (which means understanding root causes, not just rebooting a server and hoping) becomes the immediate top priority ahead of all other developer work. The end result is that Serckit has been in continuous operation since 2002 without significant unplanned downtime. Typical annual uptime exceeds 99.95% with the longest outage a 4 hour period in 2013 where the content was viewable but (intentionally) not editable as we migrated from local servers to AWS hosting.
The Development Principles in Action: Serckit's Database
You can see these development principles in action following the series of decisions around Serckit's core database. One of the earliest decisions in Serckit's development was the adoption of MySQL as the core data store for all information managed in Serckit. In retrospect this was as strong, long-game choice. MySQL was and is still a well supported, extremely widely used tool very well aligned with Serckit's needs. It supports our single system model (e.g. there is a single master table that holds the meta-data for all web pages across all projects hosted in Serckit) and it's used for storing data for a wide-range of functionality (currently over 250 tables) helps keep our overall stack simple. Our original implementation where the database lived on a single hard drive on a single Sun server, while a model of simplicity, was a bit lacking in reliability and durability. Over the years the system evolved to run on a RAID array (allowing the system to deal gracefully with hard drive failure) and then a cluster of 3 servers using Percona's XtraDB implementation of MySQL. This provided significant reliability improvements: an entire server could fail or be taken offline for maintenance without disrupting operations. But it brought with it management complexity as SERC staff had to development and maintain expertise in XtraDB. So while this system operated smoothly with very little attention for several years we were happy when the AWS Aurora system -- which offers identical functionality and reliability through a service largely managed by Amazon -- became available. Serckit's MySQL database moved to Aurora in December of 2018. Throughout this trajectory our choice of a widely adopted, mainstream tool meant that there were multiple viable upgrade paths that allowed for simplification of our internal operations and an increase in reliability without a need to rewrite tools (the same tables created in 2002 run happily on the current system) or disrupt users (the transition to Aurora was done without downtime).
Serckit's Place in the Ecosystem of Geoscience Education Technology
There are a wide range of tools and players in the world of geoscience education technology. It is useful to understand the niche Serckit occupies in this landscape in order to think about its current efficacy and future potential. Serckit, reflecting SERC's historical operating model has been designed to support education projects. In general each of these projects have a small set of leaders, clear goals, a limited lifetime with fixed timeline and budget, and a desire to have an impact broader than just a single institution. The projects value technology that directly support their project development needs (supported by Serckit's tightly focused and purpose-refined publishing and data collection tools) that allow a broad reach (supported by the shared platform and discovery tools) that extends beyond the limited timeline of their project (supported by the shared platform and its emphasis on longevity).
Because it is project leaders that choose to use (and fund the use of) Serckit they are its most important 'customer'. That said, since almost every project wants to use Serckit to reach a broad audience Serckit's ability to reach this larger audience (and serve them well) is key. This secondary audience is composed of individual educators looking for resources and ideas they put to use in their teaching and individuals working at the program or institution level similarly looking for resources and ideas to support their work.
It's critical to note that Serckit tools have focused largely on providing resources/information/ideas to individuals who then, through other means outside Serckit, actually enact some change in geoscience education. Faculty find materials within Serckit and then are the mediators: selecting and adapting the materials and delivering them to their students in class or through their local LMS. Administrators explore and share models for institutional change through projects hosted on Serckit, but the actual implementation happens on their own campus through their local tools.
This focus on not being the actual tool through which educational change is operationalized has a number of strong benefits. Early in its development it wasn't clear if Serckit could maintain the sort of operational reliability and scalability needed for a system that was on the critical path of delivering educational materials to students across the country. The mediated nature of the changes catalyzed by Serckit also helps ensure the flexibility and longevity of their impact. A faculty member provided with a strong idea for a teaching activity and some editable Word docs can adapt those materials for many settings. The ideas are unlikely to get stale and the handouts can be evolved locally as technologies change. In contrast self-contained online materials designed to be used directly by students are often impractical to modify to best fit local needs and can be fragile in the face of changing technologies. So Serckit development has intentionally focused on functionality that supports sharing ideas, using technologies that have a low chance of sudden obsolescence and staying out of the critical path of delivery.
In many cases Serckit facilitates the use of complicated or cutting-edge technologies by pointing visitors at external sources (desktop tools or 3rd party websites) while providing guidance on the how of using those tools effectively. An important open question is that, given Serckit's demonstrated stability and scalability and the evolution of technology, are there areas (and specifically newer technologies) where Serckit could expand further into delivering ultimate end-user services without risking the benefits it has derived so far from a conservative approach?
Description of Serckit
Below we describe the current state of Serckit: the infrastructure on which it runs and the major elements of its functionality.
Hardware and Network Infrastructure
In December of 2013 Serckit was transitioned from hosting on dedicated hardware in a data center at Carleton College to being fully hosted within Amazon Web Services (AWS). This move was motivated by a desire to make the overall system more robust and reliable. It does this by taking advantage of the AWS's capability to inexpensively deploy redundant servers and storage across multiple data centers. The result is a system that is far less vulnerable to the inevitable hardware failures and network connectivity issues than our previous hosting arrangement.
Serckit currently runs in an AWS Virtual Private Cloud networking environment using a standard configuration that leaves most servers in a private network not directly exposed to the internet. Incoming web traffic is funneled through an Application Load Balancer with administrative access to servers tunneled via SSH through a bastion host. DNS resolution is provided by the AWS Route 53 service. Most incoming requests for static content are directed through Cloudfront edge caches.
Serckit runs on a set of 6 generic virtualized servers (t2 EC2 instances) as well as two AWS managed servers that provide back-end caching (Elasticache) and two others that provide managed database services (Aurora). The virtualized servers make use of network block storage (EBS) for local uses (OS, transient files, Serckit code) but most data (e.g. all user uploaded files and images, system backups --approximately 2 terabytes in total) are stored in AWS S3.
A common system image built on top of a Ubuntu long-term service release is deployed on all 6 virtual servers. Different servers are assigned different roles. Two act as redundant front-end web servers. Requests are first passed to NGINX which handles simple requests (e.g. static file delivery or redirects). It proxies more complicated requests back to separate Apache processes running mod_php. Most core Serckit code is written in PHP and executed here. A third virtual server primary provide search services. It runs the Solr search engine within a Tomcat container. All access to the Solr instance is mediated by PHP code which both pushes data in for indexing and runs queries against Solr to feed various Serckit search functionalities. This instance also runs a legacy digital library tool in the same Tomcat container to provide external sharing of XML metadata records via the OAI-PMH protocol.
A fourth virtual server acts as a centralized logging repository across the cluster (using syslog-ng) as well as running time-intensive background batch processing (e.g. link checking, generating image thumbnails). The fifth server runs mailman, an open-source email list package, who's interface we have loosely integrated with the rest of Serckit. Outgoing email from all servers is relayed through the Sendgrid mail service. The sixth server is configured as a NAT gateway and SSH-only bastion host following a standard AWS recipe.
The php processes (on the two front end web servers and the batch processing host) use the redundant pair of Elasticache servers, running memcached, for caching a wide variety of data, strictly for performance enhancement. They also make heavy use of the redundant pair of Aurora managed database servers where most of Serckit's core data is stored in a MySQL database (approximately 86G over 250 tables).
Network level security is provided by a standard configuration of AWS VPC that leaves all but two hosts only reachable via the internal private network. Server-level firewalls (ufw) and VPC security groups both enforce blocking of ports other than those explicitly used by applications. Most public access to servers is only available via https through an AWS application load balancer that directs traffic to the two web hosts. The server that hosts mailman is also on a publicly routeable address in order to receive incoming mail on port 25. Administrative access to servers is via ssh (public key authentication only) through a bastion host with AWS security groups used to restrict ssh access to IP's on Carleton's internal network. Our use of a simple technology stack with few external dependancies minimizes exposure to security holes in 3rd party tools.
Development Possibilities: While Serckit's existing security position is reasonable it could benefit from a thorough review against a modern standard such OWASP ASVS to provide a stronger assurance that our current configuration and practices are appropriate. There are some obvious areas for improvement such as broader logging to support security incident detection and CSRF mitigation. Also, our existing authorization model gives internal SERC staff direct access to all user-submitted data. In some cases (e.g. the collection of sensitive evaluation data via online forms) it would be useful to have an additional layer of security that would further restrict access to this data to only those individuals with a demonstrated need for access.
Serckit infrastructure operations are predicated on the reality that there is very limited available staff time for routine system administration. On a day-to-day basis the server infrastructure is designed to require little to no active attention. System monitoring, via New Relic and internal scripts and checks, are set up to notify staff about exceptional situations: performance degradation due to aggressive crawlers, abnormally high CPU loads, low disk space on temp volumes, etc.... Server capacity is designed to typically run at loads of 10% leaving sufficient headroom to absorb most exceptional traffic situations. Backup is automated with scripts that replicate database and local server storage to S3 on at least a nightly basis. Duplicate copies of the system database are moved (nightly) out of AWS to encrypted storage on a server in the SERC offices. Databases are also backed up using Aurora's internal tools with 35 days worth of snapshots retained. Application logs from all AWS servers are aggregated (using syslog-ng) to a central EC2 instance then archived in S3 with duplicates off-site. Database logs are collected and managed internally by Aurora. Updates to systems are largely driven by the need for new functionality (e.g. an upgrade of Solr to make use of new features) or when staff time permits strategic overhauls (e.g. a move from a custom MySQL XtraDB cluster to Aurora in late 2018).
Development Possibilities: Serckit would benefit from a regular schedule of OS and application updates. EC2 instance configuration documentation has not been significantly updated since the move to AWS in 2013 and could be revisited with an eye toward making the instance deployment process more streamlined. Disaster recovery processes could be more clearly documented with special consideration given to their reliance on the availability of key SERC staff.
Serckit Development PracticesAll code for Serckit, including server configuration files, (but with the particular exception of secrets such as API keys and database passwords) are managed with Git using a private remote repository on Bitbucket. Developers work locally on development machines that are setup to match the live server configuration including recent copies of the production data. Development testing is done on these local machines and then deployed via simple shell scripts that rsync code onto production servers over ssh. Since SERC has historically operated with one, and on occasion two, developers synchronization issues are minimal and are easily handled via Git (and by virtue of being close enough to yell over to the other developer...). Likewise code architecture and style consistency have been easy to maintain because of the limited staff.
Development Possibilities: We anticipate having at least two developers going forward which points toward additional formalization around some of these processes.
Major Elements of Serckit Functionality
This section outlines key elements of Serckit. It has a particular focus on tools that are distinctive in the system as compared to other content management system. Each section also includes a development possibilities section which touches on features not currently in Serckit. These both give a sense of where the edges of functionality lie as well as potential directions for future enhancement.
Serckit includes a fairly traditional account infrastructure for managing access. Visitors can create Serckit specific accounts which are then used to provide access to private elements (editing pages, downloading private files, accessing administrative web interfaces, etc..) based on group membership. The system provides standard mechanisms for password reset and email verification of account ownership. In most cases access to private elements in the system managed by SERC staff (by controlling their membership in groups that mediate access) with some specific streamlining for key use cases. For example visitors submitting new site content have immediate access to edit 'their' pages -- though not access to make them public, and SERC staff can pre-authorized accounts for access to private areas before the accounts are created by the end user -- in order to streamline the setup of areas for new participants.
Development Possibilities: Two-factor authentication is currently not supported which might be advisable especially for some administrative users. The system has no OAuth support and no integrations with external authentication sources (Google/Facebook/etc..). Since account creation is self-serve there are a large number of unused accounts some likely created by bots. While not posing any obvious threat, mechanisms could be develop to clean out unverified or inactive accounts. Delegation of access is currently entirely mediated by SERC staff and in some cases it may be advantageous to have a system that allows project leaders or others control access for limited sets of resources.
Web Site Authoring
The system allows uploading of arbitrary files which can then be embedded in web pages for download. Files in web-native image formats (jpg, png, gif) are automatically recognized and can be embedded in a variety of responsive display formats and are automatically rescaled as needed. Powerpoints can be automatically reshared via Slideshare. Display of equations is supported via Mathjax, and iframing or embedding of external resources is allowed through a system that requires approval by staff to screen for security issues. The core editing functionality is described in the user documentation
Development Possibilities: The system's Quickedit interface, which allows direct editing of pages where they live, has been recently revamped for increased ease of use, especially for first-time users (see the Quickedit introductory video) The more full-function back-end editing interface, typically used by more advanced users, hasn't had this kind of visual reworking.
The underlying wysiwyg editing widget is drawn from an out-of-date toolkit (Dojo). We anticipate replacing it with a more modern tool (ckEditor 5) which has the potential to support features such as Google Docs-style simultaneous editing. There is currently no integration with external video sites (e.g. cross-posting of videos loaded on the site to project Youtube channels) which has been requested by multiple projects.
In addition to tools for making websites designing for public consumption, the system has the notion of private workspaces. These are modules where groups can create web pages to record their internal work. Workspaces build on top of core site editing features adding in-page tools for managing the people, files and pages associated with the particular workspace.
Projects use workspaces as an organizing platform for their internal work: minutes from working group meetings, drafts of documents, records of group decision. Workspaces strike a balance between flexibility: any workspace member can easily create a new blank page and start using it for any purpose; and structure: navigation is created automatically and can be augmented through judicious linking to apply organization to a group's resources. It allows project teams to move away from the chaos of "it's all in email but not everyone was included on that message and where is that attachment we sent last year". Teams also find that the embedded navigational options afforded by working in hypertext allows for a degree of organization that can be lacking in large projects that follow the "toss everything into a shared Google Drive folder" approach. Workspaces have the added benefit of SERC staff taking care to make sure all the right people have access at the right time.
Workspaces are also used heavily to support group work during workshops and other events. In a typical use a private workspace is setup for participants, pre-populated with pages containing the prompts for each breakout group which are linked to directly from the workshop program. The working groups then take notes and record synthesis directly on their group's page following the in-page prompts. The results are immediately visible to all workshop participants for reference as part of a report outs or further development, reframing, or analysis by evaluators post-workshop.
The core content provided by authors is composited into final pages with a system that combines page-level templates with project-specific settings collectively referred to as 'chrome'. The templates are used to enforce consistent structure within the core page with some pages using single 'blank slate' template and others being highly scaffolded (e.g. the ActivitySheet template with separate fields for description, context, goals, teaching materials, etc....). The 'chrome' setting for a given page encompasses both the selection of an appropriate header and footer (developed by SERC staff) and setting of various toggles that control overall look and feel (e.g. selection of CSS style sheets, application of custom CSS, selection of the project color palette, activation of site search boxes, behavior of automatically generated navigation menus, etc...). There are a core set of existing templates and default chrome elements that have been deeply tested for accessibility and cross-browser compatibility. Which makes starting up a new fully functioning site very efficient. At the same time the underlying templating structure is completely flexible. This has been used in a number of projects where we needed to match the look and feel of an existing external partner site. In these cases we were able to quickly achieve an identical look and feel largely through judicious copying of the underlying html and css of the external site into a new page template and chrome.
Development Possibilities: The exact separation of concerns implemented through the template and chrome structures has evolved in reaction to use over Serckit's life. The current practices work well but there are still some older legacy projects who's structure in this regard don't reflect current practices and could use updating.
Making sure the core html generation and page compositing features of Serckit generate code that meets accessibility standards has been a focus since its inception. In general the system defaults to (and encourages authors to) generate reasonably semantic html and avoid practices that cause obvious accessibility issues. The system prompts for alternative text for images and supports captioning of videos. Standard templates include skip links and overall page structures are keyboard navigable.
Development Possibilities: There is a current effort to raise the bar on the level of the accessibility of the system. In addition to a review of the automatically generated html elements, a new system is being developed (with supporting best practices documentation) that will allow projects to do systematic checks of their content for accessibility and track the results. Items like uploaded Word documents likely pose the greatest accessibility challenge and one Serckit itself can't fix. But we hope the new tools will at least help projects understand the scope of the issue, have guidance on how to improve the accessibility of their materials and track the progress of that work.
Copyright and Provenance
Serckit includes a set of tools for tracking provenance and reuse status of all the individual files, images and pages within the system. It relies on reasonable defaults, timely prompts as authors upload materials, and clear documentation to get correct details in the system around the intellectual property status for all content in the system. The resulting information is then available throughout the website through a combination of author bylines and copyright information, specific to the local content, in each page's footer.
Development Possibilities: Serckit obviously can't ensure the provenance and reuse information people choose to enter is correct. In projects where this is a priority manual checks are needed. The current system doesn't include any mechanism to distinguish information that has been provided by the initial author (who perhaps just blindly accepted a default setting) versus those that have been externally vetted. A way to track external vetting of IP status would be useful for projects that have otherwise tracked this information external to the system.
Complementary to the public-facing web pages (and private workspaces) Serckit has a series of web-based back-end management interfaces. These are used largely by SERC staff to control setting for most of Serckit's functionality. This includes task such as the creation and editing of modules, managing accounts, developing new headers and footers and creating new search interfaces. A small set of these management interfaces are visible to authors and project leaders (largely around creating web pages). There has also been initial development of some management-focused data displays which automatically plot community members, email list use and site activity over time. These are used by project leaders to understand the activity of their project.
Development Possibilities: There are many instances where expanding access to management interfaces beyond SERC staff, especially project leaders, would be desirable. We envision a project level dashboard, building on our existing data displays, that gives project leaders more control over elements of their project within Serckit (e.g. the ability to create new areas of their website) with the strong emphasis on exposing information already within the system that would be useful for project management: web analytics, project participants, broken links, accessibility review status, currency of information, etc... This would require significant new development work as the current management interfaces are very utilitarian and designed for expert users (SERC staff) working with a cross-project perspective. There are a complementary set of improvements that could be made to the existing management tools that would streamline a variety of tasks that SERC staff perform based on what we know are common work flows.
Search, Controlled Vocabularies and Other Metadata
Serckit has a range of digital library functionality that has grown out of SERC's early work in the education digital library community. Our controlled vocabulary system allows for the creation of arbitrary hierarchical vocabularies (345 to date) that are then used both for internal project needs as well as for cross-project categorization of materials. These include broadly used vocabularies such as subject and resource type as well as local implementations of external schemes such as NGSS. Individual pages in the system can be manually tagged with vocabulary terms and automatic tagging is supported through the definition of cross-vocabulary synonyms. These direct the system to recognized cases where a specific vocabulary term (or combination of terms) have been applied to an item and then automatically apply other vocabulary terms based on known relationships. For example items tagged with the CLEAN project's internal tag for 'Water Cycle' are automatically also tagged with the more generic 'Hydrology' tag within the cross-project Subject vocabulary. A complementary 'indirect vocabulary' system allows for automated bulk tagging of all the pages within a module or within sets of modules.
Page level metadata -- including vocabularies, keywords, date information (e.g. for pages describing events) and authorship information -- are directly managed through the normal authoring tools and then exposed as metatags in the pages themselves. These then feed our Solr-based search infrastructure which is kept automatically in sync with site content. Solr holds both this metadata and the textual content of pages which is them available to search against using standard information retrieval techniques. Similar information is collected and exposed via search for individual files and images uploaded to the system allowing search to extend across these media.
Serckit's search-building tools then provide a direct interface where SERC staff can define new search interfaces which can be scoped to particular areas of the site or based on any existing controlled vocabulary combination. There are variety of different ways the search results can be exposed and all the resultant search interfaces can be found throughout Serckit powering traditional site search as on the SERC front page as well as driving over 1000 different specialized search displays. Once defined a given search interface can be dropped into any page in Serckit through the normal editing tools. These search interfaces can contain the usual full text search capability as well as exposed faceted search based on one or more controlled vocabularies. The system also supports the development of custom search interfaces that expose the search controls through visual interfaces such as this tool from the CLEAN project that allow discovery of climate education materials based on key NGSS elements. The search infrastructure also participates in other systems such as our review tools and user management interfaces.
In addition to traditional web pages Serckit also support the creation and management of digital library records, primarily in Dublin Core format. A set of OAI services allow the harvest and distribution of the metadata for both these digital library records as well metadata records describing sets of Serckit web pages. This system is used to share collections with external partners (e.g. CLEAN).
Development Possibilities: Enabling effective discovery across the large collection of materials in Serckit is a core challenge. While the system and its exposure of vocabularies through faceted search is quite effective for the many small project collection, cross project search is not yet at the level we would aspire to. There are a wide variety of standard optimization processes and modern features in the current version of Solr that offer a clear direction for improved search. The digital library tools, and especially the cataloging tools, have not been used significantly in recent years. It would be useful to rethink the nature and use of these tools as there is increasing interest in portal sites that allow discovery of materials hosted outside Serckit.
Cross-Project Navigation, Portals and Beyond
Serckit has a number of tools that support user discovery of materials that span multiple projects. Portals, such as the Teach the Earth site, provide a community with small set of pages intentionally designed to provide a common entry-way into resources from multiple projects. A core element is the provision of single search interface across all the materials. This is straight-forward to provide with our standard search tools given our single system model. In many cases it's useful to develop portal-specific controlled vocabularies that allow the materials to be organized and searchable along dimensions relevant to the community. This is well-supported by Serckit's controlled vocabulary system with strong use made of automated tagging via synonyms and indirect vocabularies. Complementary to this top-level search across the materials from multiple projects, the local, specialized search interfaces provided within individual projects often leverage the ability to expand their scope across all of Serckit. For example the InTeGrate project wanted to provide visitors with a collection of sustainability-focused teaching activities. The search interface they provide to this collection not only includes materials developed by and collected from the community over the course of the InTeGrate project, but also sustainability related activities from other Serckit-hosted projects. This was facilitated by the controlled vocabulary system which allowed us to quickly identify activities across Serckit that had a sustainability focus and automatically tag them with the InTeGrate specific topical controlled vocabulary.
Another system that promotes cross-project navigation is Serckit's automated recommender system. It displays a set of 'other pages you might like' in an Amazon-esque style in page footers. The recommendations are based on similarity to the text of the current page as well as associations based on controlled vocabularies such as other activities that use the same teaching method or are tagged with the same topic. The recommendations are intentionally tuned to point visitors to relevant pages that are distant (navigationally) from their current location, promoting leaps across project boundaries.
Development Possibilities: The effectiveness of the existing set of cross-project navigation elements has not been closely examined. These various elements could be instrumented and data collected to understand the degree to which they are used. This would lead naturally to a cycle of revision and the exploration of other navigation modalities based on actual use data.
Clones and Page Sets
Serckit generally implements a model where a given piece of content lives on a single page, at a single url, in a fixed place within the navigational hierarchy of a specific project. However, on several occasions we've extended the system beyond this direct model. Cloning is a feature in Serckit that replicates the content in a module across multiple modules allowing the same content to appear within, and branded appropriately for, multiple projects. This was a critical enabler in the Pedagogies in Action project where we built custom portals, each with a subset of our core content, designed to integrate navigationally with external websites. Several projects have also needed the ability to presents sets of pages in a given sequence, reusing the same page over in different sets. This led to the development of the page set feature that serves this purpose. With both these tools the core content is not actually replicated but instead the system handles wrapping the same content in different chrome and navigation depending on the urls it is reached from.
Development Possibilities: We have been fairly conservative about using both these features as they significantly complicate Serckit's core logic and also open the possibility of significant user confusion when identical content is found in different contexts.
Serckit's tools for creating and managing the input from online forms is heavily used with over 5000 forms in the system to date. A web-based point-and-click form authoring interface allows the creation of forms containing standard html form elements as well as some custom elements (e.g. a multi-select that auto-fills with education institutions, and tools for accepting online payment (via Stripe)). Forms are placed within pages via the standard authoring tools and an automatically generated web back-end allows management of the submissions. A mapping system allows SERC staff to setup automatic translations between content submitted through a form and standard Serckit pages. This powers a number of systems where users submit materials (e.g. a description of a teaching activity) via an online form that prompts them for the needed components and the result is automatically translated into a templated web page (which they can then edit further as desired). The form tools are used to support event application and registration, collection of information from the community (teaching materials, program descriptions) as well as to support assessment and evaluation efforts (surveys and feedback forms).
Student Data Collection
Largely built on top of the form tools are a number of mechanisms to specifically support collection of student assessment data for projects engaging in education research. The system can track courses that will be the source of student data and allows for anonymized tracking of responses from students that come through our online forms, are bulk entered via spreadsheet or via a system that allows scans of hand-written (or even drawn) responses to be cropped online to indicate a single student response. The responses can be automatically scored (e.g. for multiple-choice questions) or hand scored (via a system that allows sampling and delegation of scoring via a web interface). Results are aggregated across courses to generate project-wide CSV exports that are appropriate for offline analysis. The system includes tracking of IRB-required consent and interfaces for managing large-scale data collection.
Serckit also includes a focused set of data collection tools built to support a specific project (EvalauteUR). This system allows multiple institutions to collect evaluation data from students and faculty engaged in undergraduate research. This system orchestrates a fixed, scripted set of steps in which participants are automatically prompted via email to respond to a series of surveys over the course of their research process. Institutional representatives have a dashboard where they can monitor progress and export a simple statistical analysis of the cumulative data. The initial grant-funded implementation will move to a fee-for-service model in 2020 and new grants are funding adaptation of this service to new contexts.
Development Possibilities: The main student data collection system requires significant programming effort to configure for new uses and its capabilities reflect the specific needs of the InTeGrate project that originally funded it. Modifying the system so that it is flexible and easy to adapt for a range of new projects would require significant re-writing. The EvaluateUR tools are narrowly-scoped by design and new work largely revolves around building tools to manage the fee-for-service activities, as well as whatever adaptations are driven by the new grants.
Communities and Participant Tracking
Serckit contains a number of systems designed to allow projects to track and manage the individuals who engage with their projects. Parallel to the system of user account and security groups that control access to materials, Serckit has a system for tracking profile information about individuals and the communities they belong to. The system automatically extracts and combines information people provide through the projects (e.g. when registering for an event, or submitting an activity) into an individual profiles. These are then publicly viewable through profile pages that reflect all the ways an individual has contributed and interacted with projects hosted by Serckit. Individuals can manage their profile information through the associated account. Projects can obtain aggregate demographic and participation information (including institution information drawn from the Carnegie Classification) which is then of high value for project evaluation. This system centralizes processes like collecting consent for participating in project evaluation. Sets of profiles can be aggregated into 'communities' which the system then uses to streamline communication and facilitate project management (e.g. taking attendance at recurring meetings, populating email lists) This community structure also interacts (and in most cases is synonymous with) the security groups that are used to manage access to Serckit content and features.
Development Possibilities: The parallel notions of accounts and groups (for security) and profiles and communities (for understanding participation) have slowly been merging both in their use and the underlying code. Ideally these elements would be transparently merged which would greatly simplify the code base. Project access to demographic information is currently very limited (requiring assistance from SERC staff) both due to a lack of appropriate interfaces and the associated privacy concerns which make providing the most appropriate interface a challenge that will take careful consideration.
Serckit's suite of communication tools is central to the work of many of our projects. Email list services are provided by the mailman open source tool, with integration into the rest of Serckit that allows lists to be automatically syncronized with Serckit communities and most user interactions (subscription/unsubscription) to be managed directly through Serckit interfaces, rather than mailman's interfaces. For one-way broadcasts (i.e. newsletters) Serckit has a 'community broadcast' system where messages are authored as standard web pages and can then be automatically restructured to meet the needs of email delivery and bulk delivered to members of one or more community. This system allows for tracking of delivery and is less likely to generate the false positives as spam that impact traditional email list mechanisms.
Serckit also has a message board feature. Individual conversation threads can be activated at the bottom of specific pages (e.g. to facilitate commenting on page content) and full discussion boards can be created and embedded through the standard web interface. Individuals can subscribe to email notifications about new activity in discussions of interest.
Development Possibilities: Serckit currently integrates with the previous generation of mailman (2.4). The 3.0 version of mailman is a re-architected ground-up rewrite and will likely require a complete overhaul of how Serckit integrates with mailman. Given how little of mailman's functionality is current in use (bounce processing and membership management are all handled outside mailman) and the existing tools that support email delivery for community broadcasts, it may be more efficient to simply replicate the core email list functionality within Serckit rather than adopt and integrate with mailman 3.0. The community broadcast system has, to date, only seen significant use by SERC staff and interface refinements may be required to meet the needs of the broader audience.
Serckit includes a framework for developing online review systems to support formal peer review of teaching material. Configuration of a new review system involves defining the exact workflow of the review process (assign item to a reviewer, reviewer completes review form, etc...) and development of review instruments (implemented through the Serckit's form-building tools). The systems provide both a management interface that provides an overall view of the status of all items under review (leveraging Serckit's search tools) as well as a simplified view that shows reviewers the items assigned to them and direct access to the relevant review forms. Review outcomes are tracked by the system and can flow automatically into the Serckit pages being reviewed or the catalog records representing external sites under review. Serckit pages can automatically display the results of the reviews they have undergone and search returns are automatically adjusted to favor materials that have been favorably reviewed. This system has been used to build supporting interfaces for seven different materials review processes with over 9000 individual teaching resources under review. These project-specific review systems range from the relatively simple to complicated multi-stage, multi-reviewer processes used to sort, vet and rate thousands of resources.
Development Possibilities: Configuring a new a review system is currently a programmer-intensive task reflecting the significant flexibility in the tools. Now that we have working examples of several review systems it's likely we can identify a core set of functionality that will be of highest interest to future projects. Focusing on this core functionality we could build interfaces that would allow more rapid deployment of new review systems within a more limited envelope of functionality.
Development Possibilities: There is a clear need to better expose the (anonymized) use data to projects in a ways that will help their project management and evaluation. While the potential utility of data mining the integrated profile and analytics data is high, it is less clear how to navigate the related privacy implications and data processing needs. Supporting ad-hoc queries of the full dataset would both require a different database approach (likely the use of a highly-denormalized columnar database) and the construction of a complex new interface. Alternatively it may be that we can identify a small set of standardized, high-value, analyses that can be presented to projects without exposing the full complexity of the data and open the door to difficult privacy issues.
NAGT Membership Management
In 2014 SERC took over membership management services for the National Association of Geoscience Teachers. To support this significant new functionality was added to Serckit to allow for membership management. This includes web based interfaces for subscription, account management, donation handling and associated communication (e.g. automated email lists based on membership status and role within NAGT). The SERC staff who handle the day-to-day operations of NAGT use this system heavily as they interact with and recruit members.
Development Possibilities: SERC will continue to manage NAGT membership for the forseeable future and so this system must be maintained and kept up-to-date with NAGT needs. In theory the system could be adapted to offer member services for other similar organizations. However, experience to date with the complexity of matching this system to NAGT needs and the broad availability of commodity member management systems makes that path less obviously fruitful.
Ideas for Future Development Directions
The many potential future directions for the evolution of Serckit can be aligned with key themes of the 2019 GEI workshop.
Strengthening Existing Cyberinfrastructure
The stability and long-term availability of the content hosted within Serckit is of key importance to projects and individual users. Projects value the reliability of Serckit while they are actively using it and especially value that content within Serckit continues to be available even after their project has ended. This is critical both because of the (otherwise very difficult to address) need to speak to sustainability as part of their initial grant proposals and because the overall impact of many projects is only actually realized after the grant period is over. Serckit currently hosts many teaching materials who use in classrooms post-funding is larger than it was during their period of active funding. These materials would not be having the same impact if their wide-availability came to an end when the project reached the end of funding. To fulfill this long-term-availability commitment to the community it's important that Serckit be managed, operated and evolved in ways that maximize it's long-term stability. There are 4 clear ways in which this can be advanced:
- Serckit should include process and tools that support the long-term archival tracking, maintenance and updating of existing content. This could an include systems to facilitate and track the execution of regular content review over older content. These systems would allow the easy identification of older materials in need of review and track the result of review for all content across the system. These systems could expose currency information to visitors ("this page was developed in 2012, and was last reviewed in December 2016") and support appropriate weeding of the collection. Since many of the resources within Serckit were submitted by individuals (e.g. as part of a workshop) there is also potential in developing a system that automates contacting those the original contributors and engaging them in updating the materials they submitted in the past.
- Update and systematize server operations. The long-term stability of Serckit is predicated on the robustness of the underlying server infrastructure. As noted in the Operations section there are a number of steps that can be taken to ensure Serckit's server infrastructure is on the strongest footing possible.
- Ensure Serckit security practices are robust. Although Serckit has not suffered a security incident to date, it's clear that the possibility of security incidents, whether leaks of sensitive data or even cyberblackmail, pose a potentially existential threat to any IT system. So engaging in a review to ensure Serckit's development, configuration and operations follow all current best practices is a clear priority.
- Another unrealized but potential risk to Serckit's long-term stability is over-reliance on the specialized institutional knowledge of key staff. Serckit functionality for end-users and SERC staff has been systematically documented to remove reliance on the expertise of any one individual in how Serckit is best used. But the documentation around how Serckit is configured, operated, and its architecture for future development, is less throughly recorded. That knowledge is mostly held by the individuals who have developed and operated the system. There is a clear need to develop systems of written documentation that would allow the smooth hand-off of Serckit operations and development in ways that don't depend on knowledge held by specific individuals.
Bringing Serckit into Wider Use
While Serckit is widely used as a place to find resources it still does not serve the full spectrum of geoscience education users that might benefit from it: both end users looking for resource and projects looking for a platform.
We hear repeatedly from users that while they appreciate the breadth of materials hosted in Serckit that same breadth presents a real challenge. It can be difficult to find the best resource for a current need given the size of the collection. Serckit includes a range of features that address this challenge (search interfaces, recommender systems and multiple mechanisms to curate specialized collections) but there is significant room for improvement. The effectiveness of the existing cross-project discovery mechanisms has not been deeply assessed. A discovery improvement cycle driven by ongoing measurement of user discovery behavior, and the effectiveness of Serckit's discovery affordances in responding to that use, could inform iterative refinement of the existing discovery tools and the development of new discovery functionality.
Other barriers to broad use include potential accessibility issues and challenges with use by users on mobile devices. The accessibility situation and potential strategies to address it are mentioned above. The universal design approaches that are the hallmark of sites that fully meet accessibility guidelines would certainly strengthen Serckit's potential to serve all users well. Similarly, we have a seen a steady rise in the number of visitors who reach Serckit via a mobile device: currently 30% of all visitors. Serckit page templates, base style sheets and html rendering were largely reworked in 2012 to use a responsive design approach. This along with mobile-friendly navigation menu behavior sets a reasonable baseline for use of Serckit sites on mobile devices. However, there has not been a systematic review to ensure full functionality of all Serckit features on mobile devices. Doing so would be an important step to fully engaging this growing set of users.
A key barrier to use of Serckit by projects is cost. A significant element of that cost, especially for small projects, is SERC staff time needed to configure and manage project spaces within Serckit. This presents an opportunity for improvement as Serckit's back-end administrative interfaces could be dramatically streamlined by automating many routine tasks that SERC staff currently perform manually within Serckit. From the initial configuration of new project webspace to the monitoring of copyright and accessibility issues across a project Serckit administration interfaces have room for optimization if that was made a development priority. The result would more efficient use of SERC staff time and concomitant reduction of the costs for individual projects.
Additions That Would Enhance Use
There are a myriad of ways that Serckit functionality could be enhanced to more directly and effectively fulfill the needs of the projects it hosts and allow them to be more impactful on the community.
One possibility is an online project dashboard providing centralized information and tools for a given project in one place for use by project leaders. This would include a range of features: from a view of the participants active across different elements of a project, to analytics data about the use of the project website, to centralized access to project management interfaces. While most of these features are present to some degree in the existing system they generally require support from SERC staff to navigate and use effectively. Pulling them together into a centralized, easy-to-use location will both empower project leaders to be more self-sufficient, free up SERC staff to focus on tasks that require higher-level expertise, and also expose project leaders to existing features they may not otherwise be aware of. This new dashboard would be a key enabler in allowing the wide range of Serckit features, developed over the last decade to support a wide range of individual project needs, to be fully exploited by future projects.
One specific focus, that could play out in the dashboard feature but extends beyond it, is providing more actionable data to individual projects that will support their formative and summative project evaluation efforts. Serckit currently collects a rich set of raw data for active projects. This includes website use data and other participant data (e.g. workshop registrations, contact information collected through materials contribution forms) that when combined can be used to paint a useful picture of the ongoing progress and impact of a project. Currently that data aggregation and analysis process is a complicated and labor-intensive exercise that only the largest project can afford as part of a significant project evaluation element. While we certainly can't automated a true project evaluation, there are a number of standard data aggregations and summaries that would make sense across a wide range of SERC hosted projects. These projects often have similar evaluation needs around understanding how their materials have been adopted. New Serckit functionality could be developed to generate a number of standardized reports and to streamline the use of some data collection approaches that have proven effective in larger projects. This would provide smaller projects with a stronger set of baseline insights into their project effectiveness.
Another direction in which Serckit could better serve projects and other contributors is to support more formal publication mechanisms. An easy step in this direction would be to develop support for assigning DOI's to materials published through Serckit. This would involve coordination as a member of CrossRef and development of tools and databases to support assigning DOI and ensuring they always resolve to the correct location for Serckit materials. Individual contributors would also benefit from having easy access to data and reports that characterize their contributions within Serckit, and the degree to which the community is using those contributions, in formats that align with faculty needs for tenure and promotion. These could conceivably be automatically generated from existing authorship metadata and analytics.
More broadly there are a number of key areas where geoscience education technology is advancing such as student use of authentic data and the multi-directional growth of visualization tools like 3d modeling and virtual reality. Serckit's role in supporting the use of these new tools remains an open question.
Improvements Specifically Targeted at Diversity, Equity, Inclusion and Geoscience Education Research
The use of Serckit as a tool to support Geoscience Education Research (GER) is steadily expanding. The existing functionality is described above with the form system used extensively to collect data and the formal student data collection system in use with several projects. As noted above there are clear areas where these tools could be revised to make them of higher utility. There is also ongoing exploration of Serckit as a long-term archive for project and community research data. In many cases Serckit is already a de facto community archive as it holds the data collected, via existing tools for multiple projects. It will be important to align any Serckit development in this direction with best practices from the research data archiving community such as the guidelines from DataOne as well as the literature and best practices of the evaluation and social science data community.
There is an open question about what other tools and services the GER community needs and what role Serckit might play in delivering them. The EvaluateUR project, where Serckit has been used to develop a single-purpose data collection and tracking system offered to institutions as a turn-key, for-pay system, suggests one possible direction.