Tuesday, April 15 – Tutorials and Community Meetings |
7:30—17:00 | registration desk open |
7:30—8:30 | continental breakfast Main Hall |
8:30—10:00 |
Led by: Steve Tuecke | tutorial materials
We will provide an overview of Globus Online and the Globus Toolkit, with demonstrations of key functions such as file transfer, dataset sharing, and Globus Connect Personal endpoint setup.
|
10:00—10:30 |
beverage break TCS Foyer |
10:30—12:00 |
Led by: Steve Tuecke, Raj Kettimuthu | tutorial materials
We will walk system and network administrators through the process of using Globus Connect Server to easily configure a campus computing cluster, lab server, or other shared resource for file transfer and sharing.
|
12:00—13:00 |
lunch Main Hall |
13:00—14:30 |
Led by: Steve Tuecke, Raj Kettimuthu | tutorial materials
We will dig deeper into setting up a campus data service with Globus Connect Server, including integration with campus identity systems and filesystem configuration.
|
14:30—15:00 |
beverage break TCS Foyer |
15:00—17:00 |
Led by: Dina Sulakhe, Alex Rodriguez | tutorial materials
We will introduce the Globus Genomics service and demonstrate how it can be used to scale analysis in a typical genomics/bioinformatics laboratory.
|
15:00—17:00 |
Led by: Steve Tuecke, Rachana Ananthakrishnan
We invite campus computing administrators (and all Globus users) to an open rountable discussion. We would like to gather feedback on Globus usage and solicit input on new feature requests. This will be a unique opportunity to meet and engage with many Globus team members in an informal setting. Bring your toughest questions!
If you're planning to attend, you may want to browse our new feature request forum and select features that are of particular interest to consider in the discussion. We also have a working document for this session.
|
17:00—18:00 |
reception TBD |
|
Wednesday, April 16 |
7:30—17:00 | registration desk open |
8:00—9:00 | continental breakfast Main Hall |
9:00—10:30 |
Ian Foster, Globus co-founder
| slides
| video
Globus is an increasingly critical component of research cyberinfrastructure on campuses large and small. Ian Foster will review notable deployments and provide an update on product directions.
|
10:30—11:00 |
beverage break Foyer |
11:00—12:30 |
- Stories from the Campus Front
Main Hall
-
Daniel Milroy, System Administrator, Research Computing, University of Colorado - Boulder
| slides
Data sharing has become a major challenge for most researchers since most research collaborations are multi-institututional. To address the challenges that researchers face from large-scale data storage, data management and data sharing the University of Colorado Boulder’s research computing group has integrated three different Cyberinfrastructure (CI) components:
- The PetaLibrary, a storage service providing high-performance short-term storage as well as long-term archival storage via data migration to a tape tier.
- The CU-Boulder science network, a high-performance ScienceDMZ optimized for large scale data transfers.
- Globus data transfer and sharing for fast access to the data as well as global data sharing.
The PetaLibrary can be accessed across an NSF CC-NIE funded ScienceDMZ, with core network bandwidth of 80Gb/s. In addition to traditional data transfer utilities, the PetaLibrary employs Globus Connect Server for fast fire-and-forget transfers, and sharing. The storage clusters export their filesystems via Clustered NFS to Data Transfer Nodes, which are aggregated into a Globus Connect Server cluster. Each node features redundant connectivity to the ScienceDMZ via active/active 10Gb bonded NIC in one case, and active/failover 40Gb/dual 10Gb NICs in the other. These hosts enable CU Boulder's researchers to quickly transfer terabytes of data to Personal and Server endpoints, and collaborate with researchers at partner institutions via Globus Sharing. Monitoring is accomplished via a conventional suite of Nagios, Ganglia, and BRO IDS, with access to Globus usage statistics and error reports through Flight Control. In this talk we will present the overall design of this integrated CI for data sharing as well as several use cases in combination with networking and Globus use statistics.
-
David Swanson, Director, Holland Computing Center, University of Nebraska-Lincoln
| slides
The Holland Computing Center (HCC) at the University of Nebraska (NU) provides high performance and high throughput computing resources to researchers from multiple campuses in three cities. Managed resources are located in multiple facilities. This leads to the need for a manageable, effective and convenient mechanism to move data between several centrally managed clusters, as well as collaborative resources distributed both within and beyond the NU system. Recently HCC has implemented several Globus endpoints to fill this need. Configuration options consistent with current data management practices at HCC and remaining open issues will be presented.
-
Michael Padula, Systems Consultant, Cornell CAC
| slides
Using Globus, staff at the Cornell Center for Advanced Computing (CAC) enabled customized, automated, and secure backups of research data generated by the Human Neuroscience Institute (HNI). We combined several tools and services: Globus Connect Personal to enable data access; Globus Connect Server to enable data backups; Globus command line interface (CLI) to initiate transfers and manage end points; and, CAC’s archival storage service to store the data. Bash scripting and cron were used to schedule the backups. This automated approach met the Institute’s reliability and security requirements and delivered cost savings by providing a relatively inexpensive second copy of their data. This 15-minute presentation will describe how CAC and Globus staff worked together to overcome the challenge of gaining sufficient access to the necessary data on the HNI file server using Globus Connect Personal. It will also describe how CAC staff manages end points with minimal human interaction, including starting and stopping Globus Connect Personal and activating and auto-activating the end point on the file server. Finally, we will describe the challenges surrounding engineering a sustainable solution and mitigating those issues by using dedicated credentials in a secure manner for the Globus service and CAC systems.
-
Pamela Gillman, NCAR
| slides
NCAR's Globally Accessible Data Environment, GLADE, provides a unified and consistent data environment for NCAR's HPC resources. Data is globally available to computational, data analysis, and visualization clusters along with science gateways, data service gateways, and common data transfer interfaces, while high-bandwidth networks provide the underlying infrastructure. GLADE helps streamline the scientific workflow process by providing an information centric capable workflow and Data Transfer services provide data management solutions.
|
12:30—12:35 |
a word from our sponsors Main Hall |
12:35—14:00 |
Main Hall lunch |
14:00—15:30 |
- Globus Experiences
Main Hall
-
Krishna Muriki, User Services Consultant, Lawrence Berkeley National Laboratory
Karen Fernsler, Systems Engineer, Lawrence Berkeley National Laboratory
| slides
Big Science at Berkeley Lab is generating Big Data. Experimental facility upgrades and high performance instruments are allowing Lawrence Berkeley National Laboratory scientists to perform measurements and analyses that were previously not possible. As a result, the scientific data created by these research experiments is increasing both in volume and in rate so that it is now less feasible to process it locally and we are now constructing Data Pipelines to transport and process the data on remote resources. In many cases, scientists will use convenient, but slow and inefficient methods for moving their data; however, tools such as GridFTP and services such as Globus provide better performance and capability which handle large data sets with ease. This presentation highlights some of our projects, including the Sloan Digital Sky Survey III and the XRay Diffraction and Microtomography Beamlines at the Advanced Light Source user facility, which benefit from the implementation of Globus endpoints.
We talk about how we implemented a number endpoints to support different access mechanisms needs by the project users like restricted data access, authenticated and anonymous data access, and read-only access. We will also discuss how we chose to implement a Science Gateway for our Molecular Foundry nanoscience center using the Nice and Easy Web Toolkit (NEWT) developed by the National Energy Research Scientific Computing (NERSC) center. NEWT relies on the Globus Gatekeeper (GRAM) service for implementing the job and queue resources that are made available to the web portal developers. Maintaining Globus Gatekeeper service can be a challenge for resource providers so the HPCS implementation of NEWT avoids using the Globus Gatekeeper service and instead relies on the GSI enabled OpenSSH and the GridFTP service. In the second part of this presentation we will explain this NEWT implementation without the GRAM service.
-
Rob Gardner, Senior Fellow, Computation Institute, University of Chicago
| slides
We describe an approach for providing “connective” services to campus IT and research computing organizations that facilitate resource access and sharing for research groups that may collectively utilize campus-based condo clusters, national-scale HPC resources such as XSEDE, cloud-based resources, and distributed high throughput computing (HTC) facilities such as the Open Science Grid (OSG). The approach suggested is to provide a value-added service to a local CI team’s existing service catalog, leveraging the Globus Platform following a few simple principles:
- Leverage the reliable file transfer and data sharing services from Globus which also provides linkage to institutional identity management services (via CILogon and InCommon) as well as group and metadata functions, in a time-sustainable fashion.
- Integrate Globus transfer services with compute sharing capabilities from CI Connect which easily bridges local and shared remote resources into a single user environment via HTCondor and U-Bolt technologies.
- Leverage proven, distributed high-throughput technologies of the OSG that deliver large-scale science computation over shared, federated computing resources in the US.
- Provide wide area computational capabilities by hosting services in a Science DMZ with 100 Gbps network capacity; actively monitor such environments with PerfSONAR-PS tools and dashboards.
- Provide these capabilities using platform-as-service and software-as-service models which reduce demand for on-site expertise and operational maintenance while taking advantage of rapid developments in the community which are incorporated in the services.
We will demonstrate three deployments already in production: one which provides a "retail" compute and data service for HTC computation on the Open Science Grid; one which provides an extension of a campus computing center to the OSG and a bridging connector to a second campus through a collaborative sharing agreement; and finally a service which provides a project-based compute and data service gateway for a international scale project involving 44 U.S. institutions, providing connectivity to bridged campus clusters and grids, dedicated-project clusters, and to XSEDE computational resources.
-
William Mihalo, Northwestern University
| slides
Approximately 100 terabytes of data were uploaded to our cluster storage system at Northwestern University using Globus. The transfers took place over the course of 4 months using external USB disk drives connected to a Mac workstation. The transfers were done over a 1 GB network connection. Globus Connect Personal was used on the Mac Workstation and Globus Connect Server was run on a dedicated server that was directly attached to our cluster storage system. This presentation will summarize the results of the transfer and some factors that influenced the transfer speed.
-
Matthias Hofmann, Robotics Research Institute, TU Dortmund University, Germany
| slides
The European Globus Community Forum (EGCF) is the organizational body of the Globus community in Europe. Its members are users, administrators, and developers who are applying the Globus Toolkit as their middleware or are interested in doing so. It informs the community about European and international developments in the wider context of Globus software, arranges and coordinates meetings and workshops for users, developers, and administrators, and connects projects and communities within Europe and beyond for exchanging knowledge and sharing experience with Globus software. We will provide a brief overview of the current state of Globus communities in Europe and efforts that have been made in the past; the software and services maintained by EGCF; and European Globus events. We will also present ideas on how Globus and EGCF can be addressed in the new EU funding programm "Horizon 2020" and discuss EU specific requirements and problems (big data and data privacy).
|
15:30—15:35 |
a word from our sponsors Main Hall |
15:35—16:00 |
TCS Foyer beverage break |
16:00—17:30 |
- Globus as National Cyberinfrastructure
Main Hall
-
Barbara Hallock, NET+ and Campus Bridging Systems Analyst, Indiana University
| slides
Campus Bridging refers to efforts to make cyberinfrastructure appear to researchers as proximal and easy to use as peripherals for a laptop. Campus Bridging strategies include technologies to improve data management, job submission, sharing of resources, and lowering knowledge barriers to using national cyberinfrastructure. Globus is a vital component of Campus Bridging efforts in the XSEDE project and elsewhere. This presentation details the role Globus plays in campus bridging and highlights the capabilities Globus can provide to campus, regional, and national collaborations, as well as identifying some of the challenges to Campus Bridging strategies and open questions about Campus Bridging initiatives.
-
Kieron Mottley, Business Manager, New Zealand eScience Infrastructure
| slides
The New Zealand eScience Infrastructure (NeSI) provides state-of-the-art High Performance Computing (HPC) systems, facilities and infrastructure, international linkages, and expertise to researchers in New Zealand (NZ) across the public research sector and in private industry. As part of our long-term strategic planning, NeSI aims to develop a national high bandwidth data transfer service for moving and synchronizing large amounts of research data across NZ institutional boundaries. There is a need to invest in and establish such a national capability to allow NZ researchers the ability to transfer large datasets between their home institutions and compute resources such as NeSI’s HPC facilities. There is also a need to share this data with colleagues in NZ thus enabling NZ science to participate in a much broader network of national and international collaborations. This will provide NZ researchers with the ability to use national structures or systems to support the development and/or scaling up of local institutional projects to an international level.
To meet these needs NeSI is developing a national high bandwidth data transfer service roadmap in collaboration with Globus. This roadmap will allow NeSI to build on data transfer capabilities originally started under another NZ project, BeSTGRID. NeSI must overcome a series of challenges to meet this goal. These challenges include (i) navigating through a multi-institutional collaboration framework to establish and maintain the service with partner research institutions and other services providers while obtaining buy-in from potential institutional customers, and (ii) organizational issues with establishing and operating a shared infrastructure, both overall for the business and specific to this transfer service. A national service that provides high bandwidth data transfers is good economics for local institutions by facilitating the means by which researchers use shared resources at other institutions. A common approach to accessing and moving large quantities of data that is simple and intuitive to use will help build the demand for these shared resources.
-
Niall Gaffney, Texas Advanced Computing Center, University of Texas - Austin
Dan Stanzione, Acting Director, Texas Advanced Computing Center, University of Texas - Austin
| slides
The Wrangler supercomputer will be the first of a new class of systems designed for data intensive supercomputing. Featuring an innovative embedded analytics system, Wrangler is projected to deliver I/O bandwidth at up to 1 Terabyte per second, and a transaction rate of up to an unprecedented 200M+ IOPS. The high transaction rate will make Wrangler uniquely suited for emerging classes of Big Data applications. In addition to its analytics capability, Wrangler will feature a large scale (10PB+) online storage facility geographically replicated between TACC and Indiana University. Wrangler will feature Globus, which will facilitate high-speed transfer to/from other XSEDE systems as well as other campuses and institutions, and will be host to a number of new Globus services. In addition to Globus, Wrangler will accept data on physical media via its “data docking” service. The project will enable fast, simple data transfers and ingestion; offer flexible software management tools for diverse data sets; provide powerful analysis tools that maximize hardware capabilities fully exploring large-scale data; and provide expertise in transferring, hosting, and analyzing data.
This talk will provide an overview of the Wrangler system and project, and discuss the ways Globus will enhance the user experience with Wrangler. Wrangler will be deployed at the end of 2014 with support from the US National Science Foundation, in a partnership between the Texas Advanced Computing Center, Indiana University, the Globus team at the University of Chicago, Dell, and DSSD.
-
Frédérick Lefebvre, Scientific Computing Analyst, Calcul Québec
| slides
This talk will present the scope of Compute Canada's digital storage infrastructure and how Globus has been deployed by participating sites to facilitate data transferts over a high speed national network. Additionally, ongoing work to unify data transfer services and associated challenges will be discussed.
Compute Canada is leading the creation of a powerful national HPC platform for research. This national platform integrates High Performance Computing (HPC) resources at six partner consortia across the country to create a dynamic computational resource. Compute Canada integrates high-performance computers, data resources and tools, and academic research facilities around the country.
|
17:45 | TCS Foyer depart for dinner event |
19:00—21:30 | Dinner Event Museum of Science and Industry
|
|
Thursday, April 17 |
7:30—12:00 | registration desk open |
7:30—8:30 |
continental breakfast Main Hall |
8:30—9:30 |
-
Nancy Cox, Professor and Section Chief, Section of Genetic Medicine, Department of Medicine, University of Chicago
| slides
The pace of data generation through sequencing is astonishing, as are the opportunities for using these data to enhance discovery and push translation. Comparative genomics focused on "the" sequence can, however, be more tolerant of the errors that propagate throughout the generation of sequence data than science that is directly focused on characterizing the consequences of genome variation for human health. We illustrate the value of consensus variant calling through application of a software package implementing a variety of the most commonly used variant calling packages to sequence data from a variety of projects. Particularly in applications to data in which members of families have been sequenced, the enhanced quality of the consensus calls is notable and easy to document.
|
9:30—10:00 |
-
Vasily Trubetskoy, Data Analyst, Department of Human Genetics, University of Chicago
| slides
We will demonstrate the key capabilities of Globus Genomics, drawing on use cases from recent deployments in genomics cores and analysis labs to illustrate the flexibility and scalability of the service.
|
10:00—10:30 |
beverage break Foyer |
10:30—12:00 |
- Globus Genomics Experiences
Main Hall
-
Yuriy Gusev, Senior Bioinformatics Scientist, Innovation Center for Biomedical Informatics, Georgetown University
| slides
Processing and analyzing large volumes of genomic sequencing datasets introduces great challenges including secure, fast and reliable data transfers, the availability of scalable computational resources for analysis, and defining reproducible and reusable data analysis workflows. Over past year we have established a collaboration between the Globus Genomics team and ICBI at Georgetown. As a result of this successful academic collaboration the service has been prototyped successfully for the Innovation Center for Biomedical Informatics (ICBI) to enhance translational research at Georgetown University Medical Center. We have created exome and whole genome analytical workflows, a RNA-seq workflow, and have demonstrated the processing and analysis of raw NGS data at scale for multiple samples.
We setup Globus endpoints at ICBI’s local storage and S3 based storage to provide easy access to the sequence data and to seamlessly transfer the input data into Globus Genomics for analysis. The analysis of whole exome paired-end sequences of size 6.2gb took ~30 minutes to complete using Globus Genomics. The analysis of a 70gb whole genome (paired-end) took ~12 hours. The performance of the RNAseq workflow has been optimized to 5 hours per sample. These times include automated input data transfers from S3 into the Globus Genomics platform and also transferring the results to backup storage on S3. During this preliminary prototype work, we have achieved 20X performance improvements by allowing the simultaneous analysis of 20 samples in parallel. The Globus Genomics platform offers a powerful and efficient tool for the transfer and analysis of next-gen data for clinical and research purposes especially in Cancer Research where whole genome sequencing is quickly becoming a routine clinical practice.
-
Toshio Yoshimatsu, Department of Medicine, Section of Hematology/Oncology, University of Chicago
| slides
The advent of massively parallel sequencing has enabled us to detect causative gene mutations, such as single nucleotide variants (SNVs) and insertions and deletions (Indels) at whole genome level. The challenge, however, is on minimizing the false positive and false negative detection rates when analyzing the enormous volume of genetic variants in the genome data. Another challenge is the handling of big data, where transferring and computing is a severe bottleneck. Here we demonstrate a method that utilizes Globus Online and Globus Genomics to improve computational time as well as sensitivity and specificity of mutation calls.
-
Rama Raghavan, Bioinformatics Specialist, University of Kansas Medical Center
| slides
Next-generation sequencing (NGS) technologies have forever changed the landscape of biomedical research. However, the speed at which data can be generated has outpaced many institutions' abilities to process and store these data in a cost-effective manner. In particular, many Bioinformatics Cores have spent enormous amounts of time and immense computational power in the development of the infrastructure needed to accommodate NGS studies. With the development of Globus Genomics, the Bioinformatics Core at the University of Kansas Medical Center and the University of Kansas Cancer Center have taken an hybrid approach to the analysis of NGS data, using both Globus Genomics and a local high performance compute cluster at the University of Kansas. By incorporating Globus Genomics with its on-demand provisioning of Amazon Web Services, we are able to bridge the gap between hardware availability, fluctuating demand for computing, and the increased use of high throughput sequencing technologies. We will present examples of how Bioinformatics Cores at medical centers can use Globus Genomics to facilitate the analysis of NGS research projects in a cost- and time-effective manner that benefits the researchers, the Bioinformatics Cores, and the Medical Centers. We hope this system will lead to larger translational research thereby leading to faster innovations.
-
Anoop Mayampurath, Senior Research Scientist, Computation Institute, University of Chicago
| slides
Despite significant promise, mass-spectrometry (MS) based proteomics has yet to be integrated into clinical settings. A chief limitation is the substantial informatics requirements of large-scale patient-based clinical proteomics. Traditional pipelines are difficult to implement and expensive to maintain and often fail to provide efficient data processing, annotation, and metadata handling. We are currently developing Globus Proteomics, an analytical framework that will enable the out-sourcing and democratization of computational needs for large-scale proteomic analyses to a proportionate cloud-based platform. The framework will allow users to specify, with just a few clicks, the location of the MS data and the specific analysis workflow that is to be run. With Globus features such as Globus Transfer, Globus Proteomics will then manage the file transfer securely to Amazon cloud storage, followed by deployment of necessary system requirements on Amazon compute cloud instances, and the execution of the analysis workflow specified via the Galaxy framework. All steps are automated with little user interaction. The use of low-priced Amazon spot instances allows analyses to be performed at modest costs and permits experiments with alternative analysis methods, including new methods that combine multiple analysis approaches.
|
12:00—12:05 |
a word from our sponsors Main Hall |
12:05—13:30 |
Main Hall lunch |
13:30—15:00 |
-
Main Hall
Steve Tuecke, Globus co-founder
| slides
Globus employs multiple security mechanisms for dealing with authentication and authorization across distributed cyberinfrastructure. We will explore the technical details underlying these mechanisms and describe Globus functionality for identity management, endpoint activation, federation, etc.
-
Randy Butler, Cybersecurity Director, NCSA
| slides
Identity and Group Management services are the lynchpin to securely managing all XSEDE services and resources. Identity and Group Management cover a broad set of functionality, however often times the reference is focused on the direct creation and management of user and group identities and associated authentication credentials. There are a number of additional IdM components including registration, allocations, account management, reporting, as well as user and application interfaces.
The XSEDE IdM system was developed initially during the TeraGrid days and has been thoughtfully evolved to its current XSEDE design. The XSEDE IdM system, to it's credit, has served the effort well now for 14 years and continues to provide a high level of reliability and performance. There are, however, reasons to consider a new or modified design in order to more fully take advantage of federated identities and groups, as well as the motivation, drive and vision of a provider that is focused on providing identity and group management and federation services such as are offered by Globus. This talk will outline the XSEDE IdM system, the new evolving requirements and discuss plans for how XSEDE will continue to advance this system.
-
Stu Martin, Globus Engineering Manager, University of Chicago
| slides
We will provide an update on new GT contributions, updated packaging, and changes to our development, code management, and release approaches. We will also discuss integration of GT components into Globus Connect Server.
|
15:00 |
adjourn |
|