All sessions were accessible online via the Whova platform. Online attendees were able to participate in the live discussion, but sessions were optimized for in-person attendees.
(click on a presentation title to view abstract)Tuesday, May 10, 2022
Sessions will be held in the Chagall Ballroom
|
|
---|---|
8:00—17:00 | registration desk open Chagall Foyer |
8:00—9:00 | breakfast Van Gogh |
9:00—10:15 |
Ian Foster, Globus Co-Founder Rachana Ananthakrishnan, Executive Director, Globus slides | video We will review notable events in the evolution of the Globus service over the past year, and provide an update on future product direction and sustainability. |
10:15—10:30 | break Chagall Foyer |
10:30—12:15 |
Ryan Fraser, Manager, Research Infrastructure & Engagement, AARNet
slides | video To make the most of the extremely high-bandwidth network capacity that is available within (and out of) Australia for the research and education sector; a uniform, parallel and secure data transfer and management service is needed. Too often, Australian researchers have been resorting to using portable storage devices to move data around as transfers fail. A service to support data movement, leveraging the network was needed. During 2021/22, AARNet piloted Globus with the Research and Education sector. Globus, providing fast, parallel and secure file transfers and data management; proved to be well equipped to support researchers in meeting their transfer and data movement needs. The deployment of Globus across Australia, would also standardise transfer endpoints across the research and education sector’s facilities. We will present the early results and learnings of the utilisation of the service, along with future-plans for rollout across Australia to support growing data needs of institutional researchers. Further, we will highlight the opportunities the sector has identified through the utilisation of this service. |
Eliu Huerta, Lead for Translational AI, Argonne National Laboratory
slides | video I will discuss a variety of examples that showcase how AI, scientific data infrastructure and supercomputing may be combined to address computational grand challenges in big data physics experiments. |
|
Laura Biven, Branch Chief for Integrated Infrastructure and Emerging Technologies, Office of the Director, National Institutes of Health
slides | video Abstract coming soon. |
|
Forrest Hoffman, Distinguished Computational Earth System Scientist and Group Leader, Computational Earth Sciences, Oak Ridge National Laboratory
slides | video The Earth System Grid Federation (ESGF) is a global peer-to-peer network of data servers that archives and distributes the planet’s largest collection of Earth system model output for thousands of climate and environmental scientists worldwide. After recently providing data from a variety of climate change scenario projections to inform the United Nations Intergovernmental Panel on Climate Change (IPCC) Sixth Assessment Report (AR6), ESGF will be re-architected to improve its reliability, scalability, and usability for the next generation of high resolution global Earth system model outputs from dozens of international modeling centers. Building on Globus technology, ESGF2 will move data as needed among US high performance computing centers and across simulation and analysis platforms to enhance the productivity of Earth system scientists. |
|
Andrey Shinkarev, Systems Architect, AbbVie
slides | video AbbVie, a rapidly growing global pharmaceutical company has several research centers in US and Europe. The company relies more and more on in silico development for drug discovery and process development, with heavy use of HPC resources. Compute and storage resources are consolidated in the major processing centers in the US due to ever increasing requirements for power and cooling. Data is generated at the company research centers and needs to be transferred for processing to the computational centers and public cloud solutions. The company centers are inter-connected with corporate intranet leveraging high speed SDWAN solutions. Numerous firewalls are deployed separating the corporate intranet from the Internet and public cloud. No incoming connections are allowed to internal resources from the Internet per corporate policies. Ever growing data sets require new ways of transferring data. After extensive testing of in-house data transfer applications and native operating systems’ tools, Globus Connect Personal demonstrated high data transfer rates. Globus Connect Personal didn’t require incoming connections from the internet, which complies with corporate policies. The company has deployed a series of Globus Connect Personal endpoints across the globe, leveraging existing virtual HCI infrastructure to allow data transfers over corporate high speed SDWAN solution. Globus Connect Personal Endpoints provided high data transfers rates, easy access to users’ data and reduced data friction for individual users and groups. |
|
Michael Reich, Director of Software Development, Mesirov Lab, UC San Diego
slides | video The GenePattern platform for integrative genomics has supported the analysis needs of the worldwide genomics community since 2004, with its main cloud-based server providing thousands of analyses per week for a user base of over 80,000. The web-based gateway provides hundreds of analysis methods in an accessible format that can be used without the need for programming and provides extensive capabilities for the creation of workflows and the reproducibility of analyses. In 2017, we released GenePattern Notebook, a Jupyter Notebook-based interface allowing scientists to embed GenePattern-hosted analyses within notebooks and providing additional functionality to support genomics workflows for scientists at all levels of programming expertise. As new genomic technologies such as single-cell RNA-seq and spatial transcriptomics become more prevalent, the size of datasets challenges the architecture of web-based gateway applications. We collaborated with Globus to incorporate Globus data transfer into our GenePattern application to provide robust data transfer of large datasets seamlessly and intuitively within the gateway user interface. We will discuss the science that our Globus integration enables and the challenges of adapting Globus data transfer to a multi-user web application architecture. |
|
12:15—13:30 | lunch and informal discussion Van Gogh |
13:30—14:30 |
Moderated by: Bob Flynn, Program Manager, Cloud Infrastructure & Platform Services, Internet2 Panelists Sarah Bailey, Collaborative File Sharing Service Lead, UC Berkeley Jim Leous, Team Leader, ITS Emerging Technologies, The Pennsylvania State University Christopher Clements, Director of User Services, San Diego State University Hellen Zziwa, Director of Strategy and Engineering, Technology Partner Services, Harvard University Charles McClary, Manager of Research Storage, Indiana University slides | video The cost of collaborative cloud storage is quickly growing beyond the reach of many institutions used to the halcyon days of free or low-cost unlimited storage from Google, Box and others. Those days are now gone, replaced by careful examination of storage use, drafting of data lifecycle and storage guidelines, as well as the development of strategies for distributing the data to more appropriate platforms where, in some cases, the costs can be more directly borne by the owners of the data. All of this planning is contingent upon being able to move the data from one location to the other both affordably and manageably. Could Globus be the answer to those challenges? The speakers on this panel come from a range of public and private R1 institutions and include storage admins, research technologists, Globus admins and those who must bring them together. At this point the challenges outnumber the solutions, so expect a lively debate around questions disinclined to easy answers. |
14:30—14:40 | break Chagall Foyer |
14:40—15:45 |
Moderated by: Jason Zurawski, Science Engagement Engineer, ESnet, Lawrence Berkeley National Laboratory Panelists Michael Benedetto, CISO, Deputy CIO & Director IT, American Museum of Natural History Jim Leous, Team Leader, ITS Emerging Technologies, The Pennsylvania State University Nathaniel Mendoza, Manager, Networking, Security & Operations, Texas Advanced Computing Center slides | video The ESnet Science DMZ model has been widely recognized as the gold standard in network design to support data intensive science use cases. This model, widely deployed throughout the R&E community, has adapted to changes over the previous 10 years but still features a core set of design principles to ensure friction free data movement, while enabling a high degree of security and flexibility. This panel will introduce several different deployment models, featuring members of the Globus Community, and will touch on issues of performance, security, and onboarding users. Bring your questions to ask the panelists about how they have convinced their leadership of the needs, costs, and outcomes in adopting this approach. slides | video Data mobility is critical to most scientific workflows, and Globus helps to enable this for a large section of the R&E community. Is it possible to determine if your site or facility is performing at the peak capabilities of the software, hardware, and network you have available? The Data Mobility Exhibition provides the means to run simple tests, evaluate expectations and seek assistance for making improvements. |
15:45—16:00 | break Chagall Foyer |
16:00—17:00 |
Kyle Chard, Co-lead, Globus Labs; Research Professor, University of Chicago Ben Galewsky, Senior Research Programmer, National Center for Supercomputing Applications slides The de facto standard for research data management is applying the Software-as-a-Service model to research computing. Over the past two years the Globus Labs team has developed funcX, a service that facilitates function execution across a federated ecosystem of compute endpoints spanning personal laptops through supercomputers. Having proven the model in select research projects, we would like the community's input as we work towards a generally available product. We will provide an overview of the funcX service and solicit feedback from research computing practitioners on design, implementation, deployment and management to ensure the product effectively addresses the majority of use cases. This session is a unique opportunity for you to help shape upcoming product releases and will be optimized for in-person participation. |
17:00—19:00 | reception Van Gogh |
Wednesday, May 11, 2022 Sessions will be held in the Chagall Ballroom | |
---|---|
8:00—17:00 | registration desk open Chagall Foyer |
8:00—9:00 | breakfast Van Gogh |
9:00—16:00 |
Office Hours
Gauguin An opportunity to sit down with members of the Globus team and discuss any questions that you may have for maximizing the value of Globus to your institution. |
9:00—10:45 |
Led by: Globus Product Team
| slides | video Globus Connect Server v5 (GCSv5) introduced a new architecture and deployment model for the software that enables access to storage systems via the Globus service. This session is designed to prepare you for migrating your production Globus environment. We will start with an overview of the GCSv5 architecture, installation and configuration. We will review the key changes that system administrators should be aware of and will discuss the process for migrating from prior versions of Globus Connect Server to GCSv5. The GCS development team and support staff will help you develop a migration plan and lay the technical foundation to ensure your migration minimizes user and service disruptions. |
10:45—11:00 |
Justin James, Senior Applications Engineer, iRODS Consortium (RENCI/UNC) slides | video The Integrated Rule Oriented Data System (iRODS) is open source software used worldwide to help organizations manage large and complex data sets. With the iRODS Globus Connector, an iRODS zone appears as another storage device to Globus. Using this connector, organizations can apply policy to their data while also benefiting from the easy authentication and efficient and reliable data transfer provided by Globus. The presentation will start with a brief introduction to the core components of iRODS, the policy engine, the user-defined metadata, and the virtual filesystem. I will talk about the design of the iRODS Globus Connector, how it implements the Globus DSI interface and acts as a translation between the DSI and the iRODS communication protocols. Finally, I will discuss improvements to the connector over the last year. Performance of file transfers has been significantly improved by adding multithreaded support. The connector has also been enhanced to support more hash/checksum protocols, allowing users to specify the desired protocol. Checksums for each file are now stored in iRODS metadata and recalculated only when the file has been modified. Other improvements include status updates for long running checksums, support for partial directory listings, and support for the realpath feature. |
11:00—11:15 | break Van Gogh |
11:15—12:00 |
Led by: Rachana Ananthakrishnan, Executive Director, Globus slides | video This is an introductory session for those planning to use Globus services in their applications. We will describe the foundational elements of the Globus platform and demonstrate how to work with Globus APIs and the Globus Python SDK. This session is intended to provide developers new to Globus with a starting point for working with the various platform services such as Globus Auth and Globus Transfer. |
12:00—12:15 |
Joe Stubbs, Manager, Cloud & Interactive Computing, Texas Advanced Computing Center slides | video Tapis is a cloud-based API framework for research computing, providing users with programmable access to advanced storage and computing cyberinfrastructure. With Tapis, users can automate data management and code execution pipelines on remote HPC, cloud and high-throughput servers, and they can securely share parts or all of their workflows with others in their community. The Tapis development team has recently begun an effort to integrate aspects of the Globus platform, including Globus Auth and Globus Transfer. In this talk, we will give a quick introduction to Tapis and then discuss our approach to integrating Globus functionality into our API. |
12:15—13:15 | lunch Van Gogh |
13:15—13:30 |
Claire Rye, Product Manager Data Services, New Zealand eScience Infrastructure slides | video Aotearoa New Zealand is a small island nation, geographically isolated from the rest of the world with a national population smaller than that of many major conurbations elsewhere in the world. And yet, Aotearoa also boasts rich ecosystems across everything from fauna and flora, to people and culture, to scientific research goals. At New Zealand eScience Infrastructure (NeSI), our role as a research computing and data platform is to enable research across a broad and inclusive array of research activities. The sheer diversity of needs is driving us to invest in a much wider set of services than ever before. We took a significant step towards this a few years ago by adopting Globus as our de facto national data transfer platform for research - we’ve invested to build and sustain capability to operate Globus as a national service provider, and to drive adoption widely across the research system, including our nearest neighbours, Australia. Our goal has been to lower barriers, to normalise expectations of moving demanding volumes of data to enable data intensive science. We will discuss the changes in data lifecycle management protocols that have been required to keep pace with the significant changes to and diversification of Aotearoa's research landscape, notably the requirements of maintaining Māori data sovereignty in such a connected environment. We will discuss the resultant challenges and how we are addressing them, illustrating our approach and progress using case studies from building the Aotearoa Genomic Data Repository (https://data.agdr.org.nz/) and our projects to automate data transfer and workflows on NeSI’s High Performance Computing (HPC) facilities. |
13:30—15:00 |
Led by: Vas Vasiliadis, Chief Customer Officer, Globus slides | video Managing data at scale—for example, data generated by instruments such as cryo electron microscopes and next-gen sequencers—requires automation. We will describe how the Globus platform facilitates the construction, deployment, execution, and monitoring of automated data management tasks. Attendees will experiment with the Globus Flows service to move data, and share outputs with collaborators. This is a hands-on session intended for system administrators and IT staff that support instrument facilities and projects that deal with large volumes of data. Attendees will benefit most from in-person attendance. |
15:00—15:30 | break Van Gogh |
15:30—17:00 |
Led by: Globus Professional Services Team
| slides | video We will present an approach for making data more easily findable, accessible and reusable, using the open source Django Globus Portal Framework that integrates multiple services such as Globus Auth and Globus Search. Attendees will learn how to rapidly deploy a data portal that enables enhanced data discovery and provides a gateway to further analyze and distribute data products. This is a hands-on session intended for research software engineers and those developing customized data management solutions. Attendees will benefit most from in-person attendance. |
17:00 | conference adjourns |
Thursday, May 12, 2022 Sessions will be held in the Van Gogh Room | |
---|---|
8:00—9:00 | continental breakfast Van Gogh |
09:00—12:00 |
The Customer Forum is an opportunity for Globus subscribers to discuss their experiences with the service, to learn about our product development plans, and to provide input on future product directions. Attendance at the customer forum is by invitation only. If you would like to represent your institution/community please contact us for an invitation. |
12:00—13:00 | lunch Van Gogh |