Training
Training Calendar
Training Calendar
RCS Live Demo: Optical Character Recognition (OCR)
- 06 OCT 2021 3:00 PM - 4:00 PM |
- Baker Library B82 OR Online via Zoom
RCS Statistician/Data Scientist Ista Zahn will present an overview of using Tesseract for Optical Character Recognition (OCR):
Modern text mining and analysis techniques can help you gain insight from unstructured textual data, such as hand-written notes, historical documents, letters, store receipts, and more. If your source material is an image that your computer does not (yet) recognize as text, you will need to convert it to machine-readable text using Optical Character Recognition (OCR). OCR is frequently used on documents created before the digital revolution and in other situations where only a picture of the text is available.
This workshop will introduce you to Tesseract, the most popular open-source OCR program. Whether you prefer to run it from a point-and-click desktop application, by scripting it from the command line, or by using the convenient R or Python APIs, this workshop will show you how to convert your images to text suitable for analysis.
This workshop is offered online via Zoom* and in person with limited seating. Please contact us at research@hbs.edu if you wish to attend in person; slots will be awarded on a first-come-first-served basis.
* HBS affiliates can access Zoom meeting information on our website; non-HBS Harvard affiliates may contact us for meeting details at research@hbs.edu.
RCS Live Demo and Office Hours
- 08 JUN 2021 11:00 AM - 1:00 PM |
- Online via Zoom
On the first Tuesday of the month, RCS presents a live demo. Drop in to watch an RCS staff member or guest demonstrate a technique, share a tip, or work through a problem! Please note that this month we are requiring participants to register in advance at the above link.
This month's topic is Large Data in Python and will be presented by guest speaker Mahmood Mohammadi Shad, Associate Director of Research Software Engineering at the FAS Research Computing department:
Python is a versatile and powerful general-purpose programming language, and people with a broad spectrum of skills use it for their research and work. Handling data in Python is sometimes challenging in terms of working with large datasets and efficient parsing of data files. In this course, we'll go deep in Python and best practices when working with datasets and especially tools for handling large datasets. We cover intermediate to advanced topics from data structures to working efficiently with data files.
RCS Live Demo and Office Hours
- 04 MAY 2021 11:00 AM - 1:00 PM |
- Online via Zoom
On the first Tuesday of the month, RCS presents a live demo. Drop in to watch an RCS staff member or guest demonstrate a technique, share a tip, or work through a problem! Please note that this month we are requiring participants to register in advance at the above link.
This month's topic is Large Data in R and will be presented by Ben Sabath, Former Research Software Engineer with FAS Research Computing:
R is a statistical programming language commonly used in many different academic disciplines, including the hard and social sciences. The open-source community has developed numerous packages for the language, enabling users to easily implement statistical methods that would require significant development in other languages. However, R has some performance limitations, especially when working with data that struggles to fit in memory. In this workshop, we will explore techniques (such as streaming and sharding data) and tools (the data.table package) for working with data that approaches or exceeds computer memory limits.
RCS Live Demo and Office Hours
- 06 APR 2021 2:00 PM - 3:00 PM |
- Online via Zoom
On the first Tuesday of the month, RCS presents a live demo. Drop in to watch an RCS staff member demonstrate a technique, share a tip, or work through a problem! We encourage attendees to stay after the presentation to ask any and all questions related to research computing. No registration is required.
This month's topic is collaboration on the HBSGrid:
"Computing" can be easy when working alone—you know how to run programs, set preferences, where to place files, etc. But what if you're working as a part project team? Your personal preferences, usually stored in the home folder, might "conflict" with the preferences needed for the team. And does everyone on the team agree on what folders to use, what programs to use, where to store code, and what versions of packages and libraries to use?
Come to an open discussion hosted by RCS on how you have been successful (or not) in working on a project team! All are welcome in this guided and safe dialogue. We hope, through community discussion, to provide a set of recommended practices that we can offer to the wider research community.
RCS Live Demo and Office Hours
- 02 MAR 2021 2:00 PM - 3:00 PM |
- Online via Zoom
On the first Tuesday of the month, RCS presents a live demo. Drop in to watch an RCS staff member demonstrate a technique, share a tip, or work through a problem! We encourage attendees to stay after the presentation to ask any and all questions related to research computing. No registration is required.
This month's topic is interactive data visualization with Python:
Learn to quickly and effectively investigate data with interactive visualizations! In this session, we will explore a data set by developing an interactive visualization notebook using Plotly Express and Jupyter Widgets. We will elevate simple visualizations by gradually adding more interactivity to ultimately produce data exploration tools that enable information-rich views of multivariate relationships.
RCS Live Demo and Office Hours
- 02 FEB 2021 2:00 PM - 3:00 PM |
- Online via Zoom
On the first Tuesday of the month, RCS presents a live demo. Drop in to watch an RCS staff member demonstrate a technique, share a tip, or work through a problem! We encourage attendees to stay after the presentation to ask any and all questions related to research computing. No registration is required.
This month, Ista Zahn will talk about using the containerization tool Singularity to install and configure software:
Researchers are blessed with amazing statistical and analysis tools, many open-source and available for free. Unfortunately these tools are not always easy to install or configure, and some may not be designed to run on the systems we have on the HBS Grid. This presents challenges for researchers wishing to use cutting-edge tools and models, especially in an academic environment in which reproducibility is expected.
There are several tools that can help you install complicated software stacks in ways that are reproducible and portable; in this session we will look at a popular containerization tool called Singularity. Singularity allows you to install and run the whole Linux userspace inside a container, ensuring that your software always runs in a consistent and reproducible environment. Singularity is especially useful for running software with complex dependencies and software which is unsupported on the host Linux OS. We will start with some practical examples demonstrating how to install and use the torch package in R and the OCRmyPDF tool in Python, and move from there to examples showing how to build your own Singularity containers.
RCS Live Demo and Office Hours
- 08 DEC 2020 2:00 PM - 3:00 PM |
- Online via Zoom
On the first Tuesday of the month, RCS presents a live demo. Drop in to watch an RCS staff member demonstrate a technique, share a tip, or work through a problem! (Please note that due to the Thanksgiving and winter breaks, we have postponed this live demo by one week.) We encourage attendees to stay after the presentation to ask any and all questions related to research computing. No registration is required.
This month, Ista Zahn will present a demo on transferring data to and from the HBSGrid:
As the data we work with get bigger and bigger, simple tasks like sharing with collaborators or creating cloud storage archives can get more complicated. In this session we will look at some tools and techniques for moving data sets to and from Cloud and HBSGrid storage. We will also look in some detail at the different storage options on the HBSGrid. These techniques will be useful for anyone working on the HBSGrid, but especially for those working with larger data sets that can take a long time to move around.
RCS Live Demo and Office Hours
- 03 NOV 2020 2:00 PM - 3:00 PM |
- Online via Zoom
On the first Tuesday of the month, RCS presents a live demo. Drop in to watch an RCS staff member demonstrate a technique, share a tip, or work through a problem! (Please note that to avoid a conflict with Election Day, we have postponed this live demo by one week.) We encourage attendees to stay after the presentation to ask any and all questions related to research computing. No registration is required.
This month, Victoria Prince will present an introduction to Neural Networks in Python:
As a subset of AI, deep learning lies at the heart of various innovations such as self-driving cars, natural language processing, and image recognition. Artificial Neural Networks have gained much popularity in recent years as primary deep learning tools. In this RCS live demo, we will review the main components of ANNs and demonstrate how networks can be built and trained in Python using keras
library.
Configuring Your NoMachine and HBSGrid Environment
- 06 OCT 2020 12:00 AM - 11:30 PM
From logging in to running applications to working with files and storage, using the HBSGrid is akin to having another desktop or laptop computer: working there can either be a breeze or a major time sink. So why not customize it to make your work more frictionless?
We'll walk through a number of customization and configuration changes one can make to multiple touch points (NoMachine, Gnome, Nautilus, shell) that will save your time and sanity.
We are working on making this session recording more widely accessible here, but in the meantime, members of the HBS community can access the captioned video here.
RCS Live Demo and Office Hours
- 01 SEP 2020 2:00 PM - 3:00 PM |
- Online via Zoom
On the first Tuesday of the month, RCS presents a live demo. Drop in to watch an RCS staff member demonstrate a technique, share a tip, or work through a problem! We encourage attendees to stay after the presentation to ask any and all questions related to research computing. No registration is required.
This month, Elizabeth Piette will present an introduction to Topic Modeling:
In this session we will learn some of the basics of topic modeling to uncover meaning in unstructured texts. We will use Python to process text data, construct features, train a model, visualize the results, and assess model performance. For continuity, we will be using the same data set as Christine's prior introduction to natural language processing. Members of the Harvard community can access this data set on Harvard's GitHub here.
RCS Live Demo and Office Hours
- 04 AUG 2020 2:00 PM - 3:00 PM |
- Online via Zoom
WATCH Introduction to Natural Language Processing (NLP) Demo
On the first Tuesday of the month, RCS presents a live demo. Drop in to watch an RCS staff member demonstrate a technique, share a tip, or work through a problem! We encourage attendees to stay after the presentation to ask any and all questions related to research computing. No registration is required.
This month, Christine Rivera will present an introduction to Natural Language Processing:
Welcome to Natural Language Processing! In this beginner's demo, we will use Python to walk through some basic NLP steps and demonstrate common techniques for gaining insight into text data. Using Amazon product reviews as our sample data, we will begin with some basic data cleaning, followed by tokenization and the creation of some simple graphs by counting words and tokens. We will then generate a simple word cloud. Finally, we will conduct sentiment analysis using Vader. Members of the Harvard community can access this data set on Harvard's GitHub here.
Software Modules on the HBSGrid
- 21 JUL 2020 1:00 PM - 2:00 PM |
- Online via Zoom
Software modules have arrived on the HBSGrid! The modules provide users with flexible access to multiple versions of software (e.g., Python 2.7, Python 3.6, etc.), software in home folders and project folders, and settings defaults for specific projects. One no longer has to use only the software version that was previously installed on the HBSGrid. Bob Freeman presented this special live demo on how to start using these modules, how these will work in NoMachine, and best practices around their use.