Handshake’s Tech Stack

Scott Ringwelski

February 18, 2021

Handshake co-founder and engineer Scott Ringwelski dives into how our platform's tech stack has evolved over the years.

Background

Handshake’s mission is to ensure that all students, from all backgrounds, have access to meaningful careers. Since I helped found the company back in 2014, I’ve seen the platform grow dramatically year over year. Today, Handshake is used by 7 million students, over 1,100 higher education institutions, and 500,000 companies to connect, build relationships, and find jobs.

Team

With such rapid growth has come a fast pace of continual innovation and change within the company and our technology stack. Although much of our core technology stack has stayed consistent over the years, we’ve adapted as needed—including adding and removing parts of our stack entirely. This is only made possible by a great team and I am incredibly grateful for the amazing engineering team at Handshake.

A small group of our 70+ engineers during ENG Week 2020.

Initial Stack

To set the stage, let’s first outline our initial stack. Handshake started as a Ruby on Rails monolith hosted on Heroku. In the early days the stack was very simple: Heroku, PostgreSQL, Rails (using Unicorn) and background job processing (using Sidekiq). We mostly did server rendering with a sprinkle of Coffeescript and Knockout.js.

Our Rails Monolith continues to live on to this day, and has delivered on our capacity and performance requirements (with plenty of work!). However, our tech stack today is much broader.

Current Stack

Starting from the bottom going up, here is Handshake’s tech stack as of January 2021!

Infrastructure

Handshake has been cloud-native since the beginning, and we run most of our cloud workloads on Google Cloud Platform. We moved off of Heroku in 2018 to Google Cloud Platform in order to have more control over our infrastructure, reduce costs, and gain access to more of GCP’s full product suite.

On GCP we leverage many different infrastructure products such as Kubernetes Engine, Cloud SQL, Memorystore, Pub/Sub and Cloud Load Balancing. Kubernetes Engine in particular is a key part of our web services infrastructure and we have built in-house tooling for easily managing our Kubernetes workloads. GKE handles the complex work of managing the Kubernetes cluster itself which allows for us to focus on higher level concerns.

Starting in 2020 we have been bringing our infrastructure under management by Terraform. This has had many benefits for us. It has enabled for any engineer to propose changes to our infrastructure by submitting a Pull Request. This means that teams can have higher autonomy and more familiarity with our infrastructure. We also now have a clear audit trail of infrastructure changes that have been made, complete with discussions and descriptions of the change. We can also ensure staging-production parity by using the same Terraform configuration in both environments, which proved especially useful during our UK launch.

We use Ambassador for our load balancing. Ambassador is built on top of Envoy Proxy and has enabled us to easily implement features such as API Gateway Authentication tier and traffic splitting.

Databases

PostgreSQL is the database of choice at Handshake. Our primary database is a PostgreSQL instance with read replicas to scale out reads. We built and published an open source Ruby Gem named Knockoff for querying our read replicas from our Rails application (this gem is from before this feature was built-in to Rails).

Over the years we have learned some of the quirks of PostgreSQL and built guardrails to guide engineers. For example, we have schema migration helper libraries for adding a new column with a default which in earlier versions of PostgreSQL could easily cause downtime when running on larger tables.

Elasticsearch is a key technology at Handshake and has enabled many of our more advanced search and index API endpoints. We use Elastic Cloud to run our clusters and have multiple Elasticsearch clusters to serve different use cases with different computational requirements. For example, we have an Elasticsearch cluster for powering features that track interactions between employers and students. This cluster needs to be able to index at a high rate and serve low-latency query results. In another cluster we power student job searches, which focuses more on relevancy with machine learning with sophisticated query params to return relevant search results.

For caching and for running Sidekiq (background jobs) we use Redis.

Data platform

Our data platform has evolved the most dramatically of any area of our stack over the past couple of years. Data is a critical enabler for our business and product and is a large investment area in 2021 and beyond.

Handshake’s data infrastructure and platform runs in Google Cloud Platform along with our web services infrastructure. We use Dataproc for running Pyspark jobs. These jobs are batch pipelines and range from basic ETL to machine learning to internal BI use cases. For our machine learning jobs we use Spark ML as well as the Google AI Platform product suite.

We use Cloud Composer to run an Apache Airflow instance to schedule these batch pipelines. For some of our ETL we prefer to use Fivetran which handles the complexities of interaction with various vendor APIs.

We also run some real-time pipelines using Dataflow runner for Apache Beam. Real-time pipelines are used primarily for ingesting events and storing them in our Data Lake and Data Warehouse. We have found Apache Beam to be fantastic for real-time use cases given its performance and scalability characteristics. As an example, one of our highest throughput real-time pipelines is used to ingest analytics from Segment, which we use for understanding how users are using our product.

Our Data Warehouse is consolidating onto BigQuery. Our team has found BigQuery to be fast, easy to maintain, and feature-rich. We maintain both raw data tables as well as higher-level cleaned up tables in BigQuery for many different use cases. To enable employees to query the data, we use Looker.

CI/CD

At Handshake we care deeply about the Developer Experience, and CI/CD is an important part of that.

All of our codebases have a test suite and those test suites are run on every push using Continuous Integration systems such as Buildkite, Github Actions, and Cloud Build. We track our CI failure rate and build time metrics closely. We also track and act on slow or brittle tests using a purpose-built internal tool.

Services at Handshake are all deployed using a standardised process and using Continuous Deployment. Our deployment system, aptly named “Deployer,” is built using the open source Shipit-engine.

Github sits at the centre of many of these systems. We use a Pull Request model for proposing and merging changes.

Observability

We believe that engineers should have ownership of the features that they build release in production, and great observability is an important requirement for that.

Handshake services are fully monitored using Datadog. We use Datadog’s APM, Custom Metrics, Infrastructure, and Monitors features as well as multiple integrations. These provide a centralised place for engineers to understand how their features and systems are behaving in production and have insights of any issues that need to be addressed.

We also use Bugsnag for error tracking and Google Cloud Logging for logging. We run a robust on-call program and PagerDuty is used for alerting and paging the on-call engineers.

As of 2021 we are using Terraform for managing PagerDuty and Datadog resources.

Backend services

The original Rails application that Handshake was started on is still going strong today, and is the gravitational center of our web services. The Handshake monolith serves over 90% of our functionality and is a Ruby on Rails application running Puma web server and Sidekiq background job processing. New services are often built on Ruby on Rails given our deep internal knowledge of building and deploying Rails applications.

When performance is especially critical, we have recently found success using Golang. Golang’s strong concurrency and performance characteristics have worked especially well for us with relatively simple but high throughput services. It is also delightful to work with as an engineer; builds and test suites take only seconds to run and the language is simple and opinionated. Deploys are also super fast thanks to the small production binary / docker image, which is great for our Kubernetes infrastructure stack.

Overall we have found Ruby on Rails and Golang to complement each other wonderfully for different use cases.

Mobile

Handshake maintains an iOS and Android application for our student users.

Our iOS codebase is written in Swift. The interface is primarily built using UIKit, with SwiftUI + Combine gaining traction within the team. We also use Realm for our persistence layer. Our Android app is built using Kotlin.
Our mobile applications have a robust test suite with both unit and UI tests that run on  CircleCI. We use TestFlight to automatically create internal builds that the team can use to test code changes on a device. We then deploy new releases through an automated pipeline using Fastlane.

Frontend

Our users require an application that feels native, yet can support complex workflows. Additionally, as developers, we need to be equipped with the best tools to effectively deliver these modern experiences. Our frontend stack has evolved considerably over the past six years to meet these needs, and we are proud of the frontend platform that we have in place.

Typescript is our primary frontend language. Although it can be verbose, it has offered us numerous advantages. First and foremost, it gives our frontend code a significant level of safety and bug protection. We also believe that it will better allow us to scale our growing team size.

Our user interface is built using React and Redux. We chose React because it is lightweight, flexible, and performant, but also because of the wonderful community that supports it. React is also composable, which has enabled us to build a component library that engineers use to easily and consistently build features.

As of 2021 we are serving production API endpoints using GraphQL. GraphQL has enabled each of our frontend clients to fetch and structure data more efficiently. Its structured and typed schema works hand in hand with TypeScript and the team is excited by the potential benefits of “full stack” type safety with Typescript and GraphQL combined.

For asset compilation we are leveraging Webpack. When deploying our frontend assets to production we use a Cloud Build pipeline to build the assets and Google Cloud CDN for serving them globally with low latency.

All of these technologies pair well together, have strong communities and robust libraries that allow our team to focus on delivering value to our users.

Technology change process

In the early stages of Handshake the technology change process was informal. This was not scalable as our team began to grow; in an informal environment, new engineers didn’t know who to propose changes to, or what technologies are approved.

We have now implemented sustainable processes for introducing new technologies. The goal of our formalized process is to enable all engineers to drive change and innovation, while also ensuring we do not have “technology sprawl” across our systems with a long list of languages and systems that engineers need to learn.

We also have clearly documented which technologies we have Preferred, Approved, Deprecated or even Rejected. As an example, for iOS development we Prefer Swift and have Deprecated Objective-c. This helps the engineering team stay up to date with technology decisions and new engineers onboard.

Conclusion

I hope that this post has been an insightful glimpse into how Handshake runs, and what sort of technologies Handshake Engineers work with every day. As our business and product requirements evolve, so will our tech stack. I am excited to have our technology change process in place to enable all Handshakers to propose and drive changes to our tech stack.

Share