Recently I've been working on creating a self-hosted version of a closed-source web application. I've tried to obfuscate some of the original source code when packaging up the application for usage in a self-hosted environment to protect intellectual property (i.e. don't want others to clone our product like Microsoft is cloning Notion).
This article will talk about self-hosted apps, how self-hosted apps are typically run, and the steps that can be taken to obfuscate source code.
What is a self-hosted app?
A self-hosted application is an application that an end-user can host for themselves on their own infrastructure. Many companies want to be able to do this so they can prevent having to send any data to servers they don't control, allowing them to better control their data to mitigate against data leaks. Many apps offer this option, such as Sentry, Retool, and Gitlab. It's usually part of an "Enterprise" level plan.
How do end-users run a self-hosted app?
The self-hosted apps I've seen are all available via Docker images that end users can run. Often multiple docker images need to run since the self-hosted app relies on many services, such as the app's server, an nginx server that serves static files, a message broker like rabbitmq or Redis, etc.
There will usually be instructions on how to run all the services together. Setup options may include:
- Running the app using docker-compose
- Running the app using Kubernetes
- Running the app with minimal setup on a platform like Heroku or Render
What does it mean to "obfuscate" code
Obfuscating code means that the code is transformed in a way that makes it difficult to read and reverse engineer. This could mean changing the names of variables, compiling the code to a different language (like bytecode), or several other code obfuscation techniques.
Why would you want to obfuscate your code
You might want to obfuscate your code to prevent others from copying your app features or prevent exposing potential vulnerabilities in your code. For completely open-source apps, like Sentry, this is pointless since all their original source code is readily available to anyone.
Sentry source code I wrote a post about some of the interesting parts I discovered from the Sentry source code: https://robertcooper.me/post/goodies-from-sentry-source-code. Also, it's worth reading some of the reasons why Sentry decided to go open source: https://open.sentry.io/benefits/
How to self-host a web app
The basic steps you'll want to follow to create a self-hosted web app are the following:
1. Create and/or identify the docker images needed
You'll have a docker image that you need to create for your app's back-end server. Then you might also have another server used to serve your static front-end code, such as an nginx server. It's possible to have your back-end server serve your front-end code, but a dedicated server like nginx is likely a more performant option.
Apart from your back-end and front-end images, you may also have other images for the other services your app depends on. For example, you could set up a PostgreSQL database in docker that is used for the back-end server's data. In fact, this is what Retool does, although they also give you the option to use an externally hosted database.
If your app is using multiple docker containers, it would be helpful to have a docker-compose.yml file that describes each service and how they are connected with one another.
2. Ensure only the required code is included in the docker images and obfuscate source code that should be "protected"
You shouldn't include any files in your docker images that are not required for the app to run. Original source code that you do not wish to show to your users should be obfuscated. Details on how to obfuscate code are included further down in the post.
3. Have the end-user decide where they want to host the app and run the app
The end-user might have a preference for where their data is hosted. For example, a company may always have all their applications hosted on Azure and so would like to continue hosting on that platform. This is especially true for companies that have most of their internal applications accessible via a Virtual Private Network (VPN). Once the host has been selected, running the docker containers can be done using the docker cli, docker-compose, Kubernetes, or some mechanism.
Obfuscating front-end code
Personally, I don't think it's worth obfuscating your front-end source code since it makes your code slower to run and increases the file size, which hurts the end-user's experience. I would simply minify source code and remove source maps. Minified source code with no source maps will help obfuscate your code and make it difficult for people to reverse-engineer.
Obfuscating your back-end code
Obfuscating your back-end code is arguably more important than your front-end code. Back-end code will likely hold most of your application's business logic and also be the place where the most damage can be done if there is a security vulnerability.
What you can do is compile your back-end server code into bytecode and run your back-end via a binary executable. If your server is a Node.js server, there is a library called pkg that can do this for you. Reverse engineering bytecode isn't impossible, but it's quite difficult and time-consuming from what I've read.
Obfuscating your database schema and migrations
If you're allowing users to self-host your app, it means that they will have control of the database. That means that if they want to look inside the database, they can, so there is no point in hiding or obfuscating code related to your database. This includes the database migration files that are used to set up a database with the correct schema.
That being said, you might still want to obfuscate your database migrations to prevent anyone that isn't authorized to run your self-hosted app from being able to analyze your database schema. If that's the case, you could obfuscate your migration code by compiling it to an executable binary which will only be able to run the migrations if a valid license key is provided.
The reason why you might want to hide your database schema is to prevent people from trying to figure out potential vulnerabilities. It's a mechanism of security through obscurity, which is not something you should solely rely on for the security of your app.
Is it worth going through all this work to obscure your source code? Personally, I'm not sure. I quite like how Sentry is open source and I like the reasons they've listed for adopting an open-source model. However, if your application deals with a lot of sensitive data, such as a banking or healthcare application, perhaps obscuring source code is worth the extra effort.