Note: Most of these blogs are for my personal reference and at a given time, some of those might just be unpolished drafts.

Using gcsfuse to mount GCS bucket

Migration to Cloud

As your applications scale it becoomes tedious, expensive and error prone to have it all on your data centers. Modern microservices are built with the philosophy of isolation. Scaling up and down of those services should happen with a snap of a finger (enter containerization technologies like dockers replacing traditional VMs), you scale up to serve bigger traffic and also to make it fault tolerant. But even if you have 20 docker pods, a failure in data center as simple as switch failure can bring your whole infrastructure to its knees. Thus the adoption of cloud. (There are numerous other reasons as well.)

In cloud everything becomes a service. Computation is a service, storage is a service and so on. While migrating traditional applications to cloud we need to gracefully preserve the integrity of the applications’ storage, compute and other components.

Google Cloud Storage

GCS is the storage ‘cloud service’ provided by google. Unlike traditional NFS you organize your data into buckets (logical entities) and the ‘cloud’ handles all other underlying infrastructure. Your application can read from or write to buckets using standard APIs. For an application which heavily depends on file read/write this can quickly become tedious. Also when you are moving to cloud from traditional environments there is a high chance that there are numerous applications which share the common storage and their subsequent operations depend on that.

GCS as Traditional NFS

What if we could have those storage buckets abstracted as traditional NFS? Applications would continue reading/writing with a now found easy abstraction that it already knows. Well, there’s a tool called gcsfuse which does exactly this.

In this article I will quickly create a sample storage bucket, add some files to it, mount this bucket on my local linux machine and open those files (read/write). For my local machine it wouldn’t make a difference if they were hosted on some faraway zone.

Creating a gcp project with gcs and storage buckets

  • Loginto console.cloud.google.com and create a project

  • Create bucket under the tab ‘storage’ (That’s google cloud storage) [Note down the id as this will be useful later on]

  • Create some folders/files in it

gcs-bucket

Local gcloud setup

gcloud is a CLI based tool to interact with the google cloud services. Practically in most cases you end up using GUI/web tools to interact but for the purpose of this demo, I am using gcloud CLI tool.

  • Install gcloud Installation guide

  • Authenticate

    gcloud auth application-default login

    This launches your default browser to authenticate. On successful authentication credentials are stored at:

    ~/.config/gcloud/application_default_credentials.json

    You can get your proper credential file and store in this location if you are in a non GUI environment

**Installation of gcsfuse**

export GCSFUSE_REPO=gcsfuse-`lsb_release -c -s`
echo "deb http://packages.cloud.google.com/apt $GCSFUSE_REPO main" | sudo tee /etc/apt/sources.list.d/gcsfuse.list
curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

sudo apt-get update
sudo apt-get install gcsfuse

**mounting**

  • Create a folder where you want to mount and cd to it

  • Run gcsfuse --implicit-dirs dentai-bucket1 . (dentai-bucket1 is my bucket id)

  • If your gcloud authentication was successful and bucket id is proper, you should see something like:

Using mount point: /home/sudipbhandari/sudip/vats
Opening GCS connection...
Opening bucket...
Mounting file system...
File system has been successfully mounted.
  • If you ls you won’t actually see anthing at this moment even if you have some stuff in your bucket because the host system yet doesn’t have those objects. However, if you just create the directory (mkdir) and cd into it, it will actually sync with your bucket and the contents start to appear. –implicit-dirs

  • Empty ls output

~/sudip/vats » ls                                                                                   sudipbhandari@sudipbhandari-Latitude-5480
---------------
  • Creation of images folder (this actually exists in my bucket with real contents but locally I am just creating the directory and nothing more)
~/sudip/vats » mkdir images                                                                         sudipbhandari@sudipbhandari-Latitude-5480
----------------------------------------------------------------------------------------------------------------------------------------------
~/sudip/vats » cd images                                                                            sudipbhandari@sudipbhandari-Latitude-5480
----------------------------------------------------------------------------------------------------------------------------------------------
~/sudip/vats/images » ls                                                                            sudipbhandari@sudipbhandari-Latitude-5480
sanjayyriit  yella016

Now gcsfuse has abstracted the remote gcs bucket as if it was my local NFS point. Neat!!

  • I can now write to this directory and copy files. These will be synced to my bucket on the cloud.
~/sudip/vats » cp ~/Downloads/startupofyou.mobi .

On the cloud

gcs-bucket-updated

Performance and other considerations

Writing files to and reading files from GCS has a much higher latency than using a local file system. If you are reading or writing one small file at a time, this may cause you to achieve a low throughput to or from GCS. If you want high throughput, you will need to either use larger files to smooth across latency hiccups or read/write multiple files at a time.

More On performance

Written on February 6, 2020