This guide provides information and resources to prepare for the GC Professional Data Engineer certification. It will give you advice based on the experience of other certified data engineers and architects.
Who this Certification is for?
This certification and the examination to obtain it are designed for data engineers with at least three years of experience using GC technologies for massive information processing, both in real time and in batch. Candidates must also be able to apply best practices in each technology and know how they are connected to each other for scalable solutions.
This certification is recommended if you are a data engineer with experience executing business intelligence, machine learning, or big data projects with GC and you know the concepts behind technologies like HDFS, Spark, Kafka, Hadoop, Streaming, etc.
What this Certification is for?
This certification is useful for many reasons, including:
- Demonstrating your GC capabilities to solve real problems concerning massive data processing
- Reinforcing your knowledge of certain aspects behind each technology
- Understanding the path that the industry will follow in the coming years and how to focus your career
- Increasing your recognition in the industry as you search for new job opportunities due to the high demand for professionals in this field and the value of being certified for the market
For this certification, you only need to take the Professional Data Engineer exam. This exam assesses your ability to design data products, put them into production, and monitor and protect them to ensure scalable performance for high-performance solutions. In addition, it also measures your ability to design machine learning models using the range of options available in GC, including the use of pre-existing models (such as Vision or Speech API).
Your preparation for the exam should look like this:
- Analyze and study the contents of the Pluralsight and GC courses.
- After studying one of the technologies, practice with your own GC account by creating and deleting resources and connecting them with other GC services, so that you discover the steps and the configurations that you will have to do. Practice on this exam is essential, as many questions are understood only through experience.
- If you can, you should take a practice exam. GC offers one here.
- Implement a complete data solution in your GC account, working with both real-time and batch data.
You can take this certification exam directly without taking any any others, but do not take the exam if you have never had the opportunity to work with GC. You should also study quite a few topics that you may never have had to develop, but that are still part of the technologies and standards.
Because this certification focuses very heavily on technical details, you should have at least five years of work experience with data solutions (ETL, machine learning, etc.), and two to three years of experience implementing projects with the GC technologies that you’ll be tested on in this exam.
The skills that will be measured during the exam may vary, but can consist of:
- Understanding business requirements for implementing solutions
- Collecting data in batch, real time, and near-real time processes
- Online and batch predictions in machine learning
- Data processing, both at the file and database levels (SQL and NoSQL)
- Creating conversational experiences for users with machine learning
- Information analysis and visualization
- Implementing pre-built machine learning models and custom models
- Applying security, encryption, and management to each part of processes
- Notions of infrastructure, especially in a hybrid environment
- Monitoring every process and data movement
In other words, you must understand and be able to practice throughout the life cycle of a data project.
The technologies that you must understand and use to successfully complete the exam are:
- Cloud Dataproc
- Apache Beam
- Apache Spark
- Hadoop (and its ecosystem)
- Cloud Pub/Sub
- Data Transfer Service
- Transfer Appliance
- Cloud Networking
- Cloud Bigtable
- Cloud Spanner
- Cloud SQL
- Cloud Storage
- Cloud Datastore
- Cloud Memorystore
- Cloud Dataprep
- ML APIs (such as Vision, Speech)
- AutoML Vision
- Auto ML text
- Other ML technologies, such as Cloud Machine Learning Engine, BigQuery ML, Kubeflow, Spark ML.
- Cloud IAM
- Data Loss Prevention API
- Key management and encryption
As you can see, there are many technologies you’ll need to understand, so it could also be considered a prerequisite to have experience with these technologies and how they connect with each other.
GC has a platform called Qwiklabs that allows you to practice in a guided way, which can help you get to know the technologies in a practical way, but it is not a substitute for true practical experience. You can find these exercises here.
Compensation and Employment Outlook
The benefits of obtaining this certification include:
Being able to participate in projects of high technical complexity related to issues of massive data processing
Recognition from the industry and your colleagues
- Direct benefits from GC, such as being registered in the Google certificates directory where anyone can find you, badges to share with the community, and other recognitions and discounts
- Qualification for better jobs—the certification is valid anywhere in the world
According to Paysa, the annual salary of someone with the GC Professional Data Engineer certificate on average reaches US$ 152,428, or more if you have additional experience and certifications.