Chapter 2
Designing for Business
Requirements
The Professional Cloud Architect Certification Exam objectives covered in this chapter include the following:
- ✓ 1.1 Designing a solution infrastructure that meets business requirements
- ✓ 1.2 Designing a solution infrastructure that meets technical requirements
- ✓ 1.5 Envisioning future solution improvements
One of the things that distinguishes a cloud architect from a cloud engineer is the architect’s need to work with business requirements. Architects work with colleagues responsible for business strategy, planning, development, and operations. Before architects can begin to design solutions, they need to understand their organization’s cloud vision and objectives that the business is trying to achieve. These objectives frame the possible solutions that are acceptable to the business.
When it comes to the Professional Cloud Architect exam, business requirements are pieces of information that will help you determine the most appropriate answer to some of the questions. It is important to remember that questions on the exam may have more than one answer that appears correct, but only one answer is the best solution to the problem posed. For example, a question may ask about a small business application that needs to store data in a relational database, and the database will be accessed by only a few users in a single office. You could use the globally scalable relational database Cloud Spanner to solve this problem, but Cloud SQL would be a better fit to the requirements and cost less. If you come across a question that seems to have more than one answer, consider all of the technical and business requirements. You may find that more than one option meets the technical or the business requirements, but only one option meets both.
This chapter reviews several key areas where business requirements are important to understand, including the following:
- Business use cases and product strategy
- Application design and cost considerations
- Systems integration and data management
- Compliance and regulations
- Security
- Success measures
Throughout the chapter, I will reference the three case studies used in the Google Professional Cloud Architect exam.
Business Use Cases and Product Strategy
Business requirements may be high-level, broad objectives, or they may be tightly focused specifications of some aspect of a service. High-level objectives are tied to strategy, or plan, to meet some vision and objective. These statements give us clues as to what the cloud solution will look like. In fact, we can often estimate a number of technical requirements just from statements about business strategy and product offerings.
Let’s look at the three case studies and see what kinds of information can be derived to help formulate a solution.
Dress4Win Strategy
In the Dress4Win case study, the company’s objective is to “help their users organize and manage their personal wardrobes using a web app and a mobile application.” They are trying to achieve this objective by cultivating “an active social network that connects their users with designers and retailers,” which leads to revenue “through advertising, e-commerce, referrals, and a freemium app model.”
From these statements, we can derive several facts about any solution.
- This is a consumer application, so the scale of the overall problem is a function of the number of people using the service, the number of designers and retailers participating, and the number of products available.
- The solution includes web and mobile applications.
- The service acts as an intermediary connecting consumers with designers and retailers.
- The application supports advertising. It is not stated, but ads are commonly provided by a third-party service.
- The application supports e-commerce transactions, including purchases, returns, product catalogs, and referrals of customers to designers and retailers.
- The application supports both a free set of features and a premium set of features.
This list of business requirements helps us start to understand or at least estimate some likely aspects of the technical requirements. Here are some examples of technical implications that should be considered based on the facts listed previously:
- This is a consumer, e-commerce application, so it will probably need to support ACID transactions for purchases, returns, account balances, and so forth.
- Product attributes vary depending on the type of products. Shirts may have attributes such as color, size, and material, while belts have those attributes plus length. A NoSQL database, particularly a document database, which supports flexible schemas, may be a good option for a catalog database.
- The mobile app should provide for a good user experience even if connectivity is lost. Data may need to be synchronized between mobile devices and cloud databases.
- Customers, designers, and retailers will have to be authenticated. They will have access to different features of the application, so role-based access controls will be needed.
- On an e-commerce site using payment cards, payment data will need to be protected in compliance with the payment card industry and government regulations.
- E-commerce sites should be highly available in terms of both front-end and backend subsystems.
- Wardrobes are seasonal. Dress4Win may need to scale up during peak purchasing periods, such as the start of summer or winter.
- To have low-latency response times in the application, you may want to store static data, such as the product catalog, on a content delivery network (CDN).
- You should understand where customers are located and investigate any regulations that may apply in different regions. For example, if some customers are in the European Union (EU), you should plan for the GDPR requirements for consumer privacy and limitations on storing EU citizen data outside of the EU.
These are just possible requirements, but it helps to keep them in mind when analyzing a use case. This example shows how many possible requirements can be inferred from just a few statements about the business and product strategy. It also shows that architects need to draw on their knowledge of systems design to anticipate requirements that are not explicitly stated, such as the need to keep application response times low, which in turn may require the use of a CDN.
Mountkirk Games Strategy
Mountkirk Games is a company that makes “online, session-based, multiplayer games for mobile platforms,” and it is currently “building a new game, which they expect to be very popular.” The company is already using cloud computing. In the Mountkirk Games case study, we see an example of a product strategy that is focused on correcting the limitations of the current solution.
The executive statement notes, “Our last successful game did not scale well with our previous cloud provider, resulting in lower user adoption and affecting the game’s reputation. Our investors want more KPIs to evaluate the speed and stability of the game, as well as other metrics that provide deeper insight into usage patterns so that we can adapt the game to target users.”
From these statements, we can derive several facts about any solution.
- Scalability is a top concern. Since scalability is a function of the entire solution, we will need to consider how to scale application processing services, storage systems, and analytic services.
- User experience is a top concern because that affects the game’s reputation. Games are designed for mobile devices, and players should have a good gaming experience in spite of connectivity problems.
- To ensure a good gaming experience, a user’s game should not be disrupted if a Compute Engine instance fails. Load balancing of stateless services and live migration of VMs support this need.
- The company expects the game to be popular, so any solution should scale quickly. The solution will likely include infrastructure-as-code, such as instance templates.
- The game will need to collect performance metrics and store them in a way that allows for KPI reporting. Since the company plans to deploy the game backend on Google Compute Engine, Stackdriver is a good option for monitoring and logging.
- The company will analyze usage patterns. This may require collecting game state data on short intervals. The data will need to be stored in a database that can write with low latency and readily scale. This may be a use case for Bigtable, but more information is required to know for sure.
- Analysts will probably need support for exploratory data analysis and analytic processing. BigQuery would be a good option for meeting this requirement.
- Depending on the type of analysis, the company may want to use Cloud Dataproc for Spark and Hadoop-based analysis. If so, they may want to use Cloud Dataflow to preprocess data before loading it for analysis. Additional information is needed to know for sure.
Here again, we see that business and product strategy can help us understand some likely requirements. There is not enough information in the business requirements to make definitive decisions about technical solutions, but they do allow us to start formulating possible solutions.
TerramEarth Strategy
The TerramEarth business problem is summed up in the executive statement: “Our competitive advantage has always been in our manufacturing process, with our ability to build better vehicles for lower cost than our competitors. However, new products with different approaches are constantly being developed, and I’m concerned that we lack the skills to undergo the next wave of transformations in our industry. My goals are to build our skills while addressing immediate market needs through incremental innovations.”
The executive here wants to continue to compete but is concerned that the company does not have the collective skills needed to do so. The objective is to build those skills over time while releasing incremental improvements. Let’s break that down into business processes that can support both of those objectives.
It is not clear what skills are needed to continue to innovate, but any skills involving mechanical engineering and manufacturing processes are outside the scope of cloud solution design. If the skills include analytical skills to understand customer needs and equipment performance better, then they fall within the scope of this effort.
“Releasing incremental improvements” fits well with agile software development and continuous integration/continuous deployment practices. Incremental improvements could be targeted at improving the experience for the 500 dealers or at improving the quality of analysis and predictive modeling.
The business focus in this case is on responding to competitive pressures in a market where new products with different approaches are frequently released. The business strategy statement here does not provide as much information about product requirements, but we can anticipate some possible solution components.
- If agile practices are not in place, they will likely be adopted. Solutions will include support for tools such as code repositories and continuous integration/continuous deployment.
- Services that implement the solution should be monitored to help identify problems, bottlenecks, and other issues. Stackdriver may be used for collecting metrics and logging.
- Analytics skills should be developed and supported with data management and analysis services. A data warehouse may be needed to store large amounts of data for use in data analysis, in which case BigQuery would be a good option. Big data analytics platforms, like Spark and Hadoop, may be used by data scientists. AutoML Tables may be useful for building classification models for predicting parts failures.
If this were a real-world use case, architects would gather more information about the specific business objectives. This is a good example of a case study that requires architects to exercise “soft skills,” such as soliciting information from colleagues and describing possible solutions to help identify missed requirements.
Business requirements are a starting point for formulating a technical solution. Architects have to apply their knowledge of systems design to map business requirements into possible technical requirements. After that, they can dig into explicit technical requirements to start to formulate technical solutions.
The key point of this section is that business requirements are not some kind of unnecessary filler in the case studies. They provide the broad context in which a solution will be developed. While they do not provide enough detail to identify solution components on their own, they do help us narrow the set of feasible technical solutions. Business requirements can help you rule out options to an exam question. For example, a data storage solution that distributes data across multiple regions by default may meet all technical requirements, but if a business requirement indicates the data to be stored must be located in a specific country, then the correct answer is the one that lets you limit where the data is stored.
Application Design and Cost Considerations
In addition to specifying business and product strategy, business requirements may state things that you should consider in application design, such as a preference for managed services and the level of tolerance for disruptions in processing. Implicit in business requirements is the need to minimize costs while meeting business objectives.
One measure of costs is total cost of ownership (TCO). TCO is the combination of all expenses related to maintaining a service, which can include the following:
- Software licensing costs
- Cloud computing costs, including infrastructure and managed services
- Cloud storage costs
- Data ingress and egress charges
- Cost of DevOps personnel to develop and maintain the service
- Cost of third-party services used in an application
- Charges against missed service-level agreements
- Network connectivity charges, such as those for a dedicated connection between an on-premises data center and Google Cloud
While you will want to minimize the TCO, you should be careful not to try to minimize the cost of each component separately. For example, you may be able to reduce the cost of DevOps personnel to develop and maintain a service if you increase your spending on managed services. Also, it is generally a good practice to find a feasible technical solution to a problem before trying to optimize that solution for costs.
Some of the ways to reduce costs while meeting application design requirements include managed services, preemptible virtual machines, and data lifecycle management. Google also offers sustained uses discounts and reserved VMs, which can help reduce costs.
Managed Services
Managed services are Google Cloud Platform services that do not require users to perform common configuration, monitoring, and maintenance operations. For example, Cloud SQL is a managed relational database service providing MySQL and PostgreSQL databases. Database administration tasks, such as backing up data and patching operating systems, are performed by Google and not by customers using the service. Managed services are good options when:
- Users do not need low-level control over the resources providing the service, such as choosing the operating system to run in a VM.
- Managed services provide a functionality that would be difficult or expensive to implement, such as developing a machine vision application.
- There is little competitive advantage to performing resource management tasks. For example, the competitive advantage that may come from using Apache Spark for analytics stems from the algorithms and analysis methodologies, not from the administration of the Spark cluster.
Architects do not necessarily need to know the details of how managed services work. This is one of their advantages. Architects do not need to develop an expertise in natural language processing to use the Cloud Natural Language API, but they do need to understand what kinds of functions managed services provide. See Table 2.1 for brief descriptions of some of the Google Cloud Platform managed services.
This table is provided to show the breadth of Google Cloud Platform managed services. The services offered change over time. Managed services may be generally available, or they can be in beta. For a list of current services, see the Google Platform Services Summary at https://cloud.google.com/terms/services.
Table 2.1 Examples of Google Cloud Platform Managed Services
Service Name | Service Type | Description |
AutoML Tables | AI and machine learning | Machine learning models for structured data |
Cloud AutoML | AI and machine learning | Trains machine learning models |
Cloud Inference API | AI and machine learning | Correlations over time-series data |
Cloud Speech-to-Text | AI and machine learning | Convert speech to text |
Cloud Text-to-Speech | AI and machine learning | Convert text to speech |
Natural Language | AI and machine learning | Text analysis, including sentiment analysis and classification |
Translation | AI and machine learning | Translate text between languages |
BigQuery | Analytics | Data warehousing and analytics |
Cloud Datalab | Analytics | Interactive data analysis tool based on Jupyter Notebooks |
Cloud Dataprep | Analytics | Explore, clean, and prepare structured data |
Dataproc | Analytics | Managed Hadoop and Spark service |
Google Data Studio | Analytics | Dashboard and reporting tool |
Cloud Data Fusion | Data management | Data integration and ETL tool |
Data Catalog | Data management | Metadata management service |
Dataflow | Data management | Stream and batch processing |
Cloud Spanner | Database | Global relational database |
Cloud SQL | Database | Relational database |
Cloud Deployment Manager | Development | Infrastructure-as-code service |
Cloud Source Control Repositories | Development | Version control and collaboration service |
Cloud Pub/Sub | Messaging | Messaging service |
Cloud Composer | Orchestration | Data workflow orchestration service |
Bigtable | Storage | Wide column, NoSQL database |
Cloud Data Transfer | Storage | Bulk data transfer service |
Cloud Memorystore | Storage | Managed cache service using Redis |
Cloud Storage | Storage | Managed object storage service |
Preemptible Virtual Machines
One way to minimize computing costs is to use preemptible virtual machines, which are short-lived VMs that cost about 80 percent less than their nonpreemptible counterparts. Here are some things to keep in mind about preemptible VMs when considering business and technical requirements:
- Preemptible VMs may be shut down at any time by Google. They will be shut down after running for 24 hours.
- GCP will signal a VM before shutting down. The VM will have 30 seconds to stop processes gracefully.
- Preemptible VMs can have local SSDs and GPUs if additional storage and compute resources are needed.
- If you want to replace a preempted VM automatically, you can create a managed instance group for preemptible VMs. If resources are available, the managed instance group will replace the preempted VM.
- Preemptible VMs can be used with some managed services, such as Cloud Dataproc, to reduce the overall cost of the service.
Preemptible VMs are well suited for batch jobs or other workloads that can be disrupted and restarted. They are not suitable for running services that need high availability, such as a database or user-facing service, like a web server.
Preemptible VMs are also not suitable for applications that manage state in memory or on the local SSD. Preemptible VMs cannot be configured to live migrate; when they are shut down, locally persisted data and data in memory are lost. If you want to use preemptible VMs with stateful applications, consider using Cloud Memorystore, a managed Redis service for caching data, or using a database to store state.
Data Lifecycle Management
When assessing the application design and cost implications of business requirements, consider how to manage storage costs. Storage options lie on a spectrum from short term to archival storage.
- Memorystore is a cache, and data should be kept in cache only if it is likely to be used in the very near future. The purpose of this storage is to reduce the latency of accessing data.
- Databases, like CloudSQL and Datastore, store data that needs to be persistently stored and readily accessed by an application or user. Data should be stored in the database when it could possibly be queried or updated. When data is no longer required to be queried or updated, it can be exported and stored in object storage.
- In the case of time-series databases, data may be aggregated by larger time spans as time goes on. For example, an application may collect performance metrics every minute. After three days, there is no need to query to the minute level of detail, and data can be aggregated to the hour level. After one month, data can be aggregated to the day level. This incremental aggregation will save space and improve response times for queries that span large time ranges.
- Object storage is often used for unstructured data and backups. Multiregional and regional storage should be used for frequently accessed data. If data is accessed at most once a month, then Nearline storage can be used. When data is not likely to be accessed more than once a year, then Coldline storage should be used.
Consider how to take advantage of Cloud Storage’s lifecycle management features, which allow you to specify actions to perform on objects when specific events occur. The two actions supported are deleting an object or changing its storage class. Multiregional and regional class storage objects can be migrated to either Nearline or Coldline storage. Nearline storage can migrate only to Coldline storage. Lifecycle conditions can be based on the following:
- The age of an object
- When it was created
- The object’s storage class
- The number of versions of an object
- Whether or not the object is “live” (an object in nonversions bucketed is “live”; archived objects are not live)
It is worth noting that BigQuery has a two-tiered pricing model for data storage. Active data is data that has been modified in the last 90 days. Long-term data has not been accessed in the last 90 days, and it is billed at a lower rate than active data.
Systems Integration and Data Management
Business requirements can give information that is useful for identifying dependencies between systems and how data will flow through those systems.
Systems Integration Business Requirements
One of an architect’s responsibilities is to ensure that systems work together. Business requirements will not specify technical details about how applications should function together, but they will state what needs to happen to data or what functions need to be available to users. Let’s review examples of systems integration considerations in the case studies.
Dress4Win Systems Integration
In the Dress4Win case study, the description notes that the company generates revenue through referrals, presumably between customers and designers and retailers. To make referrals, a system must analyze customer data as well as designer and retailer profiles. The results of that analysis generate recommended matches.
The matching application will need to integrate data from customers, retailers, and designers. Most likely, this processing will be done in batch since it is not time sensitive. Also, this kind of analysis is best done with analytic platforms such as Cloud Dataproc or BigQuery, so data will need to flow from the e-commerce transaction system to an analytics system. This could require the use of Cloud Dataflow or Cloud Data Fusion, depending on the specific integration and data preprocessing requirements.
Mountkirk Games Systems Integration
Let’s consider how datastores and microservices architectures can influence systems integration.
Online games, like those produced by Mountkirk Games, use more than one type of datastore. Player data, such as the player’s in-game possessions and characteristics, could be stored in a document database like Cloud Datastore, while the time-series data could be stored in Bigtable, and billing information may be kept in a transaction processing relational database. Architects should consider how data will be kept complete and consistent across datastores. For example, if a player purchases a game item, then the application needs to ensure that the item is added to the player’s possession record in the player database and that the charge is authorized by the payment system. If the payment is not authorized, the possession should not be available to the player.
It is likely that Mountkirk Games uses a microservices architecture. Microservices are single services that implement one single function of an application, such as storing player data or recording player actions. An aggregation of microservices implements an application. Microservices make their functions accessible through application programming interfaces (APIs). Depending on security requirements, services may require that calls to their API functions are authenticated. Services that are accessible only to other trusted game services may not require authentication. High-risk services, such as a payment service, may require more security controls than other services. Cloud Endpoints may help to manage APIs and help secure and monitor calls to microservices.
TerramEarth Systems Integration
From the description of the current system, we can see that on-board applications communicate with a centralized data collection system. Some of the data is collected in batches when vehicles are serviced, and some is collected as it is generated using cellular network communications. TerramEarth works with 500 dealers, and one of the business requirements is to “support the dealer network with more data on how their customers use their equipment to better position new products and services.”
As part of planning for envisioning future needs, an architect working with TerramEarth should consider how to support more vehicles sending data directly, eventually retiring the batch data load process in favor of having real-time or near-real-time data uploads for all vehicles. This would require planning an ingest pipeline that could receive data reliably and consistently, and perform any preprocessing necessarily.
Services that store and analyze the data will need to scale to support up to tens of millions of vehicles transmitting data. To accomplish this, consider using a Cloud Pub/Sub queue, which allows decoupling and buffering so that data is not lost if the ingestion services cannot keep up with the rate that new data is received. This will ensure that data is not lost if ingestion services cannot keep up with the rate at which new data is received.
TerramEarth seeks to improve data sharing with dealers so that they can better understand how customers use their equipment. This will require integrating TerramEarth and dealer systems. The architect should gather additional details to understand how best to share data with dealers. For example, an architect may ask, “What features of the data are significant to dealers?” Also, architects should consider how dealers would access the data. Dealer applications could query TerramEarth APIs for data or TerramEarth could send data directly to dealer data warehouses or other reporting platforms.
Each case study has examples of systems integration requirements. When considering these requirements, keep in mind the structure and volume of data exchanged, the frequency of data exchange, the need for authentication, the reliability of each system, how to prevent data loss in the event of a problem with one of the services, and how to protect services from intentional or unintentional bursts in API requests that could overwhelm the receiving services.
Data Management Business Requirements
In addition to using business requirements to understand which systems need to work together, architects can use those requirements to understand data management business requirements. At a minimum, data management considerations include the following:
- How much data will be collected and stored?
- How long will it be stored?
- What processing will be applied to the data?
- Who will have access to the data?
How Much Data Is Stored?
One of the first questions asked about data management is “How much data are we expecting, and at what rate will we receive it?” Knowing the expected volumes of data will help plan for storage.
If data is being stored in a managed service like Cloud Storage, then Google will manage the provisioning of storage space, but you should still understand the volume of data so that you can accurately estimate storage costs and work within storage limits.
It is important to plan for adequate storage capacity. Those responsible for managing storage will also need to know the rate at which new data arrives and existing data is removed from the system. This will give the growth rate in storage capacity.
How Long Is Data Stored?
It is also important to understand how long data will be stored in various storage systems. Data may first arrive in a Cloud Pub/Sub queue, but it is immediately processed by a Cloud Function that removes the data from the queue, transforms the data, and writes it to another storage system. In this case, the data is usually in the queue for a short period of time. If the ingestion process is down, data may accumulate in the Cloud Pub/Sub topic. Since Cloud Pub/Sub is a managed service, the DevOps team would not have to allocate additional storage to the queue if there is a backup of data. GCP will take care of that. They will, however, have to consider how long the data should be retained in the queue. For example, there may be little or no business value in data that is more than seven days old. In that case, the team should configure the Cloud Pub/Sub queue with a seven-day retention period.
If data is stored in Cloud Storage, you can take advantage of the service’s lifecycle policies to delete or change the storage class of data as needed. For example, if the data is rarely accessed after 45 days, data can be stored in Nearline or Coldline storage. If the data is accessed once per month or less, the Nearline storage is an appropriate choice. If it is accessed once a year or less, then Coldline storage is an appropriate choice.
When data is stored in a database, you will have to develop procedures for removing data that is no longer needed. In this case, the data could be backed up to Cloud Storage for archiving, or it could be deleted without keeping a copy. You should consider the trade-offs between the possible benefits of having data available and the cost of storing it. For instance, machine learning models can take advantage of large volumes of data, so it may be advisable to archive data even if you cannot anticipate a use for the data at this time.
What Processing Is Applied to the Data?
There are several things to consider about how data is processed. These include the following:
- Distance between the location of stored data and services that will process the data
- Volume of data that is moved from storage to processing services
- Acceptable latency when reading and writing the data
- Stream or batch processing
- In the case of stream processing, how long to wait for late arriving data
The distance between the location of data and where it is processed is an important consideration. This affects both the time it takes to read and write the data as well as, in some cases, the network costs for transmitting the data. If data will be accessed from multiple geographic regions, consider using multiregional storage when using Cloud Storage. If you are storing data in a relational database, consider replicating data to a read-only replica located in a region closer to where the data will be read. If there is a single write instance of the database, then this approach will not improve the time to ingest or update the data.
Understand if the data will be processed in batches or as a stream. Batch processes tend to tolerate longer latencies, so moving data between regions may not create problems for meeting business requirements around how fast the data needs to be processed. If data is now being processed in batch but it will be processed as a stream in the future, consider using Cloud Dataflow, Google’s managed Apache Beam service. Apache Beam provides a unified processing model for batch and stream processing.
When working with stream processing, you should consider how you will deal with late arriving and missing data. It is a common practice in stream processing to assume that no data older than a specified time will arrive. For example, if a process collects telemetry data from sensors every minute, a process may wait up to five minutes for data. If the data has not arrived by then, the process assumes that it will never arrive.
Business requirements help shape the context for systems integration and data management. They also impose constraints on acceptable solutions. In the context of the exam, business requirements may infer technical requirements that can help you identify the correct answer to a question.
Compliance and Regulation
Businesses and organizations may be subject to one or more regulations. For example, it is likely that Mountkirk Games accepts payment using credit cards and so is subject to financial services regulations governing payment cards. Part of analyzing business requirements is to understand which, if any, regulations require compliance. Regulations have different requirements, depending on their objectives. Some widely recognized regulations include the following:
- Health Insurance Portability and Accountability Act (HIPAA), which addresses privacy security of medical information in the United States.
- General Data Protection Regulation (GDPR), which defines privacy protections for persons in and citizens of the European Union.
- The Sarbanes-Oxley (SOX) Act, which regulates business reporting of publicly traded companies to ensure the accuracy and reliability of financial statements in order to mitigate the risk of corporate fraud. This is a U.S. federal regulation.
- Children’s Online Privacy Protection Act (COPPA) is a U.S. law that regulates websites that collect personal information to protect children under the age of 13.
- Payment Card Industry Data Security Standard (PCI DSS) is an industry security standards that applies to businesses that accept payment cards. The regulation specifies security controls that must be in place to protect cardholders’ data.
It is important to understand what regulations apply to the systems you design. Some regulations apply because the business or organization developing cloud applications operates in a particular jurisdiction. HIPAA applies to healthcare providers in the United States. Companies that operate in the state of California in the United States may also subject to the California Consumer Privacy Act when it goes into effect in 2020. If a business operates in North America but has customers in Europe, it may be subject to GDPR.
Some regulations apply by virtue of the industry in which the business or organization operates. HIPAA governs healthcare providers and others with access to protected health information. Banks in the United States are subject to the Financial Services Modernization Act, also known as the Gram-Leach-Bliley Act (GLBA), specifying privacy protections for consumers’ nonpublic financial information.
Privacy Regulations
Regulations placed on data are often designed to ensure privacy and protect the integrity of data. A large class of regulations govern privacy. HIPAA, GLBA, GDPR, and a host of national laws are designed to limit how personal data is used and to provide individuals with some level of control over their information. More than 40 countries, including the European Union and Singapore, have privacy regulations. (See https://www.privacypolicies.com/blog/privacy-law-by-country/ for a list of countries and links to additional information.) Industry regulations, like PCI DSS, also include protections for keeping data confidential.
From an architect’s perspective, privacy regulations require that we plan on ways to protect data through its entire lifecycle. This begins when data is collected, for example, when a patient enters some medical information into a doctor’s scheduling application. Protected data should be encrypted before transmitting it to cloud applications and databases. Data should also be encrypted when stored. This is sometimes called encrypting data in motion and data at rest.
Access controls should be in place to ensure that only authenticated and authorized persons and service accounts can access protected data. In some cases, applications may need to log changes to data. In those cases, logs must be tamperproof.
Networks and servers should be protected with firewalls and other measures to limit access to servers that process protected data. With Google Cloud, architects and developers can take advantage of the Cloud Identity-Aware Proxy to verify a user’s identity in the context of a request to a service and determine whether that operation should be allowed.
Security best practices should be used as well. This includes following the principle of least privilege, so users and service accounts have only the permissions that are required to carry out their responsibilities. Also practice defense in depth. That principle assumes any security control may be compromised, so systems and data should be protected with multiple different types of controls.
Data Integrity Regulations
Data integrity regulations are designed to protect against fraud. SOX, for example, requires regulated business to have controls on systems that prevent tampering with financial data. In addition, businesses need to be able to demonstrate that they have these controls in place. This can be done with application logs and reports from security systems, such as vulnerability scanners or anti-malware applications.
Depending on regulations, applications that collect or process sensitive data may need to use message digests and digital signing to protect data with tamper-proof protections.
Many of the controls used to protect privacy, such as encryption and blocking mechanisms, like firewalls, are also useful for protecting data integrity.
In addition to stated business requirements, it is a good practice to review compliance and regulations with the business owners of the systems that you design. You will often find that the security protections required by regulations overlap with the controls that are used in general to secure information systems and data.
Security
Information security, also known as infosec and cybersecurity, is a broad topic. In this section, you will focus on understanding high-level security requirements based on business requirements. Chapter 7, “Designing for Security and Legal Compliance,” will go into more detail on cloud security measures.
Business requirements for security tend to fall into three areas: confidentiality, integrity, and availability.
Confidentiality
Confidentiality is about limiting access to data. Only users and service accounts with legitimate business needs should have access to data. Even if regulations do not require keeping some data confidential, it is a good practice to protect confidentiality. Using HTTPS instead of HTTP and encrypting data at rest should be standard practice. Fortunately, for GCP users, Google Cloud provides encryption at rest by default.
When we use default encryption, Google manages the encryption keys. This requires the least work from customers and DevOps teams. If there is a business requirement that the customer and not Google manage the keys, you can design for customer-managed encryption keys using Cloud KMS, or you can use customer-supplied encryption keys. In the former case, keys are kept in the cloud. When using customer-supplied keys, they are stored on-premises.
Protecting servers and networks is also part of ensuring confidentiality. When collecting business requirements, look for requirements for additional measures, for example if a particular hardened operating system must be used. This can limit your choice of computing services. Also determine what kind of authentication is required. Will two-factor authentication be needed? Start thinking about roles and permissions. Will custom IAM roles be required? Determine what kinds and level of audit logging are required.
Integrity
Protecting data integrity is a goal of some of the regulations discussed earlier, but it is a general security requirement in any business application. The basic principle is that only people or service accounts with legitimate business needs should be able to change data, and then only for legitimate business purposes.
Access controls are a primary tool for protecting data integrity. Google Cloud Platform has defined a large number of roles to grant permissions easily according to common business roles. For example, App Engine has roles for administrators, code viewers, deployers, and others. This allows security administrators to assign fine-grained roles to users and service accounts while still maintaining least privileges.
Server and network security measures also contribute to protecting data integrity.
When collecting and analyzing business requirements, seek to understand the roles that are needed to carry out business operations and which business roles or positions will be assigned those roles. Pay particular attention to who is allowed to view and update data and use separate roles for users who have read-access only.
Availability
Availability is a bit different from confidentiality and integrity. Here the goal is to ensure that users have access to a system. Malicious activities, such as distributed denial-of-service (DDoS) attacks, malware infection, and encrypting data without authorization can degrade availability.
During the requirements-gathering phase of a project, consider any unusual availability requirements. With respect to security, the primary focus is on preventing malicious acts. From a reliability perspective, availability is about ensuring redundant systems and failover mechanisms to ensure that services continue to operate in spite of component failures.
Security should be discussed when collecting business requirements. At this stage, it is more important to understand what the business expects in terms of confidentiality, integrity, and availability. We get into technical and implementation details after first understanding the business requirements.
Success Measures
Businesses and other organizations are moving to the cloud because of its value. Businesses can more efficiently develop, deploy, and run applications, especially when they design in ways that take advantage of the cloud. Decision-makers typically want to measure the value of their projects. This enables them to allocate resources to the more beneficial projects while avoiding others that may not prove worthwhile. Two common ways to measure progress and success are key performance indicators and return on investment.
Key Performance Indicators
KPIs are a measurable value of some aspect of the business or operations that indicates how well the organization is achieving its objectives. A sales department may have total value of sales in the last week as a KPI, while a DevOps team might use CPU utilization as a KPI of efficient use of compute resources.
Project KPIs
Project managers may use KPIs to measure the progress of a cloud migration project. KPIs in that case may include a volume of data migrated to the cloud and no longer stored on-premises, a number of test cases run each day in the cloud instead of on-premises, or the number of workload hours running in the cloud instead of on-premises.
You can see from these examples that KPIs can be highly specific and tailored to a particular kind of project. Often, you will have to define how you will measure a KPI. For example, a workload hour may be defined based on the wall clock time and the number of CPUs dedicated to a workload.
The definition of a KPI should allow for an obvious way to measure the indicator. The details of the definition should be stated early in the project to help team members understand the business objectives and how they will be measured.
Operations KPI
Line-of-business managers may use KPIs to measure how well operations are running. These KPIs are closely aligned with business objectives. A retailer may use total sales revenue, while a telecommunications company may monitor reduction in customer churn, in other words, customers taking their business to a competitor. A financial institution that makes loans might use the number of applications reviewed as a measure of how well the business is running.
For architects, it is important to know how the business will measure the success of a project or operations. KPIs help us understand what is most important to the business and what drives decision-makers to invest in a project or line of business.
Return on Investment
ROI is a way of measuring the monetary value of an investment. ROI is expressed as a percentage, and it is based on the value of some aspect of the business after an investment when compared to its value before the investment. The return, or increase or loss, after an investment divided by the cost of the investment is the ROI. The formula for ROI is as follows:
For example, if a company invests $100,000 in new equipment and this investment generates a value of $145,000, then the ROI is 45 percent.
In cloud migration projects, the investment includes the cost of cloud services, employee and contractor costs, and any third-party service costs. The value of the investment can include the expenses saved by not replacing old equipment or purchasing new equipment, savings due to reduced power consumption in a data center, and new revenue generated by applications and services that scale up in the cloud but were constrained when run on-premises.
Success measures such as KPIs and ROI are a formal way of specifying what the organization values with respect to a project or line of business. As an architect, you should know which success measures are being used so that you can understand how the business measures the value of the systems that you design.
Summary
The first stages of a cloud project should begin with understanding the business use cases and product strategy. This information sets the context for later work on the technical requirements analysis.
One part of business requirements analysis includes application design and cost considerations. Application design considerations include assessing the possible use of managed services and preemptible virtual machines. Data lifecycle management is also a factor in application design.
In addition to business drivers, consider regulations that may apply to your projects. Many regulations are designed to protect individuals’ privacy or to ensure the integrity of data to prevent fraud. Compliance with regulations may require additional security controls or application features that otherwise would not be implemented.
Security business requirements can be framed around three objectives: protecting confidentiality, preserving the integrity of data, and ensuring the availability of services, especially with respect to malicious acts that could disrupt services.
Business and other organizations will often monitor the progress of projects and the efficiency and profitability of lines of business using success measures, such as KPIs and ROI.
Exam Essentials
Study the three case studies: Dress4Win, Mountkirk Games, and TerramEarth. You will have access to the case studies during the exam, but you can save time if you are already familiar with the details of each. Also, think through the implications of the business requirements in order to understand how they constrain technical solution options.
Understand business terms like total cost of ownership (TCO), key performance indicators (KPIs), and return on investment (ROI). You will almost certainly not have to calculate any of these measures, but you should understand what they measure and why they are used by business executives and analysts.
Learn about Google Cloud Platform managed services and for what purposes they are used. These services can be used instead of deploying and managing applications and managing servers, storage, networking, and so forth. If a business requirement includes using managed services or reducing the workload on a DevOps team, you may be able to use one or more of these services to solve problems presented on the exam.
Understand the elements of data management. This includes how much data will be collected and stored, how long it will be stored, what processing will be applied to it, and who will have access to it.
Understand how compliance with regulations can introduce additional business requirements. Businesses and organizations may be subject to one or more regulations. Part of analyzing business requirements is to understand which, if any, regulations require compliance.
Understand the three main aspects of security with respect to business requirements. They are confidentiality, integrity, and availability. Confidentiality focuses on limiting access to data to those who have a legitimate business need for it. Integrity ensures that data is not tampered with or corrupted in ways that could enable fraud or other misuse. Availability addresses the need to prevent malicious acts from disrupting access to services.
Know why decision-makers use success measures. Two well-known success measures are KPIs and ROI. KPIs are measures that are specific to a business operation. ROI is a general measure based on the cost of an investment versus the additional benefit realized because of that investment.
Review Questions
-
In the Dress4Win case study, the volume of data and compute load will grow with respect to what factor?
- The number of customers, designers, and retailers
- The time the application is running
- The type of storage used
- Compliance with regulations
-
You have received complaints from customers about long wait times while loading application pages in their browsers, especially pages with several images. Your director has tasked you with reducing latency when accessing and transmitting data to a client device outside the cloud. Which of the following would you use? (Choose two.)
- Multiregional storage
- Coldline storage
- CDN
- Cloud Pub/Sub
- Cloud Dataflow
-
Mountkirk Games will analyze game players’ usage patterns. This will require collecting time-series data including game state. What database would be a good option for doing this?
- BigQuery
- Bigtable
- Cloud Spanner
- Cloud Storage
-
You have been hired to consult with a new data warehouse team. They are struggling to meet schedules because they repeatedly find problems with data quality and have to write preprocessing scripts to clean the data. What managed service would you recommend for addressing these problems?
- Cloud Dataflow
- Cloud Dataproc
- Cloud Dataprep
- Cloud Datastore
-
You have deployed an application that receives data from sensors on TerramEarth equipment. Sometimes more data arrives than can be processed by the current set of Compute Engine instances. Business managers do not want to run additional VMs. What changes could you make to ensure that data is not lost because it cannot be processed as it is sent from the equipment? Assume that business managers want the lowest-cost solution.
- Write data to local SSDs on the Compute Engine VMs.
- Write data to Cloud Memorystore, and have the application read data from the cache.
- Write data from the equipment to a Cloud Pub/Sub queue, and have the application read data from the queue.
- Tune the application to run faster.
-
Your company uses Apache Spark for data science applications. Your manager has asked you to investigate running Spark in the cloud. Your manager’s goal is to lower the overall cost of running and managing Spark. What would you recommend?
- Run Apache Spark in Compute Engine.
- Use Cloud Dataproc with preemptible virtual machines.
- Use Cloud Dataflow with preemptible virtual machines.
- Use Cloud Memorystore with Apache Spark running in Compute Engine.
-
You are working with a U.S. hospital to extract data from an electronic health record (EHR) system. The hospital has offered to provide business requirements, but there is little information about regulations in the documented business requirements. What regulations would you look to for more guidance on complying with relevant regulations?
- GDPR
- SOX
- HIPAA
- PCI DSS
-
What security control can be used to help detect changes to data?
- Firewall rules
- Message digests
- Authentication
- Authorization
-
Your company has a data classification scheme for categorizing data as secret, sensitive, private, and public. There are no confidentiality requirements for public data. All other data must be encrypted at rest. Secret data must be encrypted with keys that the company controls. Sensitive and private data can be encrypted with keys managed by a third party. Data will be stored in GCP. What would you recommend in order to meet these requirements while minimizing cost and administrative overhead?
- Use Cloud KMS to manage keys for all data.
- Use Cloud KMS for secret data and Google default encryption for other data.
- Use Google default encryption for all data.
- Use a custom encryption algorithm for all data.
-
You manage a service with several databases. The queries to the relational database are increasing in latency. Reducing the amount of data in tables will improve performance and reduce latency. The application administrator has determined that approximately 60 percent of the data in the database is more than 90 days old and has never been queried and does not need to be in the database. You are required to keep the data for five years in case it is requested by auditors. What would you propose to decrease query latency without increasing costs—or at least keeping any cost increases to a minimum?
- Horizontally scale the relational database.
- Vertically scale the relational database.
- Export data more than 90 days old, store it in Cloud Storage Coldline class storage, and delete that data from the relational database.
- Export data more than 90 days old, store it in Cloud Storage multiregional class storage, and delete that data from the relational database.
-
Your company is running several custom applications that were written by developers who are no longer with the company. The applications frequently fail. The DevOps team is paged more for these applications than any others. You propose replacing those applications with several managed services in GCP. A manager who is reviewing your cost estimates for using managed services in GCP notes that the cost of the managed services will be more than what they pay for internal servers. What would you recommend as the next step for the manager?
- Nothing. The manager is correct—the costs are higher. You should reconsider your recommendation.
- Suggest that the manager calculate total cost of ownership, which includes the cost to support the applications as well as infrastructure costs.
- Recommend running the custom applications in Compute Engine to lower costs.
- Recommend rewriting the applications to improve reliability.
-
A director at Mountkirk Games has asked for your recommendation on how to measure the success of the migration to GCP. The director is particularly interested in customer satisfaction. What KPIs would you recommend?
- Average revenue per customer per month
- Average time played per customer per week
- Average time played per customer per year
- Average revenue per customer per year
-
Mountkirk Games is implementing a player analytics system. You have been asked to document requirements for a stream processing system that will ingest and preprocess data before writing it to the database. The preprocessing system will collect data about each player for one minute and then write a summary of statistics about that database. The project manager has provided the list of statistics to calculate and a rule for calculating values for missing data. What other business requirements would you ask of the project manager?
- How long to store the data in the database?
- What roles and permissions should be in place to control read access to data in the database?
- How long to wait for late-arriving data?
- A list of managed services that can be used in this project
-
A new data warehouse project is about to start. The data warehouse will collect data from 14 different sources initially, but this will likely grow over the next 6 to 12 months. What managed GCP service would you recommend for managing metadata about the data warehouse sources?
- Data Catalog
- Cloud Dataprep
- Cloud Dataproc
- BigQuery
-
You are consulting for a multinational company that is moving its inventory system to GCP. The company wants to use a managed database service, and it requires SQL and strong consistency. The database should be able to scale to global levels. What service would you recommend?
- Bigtable
- Cloud Spanner
- Cloud Datastore
- BigQuery
-
TerramEarth has interviewed dealers to better understand their needs regarding data. Dealers would like to have access to the latest data available, and they would like to minimize the amount of data they have to store in their databases and object storage systems. How would you recommend that TerramEarth provide data to their dealers?
- Extract dealer data to a CSV file once per night during off-business hours and upload it to a Cloud Storage bucket accessible to the dealer.
- Create an API that dealers can use to retrieve specific pieces of data on an as-needed basis.
- Create a database dump using the database export tool so that dealers can use the database import tool to load the data into their databases.
- Create a user account on the database for each dealer and have them log into the database to run their own queries.
-
Your company has large volumes of unstructured data stored on several network-attached storage systems. The maintenance costs are increasing, and management would like to consider alternatives. What GCP storage system would you recommend?
- Cloud SQL
- Cloud Storage
- Cloud Datastore
- Bigtable
-
A customer-facing application is built using a microservices architecture. One of the services does not scale as fast as the service that sends it data. This causes the sending service to wait while the other service processes the data. You would like to change the integration to use asynchronous instead of synchronous calls. What is one way to do this?
- Create a Cloud Pub/Sub topic, have the sending service write data to the topic, and have the receiving service read from the topic.
- Create a Cloud Storage bucket, have the sending service write data to the topic, and have the receiving service read from the topic.
- Have the sending service write data to local drives, and have the receiving service read from those drives.
- Create a Bigtable database, have the sending service write data to the topic, and have the receiving service read from the topic.
-
A product manager at TerramEarth would like to use the data that TerramEarth collects to predict when equipment will break down. What managed services would you recommend TerramEarth to consider?
- Bigtable
- Cloud Dataflow
- Cloud AutoML
- Cloud Spanner