1 - Choose a data storage approach in Azure

 https://docs.microsoft.com/en-us/learn/modules/choose-storage-approach-in-azure/

 

Data comes in different shapes and sizes, and no single storage solution fits all data.

The key factors to consider in deciding on the optimal storage solution are:

  1. Classify your data as structured, semi-structured, or unstructured
  2. Determine how your data will be used
  3. Determine whether your data requires transactions(Performance)

  

 

Application data can be classified in one of three ways: 

  1. structured, 
  2. semi-structured, 
  3. unstructured.

Structured data

  • Structured data, sometimes referred to as relational data, is data that adheres to a strict schema, so all of the data has the same fields or properties. 
  • The shared schema allows this type of data to be easily searched with query languages such as SQL (Structured Query Language). 
  • This capability makes this data style perfect for applications such as CRM systems, reservations, and inventory management.

 

  • Structured data is often stored in database tables with rows and columns with key columns to indicate how one row in a table relates to data in another row of another table. 
  • The below image shows data about students and classes with a relationship to grades that ties them together.
  • Structured data is straightforward in that it's easy to enter, query, and analyze. 
  • All of the data follows the same format. 
  • However, forcing a consistent structure also means evolution of the data is more difficult as each record has to be updated to conform to the new structure.

Semi-structured data

  • Semi-structured data is less organized than structured data, and is not stored in a relational format, as the fields do not neatly fit into tables, rows, and columns. 
  • Semi-structured data contains tags that make the organization and hierarchy of the data apparent - for example, key/value pairs. 
  • Semi-structured data is also referred to as non-relational or NoSQL data.
  • The expression and structure of the data in this style is defined by a serialization language.

 

  • For software developers, data serialization languages are important because they can be used to write data stored in memory to a file, sent to another system, parsed and read. 
  • The sender and receiver don’t need to know details about the other system, as long as the same serialization language is used, the data can be understood by both systems.

Common formats

Today, there are three common serialization languages you're likely to encounter:

  • XML, or extensible markup language, was one of the first data languages to receive widespread support. 
  • It's text-based, which makes it easily human and machine-readable. 
  • In addition, parsers for it can be found for almost all popular development platforms. XML allows you to express relationships and has standards for schema, transformation, and even displaying on the web.

Here's an example of a person with hobbies expressed in XML.


<Person Age="23">
    <FirstName>John</FirstName>
    <LastName>Smith</LastName>
    <Hobbies>
        <Hobby Type="Sports">Golf</Hobby>
        <Hobby Type="Leisure">Reading</Hobby>
        <Hobby Type="Leisure">Guitar</Hobby>
   </Hobbies>
</Person>


  • XML expresses the shape of the data using tags.
  • These tags come in two forms: elements such as <FirstName> and _attributes that can be expressed in text like Age="23".
  • Elements can have child elements to express relationships - such as the <Hobbies> tag above which is expressing a collection of Hobby elements.  

 

 

  • JSON – or JavaScript Object Notation, has a lightweight specification and relies on curly braces to indicate data structure. 
  • Compared to XML, it is less verbose and easier to read by humans. 
  • JSON is frequently used by web services to return data.

Here's the same person expressed in JSON.

{
    "firstName": "John",
    "lastName": "Doe",
    "age": "23",
    "hobbies": [
        { "type": "Sports", "value": "Golf" },
        { "type": "Leisure", "value": "Reading" },
        { "type": "Leisure", "value": "Guitar" }
    ]
}

  • Notice that this format isn't as formal as XML. 
  • It's closer to a key/value pair model than a formal data expression. 
  • As you might guess from the name, JavaScript has built-in support for this format - making it very popular for web development. 
  • Like XML, other languages have parsers you can use to work with this data format. 
  • The downside to JSON is that it tends to be more programmer-oriented making it harder for non-technical people to read and modify.



  • YAML – or YAML Ain’t Markup Language, is a relatively new data language that’s growing quickly in popularity in part due to its human-friendliness. 
  • The data structure is defined by line separation and indentation, and reduces the dependency on structural characters like parentheses, commas and brackets.

Here's the same person data expressed in YAML.

firstName: John
lastName: Doe
age: 23
hobbies:
    - type: Sports
      value: Golf
    - type: Leisure
      value: Reading
    - type: Leisure
      value: Guitar

 

  • This format is more readable than JSON and is often used for configuration files that need to be written by people but parsed by programs. 
  • However, YAML is the newest of these data formats and doesn't have as much support in programming languages as JSON and XML.


Unstructured data

  • The organization of unstructured data is ambiguous. 
  • Unstructured data is often delivered in files, such as photos or videos. 
  • The video file itself may have an overall structure and come with semi-structured metadata, but the data that comprises the video itself is unstructured. 
  • Therefore, photos, videos, and other similar files are classified as unstructured data.

Examples of unstructured data include:

  • Media files, such as photos, videos, and audio files
  • Office files, such as Word documents
  • Text files
  • Log files

 

 

 

 

Determine operational needs

  • Once you've identified the kind of data you're dealing with (structured, semi-structured, or unstructured), the next step is to determine how you'll use the data. 
  • For example, as an online retailer you know customers need quick access to product data, and business users need to run complex analytical queries. 
  • As you work through these requirements, taking your data classification into account, you can start to plan your data storage solution.

Operations and latency

What are the main operations you'll be completing on each data type, and what are the performance requirements?

Ask yourself these questions:

  • Will you be doing simple lookups using an ID?
  • Do you need to query the database for one or more fields?
  • How many create, update, and delete operations do you expect?
  • Do you need to run complex analytical queries?
  • How quickly do these operations need to complete?

When deciding what storage solution to use, think about how your data will be used. How often will your data be accessed? Is your data read-only? Does query time matter? The answers to these questions will help you decide on the best storage solution for your data.



 

Group multiple operations in a transaction

 

  • Applications may need to group a series of data updates together, because a change to one piece of data needs to result in a change to another piece of data. 
  • Transactions enable you to group these updates so that if one event in a series of updates fails, the entire series can be rolled back, or undone.
  • For example, as an online retailer you might use a transaction for the placement of an order and payment verification. The grouping of the related events ensures that you don't reduce your inventory levels until an approved form of payment is received.

 

What is a transaction?

  • A transaction is a logical group of database operations that execute together.
  • Here's the question to ask yourself regarding whether you need to use transactions in your application: Will a change to one piece of data in your dataset impact another? If the answer is yes, then you'll need support for transactions in your database service.
  • Transactions are often defined by a set of four requirements, referred to as ACID guarantees. ACID stands for Atomicity, Consistency, Isolation, and Durability:
  • Atomicity means a transaction must execute exactly once and must be atomic; either all of the work is done, or none of it is. Operations within a transaction usually share a common intent and are interdependent.
  • Consistency ensures that the data is consistent both before and after the transaction.
  • Isolation ensures that one transaction is not impacted by another transaction.
  • Durability means that the changes made due to the transaction are permanently saved in the system. Committed data is saved by the system so that even in the event of a failure and system restart, the data is available in its correct state.
  • When a database offers ACID guarantees, these principles are applied to any transactions in a consistent manner.

OLTP vs OLAP

  • Transactional databases are often called OLTP (Online Transaction Processing) systems. 
  • OLTP systems commonly support lots of users, have quick response times, and handle large volumes of data. 
  • They are also highly available (meaning they have very minimal downtime), and typically handle small or relatively simple transactions.

 

  • On the contrary, OLAP (Online Analytical Processing) systems commonly support fewer users, have longer response times, can be less available, and typically handle large and complex transactions.
  • The terms OLTP and OLAP aren't used as frequently as they used to be, but understanding them makes it easier to categorize the needs of your application.

 

 

Choose a storage solution on Azure

Choosing the correct storage solution can lead to better performance, cost savings, and improved manageability.

READ: https://docs.microsoft.com/en-us/learn/modules/choose-storage-approach-in-azure/5-choose-the-right-azure-service-for-your-data

 

Comments