Tuesday, June 6, 2023

Big Data Analytics … Whaaat?

Whaaat?
To be amazed by something and you can't believe it is real. Not to be confused with the question "what." It is an expression of disbelief or amazement. – Urban Dictionary (https://www.urbandictionary.com )

I admit to being a little hazy on what “big data analytics” actually means. An article that popped up in a recent Google Scholar Alerts search result helped me out on that front.

TIP: Set up a Google Scholar alert on any topic of interest to you. Results, in my experience, are about 3 (three) per cent relevant to my specific interests. But it requires only two minutes of my time to browse the results for the rare truffles that tantalize my taste buds.

The name of the article … wait for it … is …

Predictive big data analytics for drilling downhole problems: A review

Do not be misled by the title of the article … It is about so much more than downhole drilling. In reading the article, I came to understand …

The goal of big data analytics: To analyze enormous sets of unstructured data in real time to enable nearly instantaneous decisions.

So, here are excerpts from the article. The full text of the article, by the way, is available for the low, low price of nothing at: https://www.sciencedirect.com/science/article/pii/S2352484723007710

///////
EXCERPTS
Energy Reports 9 (2023) 5863–5876
Predictive big data analytics for drilling downhole problems: A review
Aslam Abdullah M.
, Aseel A., Rithul Roy, Pranav Sunil
Petroleum Engineering Research Group, School of Chemical Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India
ABSTRACT
With the recent introduction of data recording sensors in exploration, drilling and production processes, the oil and gas industry has transformed into a massively data-intensive industry. Big data analytics has acquired a great deal of interest from researchers to extract and use all the possible information. This paper presents an outline for predictive big data analytics to forecast and analyze some downhole problems such as pipe sticking, dog leg and pipe failure depending on several variables. Different methodologies were studied under big data, enabling the identification of the paradigm change in data storage and processing while handling vast diversified data generated in a short span of life. The evaluated data pattern sets are fed into different established predictive models and risk prediction windows to highlight future irregularities for the prevention of accidents. Finally, the game theory is used to evaluate the best predictive model to discover the optimal model for the identification of downhole problems.

One of the key phases of the oil and gas industry’s digitization seems to be big data (BD) analytics. The oil and gas exploration and production industries now generate enormous datasets on a daily basis as a result of recent technological advancements.
Big data consists of unstructured (not ordered and text-heavy) and multi-structured data (containing many data formats arising from interactions between humans and machines) (Trifu and Ivan, 2016). The phrase big data (also known as big data analytics or business analytics) identifies the magnitude of the accessible data collection. There are further properties of the data that make it suitable for big data applications. IBM appropriately identifies these features as three V’s. These three V’s refer to volume, variety, and velocity (Pence, 2014). However, recent publications have added two additional V’s to provide a detailed explanation of big data. The other Vs consist of veracity and value (Ishwarappa and Anuradha, 2015).
Big data is not the outcome of a single silver-bullet technology, but rather the highly complementary combination of several technologies and creative concepts (Perrons and Jensen, 2015). Despite the fact that this type of analytics depends on solid data science foundations, there are a number of key considerations for putting these approaches into effect (Kezunovic et al., 2020). The storage and processing of large data sets, as well as the transformation of large data, sets into knowledge, are the primary challenges connected with big data. It is often believed that the massive amount of big data means that useful information is hidden and must be unearthed, but analysts cannot simply intuit the data’s value content (Shull, 2013). In any industry, big data analytics can give new perspectives. It may result in the accurate identification or forecasting of new scientific hypotheses, consumer behavior, societal phenomena, weather patterns, and economic situations (Jayalath et al., 2014). Table 1 compares traditional data and big data analytics.
5.2. Big data storage and management
The downhole drilling data obtained is an enormous, unstructured, and complicated data collection that is challenging for conventional data processing technologies to manage (Chen et al., 2014). The information gathered by the sensors is multidimensional, and due to the ever-increasing amount of data being created, quicker and more effective methods of data analysis have become necessary. Along with the necessary infrastructures for storing and managing enormous data, there are also specific tools and methodologies for big data analytics that are essential for making successful judgments at the proper time (Elgendy and Elragal, 2014). There are techniques and tools which can analyze (process, decode, and interpret) the operation status and the change in parameters simultaneously (Kale et al., 2015; Pritchard et al., 2016). The data are gathered by drilling operators or service providers, and downhole data acquired from global geographic drilling operations will continually amass and grow in quantity, ultimately becoming a dataset that exceeds the storage and processing capacity of a single server (Chen et al., 2014). Multiple distributed servers are used to store, transmit, and process the collected data before extracting, transforming, and loading it into different databases for advanced analytics. These data are very large, ranging in size from Terabytes (TBs) to Petabytes (PBs) (Meeker and Hong, 2014). Moreover, large data sets may have considerable variability, hindering the data processing and administration, and varying integrity as a result of data inconsistency, incompleteness, complexity, delay, deceit, assumptions, and horizontal scalability to merge dissimilar information (Chen et al., 2014; Elgendy and Elragal, 2014; Hu et al., 2014). Relational databases, data marts, and data warehouses are classic techniques for storing and retrieving structured data.
Several solutions, such as distributed databases and Massive Parallel Processing (MPP) databases for delivering high query productivity and platform stability, as well as non-relational databases, were utilized for big data. Non-relational databases, such as NoSQL, are created to store and manage non-relational data. NoSQL seeks huge scalability, a flexible data format used for streamlined creation and deployment of applications. Compared to relational database systems, NoSQL decouples data storage and management. These databases emphasize scalable, highperformance data storage and enable data administration operations to also be done at the application level as opposed to database-specific languages (Bakshi, 2012).
After storing the data, it has to be analyzed using big data tools and techniques. According to (He et al., 2011), there are four essential needs for processing large amounts of data. The foremost prerequisite is rapid data processing. Due to the fact that disk and internet traffic conflicts with request performance during performing data, it is vital to minimize the time necessary for performing data. The next criterion is the speed of query execution. Many queries are response-time essential due to the demands of high workloads and real-time requests. As a result, the data placement structure needs to be able to maintain high query processing rates as the number of inquiries grows quickly. Consequently, the prerequisite for large-scale data collection is the efficient use of storage capacity.
Due to limited disk space, it is essential that data storage must be carefully handled throughout processing and that challenges regarding the storage of the data should be minimalized. The quick expansion in user behavior might need extensible storage space and processing speed. The ability to adjust well to workload patterns that are extremely dynamic is the final need. Massive datasets are processed by a variety of applications and consumers, for a variety of purposes in a diverse range of ways, requiring the operating system to be highly adaptive to unforeseen processing dynamics and not specific to each of these workload patterns (He et al., 2011).
5.3.1. Challenges in big data analysis
Data mining is the extraction of relevant information and insights from large datasets using statistical and computational methods. Data mining is an integral part of big data analytics, which entails processing, analyzing, and interpreting large and complex datasets to discover patterns, trends, and insights that can assist organizations in making informed decisions. Nonetheless, data extraction for big data analytics is not error-free. During the data collection process, these errors can influence the quality and dependability of the insights generated from the data (Amirian et al., 2015). Examples of common data collection errors include:
• Sampling errors: When the sample data used for analysis is not representative of the population as a whole.
• Measurement errors: When the data collected is not accurate or trustworthy.
• Data entry errors: Errors in data entry occur when information is erroneously recorded.
• Processing errors: Errors in processing occur when data is improperly processed.
Data cleansing, data validation, data normalization, and data transformation are some of the methods used by data mining and analytics practitioners to reduce the likelihood of these errors. Data cleansing is the process of identifying and rectifying data errors and inconsistencies, such as absent values, outliers, and duplicate entries. Validating data involves validating its accuracy and completeness, such as by ensuring that all required fields are populated. The normalization of data entails transforming the data into a standard format, such as converting all dates to a common format. For example, consider a scenario in which a drilling company wishes to improve the precision of its drilling operations by analyzing data collected from downhole sensors. The collected data include measurements of temperature, pressure, and other parameters that help the company determine the characteristics of the drilled rock formation. Nevertheless, errors may occur during the data collection procedure, such as faulty sensors or incomplete data due to technical issues. These errors can result in erroneous analysis and may lead to drilling in the incorrect location, resulting in increased costs and decreased efficiency (Liu et al. 2022).
///////
Google® Better!
Jean Steinhardt served as Librarian, Aramco Americas (https://americas.aramco.com/ ), Engineering Division, for 13 years. He now heads Jean Steinhardt Consulting LLC, producing the same high quality research that he performed for Aramco.

Follow Jean’s blog at: http://desulf.blogspot.com/ for continuing tips on effective online research
Email Jean at jstoneheart@gmail.com with questions on research, training, or anything else

No comments:

Post a Comment