Difference between thread, task, and , MapReduce.
COMPARISONS BETWEEN THREAD, TASK, AND MAP-REDUCE
As previously mentioned in this section, there are several computing categories such as high-performance computing (HPC), high-throughput computing (HTC), etc. Various programming methods may be delivering large amounts of computing in the form of transactions, which is accomplished by multithreading, employed to take advantage of these computing techniques. High throughput computing is concerned with multithreaded programming and may now be extended to a distributed environment as well. Similarly, lengthy period to complete a computational job. The most obvious and frequent method for designing parallel and distributed computing applications and gaining the advantages of high-throughput computing is rough task programming.
A task describes a program that may need input files and generate output files as a result of its execution and applications are a collection of tasks. Tasks are submitted for execution, and their Output data is gathered at the conclusion. The way tasks are produced, the sequence in which they are executed, and whether they need data interchange to distinguish the application models that come under the task programming umbrella. Similarly, another important computing category is data-intensive computing which deals with enormous amounts of data. Several application domains, from computational research to generate vast amounts of data that must be effectively stored, accessed, indexed, and evaluated. These activities get more difficult as the amount of information collects and grows at a faster rate over time.
Data-intensive computing uses MapReduce as a programming model for creating data-intensive applications and their deployment on clouds. A Task may be used to indicate what you want to perform, and then that Task may be attached t a Thread. Threads are utilized to finish the task by splitting it up into pieces and executing them individually in a distributed system.
A thread is a fundamental unit of CPU utilization that consists of a program counter, a stack, and a collection of registers. Threads have their program and memory areas. A thread of execution is the shortest series of programmed instructions that a scheduler can handle separately. Threads are a built-in feature of your operating system. The thread class provided by different programming languages such as .Net or Java provides a method for creating and managing threads.
A task is anything that you want to be completed that is a higher-level abstraction on top of threads. It is a collection of software instructions stored in memory. When a software instruction is placed into memory, it is referred to as a process or task. The Task can inform you if it has been completed and whether the procedure has produced a result. A task will use the Threadpool by default, which saves resources because creating threads is costly as a large block of memory has to be allocated and initialized for the thread stack and system calls need to be made to create and register the native thread with the host OS. When requests are frequent and lightweight, as they are in most server applications, establishing a new thread for each request might take substantial computer resources.
MapReduce is a framework that allows us to design programs that can process massive volumes of data in parallel on vast clusters of commodity hardware in a dependable manner. MapReduce is a programming architecture for distributed computing. The MapReduce method consists of two key tasks: Map and Reduce. Map translates one collection of data into another, where individual pieces are split down into tuples (key/value pairs). Second, there is the reduction job, which takes the result of a map as an input and merges those data tuples into a smaller collection of tuples. The reduction work is always executed after the map job, as the name MapReduce indicates.
Multithreaded programming and multiprocessing technologies, as well as multi-core technology, aid in attaining parallelism on a single computer that may be used to accelerate programs. Currently, all of the most common operating systems allow multithreading, regardless of whether the underlying hardware expressly supports actual parallelism or not. If many processors or cores are available, real parallelism can be achieved by using them at the same time; otherwise, multithreading is achieved by interleaving the execution of many threads on the same processing unit. Multithreaded programming enables parallelism within the confines of a single processor. Applications that require a high level of parallelism cannot be handled by traditional multithreaded programming and must rely on distributed infrastructures such as clusters, grids, or clouds. The usage of these facilities necessitates the development of programs and the e of certain APIs, which may need considerable changes to existing programs. To overcome this issue. usage Aneka provides the Thread Programming Model, which extends the multithreaded programming philosophy beyond the bounds of a single node and enables the use of heterogeneous distributed infrastructure.
Programming languages define the abstractions of process and thread in their class libraries to facilitate multithreaded programming. POSIX is a prominent standard for thread operations and thread synchronization that is supported by all Linux/UNIX operating systems and is offered as an extra library for the Windows operating system family. A typical implementation of POSIX is provided as a function library in C/C++. New-generation languages, such as Java and C# (.NET), provide a set of abstractions for thread management and synchronization that is compliant and adheres to the object-oriented architecture as closely as possible. Aneka provides the Thread Programming Model, which extends the concept of multithreaded programming beyond the limitations of a single node and allows execution to be performed on heterogeneous distributed infrastructure.
Similarly, the most natural technique of dividing an application's computation among a group of nodes is task-based programming. The idea of a task, which represents a series of actions that may be isolated and executed as a single unit, is the core abstraction of task-based programming. A job might be as basic as a shell program or as sophisticated as a piece of code that requires a certain runtime environment to execute. Tasks frequently need input files for execution and generate output files as a result. The task can return a result. There is no direct mechanism to return the result from a thread. It is usually recommended to utilize tasks rather than threads since they are formed on the thread pool, which already includes system-generated threads to boost performance. A task by default runs in the background, and the task cannot be in the foreground. On the other hand, a thread can be the background as well as the foreground. Aneka supports task-based programming and serves as a real example of a framework that facilitates the development and execution of task-based distributed applications.
As mentioned earlier, MapReduce is associated with data-intensive applications which process or generate large amounts of data and may also be compute-intensive. The volumes of data that prompted the concept of data-intensive computation have varied over time. Data-intensive computing is a discipline that began with high-speed WAN applications but nowadays it is the province of storage clouds, with data dimensions reaching terabytes, if not petabytes, which is referred to as Big Data which represents data in a semi-structured or unstructured form. As a result, traditional techniques based on relational databases are incapable of serving data-intensive applications efficiently. To address such issues, new techniques and storage models have been developed. The major efforts in the area of storage systems have been devoted to the creation of high-performance distributed file systems, storage clouds, and NoSQL-based systems. The most significant improvement in the assistance of programming data-intensive applications has been the advent of MapReduce.
Google presented MapReduce as a straightforward technique for processing massive amounts of data based on the creation of two functions, map and reduce, which are applied to the data in a two-phase process. The map step first retrieves useful information from the data and saves it in key-value pairs, which are then aggregated together in the reduction step. Despite its limitations, this paradigm is effective in a variety of application settings. MapReduce can be implemented in the Apache Hadoop framework using programming languages such as Java, Python, or C++. Equally, Aneka, like thread and task programming models, provides APIs for designing and implementing MapReduce applications as well. The .NET framework and C# are preferred for developing task-based applications using Aneka.
Comments
Post a Comment