Fault Tolerance Scenarios (High Availability Solutions)

Uncategorized

Fault Tolerance Scenarios (High Availability Solutions)

October 21, 2022

The basic word and first security basis of our masters, our elders who devoted their lives to the system before us, and the first security basis is to take backups of our existing data.. Our elders gradually retired and handed over their place to us, today’s systemists.. And this basic philosophy, which our elders left us, is in our minds – in our minds, and it constantly finds its place in almost every system design.. Although we include a backup solution in every project, it is not a priority security solution for us.. Now a necessity – it has become a necessity.

The reason is that the measures of security understanding have changed with them.. The most obvious problems experienced by our masters, according to us, were the system crash, stop working, data loss.. Today’s systemists, even if we experience the troubles they experienced today, there are more advanced problems waiting for us..

The fact that the internet world has become almost compulsory today and almost all of our needs are done with this technology called internet, that we still meet many needs of our homes and even our children and babies with this technology, besides, we have many disadvantages. brought with it the advantage.

While viruses, spam and spyware are circulating very easily in the Internet world, while malicious software developers are roaming around, we have changed the security understanding of our elders and put our own, today’s required solutions into the security branch, and our elders’ basic security We started to present the concept of security as a different solution by removing it from the security branch. we use.

Of course, these solutions vary according to today’s conditions, and we are developing our structure with various designs with the products in our Cluster structures, Farms, System room and Network infrastructure.

No technology we have mentioned ( Cluster, Farm, NLB ) is neither a backup solution nor a security solution.. But they are mandatory solutions to ensure the continuity and stability of the system.

Cluster Solutions ;

It is the system technology that we have applied to our critical servers in our network, in our growing company structure, in our company applications.. It is one of the indispensable solutions of enterprise structures.. Cluster solutions are supported on application servers, which are Database Servers, Messaging Servers, File Servers.. We can make application servers only in Enterprise and Datacenter products within Microsoft Server family products..

The working logic of cluster structures is the common use of application data on the Shared Disk) to which at least two or more servers in a cluster group are physically connected. works with. Cluster structures are usually made for Failover and Microsoft Cluster (MSCS) structures support Active – Passive mode.

In other words, our applications are carried out on our Master server in the cluster structure, and it is in constant communication with our Master server and other servers in the cluster structure.. If one of the NODs in the cluster becomes inaccessible during the application or maintenance process, our other NOD in the cluster immediately undertakes the task without our users being affected by this error.. They are structures designed in such a way that our users will not be affected / felt by this error..

Shared disks are mandatory for cluster structures. There is an obligation to use a common disk with SCSI, ISCSI, FC and SAS connection technologies.. We can expand our structure and solution with different geographical regions according to our network structure and the connection technologies we use.

Network Load Balancing (NLB) Solutions;

NLB solution is a high availability technology developed for performance alongside fault tolerance technology. Supports NLB solutions on Service-based servers. We can do all of the Microsoft server family in Web Server, Standard Server, Enterprise Server and Datacenter products.. Terminal Service, Web Service, Streaming Media Service etc. are the basic services that we can apply NLB technology to.. We can apply in services.

With NLB technology, we have the advantage of not stopping the service in case of an error, and we can use it in performance.. NLB service balances the current load of the NLB servers in the Farm by using the DNS Service at the back, according to the current performance values, by allocating the new incoming request to the server with less performance.

When one of our NLB servers in FARM goes into error, it directs requests to our other NLB server(s).. If we made the NLB solution for performance instead of fault tolerance architecture, our users will not experience service interruption, BUT, during the time they run their AMA applications, they will feel that the faulty server is not in FARM.. Because the connections to the faulty server will be directed to our working server in order to ensure the continuity of the system.

The Benefits of Cluster and Nlb Technology ;

High Availability : We can translate it into Turkish as High Availability. This means that the system has been designed to work in any situation.. It is used to spread our applications and services on more than one server, to prevent the working system from being affected due to an error that may occur on any server.

Scalability : In Turkish We can translate Scalability as Extensibility. Cluster and NLB structures are the features that allow us to grow without making any changes on the existing system and without disrupting the continuity of the running system.. While our system is running, the RAM, CPU, etc. of any of the NODs in the Cluster and NLB.. It allows us to make changes on its parts.

Manageability : We can translate it into Turkish as manageability. Managing our NODs in the developed cluster structures becomes very easy with this technology.. If we have concentrated our cluster applications on virtualization, it will be much easier for us to take images of our existing servers and backup them.. It offers us many advantages such as centrally managing our NODs in Cluster or FARM from a single point. You can get detailed technical information about NLB on our portal and many access resources..

The information we will share after this section are applications that are thought, known and done together with Cluster and Nlb solutions, and are solutions that are independent of hardware.. In the next part of our article, we will examine the hardware side of our Fault Tolerance infrastructure, which we have prepared completely depending on the hardware, independent of the software.. The information we will share after this section is completely related to the High Accessibility of our Hardware, Electricity, Network infrastructure and I will share the other known side of the High Availability Scenarios.

In our previous IBM DS Storage articles, we mentioned that there is an absolute backup of all our hardware equipment, and I gave a line that will not go unnoticed by my careful readers as a clue for this article.. We said “Our only problem for FAULT Tolerance is nothing but a power outage, we have done all the tasks that fall on storage connection technology”.

The diagram I have drawn above Let’s be general and examine how the system works and how the continuity of our system is ensured by following a way in case of failure.

Design of Our Electrical Infrastructure :

Today’s System room, Datacenter products generally consist of Redundant products. Application, Database etc.. There is a backup of each equipment on many servers and storages developed for server solutions.. Among these redundant products are Power units and we call servers with REDUNDANT.

Although the word Redundant has the meaning of the word as Unnecessary, Unnecessary, Excessive in Turkish, we will also refer to the necessity of our system design when it is done in a healthy way and as it should be. .

As it can be understood from the Diagram above, we have two different physical servers and the Power units on them are also two separately.. I named our power units with redundant feature as Power 1 and Power two.. Our Storage, which is also under our servers, has Dual Controller feature and has two different power inputs.. This Controller is also called Controller A and Controller B.. When we connect these Power inputs to a SINGLE UPS source, in a possible power failure, all power will be loaded on this UPS and our time will be limited as the source that the UPS can provide is sufficient.. Of course, when there is a power outage and it is thought that this interruption will be in the long term, the system can be turned off in a sequential manner and made safe.. But what will happen if there is an error on our UPS, when our UPS goes into error at a time when we least expect it, and cuts off the electricity supply? Answer: Our system will stop suddenly. Our system will collapse in an instant.

However, if we shape our system design as the Redundant enabled servers we have, we will be able to see how an unnecessary unit saves lives.

An error in our electrical system. Our UPS stopped momentarily and cut off the supply.. However, since we made our design as it should be, we can see that it did not add an error system originating from UPS.. According to our scenario, an error occurred in our unit named Server UPS 2, which is fed from the Main Supply and the Generator, and the supply momentarily stopped.. However, since there is no problem on Server UPS 1 in our environment, our system continues to work.

As can be seen from the diagram, our unit named Server UPS 2, Power 2 of our Nod 2 Server, Controller B of our Storage and Nod 1 It was feeding the Power 1 of our server.. In case of an error on Server 2 UPS, our system works as if there was no error on Server UPS 1.. Since our Server 1 UPS feeds the power unit called Poer 1 of our NOD 2 server, the Controller A of our Storage and the power unit called Power 2 of our NOD 1 server, our system is not affected by a possible UPS error.

Likewise, a possible Power unit In the event of an error, that is, our Power 2 unit on NOD 2, and if an error occurs in our Power cable, our system will continue to run from the other unit and the electrical cable.. This error is valid in our Storage, and if one of the Controllers fails, our system is designed to continue working through the other controller.

We have proven the necessity of the Redundant feature with this scenario.

Design of Network Infrastructure :

The NIC cards on our non-endpoint servers are divided into two and if these NIC cards belong to the same hardware manufacturer and we can create a Virtual NIC (TEAM). We mentioned this issue in our previous article called BASP Virtual Adapter on IBM X 3650 Server and when an error occurred on one of our NIC cards, our business continuity continued on the other..

If we describe our scenario from our Servers in the Diagram, we have converted two physical NICs but two NIC cards from the same product manufacturer into a single Virtual NIC.. And we have separately connected to two different SWs in our DMZ network..

According to this design we have made, our gain, our DMZ SW1, when it falls into an error state in the environment, our system continues its continuity over DMZ SW2. continues. Because DMZ SW1 was connected to Physical NIC 1 of our Nod 2 server and our NOD 1 Server was connected to NIC 2.. Your storage was connected to Controller A.

In case of an error on DMZ SW 1, our NOD 2 Server continues to operate over NIC 2, and our NOD 1 server continues to operate over NIC 1..

By connecting our DMZ SW 1 and DMZ SW 2 SWs to our UPS 1 and UPS 2 power units, we will also tolerate a possible fault that may occur in the electrical equipment.

Another information I would like to give for our Network design is the connections of our SWs in our Network to each other.. A SW can be connected to another SW as a Access (formerly UPLINK) or Trunk line. We back up the communicating ports between trunk lines and two existing SWs.. For example, we communicated from SW 1 to SW 2 over port 1. And if any failure occurs to this port 1, if there is a problem with the connecting cable, there will be no communication between the two SWs anymore.. For these reasons, we back up the communication with Port 1 on SW1 and SW2 via TRUNK line with another port on SW.. This trunk line controls the communication between two SWs and provides continuity by stepping in in case of a possible failure on the Port/Cable..

The most important point to know in this design is that if our SW is not a SW that can detect Trunk line, that is, it is not a manageable SW, the occurrence of LOOP in our Network is a high percentage event..

All of our designs, business interruptions etc.. Purposes are all Our System room, Servers etc.. Applies to our products. If attention was paid in our article and judging by their real-life applications, this and any similar technologies are not made for our Endpoint Clients.. Not stopping our client user from working with malfunctions that may occur in our clients usually comes to the fore in installed systems.. Terminal Server applications, File Server applications made with Folder Redirection, Softgrid-style software, application virtualization technologies and computers used by our end users have caused our clients to turn into a dumb terminal and not have data on them.. And in such designed systems, according to the designed system, in case of an error that may occur on the user’s side, the user continues to do his/her transactions with another computer that is idle.. The reason is that Data is in our system rooms, in our datacenters, which are designed to work in case of any kind of error.

Fatih < KARAALIOGLU

LEAVE A REPLY Cancel reply