<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Zamansiz &#187; General Programming</title>
	<atom:link href="http://zamansiz.org/archives/category/general-programming/feed" rel="self" type="application/rss+xml" />
	<link>http://zamansiz.org</link>
	<description>Minds in motion</description>
	<lastBuildDate>Tue, 25 May 2010 12:24:24 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>Issues in Real-time System Design</title>
		<link>http://zamansiz.org/archives/166</link>
		<comments>http://zamansiz.org/archives/166#comments</comments>
		<pubDate>Mon, 19 Apr 2010 14:40:48 +0000</pubDate>
		<dc:creator>Zamansiz</dc:creator>
				<category><![CDATA[Embedded Systems]]></category>
		<category><![CDATA[General Programming]]></category>

		<guid isPermaLink="false">http://zamansiz.org/?p=166</guid>
		<description><![CDATA[Designing Realtime systems is a challenging task. Most of the challenge comes from the fact that Realtime systems have to interact with real world entities. These interactions can get fairly complex. A typical Realtime system might be interacting with thousands of such entities at the same time. For example, a telephone switching system routinely handles [...]]]></description>
			<content:encoded><![CDATA[<p>Designing Realtime systems is a challenging task. Most of the  challenge comes from the fact that Realtime systems have to interact with real  world entities. These interactions can get fairly complex. A typical Realtime  system might be interacting with thousands of such entities at the same time.  For example, a telephone switching system routinely handles calls from tens  of thousands of subscriber. The system has to connect each call  differently. Also, the exact sequence of events in the call might vary a lot.</p>
<p>In the following sections we will be discussing these very issues...</p>
<ul>
<li><a href="#Real%20Time%20Response">Realtime  Response</a></li>
<li><a href="#Recovering%20from%20Failures">Recovering  from Failures</a></li>
<li><a href="#Working%20with%20Distributed%20Architectures">Working  with Distributed     Architectures</a></li>
<li><a href="#Asynchronous%20Communication">Asynchronous  Communication</a></li>
<li><a href="#Race%20Conditions%20and%20Timing">Race  Conditions and Timing</a></li>
</ul>
<h2><a name="Real Time Response">Realtime Response</a></h2>
<p>Realtime systems have to respond to external interactions in a  predetermined amount of time. Successful completion of an operation depends upon the  correct and timely operation of the system. Design the hardware and the software in the system to meet the Realtime requirements. For example, a telephone switching system must feed dial tone to thousands of  subscribers within a recommended limit of one second. To meet these requirements,  the off hook detection mechanism and the software message communication involved  have to work within the limited time budget. The system has to meet these  requirements for all the calls being set up at any given time.</p>
<p>The designers have to focus very early on the Realtime response requirements. During the architecture design phase, the hardware and  software engineers work together to select the right system architecture that  will meet the requirements. This involves deciding inter connectivity of the  processors, link speeds, processor speeds, etc. The main questions to be asked are:</p>
<ul>
<li><strong><span style="color: #000080;">Is the architecture suitable?</span></strong> If message     communication involves too many nodes, it is likely that the system  may not     be able to meet the Realtime requirement due to even mild  congestion. Thus     a simpler architecture has a better chance of meeting the Realtime  requirements.</li>
</ul>
<ul>
<li><span style="color: #000080;"><strong>Are the link speeds adequate?</strong></span> Generally, loading     a link more than 40-50% is a bad idea. A higher link utilization  causes the     queues to build up on different nodes, thus causing variable amounts  of     delays in message communication.</li>
</ul>
<ul>
<li><span style="color: #000080;"><strong>Are the processing components powerful  enough?</strong></span> A     CPU with really high utilization will lead to unpredictable Realtime  behavior. Also, it is possible that the high priority tasks in the  system     will starve the low priority tasks of any CPU time. This can cause  the low     priority tasks to misbehave. As with link, keep the peak CPU  utilization     below 50 %.</li>
</ul>
<ul>
<li><span style="color: #000080;"><strong>Is the Operating System suitable?</strong></span> Assign high     priority to tasks that are involved in processing Realtime critical     events. Consider preemptive scheduling if Realtime requirements are     stringent. When choosing the operating system, the interrupt latency  and     scheduling variance should be verified.
<ul>
<li>Scheduling variance refers to the predictability in task  scheduling         times. For example, a telephone switching system is expected to  feed         dialtone in less than 500 ms. This would typically involve  scheduling         three to five tasks within the stipulated time. Most operating  systems         would easily meet these numbers as far as the mean dialtone  delay is         concerned. But general purpose operating systems would have much  higher         standard deviation in the dialtone numbers.</li>
<li>Interrupt Latency refers to the delay with which the operating  system         can handle interrupts and schedule tasks to respond to the  interrupt.         Again, real-time operating systems would have much lower  interrupt         latency.</li>
</ul>
</li>
</ul>
<h2><a name="Recovering from Failures">Recovering from Failures</a></h2>
<p>Realtime systems must function reliably in event of failures. These failures can be internal as well as external. The following sections  discuss the issues involved in handling these failures.</p>
<h3>Internal Failures</h3>
<p>Internal failures can be due to hardware and software failures in the system. The different types of failures you would typically  expect are:</p>
<ul>
<li><strong><span style="color: #000080;">Software Failures in a Task:</span></strong> Unlike desktop applications,     Realtime applications do     not have the luxury of popping a dialog box and exiting on detecting  a     failure. Design the tasks to safeguard against error conditions.  This     becomes even more important in a Realtime system because sequence of  events     can result in a large number of scenarios. It may not be possible to  test     all the cases in the laboratory environment. Thus apply defensive  checks to     recover from error conditions. Also, some software error conditions  might     lead to a task hitting a processor exception. In such cases, it  might     sometimes be possible to just rollback the task to its previous  saved state.</li>
</ul>
<ul>
<li><span style="color: #000080;"><strong>Processor Restart:</strong></span> Most  Realtime systems     are made up of multiple nodes. It is not possible to bring down the  complete     system on failure of a single node thus design the software to  handle     independent failure of any of the nodes. This involves two  activities:
<ol>
<li><strong><span style="color: #000080;">Handling Processor Failure: </span></strong> When a processor fails, other processors have to be notified  about the         failure. These processors will then abort any interactions with  the         failed processor node. For example, if a control processor  fails, the         telephone switch clears all calls involving that processor.</li>
<li><span style="color: #000080;"><strong>Recovering Context for the Failed  Processor: </strong></span> When the failed processor comes back up, it will have to recover  all its         lost context from other processors in the system. There is  always a         chance of inconsistencies between different processors in the  system. In         such cases, the system runs audits to resolve any  inconsistencies.         Taking our switch example, once the control processor comes up  it will         recover the status of subscriber ports from other processors. To  avoid         any inconsistencies, the system initiates audits to crosscheck         data-structures on the different control processors.</li>
</ol>
</li>
</ul>
<ul>
<li><strong><span style="color: #000080;">Board Failure: </span></strong>Realtime  systems are     expected to recover from hardware failures. The system should be  able to     detect and recover from board failures. When a board fails, the  system     notifies the operator about the it. Also, the system should be able  to     switch in a spare for the failed board. (If the board has a spare)</li>
</ul>
<ul>
<li><strong><span style="color: #000080;">Link Failure:</span></strong> Most of the  communication in     Realtime systems takes place over links connecting the different  processing     nodes in the system. Again, the system isolates a link failure and  reroutes     messages so that link failure does not disturb the message  communication.</li>
</ul>
<h3>External Failures</h3>
<p>Realtime systems have to perform in the real world. Thus they should recover from failures in the external environment. Different  types of failures that can take place in the environment are:</p>
<ul>
<li><span style="color: #000080;"><strong>Invalid Behavior of External Entities: </strong></span> When a Realtime system interacts with external entities, it should  be able     to handle all possible failure conditions from these entities. A  good     example of this is the way a telephone switching systems handle  calls from     subscribers. In this case, the system is interacting with humans, so  it     should handle all kinds of failures, like:
<ol>
<li>Subscriber goes off hook but does not dial</li>
<li>Toddler playing with the phone!</li>
<li>Subscriber hangs up before completing dialing.</li>
</ol>
</li>
</ul>
<ul>
<li><strong><span style="color: #000080;">Inter Connectivity Failure:</span></strong> Many times a     Realtime system is distributed across several locations. External  links     might connect these locations. Handling of these conditions is  similar to     handling of internal link failures. The major difference is that  such     failures might be for an extended duration and many times it might  not be     possible to reroute the messages.</li>
</ul>
<h2><a name="Working with Distributed Architectures">Working with Distributed Architectures</a></h2>
<p>Most Realtime systems involve processing on several different nodes.  The system itself distributes the processing load among several processors.  This introduces several challenges in design:</p>
<ul>
<li><strong><span style="color: #000080;">Maintaining Consistency:</span></strong> Maintaining     data-structure consistency is a challenge when multiple processors  are     involved in feature execution. Consistency is generally maintained  by     running data-structure audits.</li>
<li><strong><span style="color: #000080;">Initializing the System:</span></strong> Initializing     a system with multiple processors is far more complicated than  bringing up a     single machine. In most systems the software release is resident on  the OMC.     The node that is directly connected to the OMC will initialize  first. When     this node finishes initialization, it will initiate software  downloads for     the child nodes directly connected to it. This process goes on in an  hierarchical     fashion till the complete system is initialized.</li>
<li><strong><span style="color: #000080;">Inter-Processor Interfaces:</span></strong> One of the     biggest headache in Realtime systems is defining and maintaining  message     interfaces. Defining of interfaces is complicated by different byte  ordering     and padding rules in processors. Maintenance of interfaces is  complicated by     backward compatibility issues. For example if a cellular system  changes the     air interface protocol for a new breed of phones, it will still have  to     support interfaces with older phones.</li>
<li><strong><span style="color: #000080;">Load Distribution:</span></strong> When  multiple     processors and links are involved in message interactions  distributing the     load evenly can be a daunting task. If the system has evenly  balanced load,     the capacity of the system can be increased by adding more  processors. Such     systems are said to scale linearly with increasing processing power.  But     often designers find themselves in a position where a single  processor or     link becomes a bottle neck. This leads to costly redesign of the  features to     improve system scalability.</li>
<li><strong><span style="color: #000080;">Centralized Resource Allocation:</span></strong> Distributed systems may be running on multiple processors, but they  have to     allocate resources from a shared pool. Shared pool allocation is  typically     managed by a single processor allocating resources from the shared  pool. If the     system is not designed carefully, the shared resource allocator can  become a     bottle neck in achieving full system capacity.</li>
</ul>
<h2><a name="Asynchronous Communication">Asynchronous Communication</a></h2>
<p>Remote procedure calls (RPC) are used in computer systems to simplify software design. RPC allows a programmer to call procedures on a remote  machine with the same semantics as local procedure calls. RPCs really simplify  the design and development of conventional systems, but they are of very  limited use in Realtime systems. The main reason is that most communication in the  real world is asynchronous in nature, i.e. very few message interactions can  be classified into the query response paradigm that works so well using  RPCs.</p>
<p>Thus most Realtime systems support state machine based design where  multiple messages can be received in a single state. The next state is determined  by the contents of the received message. State machines provide a very flexible mechanism to handle asynchronous message interactions. The flexibility  comes with its own complexities. We will be covering state machine design  issues in future additions to the? Realtime Mantra.</p>
<h2><a name="Race Conditions and Timing">Race Conditions and Timing</a></h2>
<p>It is said that the three most important things in Realtime system  design are timing, timing and timing. A brief look at any protocol will underscore  the importance of timing. All the steps in a protocol are described with exact timing  specification for each stage. Most protocols will also specify how the timing should vary  with increasing load. Realtime systems deal with timing issues by using  timers. Timers are started to monitor the progress of events. If the expected  event takes place, the timer is stopped. If the expected event does not take  place, the timer will timeout and recovery action will be triggered.</p>
<p>A race condition occurs when the state of a resource depends on  timing factors that are not predictable. This is best explained with an  example. Telephone exchanges have two way trunks which can be used by any of the  two exchanges connected by the trunk. The problem is that both ends can  allocate the trunk at more or less the same time, thus resulting in a race condition.  Here the same trunk has been allocated for a incoming and an outgoing call.  This race condition can be easily resolved by defining rules on who gets to keep  the resource when such a clash occurs. The race condition can be avoided by requiring the two exchanges to work from different ends of the pool.  Thus there will be no clashes under low load. Under high load race conditions will  be hit which will be resolved by the pre-defined rules.</p>
<p>A more conservative design would partition the two way trunk pool  into two one way pools. This would avoid the race condition but would fragment  the resource pool.</p>
<p>The main issue here is identifying race conditions. Most race  conditions are not as simple as this one. Some of them are subtle and can only be  identified by careful examination of the design.</p>
<p>Source: <a title="Event Helix" href="http://www.eventhelix.com/RealtimeMantra/IssuesInRealtimeSystemDesign.htm" target="_blank">EventHelix</a></p>
]]></content:encoded>
			<wfw:commentRss>http://zamansiz.org/archives/166/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
