paulengr Posted November 10, 2007 Report Posted November 10, 2007 I'm in the rough "thinking" stages about the control network fabric at the plant I work at. Right now, the telephone system (VoIP), security cameras, office PC's, and control network all share the same switches and fiber...the switches at least are owned by IT. And they're not the least bit concerned about uptime in the same way that I am. A few minutes is not really considered a big deal. It seems that it takes a Cisco switch about 10 minutes to reboot It also goes all draconian on you if it ever sees too much noise/garbage on a port and may even permanently lock out a port on it's own...which makes things a bitch to troubleshoot if you don't have access to the switch. So...long story short, I'm going to be buying switches some time soon. Managed switches. With redundancy capabilities, IGMP, etc. So far it seems like at least N-Tron, Sixnet, Hirschmann, and probably a few others are out there. Does anyone have any particular preferences, horror/angel stories, etc., good or bad? Quote
Nathan Posted November 10, 2007 Report Posted November 10, 2007 (edited) Paul, First - I'd HIGHLY RECOMMEND that you try to work things out with IT in this case. Do you have specific requirements to share, either prescribed, actual, or anticipated, in terms of switch capabilities? I know you mentioned managed, redundancy, etc. I find that in most real world applications KISS is the way to go - the question is how simple of a solution can really meet your needs. In many cases users overestimate their hardware needs, or talk big about how crucial uptime is, but aren't realistically considering what it takes to get there. They imagine that throwing a few more bucks at hardware for "redundancy" is going to magically give them 6 sigma uptime - It ain't gonna happen! Alternatively, I've set up networks with no onsite support staff that have little downtime. But we're talking unmanaged, non-redundant switches, or at the very most, simple managed switches like Linksys offers. Add complexity, configuration changes, redundancy, and smaller numbers of acceptable downtime and you'll find yourself in a much more difficult situation. Going from 95% to 99% to 99.9% to 99.99% uptime might as well be exponential in terms of complexity and cost, especially on a larger network. I work in an all Cisco environment where we have many different devices and a big budget: routers, layer 2 & 3 switches, lots of VoIP, IDS, Pix firewalls, etc on several distinct enclaves (separate networks, basically). Our uptime is decent, but probably not as high as what you're looking for. Typically when we go down it's on one of many router legs away that provide services to us. More often than not, it's a crypto issue as we have several encrypted networks riding on each other over commercial services. Most likely that's not something that you'll have to deal with. Also downtime occurs when users or administrators make changes - this becomes tough in a complex environment as you mentioned. We can configure all kinds of really cool stuff, but it's complicated. I've had my MCSE since the summer of 2000 and have held several IT jobs that include networking - I consider myself fairly well grounded in the topic. However, I'm basically at the very beginning of the CCNA/CCNP road with a lot to learn about Cisco, IOS, and how we're configured. I mention this because you really want IT taking care of this sort of system - it's not something that you can "figure out" with a little trial and error, and it's certainly not something that you can blow them off with because "they don't care about uptime as much as we do". That is, unless you're truly committed to learning an entirely different skillset. Even then, you'll need to be available often to take care of the network for when a problem does arise. A complex network will not manage itself. So, as I see it - your likely options include: 1. Install a simple system that's easy to troubleshoot. This is cheap and easy. You can do this yourself or get IT support. It may not ultimately satisfy your requirements. 2. Have IT provide and support the network layer. This could take compromise and understanding on both ends. Sorry I don't have any input on other brands. I find it unlikely that uptime and complexity/simplicity will vary much on similarly capable systems as Cisco, especially as dictated by anecdotal horror/angel stories. I know that this isn't what you want to hear. Edited November 10, 2007 by Nathan Quote
paulengr Posted November 10, 2007 Author Report Posted November 10, 2007 A lot of questions. I've been thinking long and hard about the plant network. I've already made several changes. One of the critical plant backbones was run on plastic cable ties on the side of a building without any gutters. My first winter at this plant, the melting ice/snow formed icicles on the fiber and shredded it, taking out half the plant in the process. This fiber (and one other) were promptly moved to more hardened locations. The fiber termination points were all cheap NEMA 1 boxes. This was also rectified. The backbone switches (which also provide PoE to the VoIP network and surveillance systems) were just stuck wherever and plugged in. Several of the critical ones were moved onto UPS protection (with battery management). The server stack had nothing at all except some junk UPS's. It's now on UPS protection with automatic transfer switch and natural gas generator backup. Needless to say, uptime has improved substantially but there is still lots of room for improvement. The physical production plant is roughly "Z" shaped. The logical layout is a star currently with 3500 series or 2600 series Cisco switches at the ends of the star. The center of the star (not surprisingly, the server stacks) is 100 Mbps. All of the star legs are 1 Gbps fiber. Unfortunately they ran 62.5um multimode fiber not realizing that the plant exceeds the maximum length for the cheaper multimode transceivers (forgot the nomenclature but we have to run the long haul transceivers for the long links). The actual fibers are at least 12 fibers and usually more. Less than 25% of the actual fibers are in use since other than coming out from the center of the star (located across the street), there are only 2 active fibers on almost every run (lots of dark fiber). The only part of the network that is separated is that the phone system is VLAN'd (phones are all IP Phones except for a 48 line gateway which covers the older stuff). There are a couple wireless access points but they're used purely for convenience (tetherless laptops within conference rooms). Everything else is wide open. In terms of reliability, I installed a natural gas generator on an automatic transfer switch covering the core (server room) complete with a separate (and powered) air conditioner. With the amount of hardware (about a dozen servers, T1 stuff, phone stuff, etc.), the existing pair of UPS's is just enough to give them about 8-10 minutes of backup power, plenty of time for the generator (30-60 seconds startup time), and no real reason to upgrade that stuff further. The outlying switches are located in fairly well protected cabinets. Some of them (critical areas) have UPS protection which gives us about 4 hours of battery life. Since the majority of the plant uses VoIP, this is kind of critical to have...something most people don't plan for until they have a couple power blips and everything goes out including phone service. So...in terms of reliability, the biggest problems that I've had have to do with the fact that Cisco switches are rather finicky at times and tend to lock out ports very quickly (troubleshooting is a bitch when IT is a 9-5 operation and I don't have access to the switch fabric), and they have rather long bootup times (10-20 minutes sometimes before a port activates...sometimes the switch has to be rebooted). The second problem is if a fiber takes a hit and goes down. These are my #1 & #2 problems in terms of outages. In terms of QoS, the biggest problem is that since the network is wide open and we have no traffic controls, it is fairly easy to "swamp" the system. Especially when the backbone is 1 Gbps, but the server rack is only 100 Mbps. So when someone in accounting does a big download, it kills the control network. In terms of "my stuff" in the server area, there are 4 machines: a SQL server, a "management" server (generates reports, web-based HMI displays, etc.), and a redundant pair of operator HMI servers (hot backup with capability to transition the floor machines onto the backup server in about 5-10 seconds). I was able to use the second gigabit Ethernet ports with a crossover cable to reduce the inter-HMI server traffic but the main IT switch for the server banks is all 100 Mbps. There are 6 total legs on the star. Only 4 of them contain parts of the control system (plus the central hub). Physically, we're building a new extension to the emission system and a new substation over the next couple years that will have duct work conveniently located so that it would be a trivial matter to run one extra fiber and make a ring out of the system. Essentially taking the "Z" I was referring to and turning half of it into a triangle. There would be two remaining points of vulnerability. The last leg going to the server stack would end up running both legs of the ring across the same physical fiber. And the second point would be reaching the last corner of the "Z". This last corner is physically located in such a way that I could either run fiber down existing poles and eliminate both cable vulnerabilities or set up the backup leg in the ring as a wireless point-to-point backup link (it's line of sight, about 500-600 feet of distance). This would convert everything into a true physical ring...physically about the shape of a kidney bean (a squashed ring). So...divorcing the control and office networks will solve the QoS and administrative issues. The cost isn't too drastic in doing so. It mostly involves locating a bunch of switches right alongside existing ones. Adding the last fiber run allows the system to be converted to a true ring with no shared fibers, and the interim plan can include running shared fiber over 2 of the paths (out of about 6). I've got a good handle on the physical layout, etc. The problem comes down to switch selection. Overall, I haven't been too impressed with Cisco. They don't exactly have an "industrial" line of products. Hence the reason I'm throwing out the question of which switches anyone has had experience with, good or bad. So far all of my existing switches (I always isolate a PLC, operator panels, remote I/O, etc., from the IT network with at least one separate switch) have been unmanaged. I'm plenty comfortable with managed switches (I end up helping the IT guys diagnose network problems anyways). But Cisco switches have not been what I would consider the ideal fit for industrial systems. So I'm just asking around about other stuff out there. We've done a fair amount of installations with Hirschmann switches but not their gigabit stuff. We've also obviously got lots of Cisco stuff. There are still some "D-Link" or "Linksys" type home/office grade switches floating around, but I try to eradicate them whenever I find one. The hubs were gone about a year ago never to be seen again (hopefully). Quote
Nathan Posted November 10, 2007 Report Posted November 10, 2007 (edited) Ok - so you are at a site and application where you need, and are flipping the bill for enterprise level stuff. Your first paragraph seemed to take care of the really bad obvious problems - keep attacking the "low hanging fruit"! I think something's wrong with the transfer performance that you're getting, both between nodes on the same switch and between switches, but who knows. You'll have to dig deeper to figure that out. Most of what I'm describing is fairly vendor neutral. I'll ask around about environmentally hardened equipment - I doubt that'd do much to combat the problems you're describing. For example, if dust is taking out your equipment, put it in a clean room. I'd be more weary of additional administrative complexity from mixing brands. 0. You'll have to figure out the fiber in terms of repeaters, more expensive transeivers, etc. 1. Slow boot up times or times when the equipment is establishing spanning trees, etc doesn't surprise me. Cisco equipment being finicky when you're configuring it - doesn't surprise me. What's the deal with ports randomly "locking" connections out? I've never heard of that. 1a. I'll ask my Cisco guy about the 10-20 minute bootup time for ports on Tuesday. That makes no sense to me. 2. Running a second set of equipment is an option for you since you have so much dark fiber. However, it introduces more potential points of failure and probably more complexity. I would recommend first looking at a real QoS implementation and making sure all your backbones run at a gig. How many users do you have and what kind of data requirements? I find it hard to believe that you would exceed that on the HMI "server room" that you described. 3. It doesn't seem right that someone from accounting downloading a file would disrupt the controls network. The back end capability of a 100Mbs switch is enormous - there's almost no way you could saturate that. Your weakest link should be across the star at 1000Mbs. The fastest that a regular hard drive can stream data is roughly 200Mbs. A RAID array can do a little better, but any file they're downloading will be finished quickly at that rate. Am I missing something? Is your whole server room connection somehow a shared 100 Meg? 3a. I'm guessing that your "2 operator HMI servers" are running some sort of network intensive thin client with concurrent connections. It would ease a lot of your traffic if those were on the same switch as your thin clients. Granted you would need gigabit connections from the HMI server to the switch (which you should do anyway - I don't think that'd slow down the switch any). 3b. If 3a isn't an option and you're saturating your most used gigabit connection, then it's time to add another. You already seem to have extra fiber runs. I think you can connect multiple fiber uplinks between switches. 4. It would be valuable for you to collect data on how saturated your network really is and do data transfer tests. 5. Production should trump IT. If you guys work around the clock and the network brings down your operation, then IT can't be 9-5. It comes down to dollars and cents and management should understand that. This may mean that you're volunteering yourself to train with IT and support your network 6. Not a lot you can do about fibre taking a hit. You already hardened the path. The ring's a good step - I think they support a true star topology, but options may be limited in your Z shaped plant. Fibre doesn't just "quit working". It takes idiots with a back hoe or something. Policy, training, network diagrams, and communication can save you here. Then make sure you have monitoring software in addition to alerts (like What's Up Gold). 7. Back to QoS. Divorcing the 2 networks it is an option and could eliminate the need for QoS on the controls side. If the second network is as complex as the first it won't solve anything on the administration side - at least it won't give you any more uptime there. Say they do let you manage it, are you going to know what to do with it? We're talking about troubleshooting when there's problems, not normal operations. If you can effectively do that then you might as well be working with them on your portion of their network in the first place. They're not withholding access just to be mean. And real QoS would have other benefits besides your control system, namely VoIP. 8. Replacing small switches with quality managed ones is a good idea here. Edited November 10, 2007 by Nathan Quote
paulengr Posted November 10, 2007 Author Report Posted November 10, 2007 Already done. I have unmanaged switches anywhere there are operator interfaces ("Panelviews" and such) and networked I/O with static IP addressing. The reason is because IT wouldn't think twice about screwing around with switch configuration and taking production systems down. At least until I start generating IGMP (Ethernet/IP) trafffic, this prevents screwups in the managed switch configurations and resetting them from taking down production equipment. We had a bunch of "Cyclostak" switches (I think that was the name) and there were cheap Walmart hubs scattered all over the place when I started. I went through all the cabinets I could find and threw away the hubs, replacing them with additional home runs or 100 Mbps unmanaged switches as needed. The "Cyclostak" switches were replaced with Cisco 2900's or 3500's (depending on whether PoE & SIPS was needed or not). By way of example of the problems I have now, it took me 5 different ports on a switch before I could find one that would handle DHCP correctly on a Cisco 2900 series switch so that I could even get my laptop onto the network to diagnose a PLC problem. I was trying to figure out why I lost connectivity between two PLC's that jointly operate a piece of equipment and went into shutdown because the network connection was lost. The connection was originally lost because of a power failure (a 1960's vintage drawout circuit breaker took a hit from a large motor starter with miscoordinated protection downstream...we've had several more problems subsequently with the breaker). After about 20 minutes, the IT guy rebooted the Cisco switch and everything was fine after about 5 minutes. Needless to say, that's why my confidence in Cisco equipment is 0.0% and I'm looking into alternatives. Quote
Nathan Posted November 11, 2007 Report Posted November 11, 2007 (edited) Paul, Sounds like all in all you're on top of the situation. A few more thoughts: 1. Implementing QoS would go a long ways. However, your stuff would probably be in the middle of the pecking order anyway. You'd probably be better off justifying link aggregation, oops, "trunking" ;) other 100 meg drops there - if not gigabit. Or just coordinate backup times. 2. You might want to consider having Cisco consultants, the CCNP/CCIE type, come check out your network for advise. They may be able to help with: fixing the port garbage thing or port monitoring, the bootup time, and VoIP configuration which might include QoS. The trick is to get IT to make it their idea. Again, I'll ask my Cisco guy about those topics. 3. In response to #5 - I understand that it's their network. That's where you bring your requirements to them (possibly via management) and allow them to engineer and support a solution for you that meets your production driven needs, that is, the bottom line of the business. I don't mean "production trumps IT" like I hear integrators justifying when they sidestep IT in a moonlighting scenario. 4. Clean up crew - whoa, that's rough. All you can do there is your best and monitor, monitor, monitor. Management should understand that they're job entails acceptable risk. If that's not the case, then you need better supervision, training, whatever. 5. I think your DHCP troubleshooting analogy may not be a fault of Cisco and would probably be possible with any comparable device. The part that does worry me was your mention of rebooting it fixing the problem. The example seems like concluding that AB Plcs don't work because integrator Bob's factory is always down. Too many large scale 24/7 operations run Cisco. In any event, your example supports my point about those 2 options - KISS or becoming more technical in this area (or, equivalently, leveraging IT). I'm not trying to get high and mighty here - just prove a point. You shouldn't be wasting time plugging into 5 ports to troubleshoot a DHCP issue. With VLANs and broadcast domains and DHCP proxies and stuff you've got a lot of configuration flexibility that can hose you here. You also should have higher level end to end connectivity monitoring tools that convey the network picture to you immediately. You also could use a static address and not worry about DHCP. My point is that controls guys who are wizards with physical and electrical systems, and I'm not just picking on you here, commonly presume to understand networking because they've setup really simple systems where the vendors "auto-magically" do the backend work. They typically seem to think that they understand the system or can easily "figure it out" - fine, more power for initiative. But what kills me is when they slam IT as a legitimate profession, exemplifying the young guys. These same guys do things like idiots on the network end. I can't tell you how many times I've seen IP addresses like: 1.2.3.4 with subnet masks that don't make sense, or dangling modems with PC Anywhere, or guys randomly guessing at simple networking problems until it seems to work (firewalls, port forwarding, file sharing, etc), or policies or automated software to reboot the computer periodically for a software issue. Sure you can power cycle a hardware device on a problem, but Windows? I'm pretty much done with my off topic rant. For your setup - running a separate simple managed switch for controls might be the best option. Then you could troubleshoot it easily. When I was saying unmanaged versus managed, I was off in my terminology. Managed is always great for things like: SNMP, turning ports on/off, mac filtering, etc. I was really thinking about the layer 2/layer 3 features that you probably don't need on the controls side - that's part of what makes configuration and troubleshooting tough. It sounds like your entire control network just needs to be "one network" - one broadcast domain, one subnet, no routing involved. Simple. It's not like you have 30,000 users all logging on to thin clients at 9AM. That was why I brought up the "high end", low end business linksys switch. You can do VLANS and stuff all through their web based interface. A little networking background is necessary, but you're already way past there. It's certainly no IOS in terms of flexibility or complexity. An "Industrial" version of what I described above is probably what you're looking for. Your best bet may be looking for mil spec gear - but we usually go with Cisco. Edited November 11, 2007 by Nathan Quote
Ken Moore Posted November 11, 2007 Report Posted November 11, 2007 You guys are over my head, but I did want to comment. We have separate LANs for business and controls, but both use the same fiber runs, two physically separate networks. We have a gigabit backbone with 100mb drops to the equipment. The two networks are all Cisco, and I have never experienced the problems you have. We do not have any local IT support, so I have to maintain both on the physical side, but the logical side is all handled remotely. The major difference is that we have a Cisco guru at the enterprise level ( I believe he is actually a Cisco employee), once the networks were setup, we've never had an outage from the network. The BLAN has VLAN's and is very heavily managed. The Process Lan is lightly managed and does not use VLAN's, any device on the network can connect to any other device, regardless of location or switch used. The PCN switches are in locked, dedicated enclosures with UPS power from the process UPS system. The BLAN uses a 4750 Core and 48port 2950's in the field, star topology. The PCN uses a 3750 Core and 24port 2960's in the field, star topology. The servers connect to the Cores via gigabit fiber. In both cases the field switches are home runs to the core, and all the devices are home runs to the switches. I ran at least two copper drops to every drop, so that I would have at least one spare for laptops, bad ports etc..... I think you have two major issues, the 100mb connection bottleneck you mentioned, and your IT people do not have the Cisco stuff setup correctly. I believe enterprise grade Cisco switches are a reliable and robust product, you just have to configure them correctly. I know it will be tough to get passed IT, but you might want to consider bringing in a Cisco specialist to take a look at your setup. Quote
Nathan Posted November 11, 2007 Report Posted November 11, 2007 Ken - thank you. You've confirmed the validity of much of what I've said. And you bring up a good point that Cisco equipment can be remotely managed. Your PLAN consisting of a single VLAN also keeps things simple since it's such a small network in terms of nodes and intercommunication (the single "broadcast domain"). Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.