J. NETWARE 4.X MATERIAL

J.1 Installing Netware 3.12 or 4.0x without a CD-ROM drive (sort of)

For NetWare 4.1, Novell has engineered a workaround to get away from the problem of having to deal with a DOS device that you have just "stomped on" by loading a NetWare driver for your host adapter.

To install a new NetWare 3.12 or 4.0x file server without a CD-ROM drive you need about 110 MB of disk space on a source file server that is not having 3.12 or 4.0x installed on it and a DOS partition on the server of at least 6 MB, but 15 MB is suggested by INSTALL. You also need a CD-ROM drive to copy the distribution CD to that disk space. But after the CD is copied, you don't need the drive anymore. That's the sort of part! Note that the 110 MB figure only applies if you install English support only. For other language support, you'll need about 500 MB total.

Create a volume or directory on the source server, then copy the distribution CD to it. Make sure to get all the subdirectories, empty and occupied, and the file attributes. It wouldn't hurt to verify the copy too. If you are using English only, you may save space by deleting all of the contents of other language directories (German, Spanish, ...). Do not worry about the errors generated during installation about the missing files. Tell the install process to just skip the file it is looking for.

Create a 10 MB bootable DOS partition on the target server (standard NetWare install instructions here). Also put an ODI stack and NETX.EXE on the DOS partition.

Login to your source server and map a root drive to the volume or directory with the CD contents on it. Change to that drive and start the installation. Whenever you have to enter the drive letter of the CD-ROM drive, use your mapped drive instead. You can complete most of the installation. There will be a warning about your network drive not being available after you define your LAN drivers. This is OK. You just won't be able to 'Copy On-line Documentation...' nor 'Copy Computer Based Training...' just yet. So skip those 2 steps. Also skip the 'Create DOS/Windows Client...', 'Create OS/2 Client...' and 'Create Upgrade/Migration...'.

Make sure to create your STARTUP.NCF and AUTOEXEC.NCF files.

Exit out of Install, down the server, and make sure the AUTOEXEC.BAT file does NOT start the target server. Reboot the PC. Run the ODI stack and login to your source server again. Map a root drive to the volume or directory with the CD contents on it.

Start the server with the '-NA' option. This will prevent the AUTOEXEC.NCF from running and starting up the LAN drivers again.

Make sure the SYS volume is mounted and LOAD INSTALL.

Choose 'Other Options' so that you can copy the documentation and CBT (Computer Based Training) files from your source server. Make sure to use the proper drive letter.

Exit out of Install, down the server, and make sure the AUTOEXEC.BAT file starts the target server like it is supposed to. Reboot the PC. Success.

You can use any PC logged into the source server with the proper drive mapping to do all of the 'Create... ' diskettes skipped earlier.

[Thanks to John Jobst and S.M.D. for this info]

J.2 Dumping your configuration to an ASCII file

If, at the server console's ':' prompt you issue the command: LOAD REGISTER -C you get your full configuration dumped to an ASCII file in SYS:SYSTEM. This is a useful undocumented feature of Netware v4.0x.

[Thanks to J.P. and John Burton for this entry]

J.3 To Upgrade or Migrate to NetWare 4.1

If you UPGRADE from 3.12 to 4.1, rather than MIGRATE, the passwords, as well as trustee rights, are retained. The distinction between an UPGRADE and a MIGRATION is that an UPGRADE installs 4.10 on top of the existing 3.1x or 4.0x system, while a MIGRATION first creates a new 4.10 system and then moves user files and trustee rights onto the new server.

You need to migrate if (1) you want to change your volume/disk structure or block size, (2) you want to hang on to the old 3.12 system to facilitate backing out of the 4.1 upgrade if that proves necessary, or (3) you want to automatically eliminate the obsolete 3.1x or 4.0x system files.

[Thx D.E.H.]

If you have just upgraded and are experiencing problems, search the Novell database at:

http://support.novell.com/search/

You can also look at the "Top 20 systems TIDs" on NETWIRE then select #4- Top Server Issues. It goes into depth about abends, high utilization, memory problems, sbackup, rconsole, etc.

[Thx J.D.L.]

Also, check out "Upgrading to NetWare 4.1 Across a LAN/WAN Using RCONSOLE" in the May 1995 Novell Application Notes.

[Thanks to James Powers for this info]

The biggest problem with migrate is it's (lack of) speed. Mind numbingly slow. Its not media nor processor bound but only moves 3 MB/min over FDDI! Migrating as much data as you have would take about 40 hours (ie too long). A good tape system could handle it in about 4 hours.

We've all but stopped using MIGRATE for our upgrades for this reason. We let MIGRATE create users, groups, trustee lists, etc. then we use the JRB Util's NetCopy to copy the rest. It maintains ownerships. space quotas and trustee assignments. It is on netlab2.usu.edu and mirrors and at:

	ftp://netlab2.usu.edu/apps/jrb400a.zip
For Netscapers:
	ftp://netlab2.usu.edu/sys/anonftp/apps/jrb400a.zip

[Thx M.A.]

J.3.1 Upgrading to NetWare 3.12 instead of NetWare 4.1

If you have no need for, and no support to offer on, centralized admin of all servers (that's what NetWare 4.x requires) then consider upgrading to NetWare 3.12 instead. NetWare 3.x is easier to learn and maintain and has a much larger array of third party products that will work with it.

Resist version-itis, particularly when higher does not equate to better (it's different, by rather a lot). Discard thoughts of disk compression, because you will want to get at files willy nilly and there must be space to expand them.

[Thx Joe D.]

J.3.2 Upgrading from NetWare 3.11 to NetWare 3.12

Upgrading from 3.11 to 3.12 involves downing the server, copying the 3.12 SERVER.EXE and the other miscellaneous utilities (INSTALL, VREPAIR, etc.) to the DOS boot area, and bringing the server back online. Then just copy the new SYSTEM and PUBLIC files.

[Thanks to Mark Motley for this info]

J.4 Disable Login Banner

To turn off the Red Background banner in NW 4.1 when first logging on use the /nb parameter, ie. login xxxx /nb

[Thanks to A.J. Sheehan for this info]

J.5 Backing up/Restoring NDS -- multiple methodologies

Here is food for thought which comes out of a couple of meetings with Novell and experienced sites: don't use tape to restore NDS material. Syncing NDS will be horrid and very unlikely to produce positive results. Live backups (i.e. replicating partition information) is the only recommended approach.

[Thx Joe D.]

Follow-up eMail comments were as follows:

>From what we have been told by Palindrome's tech.sup. the Novell TSANDS does not support properly restoring Object-Trustees, thus making it virtually impossible to do a full restore of a crashed server.

The problem is that object IDs are server specific, they are not replicated, and upon restoration via replication new IDs are assigned resulting in (a) loss of all directory and file trustee assignments, (b) loss of file ownership (c) if doing bindery logins the user's mail directories will no longer match their IDs, and (d) if using bindery based printing, queue and print server directories will no longer match their object IDs. These issues are non-trivial.

- J.R.B.

---------

The Red Manuals do not come close to stating the reality of tape restoring/replicating NDS for crashed servers. Thanks for folding things together John.

- Joe D.

---------

Adding to my own comment above, here is a "fair use" (I hope) snippet from the Bullets discussion I referenced yesterday. Realize that this lacks the surrounding explanatory material, and the tenor is instruction to developers about what *they* should do, not what NDS does for them.

"During an NDS backup and restore, entry IDs change. NDS is backed up by name, and therefore if any portion of any objects (sic) stored in NDS is deleted and restored, the entry IDs for the restored objects will be different. NetWare gets around this issue by allowing trustees to be backed up by name. A similar strategy is necessary if entry IDs are used to correlate file server-centric events with an object. ... It is wise to store the object's name and entry ID somewhere, either as a backup or a reference in the database, to ensure the information is accessible even if the entry ID changes."

- Joe D.

---------

At the recent Novell Brainshare conference in Sydney, one of the Novell delegates was talking about the robustness of NDS, one of the points that he made was that you never EVER want to run out of disk space on SYS: otherwise transaction tracking shuts off and as a result NDS stops. Because of this you may want to have all your queues on other volumes, no applications on SYS: and compression switched OFF since it can have "unpredictable" effects if there is not enough space to get at an essential file that is compressed.

- Adrian Tritschler

---------

Aren't the trustee rights still stored in the Directory Entry Tables and these are backed up along with the file system not the NDS? Yes, restoring just the NDS will give you no file or trustee associations, but once you restore the server all should be there. I have not had any success restoring 4.02 servers but 4.1 seem to come back fully, groups and individual trustee rights intact and the NDS pointers. I had problems with 4.02 dropping group objects, and holding the NDS together without generating spurious objects.

My test restores were done most recently with Legato 3.1, and ArcServe 5.01g. Most should be able to do a full restore. It's the partials that are a bear...I have yet to be able to restore a subpartition.

- Jerry

P.S. Procedure for restoring Servers:

  First Create Server of Same name and Same internal ID
  Second Restore all Master Partitions, (NDS)
  Third Restore file system.
  Run DSRepair until clean.

---------

Now that there's been a healthy discussion on this topic, it should probably be pointed out that Novell has a Technical Information Document (TID) that describes the proper sequence of events one should follow when restoring a 4.x server with an NDS compliant backup in a couple different situations (single server tree, multiple server tree, etc.).

It is TID 2914422, entitled "Backing Up and Restoring NetWare DS in 4.1". One can get it by visiting their web site and searching the Operating Systems Technical Information Database with the Boolean combo of "nds AND backup", or just entering the TID document ID.

One of the more interesting tidbits in it is: if you are restoring a server that has replicas elsewhere on other server(s), set one of the replicas to be the master before beginning the restore, as well as removing all volume and server objects to be resotred from the tree.

The TID databases that Novell has available for searching via their web site are and have been an invaluable resource to us. It is surprising that there isn't more talk about them here.

- J.B.F.

---------

You are right that providing NDS is back in place (via replication or restoration) *prior* to replacement of file data, then trustee assignments and ownership can be restored via an NDS aware backup system. But I stand by my claim that proper replication does not make NDS backup a non-issue because:

You still need an NDS aware backup system to back up the trustees and file ownership via distinguished name, rather than via object ID as bindery based backup systems do.

You must restore the trustee assignments over all volumes, not just SYS (I'm assuming the reason for restoring is a failure of SYS, or SYS has gotten into a state which is not vrepairable). I believe most backup systems will allow you to restore file and directory trustees without restoring the data.

If file ownership is important to your site (it will be if using volume based quotas, and personally, I favour retaining it because it allows you to identify who created files where), then you also need to restore ownership over all volumes. As far as I'm aware, and I may be wrong here, backup systems do not provide the option to restore ownership without the data. Therefore to restore ownership you will need to restore your entire file system. Given that the standard recommendations are to place as little data as possible on SYS and to keep applications and user data on other volumes, restoration of the entire file system will increase the time required to get your server running again by an order of magnitude. Our SYS volumes contain 10-20% of the total space occupied on the server, and its only that high because of Pmail. So, restoring all data would increase restoration time by a factor of 5 which means I would accept the loss of file ownership and simply set the ownership of files and directories to, say, admin on the volumes which were not restored. But if using volume restrictions, there would be little choice but to restore the lot. One change Novell have made in NW 4 and it may be because of the problem of loss of ownership after restoration via replication is to add a console set parameter controlling the ability to extend ownerless files, and it defaults to "on". Under 3.x, ownerless files can not be extended, but fortunately in most cases files they are rewritten rather than extended.

If using bindery based logins as I'm sure many educational sites are, then the problem of mail directories being restored with the old IDs as the subdirectory names has to be faced. Either they and the login scripts can be recreated, or particularly if using Pmail, the directories will need to be renamed to match the new IDs, and this will also need to be done in the mac name space if using Pmail for the mac, as Don Hanley from Syracuse University recently pointed out.

And then theres bindery based printing. Both queues and print servers use subdirectories of SYS:SYSTEM based on the object ID and these will no longer match. Given that the queue directory name is stored in a property of the queue, there is more to sorting this out than simply renaming the subdirs of SYS:SYSTEM to match the new IDs. I have not checked out the consequences for NDS based printing but I would guess it would restore correctly.

Clearly, if you are not using bindery emulation restoration problems are diminished, but even for sites using only NDS logins, restoration via replication is not the piece of cake Novell would have you believe, and you are still dependent on an NDS aware backup.

- J.R.B.

--------

>What about the utility that Joe Flowers posted about a couple of months back? Didn't he have a way of backing up and restoring NDS? Does it suffer from this same flaw?

The utility was JCMD which allows you to execute DOS-type commands from the console. It allows you to copy the contents of the NDS directory once DS has been unloaded. This is of some use for backing up NDS in a single server environment, but it doesn't lend itself to being automated because you are working at the server console i.e. unload DS, load jcmd, copy files, exit jcmd, load DS. Maybe Wolfgang Schreiber's RC util would allow the sequence of commands to be executed via a workstation - I have not tried it.

In a multiserver environment, JCMD is more problematic. To avoid potential problems, you must ensure that synchronisation is not taking place when DS is unloaded and the files copied from _NETWARE. When restoring these files into _NETWARE, you have the potential problem that Joe D has refered to often when restoring from backup, about placing info from an old epoch back into the tree. I haven't experimented with this, so I dont know if NDS would figure things out and correctly resynch or not.

We are planning to use JCMD when we upgrade a server in a few weeks. We want to preserve the existing passwords, change the block size and enable suballocation on SYS. The plan is to perform an in-place upgrade which preserves the passwords, copy VOL3 from the RAID array to a spare disk in the server, delete VOL3, use JCMD to take a copy of the NDS files, rename SYS, do a 2nd install of 4.1 setting up a new SYS on what was VOL3 with 64 KB blocks and suballocation, use JCMD to copy the NDS files from the in-place upgrade into _NETWARE on the new SYS, copy all files from the old SYS. Will this work? Right now I dont know, but it sounds ok in theory and we will do a test run next week. If anyone can point out gotchas I'd like to hear them.

- J.R.B.

---------

Based on experience, I fear I must take issue with some of the dire projections in this thread about restoring NDS.

Our network:

16 4.02 servers (now upgraded to 4.1)
Each server on its own segment & partition, connected by T1 & frame relay
Each partition has a master and at least 2 read/write replicas
SMS-compliant backups of file system only (ie. TSA400.NLM on target servers)
Backup system: Intel Storage Express (1.52AE, based on ArcServe 5.01g)

I had the "opportunity" to restore two of our servers. Here's the steps:

Use PARTMGR to delete replicas on server to be replaced Delete volume objects in NETADMIN Down server User PARTMGR to delete server object Reinstall server from scratch with same name, same internal net # Ensure all patches are loaded, including SMDR and TSA400 Use PARTMGR to place a r/w replica of the server's partition on it Allow replica to sync Using SMS, restore files from Storage Express

Result? Trustee assignments were valid. Bindery users logged in without problems. The only downside was print queues: they must be recreated. (Doesn't matter if they're bindery-based or NDS queues; according to Novell, the object ID issue hasn't been resolved with print queues yet, and they must be recreated after a restore. Might as well write that step into your disaster recovery plan.)

What are the keys to success here?

  1.  Valid NDS replication
  2.  SMS-compliant backup of file system

So, I have confirmed faith in NDS replication and restoration. While nothing is infallable, it seems to have worked in these situations. If you only have one 4.x server, you'll have to backup NDS using TSANDS, as well as the file system using TSA400. If you have multiple servers, replicate.

- C.M.

---------

A while ago, running NetWare 4.01, we managed to ruin both of our servers. Each server held replicas of all partition - but then - they were gone. We tried to restore the NDS by means of two tape backups (via SBACKUP.NLM) of the servers' NDS which were drawn roughly at the same time - to no avail. The servers didn't synchronize and serveral runs of DSREPAIR did more bad than good.

We finally resolved the problem by restoring the "root" server from tape only, thereby getting a synchronized read/write-replica back and restoring the second server from this. We observed all the three problems John mentioned above.

Conclusion is, tape backup is the last resort, applicable to one server and if there is positively no replica available.

- M.C.M.

---------

Procedure for replacing a SYS:-volume carrying drive in a NetWare 4.02 server in a multi-server environment:

0) Preparations:

0.1) Write down sizes of all partitions/volumes (look into install.nlm)

0.2) Create a user-to-bindery id list (eg. via NLIST user /B /D)

0.3) Write down all print queue/ print server/ printer specs (e.g. using nlist). Don't forget LPD-based printers.

0.4) Make list of objects (e.g. users and computers), that use this server as default (e.g. using nlist)

0.5) Copy LAN and device drivers, backup system, patches (esp. NDS) and install.{nlm|hlp|msg} to DOS-accessible location, e.g. c:\server.40

0.6) Locate original NetWare CD and keep it handy. Locate and keep a DOS bootdisk with fdisk/format.

0.7) If paranoic: create list of Trustee assignments via "rights".

0.8) Make list of trustees of Volumes objects.

0.9) Make list of "home directory" properties of all users that point to volumes of the server.

1) Run DSrepair until clean

2) Disable Logins

3) Stop the mail system

4) Backup Data via NDS-aware (SMDR-compliant) system (e.g. SBACKUP)

5) Move all NDS' partitions' replicas to another server

6) Set dstrace on all involved servers until "All processed = YES" for all partitions

7) Down server. Remove server from NDS via partmgr. Remove volume objects using NetAdmin. (needed for step 17) Remove print queues using pconsole. (No need to remove print servers or print queues)

8) Down server and copy all files on the DOS partition to floppy

9) Replace HD (or install parallel, if memory permits), partition via fdisk (see 0.1)

10) Copy files back to the new DOS volume

11) Run server.exe only with disk drivers loaded.

12) Load install.nlm and restore NetWare partition and Volumes (see 0.1)

13) Restore all patches (esp. NDS), the backup system, LAN/device drivers. Make sure nobody logs in and you don't fire up the mail system.

14) Restore SYS: (just for the files)

15) Make backup of startup.ncf/autoexec.ncf. Generate a minimal, net-connecting startup.ncf/autoexec.ncf (i.e. leave out all applications). Make sure nobody logs in and you don't fire up the mail system.

16) Restart server

17) Reinstall NDS via INSTALL.NLM

18) Run partition manager and create needed replicas on new HD (only needed for additional Bindery Contexts)

19) Restore SYS: and all other volumes (for the trustee assingments, volume space restrictions a.s.f.). (Don't forget to add name space beforehand, if needed)

20) Restore what's left:

20.1) Rename mail directories to new Object ID's using list from 0.2 update explicit references to Object ID's whereever necessary (e.g. in customized scripts)

20.2) recreate print system using specs from 0.3

20.3) restore "default-server assignments" using list from 0.4 (e.g. via UIMPORT)

20.4) restore trustees of volume objects from list 0.8

20.5) restore "home directory" properties from list 0.9 (e.g. using uimport)

21) Enable logins, fire up mail system and keep fingers crossed

NOTES: - If you have both drives in parallel, you can use a tool like JCMD.NLM for step 14) (but not for Step 19!).
- section J.5 of the FAQ makes good initial reading for this task

As I understand, DSMAINT.NLM of NetWare 4.1 should take care of 0.2,0.4,0.5,0.9 and the corresponding restoration steps. That doesn't help, however, if your drive crashed, because then you cannot use DSMAINT (maybe for replacing the server's refrences, i.e. 20.5, but this is unclear to me). So a complete backup should perform all of the 0.x steps plus step 8 (Do you?). I wrote some nifty perl scripts to extract the necessary information from nlist ouput, but this is only a partial solution.

There'll have to be adjustments to a single-server environment, namely step 5) should read "backup NDS to tape", step 7) is n/a and step 18) should read "restore NDS from tape", but I have neither tested this nor want to.

M.C.M.

J.6 Handling NetWare 4.x Page Faults

The following are some tips for dealing with pages faults on Netware v4.10.

Most of the time, the cause of the page fault can be attributed to a particular NLM or driver which may be loaded. As such, it is helpful if you can isolate the offending module.

With NW4.1, there is a DOMAIN.NLM that is installed on the DOS partition, which can be loaded in the STARTUP.NCF. When this NLM is loaded, it will catch page faults which occur and isolate the process that caused it. This makes it a lot easier to isolate the cause of server problems because you can bring the server down when it is convenient, rather than having to kick users off in the middle of the day.

Keep in mind that after a page fault occurs, even though the DOMAIN.NLM has caught it, you should still bring down the server and restart it. Using the DOMAIN.NLM just prevents the server from crashing and you can down the server properly after-hours.

Note: You might notice some abnormal behavior by your server after the page fault occurs, such as high utilization. I would not try to load or unload any modules after a page fault occurs.

A suggestion from Novell tech support is to check the memory settings for the server. In particular:

  Set  Read Fault Notification = On
  Set  Read Fault    Emulation = On
  Set Write Fault Notification = On
  Set Write Fault    Emulation = On

Almost all of the page faults most will experience are resolved when the 410PT1.EXE and 410IT4.EXE patches are applied.

[Thanks to Alex Lee this updated info]

J.7 NetWare 4.x block size, compression and sub-allocation

If you do turn off compression and sub-allocation, make sure that you change the block size, since it defaults to 64K. Also note that NetWare's caching scheme is most efficient with a block size of 64K, hence the need for sub-allocation.

[Thanks to Rick Damiani this info]

There is a delay during decompression, at least on a 486DX2/66 server. Once a file is decompressed, it stays that way until a certain amount of time passes, then it is compressed again so if you access a compressed file more than once in the delay before recompression, you only take the performance hit once.

This is controllable via SET. You can have the server: (a) decompress the file to disk on the first access, as described above, (b) keep the file compressed on disk unless it is accessed _twice_ within the usual compression period (also controllable via SET), or (c) always keep the file compressed on disk. You can also modify this behaviour for individual files and directories with the FLAG command.

[Thx S.M.D.]

If your server is low on RAM, it is unable to compress large files and tells you so at the console.

Note: Once you have created a volume, there is no way to enable/disable compression on the volume without re-creating the volume. You can flag files to not be compressed on a compressed volume or set compression to occur in 10,000 days to work around this problem.

NW 4.1 creates SYS: as being compressed by default. This can be a gotcha. If you do not want this, be careful to specify this when volume SYS: is being created. Compression works well for user volumes, especially if a lot of user files that are infrequently accessed.

[Thx D.H.]

Compression is a CPU-intensive task, and while it's normally scheduled for the middle of the night, what do you do if you can't find any time that's convenient to do it? For example, you may run a 24x7 operation, or you might not want your CPU eaten up while doing backups. Decompression is also a performance drag but if the server is reasonably powerful, your users probably won't notice the difference. PROBABLY... On my NetWare 4.02 server, mostly for my own use, I don't do administration on it very often, and so it goes and compresses NWADMIN and the large number of DLLs that go with it. The next time I go to run NWADMIN, it takes a minute to launch...

Case History: We set up a 4.1 server for one of our clients. After a week or two, he called up and said there was something wrong with his server, because when he went into a file manager and tried to do a directory of some of the stuff he'd put on the server, CPU utilization went way up and the server just started dragging like crazy...yet it wasn't "repeatable". He'd do it again and this time it would work fine. It turned out that he had his file manager set up to read the first little bit of each file to try to determine what was in it. Of course, after a while, files get compressed if they're not used, and so he was forcing the server to go and decompress a whole directory's worth of files just for a dir listing. Once he'd done this to enough directories, they'd all be uncompressed and things would work fine [with zero compression benefit] for a week (the default time before compression) and then it would show up again.

Suballocation is a wonderful thing, too, but if memory serves, it needs more memory on the server to keep track of it. Still, I can't see any reason not to use it, because if you're _that_ tight on memory, the problem is a lack of memory, _not_ suballocation.

[Thx S.M.D.]

J.7.1 Alternative compression products

NetSqueeze, The Lan Support Group, (713) 789-0882, compresses files on NetWare volumes according to rules you set up.

[Thx D.R.]

J.8 Expanding the size of the NetWare SYS volume

DSMAINT.NLM, in the latest version of the NetWare 4.x Directory Services, will allow you to expand the size of volume SYS: and is available at:

ftp://info.umd.edu/inform2/CompRes/H+S/Software/Novell/Netwire_Files/

[Thx R.J.L. & S.R.#2]

J.9 NetWare 4.1 NLM version list

There is a Netware 4.1 nlm version list at:

http://mft.ucs.ed.ac.uk/novell/techsup/archive/archive.htm  

[Thx G.J.S.]

J.10 NetWare 4.1 patch list

There is a 4.1 patch list at:

http://mft.ucs.ed.ac.uk/novell/techsup/nw410/410pt2.htm

[Thx G.J.S.]

J.11 The Novell Consulting Toolkit

The Novell Consulting Toolkit is an excellent source of info for NetWare 4.x and for other Novell issues. Those who went to Brainshare 95 were given this invaluable source of info. A four CD subscription is now available from NCS for $300. Email ncs_toolkit@novell.com for more info. The N.C.T. is also available online at:

http://www.novell.com/toolkit

[Thx D.B. and M.W.]

J.12 Gaining access to the Admin password on a NetWare 4.x Tree

  a. Bring down a server with a copy of the partition containing the admin
     account, or admin equivalent user.
  b. Enable bindary emulation in the context containing the Admin account
     on the server if not already enabled and restart server.
  c. If bindery emulation was previously enabled see the 'Lost the
     supervisor password' section of the FAQ (H.14)
  d. Login using bindery emulation, and run SYSCON from an old Netware
     3.12 server, go to user information, select the administrator's
     account and change the password.
  e. Login using NDS, and use the password you just set in step d.

[Thanks to Brian Weatherill and D.B. for this info]

J.13 NetWare 4.10 SFT-III (System Fault Tolerance III)

For those people that need very much up time Novell developed SFT-III. SFT-III is an extension to the fault tolerance that NetWare brings you out of the box. By default each server is equipped with SFT-I (the hot fix). If there is a need for it this can be extended to SFT-II. In order to do so you will have to set up Disk Mirroring or Duplexing. This level will protect against hard disk failures. SFT-III is server mirroring. This will protect against server hardware failure.

Because SFT-III is an extension, it means you will have to order it separately. Licences are available in two ways: servers with up to 100 users or for servers with more than 100 users. SFT-III can be installed during the initial installation and an existing NetWare 4.10 server can be upgraded to the SFT-III level.

SFT-III does not support the ability to share the load across the two servers (yet). Plans have been made by Novell to support this as well, but they have indicated this will not be available in the upcoming Green River release. It is planned for the next release after Green River. That version will allow disk clustering like Digital's VAX.

J.13.1 Considerations

Before ordering or installing SFT-III there are some general consideration you should make. Before using SFT-III on your server you should try to figure out if your planned configuration will work with SFT-III. Special attention should be paid to the backup solution, UPS, the MSL (Mirrored Server Link) card and, if needed, the network management software. Another good thing to pay attention to is that the hardware is identical. This may sound obvious, but by this I mean also the revision numbers. Sites that choose to upgrade a exiting NW 4.10 server to SFT-III level and have to order the additional hardware could end up with the same network card (e.g.: NE3200) but with a newer revision level. It is a good thing to have the revision levels (and BIOS dates etc.) on both machines identical.

J.13.1.1 Backup Considerations

Making a backup with a SFT-III server is possible, but not straight forward. Keep in mind that a backup device is part of the hardware. Therefore, if the primary server (that is the one of the two servers (also referred to as IO_Engine) which is acting as server) fails and your backup device is connected to that server, you will not have any backup device until it is back on line again.

Second thing to pay attention to is that the backup unit is connected to an IO_Engine and many backup products address the hardware directly. If that is the case, then the backup software has to be loaded on the IO_Engine. Not every product supports this option. Loading the software in the MS_Engine (this is the "server" part that is protected) can result in an error because no backup device was found.

At this time there are only two products that I am aware of that can run on a SFT-III server: Sbackup of Novell (very slow) and ARCserve 5.01g (from Cheyenne). Other products (even ARCserve 6.0) do not run on SFT-III at this moment.

J.13.1.2 UPS Considerations

It is a good idea to provide each IO_Engine with a separate UPS (connected to different fuses). Otherwise the SFT-III server would be shut down if a fuse blows. UPS management software like Powerchute communicate with the UPS by a comm port. This is also an example of software that can not run in the IO_Engine. At this moment it is recommended to use the Novell UPS monitoring board with the Novell UPS.NLM. This NLM can run in the IO_Engine and needs this board to communicate with the UPS.

J.13.1.3 MSL Considerations

Novell published a list of certified MSLS (as of 11/94) in their TID21974 on 09JAN95 (unable to locate this anymore). It is a good thing to use a certified MSL card. Novell also recommends to assign to the MSL card's interrupt at the highest priority, 10 being ideal. Try to avoid using interrupts 2/9 or 15, if possible. Interrupt 9 cascades to interrupt 2, and NetWare reserves interrupt 15 for lost hardware interrupts.

Also consult the NSEpro for known problems with the selected MSL card (if you have access to it). That way I found the NMSL card we had selected to use in our Compaq Proliant was not a good idea. In general you should use a high speed link for the MSL (either fibber or 100Base-Tx). The advantage of using a fiber MSL link is that you can place your secondary server far away, providing it is also connected to the network.

J.13.1.4 Network Management Considerations

Many network management utilities (e.g. Frye) try to communicate with the server hardware and also try to examine server statistics. Because these two are split with SFT-III it is not always possible to use these utilities. A network management program should be capable of communicating with the hardware (LAN adapter, processor speed etc.) in the IO_Engine and getting the statistics (# of connected users, buffers, utilisation etc.) from the MS_Engine. The only utility that I am aware of at this moment that can do this is Novell's ManageWise (release 2.0 and higher)

J.13.2 SFT-III and Raid 5

Raid 5 is a technology like mirroring and duplexing. What you need to keep in mind is the speed difference between these solutions. I have no practical experience but looking at the technology I would say duplexing works fastest (less CPU overhead). Second would be the RAID 5 solution and slowest would be mirroring (assuming you use the same controller for the mirrored drives otherwise it is duplexing). These last options provide no fault tolerance for hardware failures other than for the hard disk.

These solutions can be combined with SFT-III, but personally I would say it is overkill. At the moment a disk fails and you have not implemented any of these technics the primary IO_Engine will fail and the secondary will take over. At that moment there is the possibility to swap the hard drive and bring the IO_Engine up again. Recreating the netware segments and mirrored pairs of drives will do the job.

J.13.3 Will SFT-III work on NetWare 3.12 ?

As far as I know SFT-III was only available for NetWare 3.11. Anyone that wants a 3.12 SFT-III server should investigate this. It could be that you can only get a 4.x version (all registered 3.1x SFT-III users were upgraded to the 4.x level by Novell).

J.13.4 Will NetWare Connect work on SFT-III ?

No, NetWare Connect won't run on SFT-III. NetWare Connect uses modems, which are connected to a comm port. The comm port is part of the I/O of a fileserver. That means the modems are connected to the IO_Engine. If the IO_Engine fails, the secondary will take over. Because your modems will be connected to the other IO_Engine these sessions can't be taken over at the time of the switch over. In other words all users logged in by NetWare Connect would lose their connection. This is part of the reason why NetWare Connect won't run on a SFT-III machine.

J.13.5 ARCserve 5.01g and SFT-III configuration

ARCserve requires a special configuration in order to run on a SFT-III machine. Look at Cheyenne's ARCserve release notes section E. Configuring ARCserve to run on NetWare 4.1 SFT III. You will find that in order to run it on a SFT-III machine you will have to load 2 additional files in the IO_Engine that do not come with ARCserve or NW 4.1. The file IODAI40.NLM is a Novell file (can be found on many BBS's and the Internet). The file ARC_SFT3.NLM can be obtained from Cheyenne.

Some time ago there was a message that a NW 4.10 server with all latest patches (Libup8, 410pt3 & landr5) could crash during a remote server backup with ARCcserve. Cheyenne has a patch for it. Keep in mind that using ARCcserve on SFT-III actualy makes a remote server backup.

One other issue. We installed the manager on our server and when we ran the Windows manager it looked fine until we tried to work with the databases. It turned out that this was caused by the way ARCserve defines the location of the program. By default it installs the manager in:

\\SERVERNAME\VOL_NAME\ARCSERVE\MANAGER\arcserve.exe.

After changing this to:

F:\ARCSERVE\MANAGER\arcserve.exe

and changing the working directory corresponding it all worked great.

J.13.6 TCP/IP and SFT-III configuration

In order to run TCP/IP on a SFT-III server you will have to set up a separate sub-net for your MS_Engine. The IO_Engines must be configured to act as a router. The MS_Engine will act as an end node. Note that both IO_Engines communicate with THE SAME IP address.

		TCP/IP Configuration Example

	      +-------------------------------+
	      |  Mirrored Server - MS Engine  |
	      |     193.67.129.200            |
	      +-------------------------------+
				|
				|
				|            Virtual LAN
	----------------------------------------------
				|
	   (IO Engine-MS Engine | Internal Interface)

	      |------------193.67.129.201-----------|
     +-----------------+                 +-----------------+
     |  IO Engine 1    |                 | IO Engine 2     |
     |  193.67.129.131 |                 | 193.67.129.132  |
     +-----------------+                 +-----------------+
	      |                                   |
	      |             Real Network          |
	------------------------------------------------

J.13.7 SFT-III Engines swapping

It is not normal for a SFT-III server to regularly switch primary and secondary engines. If this happens try looking at the file io$log.err in the system directly. It records the problems if the primary engine and the secondary engine swap. Perhaps there is a hint in there. Look carefully to what happened just before the switch over. If that doesn't help you could try running conlog.nlm. Conlog can be loaded in all three engines and with the option FILE=SYS:\SYSTEM\MSLOG.TXT (etc.) you can specify a different output file for each engine. If the switchover happens again you can have a look at the outputfile(s) to see if something strange happened.

J.13.8 MS_Engines produced a different output

MS_Engine produced different outputs is a very difficult problem to trouble shoot. It is of most importance to understand the way SFT-III is designed. SFT-III is designed as Hardware Fault Tolerant and has no added ability to protect itself from a software bug. However if you have a system which has this problem, only the secondary machine should be affected unless SET parameters are set to halt both machines.

SFT-III Architecture

SFT-III was designed so the "event queue" in each machine, primary and secondary, would receive and then process the same events independently. Every instruction that the processor executes comes from this event queue. i.e., if a request is generated for a file from the disk, the request is put on the event queue of the MS_Engine. An identical request is then sent, across the MSL, to the other machine and placed on the event queue of that machine. When the event has been processed, one last consistency check is made, comparing the results of the MS_Engine from each machine. Note here that there is only one MS_Engine that is presented to the user even though each machine has processed the request independent of each other. If the result of each machine is identical, the data is sent to the IO_Engine, packaged into packet form, and sent out the LAN channel. If the results are compared and are not the same, then you get the Abend: MS_Engine Produced Different Outputs.

Troubleshooting (according to Novell):

The most likely causes of the "Different Outputs" Abend is that, either one machine has traversed a slightly different code path compared to the other machine, or, that the NLM that is running has encountered a variable in the code that has not been initialised. The value of an uninitialized variable is completely random, and therefore increases the likelihood that the MS_Engines are going to produce different results.

The "Different Outputs" problem is NOT a hardware problem, and it is not an MSL, LAN, or DISK problem. Use the following questions and objectives to aid in identifying the NLM that is causing the Abend.

Questions

  - What modules are being loaded in the MS_Engine ?

  - Is there a sequence of events which will cause the server to Abend ?

  - Can the Abend be reproduced using this sequence of events?

  - Can a specific NLM be singled out as the cause of the Abend?

Objectives:

  - Stabilise the customer's environment.

  - Modify one system item at a time.

  - Reproduce the problem in a non-production environment.

  - Trace the problem.

  - Correct the Module.

Troubleshooting (based on experience):

This problem can also be caused by external devices. In our configuration this problem was caused by a FDDI hub. One IO_Engine was attached by FDDI with a SAS port and the other IO_Engine was connected by FDDI to a DAS port. Another thing to check is whether the MSL link is functioning correctly. Is the speed of your MSL link equal or higher than the LAN link? (using a slow MSL link and a fast LAN link is not a good idea). Also check your interrupt settings. Does your MSL link use a higher priority than the other adapters? Last: Novell recommends NOT to use PCI adapters from different manufactures. Try to use only one PCI card or none at all if possible.

J.13.9 Additional information

http://netware.novell.com/discover/ssnwsft.htm

http://netware.novell.com/database/docs/wpdb20.htm

http://netware.novell.com/discover/can4reli.htm

Another good source are the Novell manuals. All related SFT instructions are in the normal manuals/dynatext,particularly Chapter 5 of the installation manual, and Appendix C of the Supervising the network manual. If you have access to it, you could have a look at the NSEpro. It contains several documents relating to SFT-III.

J.13.10 Other products of interest: Vinca StandbyServer

How StandbyServer (2.0) for NetWare Works

Vinca uses a second server as an automatic standby in case the primary machine fails. Data from the primary machine is mirrored to the standby machine using standard IPX protocols. StandbyServer 2.0 uses real-time NetWare disk mirroring to keep exact copies of all system data on both the primary and standby machines. Since StandbyServer uses standard IPX connections to transfer data, a dedicated link is not required, but it is recommended. Any standard IPX board and driver combination can be used as the Vinca link. The data can then be routed, bridged or use a shared, high-speed backbone. The connection status of the Vinca link and the network link between the two machines are constantly monitored to ensure that the primary server is operating . These multiple checks avoid any inadvertent switchover to the standby machine. If the primary server has failed, the standby machine automatically takes over the role of the primary server using the same server name, login scripts, bindery or NDS and IPX address as the failed server.

Vinca StandbyServer for NetWare has autoswitch. It automatically switches between the halted main server and the standby machine. With the new 32-bit clients from Novell or Microsoft, the client connection is maintained, not requiring the user to relogon to the switched server after NetWare reinitializes the disks. Users will experience only a momentary pause while the switchover takes place, and their connection to the server is retained. With older client software, the users simply log back into the server using their same name and password as they did on the failed server. For more information on Vinca, contact:

http://www.vinca.com

J.14 Mirroring

Let's take a moment to categorize. I would like to define the difference between "online" and "near line" redundant systems. An on-line system requires no user and/or administrative intervention to recover from a fault condition. Conversely a "near line" system requires user and/or administrative intervention (workstation rebooting etc.)

There are a number of products that will "mirror" servers for you. At the top of the list is:

Novell's SFT III (http://www.novell.com) This I categorize as an "on-line redundant system."

Advantage: Total protection from hardware related abends; Seamless server "switch over" (Your users will not know that the system has had a failure)

Disadvantages: Identical Hardware required; Cannot protect from software abends; NLM's must be specially certified for SFT III.

Next are the products I categorize as "Near Line redundant system."

Lan Integrity (http://www.netint.com)

Advantages: Totally recoverable from hardware and/or software errors. One to many protection. One server can protect many targets. "15 second" recovery time. Backup Solution.

Disadvantages: Users must reboot to reestablish services; Does not protect Printing Services. (Additional administrative overhead procedure required, duplicate queues); Does not support advanced Clients (Microsoft's NDS client, Novel 32 Client for 95; Novell 32Bit client for DOS/Windows. (Support expected at some point); ExtraTape Drives required (DLT etc.)

Vinca/StandBy 32 (http://www.vinca.com)

Advantages: Identical hardware not required; Full NDS, Bindery and advance client support; "Autoswitch feature takes care of swapping in the standby server without intervention from the network administrator.

Disadvantage: They claim it will protect against both hardware and software errors. Since there is no seamless "switch over" in either case. it really means that it can recovery reliably from either failure. However if there are corrupted files you will still have to wait for the automated vrepair to be run before the users can user the system.

LanShadow for Horizon (Http://www.horizon.com)

Advantage: "Network mirroring tool that provides fast recovery from server failures and assures constant availability of critical network data. Runs as NLM and configured to mirror entire servers, volumes directories or individual files, including open ones, to a designated backup server or open space on another production server. LANshadow doesn't require a dedicated backup platform physically connected to and configured exactly like a production server, nor does it require any dedicated hardware or tape drive. LANshadow supports all NetWare environments including 4.x; also includes support for Macintosh name file space."

Disadvantage: Server "switch over" not automated. NDS support ???

[Thanks to Colin St Rose for this info]

[ J(2) | Novell FAQ Home Page ]