PCI Hot Plug User's Guide I/O device edition - for Solaris(TM) Operating System -
Contents PreviousNext

Chapter 3 File Devices> 3.1 Replacement of PCI card

3.1.1 Replacement of PCI cards on non-redundant system

Before exchanging PCI cards without using redundancy software such as multipath control, applications using the PCI cards must be stopped.

Follow the procedures below to exchange PCI cards on a non-redundant system.

  1. Stop the machine administration hardware monitoring daemon

    Execute the following command to stop the hardware monitoring daemon of machine administration.

    # /usr/sbin/FJSVmadm/prephp <Return>

    If you use the Fibre Channel Card (PW028FC3*/PW028FC4*/PW028FC5*), execute the following. The daemons will be stopped.

    # /etc/rc0.d/K10ElxRMSrv stop <Return>
    # /etc/rc0.d/K10ElxDiscSrv stop <Return>
  2. Specify the replacing PCI card

    Follow the instructions below and determine the interface name of the path connecting the target PCI card and I/O devices and the connected I/O device.

    1. Determine from the WARNING messages output on the console the interface name of the path connecting the target PCI card and I/O devices and the connected I/O device.

      In the example below, glm2 is the interface name of the path connecting the broken PCI card and I/O devices, and sd20 is the disk device connected to glm2.

      When exchanging a PCI card which has two ports, the physical path name of the connection path between the I/O device corresponding to the other port, and the I/O device connected to that path must be determined in the same manner.

      The physical path name corresponding to the other port can be determined by the fact that the name of the two ports on the same PCI card have a relation like "/pci@89,4000/scsi@2" and "/pci@89,4000/scsi@2,1." The I/O device connected to that path resides under the directory of the physical path name determined.

      :
      WARNING: /pci@89,4000/scsi@2 (glm2):
      invalid intcode=fe00
      :
      WARNING: /pci@89,4000/scsi@2/sd@3,0 (sd20):
      SCSI transport failed: reason 'reset': giving up
      :

      If you use the Fibre Channel Card (PW028FC3*/PW028FC4*/PW028FC5*), the messages are shown as follows.

      :
      WARNING: lpfc0:INe:Adapter Hardware Error
      :
      WARNING: /pci@89,4000/fibre-channel@2/sd@3,0 (sd20):
      SCSI transport failed: reason 'reset': giving up

      Exec the following command and check the physical path name.

      # grep lpfc /etc/path_to_inst <Return>
      "/pci@89,4000/fibre-channel@2" 0 "lpfc"

      The second value is instance number of each lpfc instance.

    2. Determine the logical path name under /dev/dsk corresponding to the disk device (sd20) connected to the target PCI card. In the example below, "c2t3d0" is the logical path corresponding to sd20.
      # ls -l /dev/dsk | grep /pci@89,4000/scsi@2/sd@3,0 <Return>
      lrwxrwxrwx 1 root root 41 Sep 20 22:53 c2t3d0s0
      -> ../../devices/pci@89,4000/scsi@2/sd@3,0:a
      lrwxrwxrwx 1 root root 41 Sep 20 22:53 c2t3d0s1
      -> ../../devices/pci@89,4000/scsi@2/sd@3,0:b
      lrwxrwxrwx 1 root root 41 Sep 20 22:53 c2t3d0s2
      -> ../../devices/pci@89,4000/scsi@2/sd@3,0:c
      lrwxrwxrwx 1 root root 41 Sep 20 22:53 c2t3d0s3
      -> ../../devices/pci@89,4000/scsi@2/sd@3,0:d
      lrwxrwxrwx 1 root root 41 Sep 20 22:53 c2t3d0s4
      -> ../../devices/pci@89,4000/scsi@2/sd@3,0:e
      lrwxrwxrwx 1 root root 41 Sep 20 22:53 c2t3d0s5
      -> ../../devices/pci@89,4000/scsi@2/sd@3,0:f
      lrwxrwxrwx 1 root root 41 Sep 20 22:53 c2t3d0s6
      -> ../../devices/pci@89,4000/scsi@2/sd@3,0:g
      lrwxrwxrwx 1 root root 41 Sep 20 22:53 c2t3d0s7
      -> ../../devices/pci@89,4000/scsi@2/sd@3,0:h
  3. Stop applications

    Stop applications with the following operations.

    1. Stop vold.
      # sh /etc/init.d/volmgt stop <Return>
    2. Stop the all I/O devices connected to the target PCI card (determined in procedure 3.).

      [ If the device is a disk unit (file system operation) ]

      1. Determine the mountpoint of the disk from the logical path name determined in procedure 3.b. (c2t3d0).
        # mount | grep c2t3d0 <Return>
        /export/home on /dev/dsk/c2t3d0s3 setuid/read/write/largefiles on Mon Sep 30 01:00:51 2002
        /develop/firm on /dev/dsk/c2t3d0s0 setuid/read/write/largefiles on Mon Sep 30 01:00:51 2002
        /develop/drv on /dev/dsk/c2t3d0s1 setuid/read/write/largefiles on Mon Sep 30 01:00:51 2002
        /pub on /dev/dsk/c2t3d0s6 setuid/read/write/largefiles on Mon Sep 30 01:00:50 2002
      2. Stop access to the disk. To check which process is using the target file system, use the fuser(1M) command as follows.
        # fuser -c /export/home <Return>
        /export/home: 14967c 14571c 14493ctm 14020c 13828tm
        13803c 13575c 13133c 13125tm 13107c 12682ctm 12066tm
        12048c 11971ctm 11952ctm 11937c 11867c 11846c 349m
      3. Unmount the disk.
        # umount /export/home <Return>
        # umount /develop/firm <Return>
        # umount /develop/drv <Return>
        # umount /pub <Return>

      [ If the device is a disk unit (raw access operation) or a tape device ]

      1. Check the access statistics of the target disk determined in procedure 3.a. Do the same with a tape device.
        # iostat -xc <Return>
        extended device statistics cpu
        device r/s w/s kr/s kw/s wait actv svc_t %w %b us sy wt id
        sd0 59.7 7.5 474.5 45.0 0.0 3.9 58.6 0 41 3 7 23 67
        sd1 0.1 0.3 1.0 2.5 0.0 0.0 16.0 0 0
        sd20 0.0 0.1 0.3 0.7 0.0 0.0 14.7 0 0
        st82 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
        nfs1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0 0
      2. Stop disk access of applications.

        Disk access statistics of the target disk can be checked as below.

        It is the same with tape devices. The example below shows access statistics of one minute.

        # sar -d 60 1 <Return>

        SunOS machine0 5.8 Generic_108528-05 sun4u 10/02/02

        17:56:00 device %busy avque r+w/s blks/s avwait avserv
        17:57:00 nfs1 0 0.0 0 0 0.0 0.0
        sd0 2 0.3 2 37 0.0 145.5
        sd0,a 1 0.1 0 5 0.0 301.4
        sd0,b 0 0.0 0 9 0.0 31.5
        sd0,c 0 0.0 0 0 0.0 0.0
        sd0,d 1 0.0 0 4 0.0 126.8
        sd0,e 0 0.0 0 0 0.0 0.0
        sd0,f 1 0.1 1 6 0.0 120.9
        sd0,g 1 0.1 1 14 0.0 111.7
        :
        sd20 0 0.0 0 0 0.0 0.0
        sd20,a 0 0.0 0 0 0.0 0.0
        sd20,c 0 0.0 0 0 0.0 0.0
        sd20,g 0 0.0 0 0 0.0 0.0
        :
        st82 0 0.0 0 0 0.0 0.0
        :
      3. Stop (deactivate) applications. See the manuals of each software for details.

      [ If the device is a disk unit (swap device) ]

      1. Show a list of swap devices, and check that the disk of the logical path determined in procedure 3.b. (c2t3d0) is a swap device.
        # swap -l <Return>
        swapfile dev swaplo blocks free
        /dev/dsk/c2t3d0s4 32,164 16 788384 683680
      2. Delete the swap device.
        # swap -d /dev/dsk/c2t3d0s4 <Return>
  4. Disconnect the PCI card.

    Disconnect the target PCI card with the following procedure.

    1. By using inst2comp(1M) command, determine the slot position of the PCI card called an attachment point identifier ("Ap_Id") from the interface name of the path connecting the target PCI card and I/O devices determined in procedure 2.a. (glm2).

      Please refer to "PCI Hot Plug User's Guide" about the detail of inst2comp(1M) command. In this example, the "Ap_Id" is "pcipsy3:C0M00-PCI#slot02"

      # /usr/sbin/FJSVmadm/inst2comp glm2 <Return>
      pcipsy3:C0M00-PCI#slot02
    2. Specify "Ap_Id" determined from 4.a. as a parameter and confirm that the slot status of the PCI card to disconnect is "connected configured."
      # cfgadm pcipsy3:C0M00-PCI#slot02 <Return>
      Ap_Id Type Receptacle Occupant Condition
      pcipsy3:C0M00-PCI#slot02 mult/hp connected configured ok
    3. After executing the command to disconnect the PCI card specifying the "Ap_Id" from 4.a., confirm that the slot status has changed to "disconnected unconfigured."
      # cfgadm -c disconnect pcipsy3:C0M00-PCI#slot02 <Return>
      # cfgadm pcipsy3:C0M00-PCI#slot02 <Return>
      Ap_Id Type Receptacle Occupant Condition
      pcipsy3:C0M00-PCI#slot02 unknown disconnected unconfigured unknown

      Note:

      When error occurs during disconnect, cfgadm command unusually fail with following message. If cfgadm command fails, execute the command once again.

      cfgadm: Component system is busy, try again: disconnect failed

    4. To confirm the slot position at replacement operation, blink the ALARM LED of Ap_Id displayed in procedure 4.a.
      # cfgadm -x led=fault,mode=blink pcipsy3:C0M00-PCI#slot02 <Return>
  5. Replace the PCI card

    Replace the PCI card disconnected in 4. with a replacement card and connect cable to devices. This operation is performed by our customer support.

    When exchanging Fibre Channel cards, the following operations are also required.

    [ for PCI Fibre Channel(PW008FC3U/PW008FC2U/ GP7B8FC1U)]:

    [ for Fibre Channel Card (PW028FC3*/PW028FC4*/PW028FC5*)]:

    To replace PCI cards with the following configurations,

    Fibre Channel switch and disk array device need to be reconfigured individually.

    For details, see the document of each product.

    To perform the above reconfiguration, the WWPN(a 16-digit number) of the replacement card is needed. The WWPN of the PCI card can be known from the twelve characters shown on a label on the back of the card. These characters represent the bottom twelve digit of the WWPN in hexadecimal form. The top four digit are fixed to 1000 in hexadecimal form.

    For example, if the following label is shown on the back of the card, the WWPN of the replacement card is 10000000c9366037.

    IEEE:0000c9366037

    Note:

    When changing Affinity configuration on SN200 series or other Fibre Channel switch, I/O to other devices is effected by the change, and may result in temporal errors.

    I/O to disk array devices recovers normally because of retry processes, but on Fibre Channel tape devices, backup processes may end in errors. Stop backup before changing Affinity configuration

    .

  6. Connect the PCI card

    Connect the replaced PCI card using the cfgadm(1M) command with the configure option, or by pushing the button corresponding to the replacement slot position. Note that the push button is only effective in multiuser mode. After the new PCI card is connected, use the cfgadm(1M) command and confirm that the slot status has changed to "connected configured."

    If a large-scale configuration of I/O devices is connected to the PCI card in the target slot, command execution for status confirmation may take time.

    # cfgadm -c configure pcipsy3:C0M00-PCI#slot02 <Return>
    # cfgadm pcipsy3:C0M00-PCI#slot02 <Return>
    Ap_Id Type Receptacle Occupant Condition
    pcipsy3:C0M00-PCI#slot02 mult/hp connected configured ok

    When exchanging Fibre Channel cards, the following operations are also required.

    [ for PCI Fibre Channel(PW008FC3U/PW008FC2U/ GP7B8FC1U)]:

    [ for Fibre Channel Card (PW028FC3*/PW028FC4*/PW028FC5*)]:

    No procedure is necessary. Go to step 8.

  7. Start applications.

    Restart the stopped applications with the following operations.

    1. Start vold
      # sh /etc/init.d/volmgt start <Return>
    2. Restart the usage of the device stopped in procedure 4.b.

      [ If the device is a disk unit (file system operation) ]

      Mount the unmounted filesystem, and resume usage.

      # mount -F ufs /dev/dsk/c2t3d0s3 /export/home <Return>
      # mount -F ufs /dev/dsk/c2t3d0s0 /develop/firm <Return>
      # mount -F ufs /dev/dsk/c2t3d0s1 /develop/drv <Return>
      # mount -F ufs /dev/dsk/c2t3d0s6 /pub <Return>

      [ If the device is a disk unit (raw access operation) or a tape device ]

      Restart applications and resume usage.

      Refer to the manual of each application for details.

      [ If the device is a disk unit (swap device) ]

      Add swap device, and resume usage.

      # /sbin/swapadd -2 <Return>
  8. If you use the Fibre Channel Card (PW028FC3*/PW028FC4*/PW028FC5*), execute the following.

    The daemons will be started.

    # /etc/rc2.d/S99ElxRMSrv start <Return>
    # /etc/rc2.d/S99ElxDiscSrv start <Return>
  9. Update hardware configuration information of machine administration/ Start the hardware monitoring daemon.

    Execute the following commands to update hardware configuration information of machine administration and to restart the hardware monitoring daemon.

    # /usr/sbin/FJSVmadm/postphp <Return>

Contents PreviousNext

All Rights Reserved, Copyright (C) FUJITSU LIMITED 2005