Oct
25
2016
By abernal
Objectives
- Troubleshoot linux systems following a number of steps iteratively until solutions are found
- Check network and file integrity for possible issues
- Resolve problems when there is system boot failure
- Repair and recover corrupted filesystems
- Understand how rescue and recovery media can be used for troubleshooting
Troubleshooting Overview
- Beginner
- Experienced
- Wizard
Basic Techniques
- Characterize the problem
- Reproduce the problem
- Always try the easy things first
- Eliminate possible causes one at a time
- Change only one thing at a time, if it doesn´t fix the problem change it back
- Check the system logs
- /var/log/messages
- /var/log/secure
Things to Check "Networking"
- IP Configuration
- To see if the interface is up, and if so if it is configured use any of these
- ipconfig
- ip
- To see if the interface is up, and if so if it is configured use any of these
- Network Driver
- If the interface is not responding probably the network driver has not been loaded properly.
- Check with lsmod if the driver has been loaded as a kernel module
- Check /proc
- such as /proc/interrupts
- Check /sys
- such as /sys/class/net
- If the interface is not responding probably the network driver has not been loaded properly.
- Connectivity
- Use ping to check if other hosts can be reached out in the network
- Use traceroute to check the path of the packages in the network
- Use mtr utility to check the path in a continuous fashion
- Default gateway and routing configuration
- Use route -n and see if the routing table makes sense
- Hostname resolution
- Use dig or host on a URL and check if DNS is working properly
Things to check "File Integrity"
- rpm -V some_package
- Checks a single package
- rpm -Va
- Checks all packages on the system
- debsums options some_package
- Checks the checksum of the package
- dpkg -V
- Checks dpkg file integrity
- sudo aide --check
- Run a scan on the file system and compare them to the last scan. The aide database must be maintained after initialized
Boot Process Failures
If the systems fails to boot it could be because of one of this causes
- No bootloader screen
- Check the GRUB for misconfigurations or a corrupt boot sector
- Possibly reinstalling the grub could be one option
- Check the GRUB for misconfigurations or a corrupt boot sector
- Kernel fails to load
- Check for misconfigurations
- Check for corrupt kernel
- Check the parameters specified to run the kernel
- Boot from a rescue image
- Kernel loads but fails to mount the root filesystem
- Misconfiguration in the boot loader GRUB
- Misconfiguration in the /etc/fstab
- Not support for the root filesystem type either built into the kernel or as a module in the initramfs initial ram disk or filesystem
- Failure during the init process
- Check logs on the initialization process
- Check corrupted file systems
- Check for errors in startup scripts
- Boot into a lower level runlevel such as 3 (no graphics) or 1 (single user mode)
Filesystem Corruption and Recovery
- Check the /etc/fstab
- Check for misconfigurations
- Check for filesystem type support
- Filesystem may have a filesystem type not supported by the kernel
- Remount the /
- sudo mount -o remount, rw /
- To remount / with write permissions in order to edit /etc/fstab
- sudo mount -o remount, rw /
- Move to fsck
- use fsck to examine the mounted filesystems
- sudo mount -a
- To try and mount all filesystems, if it does not succeed completely, try to remount the ones with issues manually
Using Rescue / Recovery Media
The recue image contains these utilites
- Disk maintenance utilities
- Network Utilities
- Miscellaneous Utilities
- Logging files
Common Utilities on Recue / Recovery Media
- Utilities to
- Create partitions
- Manage RAID devices
- Manage logical volume
- Create filesystems
- fdisk
- mdadm
- pvcreate
- vgcreate
- lvcreate
- mkfs
- Network utilities
- ifconfig
- route
- traceroute
- mtr
- host
- ftp
- scp
- ssh
- Other commands
- bash
- chroot
- ps
- kill
- vi
- dd
- tar
- cpio
- gzip
- rpm
- mkdir
- ls
- cp
- mv
- rm