Topic: Debian is crashing! Pages that link to <a href="https://ozoneasylum.com/backlink?for=29886" title="Pages that link to Topic: Debian is crashing!" rel="nofollow" >Topic: Debian is crashing!\

 
Author Thread
zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-16-2008 20:53 Edit Quote

I'm running Debian on a machine and recently it has been crashing. I was never around to see it actually crash, but when I switch my KVM to it, my monitor gets no signal. I did manage to take a picture of one crash that happened about 10 minutes after bootup, however I did not see it happen, I just switched my KVM and saw the screen like this. After this last crash, I looked at the /var/log/messages to see what was there, but I don't know what to look for. I know it crashed sometime between 10:58 and 11:28 because I have an IRC bot that runs on the machine and it got disconnected at 11:28 and the log stopped at 10:58. This is what I see in /var/log/messages before the crash:

quote:
...
Jan 15 20:36:55 localhost gconfd (root-3639): starting (version 2.16.1), pid 3639 user 'root'
Jan 15 20:36:55 localhost gconfd (root-3639): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
Jan 15 20:36:55 localhost gconfd (root-3639): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1
Jan 15 20:36:55 localhost gconfd (root-3639): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
Jan 15 20:36:55 localhost gconfd (root-3639): Resolved address "xml:readonly:/var/lib/gconf/debian.defaults" to a read-only configuration source at position 3
Jan 15 20:36:55 localhost gconfd (root-3639): Resolved address "xml:readonly:/var/lib/gconf/defaults" to a read-only configuration source at position 4
Jan 15 20:44:55 localhost gconfd (root-3639): SIGHUP received, reloading all databases
Jan 15 20:44:55 localhost gconfd (root-3639): Resolved address "xml:readonly:/etc/gconf/gconf.xml.mandatory" to a read-only configuration source at position 0
Jan 15 20:44:55 localhost gconfd (root-3639): Resolved address "xml:readwrite:/root/.gconf" to a writable configuration source at position 1
Jan 15 20:44:55 localhost gconfd (root-3639): Resolved address "xml:readonly:/etc/gconf/gconf.xml.defaults" to a read-only configuration source at position 2
Jan 15 20:44:55 localhost gconfd (root-3639): Resolved address "xml:readonly:/var/lib/gconf/debian.defaults" to a read-only configuration source at position 3
Jan 15 20:44:55 localhost gconfd (root-3639): Resolved address "xml:readonly:/var/lib/gconf/defaults" to a read-only configuration source at position 4
Jan 15 20:44:55 localhost gconfd (root-3639): GConf server is not in use, shutting down.
Jan 15 20:44:55 localhost gconfd (root-3639): Exiting
Jan 15 20:58:35 localhost -- MARK --
Jan 15 21:18:35 localhost -- MARK --
Jan 15 21:38:36 localhost -- MARK --
Jan 15 21:58:36 localhost -- MARK --
Jan 15 22:18:36 localhost -- MARK --
Jan 15 22:38:36 localhost -- MARK --
Jan 15 22:58:37 localhost -- MARK --
Jan 15 23:18:37 localhost -- MARK --
Jan 15 23:38:37 localhost -- MARK --
Jan 15 23:58:37 localhost -- MARK --
Jan 16 00:18:38 localhost -- MARK --
Jan 16 00:38:38 localhost -- MARK --
Jan 16 00:58:38 localhost -- MARK --
Jan 16 01:18:38 localhost -- MARK --
Jan 16 01:38:39 localhost -- MARK --
Jan 16 01:58:39 localhost -- MARK --
Jan 16 02:18:39 localhost -- MARK --
Jan 16 02:38:39 localhost -- MARK --
Jan 16 02:58:40 localhost -- MARK --
Jan 16 03:18:40 localhost -- MARK --
Jan 16 03:38:40 localhost -- MARK --
Jan 16 03:58:41 localhost -- MARK --
Jan 16 04:18:41 localhost -- MARK --
Jan 16 04:38:41 localhost -- MARK --
Jan 16 04:58:41 localhost -- MARK --
Jan 16 05:18:42 localhost -- MARK --
Jan 16 05:38:42 localhost -- MARK --
Jan 16 05:58:42 localhost -- MARK --
Jan 16 06:18:42 localhost -- MARK --
Jan 16 06:38:43 localhost -- MARK --
Jan 16 06:58:43 localhost -- MARK --
Jan 16 07:18:43 localhost -- MARK --
Jan 16 07:36:25 localhost syslogd 1.4.1#18: restart.
Jan 16 07:58:44 localhost -- MARK --
Jan 16 08:18:44 localhost -- MARK --
Jan 16 08:38:44 localhost -- MARK --
Jan 16 08:58:45 localhost -- MARK --
Jan 16 09:18:45 localhost -- MARK --
Jan 16 09:38:45 localhost -- MARK --
Jan 16 09:58:45 localhost -- MARK --
Jan 16 10:18:46 localhost -- MARK --
Jan 16 10:38:46 localhost -- MARK --
Jan 16 10:58:46 localhost -- MARK --


This information looks similar to what I saw in the log from a few of the other crashes, so I'm assuming it has a major part in the crashes.
So, it looks like it's GConf's fault? Are there other logs I should check?

CPrompt
Maniac (V) Inmate

From: there...no..there.....
Insane since: May 2001

IP logged posted posted 01-17-2008 01:25 Edit Quote

well...after my wife saw the picture even she said..."Fatal is never a good sign is it?"

I don't think that gconf is the problem. I think it might be some bad RAM actually.
Have you done anything with the kernel lately? Compiled your own, upgraded or anything at all?

run this command from command line. You'll have to change over to root by doing "su" though

code:
cat /var/log/messages | egrep "signal|restart"



That will spit out everything in the messages that has "restart" in it.

Might just have to try to reinstall the kernel. Should, and I say "should" in quotes, be able to do apt-get --reinstall linux-image-yournumber. What kernel are you running?

Later,

C:\

(Edited by CPrompt on 01-17-2008 01:28)

reisio
Paranoid (IV) Inmate

From: Florida
Insane since: Mar 2005

IP logged posted posted 01-17-2008 02:27 Edit Quote

I won't even put gconf on a desktop, let alone a server. It keeps a stranglehold on conf files on purpose ? I'd like to smack the genius that thought that was a good idea.

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 01-17-2008 09:57 Edit Quote

yeah, there's nothing interesting in that log file.
Give us /var/log/syslog around the time of the crash, please.

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-17-2008 10:51 Edit Quote

Ok, it crashed again... rebooted at 4:05.

CPrompt: Output of command you gave:
(see next post)
I'm running Debian 4.0 r1 Etch on kernel 2.6.18-5-686. I think it was updated by the update manager. Now that I think of it, that could be around when it started crashing, not sure though.

TP: Here's what /var/log/syslog says before the logs from reboot:
(50 lines, I'd rather give too much than too little)
(see next post)

Edit: problem with linkwords, see next post



(Edited by zavaboy on 01-17-2008 10:53)

(Edited by zavaboy on 01-17-2008 10:55)

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-17-2008 10:59 Edit Quote

I couldn't disable linkwords when editing... It needs fixed apparently.
Edit: Gah! I disabled linkwords when I posted. Let me try using the code tag.

First stuff: (cat /var/log/messages | egrep "signal|restart")

code:
Jan 15 20:05:35 localhost syslogd 1.4.1#18: restart.
Jan 15 20:17:41 localhost exiting on signal 15
Jan 15 20:18:35 localhost syslogd 1.4.1#18: restart.
Jan 15 20:18:55 localhost papd[2719]: restart (2.0.3)
Jan 16 07:36:25 localhost syslogd 1.4.1#18: restart.
Jan 16 14:07:43 localhost syslogd 1.4.1#18: restart.
Jan 16 14:08:05 localhost papd[2761]: restart (2.0.3)
Jan 17 04:05:26 localhost syslogd 1.4.1#18: restart.
Jan 17 04:05:48 localhost papd[2749]: restart (2.0.3)
Jan 17 04:12:00 localhost syslogd 1.4.1#18: restart.


Second stuff: (/var/log/syslog)

code:
Jan 16 14:17:01 localhost /USR/SBIN/CRON[3527]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 16 14:27:44 localhost -- MARK --
Jan 16 14:33:01 localhost /USR/SBIN/CRON[3918]: (nobody) CMD ([ -x /usr/share/sa-exim/greylistclean ] && /usr/share/sa-exim/greylistclean)
Jan 16 14:33:02 localhost sa-exim[3919]: Removed 0 of 0 greylist tuplets in 0 seconds 
Jan 16 14:33:02 localhost sa-exim[3919]: Removed 0 of 0 greylist directories in 0 seconds 
Jan 16 14:39:01 localhost /USR/SBIN/CRON[4068]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 16 15:02:01 localhost /USR/SBIN/CRON[4631]: (root) CMD (if [ -x /usr/sbin/pg_maintenance ]; then /usr/sbin/pg_maintenance --analyze >/dev/null; fi)
Jan 16 15:09:01 localhost /USR/SBIN/CRON[4807]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 16 15:17:01 localhost /USR/SBIN/CRON[5008]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 16 15:27:45 localhost -- MARK --
Jan 16 15:33:01 localhost /USR/SBIN/CRON[5399]: (nobody) CMD ([ -x /usr/share/sa-exim/greylistclean ] && /usr/share/sa-exim/greylistclean)
Jan 16 15:33:02 localhost sa-exim[5400]: Removed 0 of 0 greylist tuplets in 0 seconds 
Jan 16 15:33:02 localhost sa-exim[5400]: Removed 0 of 0 greylist directories in 0 seconds 
Jan 16 15:39:01 localhost /USR/SBIN/CRON[5548]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 16 16:07:46 localhost -- MARK --
Jan 16 16:09:01 localhost /USR/SBIN/CRON[6284]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 16 16:17:01 localhost /USR/SBIN/CRON[6486]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 16 16:27:46 localhost -- MARK --
Jan 16 16:33:01 localhost /USR/SBIN/CRON[6877]: (nobody) CMD ([ -x /usr/share/sa-exim/greylistclean ] && /usr/share/sa-exim/greylistclean)
Jan 16 16:33:02 localhost sa-exim[6878]: Removed 0 of 0 greylist tuplets in 0 seconds 
Jan 16 16:33:02 localhost sa-exim[6878]: Removed 0 of 0 greylist directories in 0 seconds 
Jan 16 16:39:01 localhost /USR/SBIN/CRON[7026]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 16 17:07:46 localhost -- MARK --
Jan 16 17:09:01 localhost /USR/SBIN/CRON[7762]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 16 17:17:01 localhost /USR/SBIN/CRON[7964]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 16 17:27:47 localhost -- MARK --
Jan 16 17:33:01 localhost /USR/SBIN/CRON[8355]: (nobody) CMD ([ -x /usr/share/sa-exim/greylistclean ] && /usr/share/sa-exim/greylistclean)
Jan 16 17:33:02 localhost sa-exim[8356]: Removed 0 of 0 greylist tuplets in 0 seconds 
Jan 16 17:33:02 localhost sa-exim[8356]: Removed 0 of 0 greylist directories in 0 seconds 
Jan 16 17:39:01 localhost /USR/SBIN/CRON[8504]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 16 17:42:35 localhost proftpd[8597]: connect from 201.221.146.73 (201.221.146.73)
Jan 16 17:43:03 localhost proftpd[8609]: connect from 201.221.146.73 (201.221.146.73)
Jan 16 17:43:22 localhost proftpd[8618]: connect from 201.221.146.73 (201.221.146.73)
Jan 16 17:43:44 localhost proftpd[8629]: connect from 201.221.146.73 (201.221.146.73)
Jan 16 18:07:47 localhost -- MARK --
Jan 16 18:09:01 localhost /USR/SBIN/CRON[9243]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 16 18:17:01 localhost /USR/SBIN/CRON[9444]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 16 18:27:47 localhost -- MARK --
Jan 16 18:33:01 localhost /USR/SBIN/CRON[9836]: (nobody) CMD ([ -x /usr/share/sa-exim/greylistclean ] && /usr/share/sa-exim/greylistclean)
Jan 16 18:33:01 localhost sa-exim[9837]: Removed 0 of 0 greylist tuplets in 0 seconds 
Jan 16 18:33:01 localhost sa-exim[9837]: Removed 0 of 0 greylist directories in 0 seconds 
Jan 16 18:39:01 localhost /USR/SBIN/CRON[9985]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 16 19:07:48 localhost -- MARK --
Jan 16 19:09:01 localhost /USR/SBIN/CRON[10721]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 16 19:17:01 localhost /USR/SBIN/CRON[10922]: (root) CMD (   cd / && run-parts --report /etc/cron.hourly)
Jan 16 19:27:48 localhost -- MARK --
Jan 16 19:33:01 localhost /USR/SBIN/CRON[11314]: (nobody) CMD ([ -x /usr/share/sa-exim/greylistclean ] && /usr/share/sa-exim/greylistclean)
Jan 16 19:33:01 localhost sa-exim[11315]: Removed 0 of 0 greylist tuplets in 0 seconds 
Jan 16 19:33:01 localhost sa-exim[11315]: Removed 0 of 0 greylist directories in 0 seconds 
Jan 16 19:39:01 localhost /USR/SBIN/CRON[11463]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)





(Edited by zavaboy on 01-17-2008 11:00)

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 01-17-2008 11:07 Edit Quote

I'd say you syslog has 'rolled over' - it's all older than when the crash happend.
There should be a compressed syslog there as well - anything around the crash time?

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-17-2008 11:53 Edit Quote

How do you know when the crash happened, I don't. Last I knew it running was around 17:00, I came back around 4:00. The syslog has about a 8.5 hour gap between Jan 16 19:39:01 and Jan 17 04:05:26. Look:

code:
...
Jan 16 19:33:01 localhost sa-exim[11315]: Removed 0 of 0 greylist directories in 0 seconds 
Jan 16 19:39:01 localhost /USR/SBIN/CRON[11463]: (root) CMD (  [ -d /var/lib/php5 ] && find /var/lib/php5/ -type f -cmin +$(/usr/lib/php5/maxlifetime) -print0 | xargs -r -0 rm)
Jan 17 04:05:26 localhost syslogd 1.4.1#18: restart.
Jan 17 04:05:26 localhost kernel: klogd 1.4.1#18, log source = /proc/kmsg started.
...



Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 01-17-2008 12:21 Edit Quote

the crash happend just before the gap, obviously.

but apperantly your kernel didn't log any information into syslog either.
Huh.

Sure sounds like a hardware/driver problem though. Anything new on the machine?
Do the kernel panics always look the same (ie. same error message)...
What happens when you put the machine under load?
Is there any 'exotic' hardware in there? (ie. anything but board, network )

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-17-2008 12:35 Edit Quote

There is no new hardware from when I first installed Debian. I woulg guess some update screwed it up. I have been doing the same things I have been doing the past few months. I use it for backups (FTP from another server), my IRC bot (using sirc) as mentioned in my first post, and rendering (Bryce 5.5 through Wine, high quality 3300x2550 images). I update whenever it says there's new updates.

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 01-17-2008 14:04 Edit Quote

but you haven't done an upgrade that required you to reboot the machine (ie. new kernel)?

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-18-2008 04:13 Edit Quote

No, not yet.

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 01-18-2008 08:46 Edit Quote

Well, if it's been running smoothly the last few months, and now starts failing with kernel panics,
it could be a newly loaded kernel module, or it could be hardware.

Are all the fans spinning?

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-18-2008 14:24 Edit Quote

Yes, all the fans are spinning.
Googling, I found this. So, I added "nosmp" to grub and rebooted in hopes that it will fix this issue. I'll let you know if it still crashes. It's been running for at least 3 hours now.

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 01-18-2008 15:32 Edit Quote

are there actually multiple processors in that machine?

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-18-2008 17:04 Edit Quote

No. Anyway, it just crashed.
New information:
- It freezes the screen when it crashes. (Sometimes I find it blank.)
- The crash screen as I have in the picture happens DURING bootup.

Should I just reinstall?

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 01-18-2008 17:13 Edit Quote

first off, you should run a memory tester.
Then boot from cd, replace the kernel with a vanilla debian kernel,
before you reinstall the whole system.

CPrompt
Maniac (V) Inmate

From: there...no..there.....
Insane since: May 2001

IP logged posted posted 01-18-2008 17:25 Edit Quote

then you can probably look in /var/log/dmesg to see what when on during boot.

I can't remember in Debian but there might be a /var/log/boot log as well. Not "bootstrap" just boot.

Later,

C:\

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-19-2008 15:53 Edit Quote

TP: Tested, memory is good. Can you point me somewhere that clearly tells how to replace the kernel? I never did it before.

CPrompt: It appears that change I made to grub stopped the bootup problems. I rebooted several times since then with no problem. Before, it would crash during bootup every 3 or 4 bootups.

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 01-19-2008 18:45 Edit Quote

This is debian.

apt-get install kernel-package-of-your-choice

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-19-2008 20:23 Edit Quote

Bleh... Stupid question?
Anyway, I got it. So, now we wait...

Edit: It's still freezing up... so now what?



(Edited by zavaboy on 01-19-2008 20:55)

Tyberius Prime
Maniac (V) Mad Scientist with Finglongers

From: Germany
Insane since: Sep 2001

IP logged posted posted 01-20-2008 13:40 Edit Quote

now you boot a ubuntu cd.
If that still locks up, we might be inclined to believe it to
be hardware.


Alternativly, start disabling hardware one by one - starting with the
network card..

disabling smp... hm, give noacpi as a boot option a try.Maybe it's crashing trying to do
some powersaving.

so long,

->Tyberius Prime

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-23-2008 03:49 Edit Quote

I have reasons to believe that my CD drive may have a bad connection somewhere, I don't feel like confirming this right now. (I'm sleepy...)

The CD drive did not read any bootable CDs I gave it the first few times, nor did it seem to speed up very fast. Then it just started working normally again. So I put an Ubuntu CD in and it started up ok. The only thing is that "apport" crashed once the desktop loaded up. It is still running ok after several hours (6 at least) since I loaded it.

So, does the crash of apport tell you anything?

Later, perhaps tomorrow, I will check the CD drive connection and also try booting up Debian again. If Debian still crashes, it's probably not the CD drive. Now that I think of it, I think the connection to the HD is stretched a bit tight, so I suppose I should just check all connections.

zavaboy
Paranoid (IV) Inmate

From: f(x)
Insane since: Jun 2004

IP logged posted posted 01-26-2008 23:46 Edit Quote

I guess I should close this up...

I worked with TP over ICQ earlier. First he had me try running "bonnie" and see if it crashes any faster with it. It turns out it did the exact opposite. I ran bonnie and left it doing its thing, when I came back several hours later, bonnie was well finished and the machine was not frozen. I tried rebooting to see if it was permanent, but it wasn't and froze after about 15 minutes. Next, he had me try doing fsck on boot, so I did and it did it's thing and now it hasn't crashed since. I don't know why this worked, but I know it did. Now my Debian machine has been running for a few days straight.

So there it is, in the case someone finds this useful.

Edit: Oh, I forgot to say: Thanks TP, for all the help!



(Edited by zavaboy on 01-26-2008 23:47)

CPrompt
Maniac (V) Inmate

From: there...no..there.....
Insane since: May 2001

IP logged posted posted 01-27-2008 01:57 Edit Quote

fsck is to repair the linux filesystem. So...you might have just had some crud in there clogging up the works (that's pretty technical speak right there isn't it? )

i wouldn't have thought to actually run fsck. Glad it worked though

Later,

C:\



Post Reply
 
Your User Name:
Your Password:
Login Options:
 
Your Text:
Loading...
Options:


« BackwardsOnwards »

Show Forum Drop Down Menu