Discussion:
v4.20-rc1: list_del corruption on thinkpad x220
(too old to reply)
Pavel Machek
2018-11-08 17:58:03 UTC
Permalink
Hi!

My machine locked hard (thinkpad x220). After reboot, I found this in
syslog:

Sounds like memory corruption..? Does not sound like easy to debug.

...otoh, it still looks like an addres, so maybe it is "just" race in
GPU drivers?

Any ideas?
Pavel

Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 >
/dev/null && debian-sa
1 1 1)
Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be
ffff8801742b8178, but
was ffffc9000192fec8
Nov 8 18:42:57 duo kernel: ------------[ cut here ]------------
Nov 8 18:42:57 duo kernel: kernel BUG at
/data/fast/l/k/lib/list_debug.c:53!
Nov 8 18:42:57 duo kernel: invalid opcode: 0000 [#1] SMP PTI
Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not
tainted 4.20.0-rc1+ #3
Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU,
BIOS 8DET74WW (1.44 ) 03
/13/2018
Nov 8 18:42:57 duo kernel: RIP:
0010:__list_del_entry_valid+0x8e/0x90
Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48
c7 c7 90 74 5e 85 e8
53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff
<0f> 0b 55 48 89 d0 48
8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48
Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS:
00210086
Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX:
ffff8801742b8178 RCX: 00000000000000
00
Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI:
ffff88019e2a53d8 RDI: ffff88019e2a53
d8
Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08:
ffff880196e2cd10 R09: 00000000000000
00
Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11:
3863656632393101 R12: ffffc9000196be
c8
Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14:
ffff8801742b8080 R15: ffffc9000192fd
d0
Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000)
GS:ffff88019e280000(0000) knlGS:000
0000000000000
Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3:
000000000581e001 CR4: 00000000000606a0
Nov 8 18:42:57 duo kernel: Call Trace:
Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330
Nov 8 18:42:57 duo kernel: kthread+0x116/0x150
Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40
Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90
Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40
Nov 8 18:42:57 duo kernel: Modules linked in:
Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]---
Nov 8 18:42:57 duo kernel: RIP:
0010:__list_del_entry_valid+0x8e/0x90
Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0
48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8
74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48
39 f2 75 19 48 8b 32 48
Nov 8 18:42:57 duo kernel: RSP: 0000:ffffc9000196be78 EFLAGS:
00210086
Nov 8 18:42:57 duo kernel: RAX: 0000000000000054 RBX:
ffff8801742b8178 RCX: 0000000000000000
Nov 8 18:42:57 duo kernel: RDX: 0000000000000000 RSI:
ffff88019e2a53d8 RDI: ffff88019e2a53d8
Nov 8 18:42:57 duo kernel: RBP: ffffc9000196be78 R08:
ffff880196e2cd10 R09: 0000000000000000
Nov 8 18:42:57 duo kernel: R10: 00000000e7684eb9 R11:
3863656632393101 R12: ffffc9000196bec8
Nov 8 18:42:57 duo kernel: R13: ffff88019707e000 R14:
ffff8801742b8080 R15: ffffc9000192fdd0
Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000)
GS:ffff88019e280000(0000) knlGS:0000000000000000
Nov 8 18:42:57 duo kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
Nov 8 18:42:57 duo kernel: CR2: 00000000ed2bf000 CR3:
000000000581e001 CR4: 00000000000606a0
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Joonas Lahtinen
2018-11-21 11:19:54 UTC
Permalink
+ Chris

Quoting Pavel Machek (2018-11-08 19:58:03)
Post by Pavel Machek
Hi!
My machine locked hard (thinkpad x220). After reboot, I found this in
Sounds like memory corruption..? Does not sound like easy to debug.
Were you doing something GPU intense when you experienced the hard hang?

And if so, have you been able to hit the issue more than once? At this
point it doesn't look like anything we've hit previously, so would be
great to have some more insight into how we could reproduce.

There's one similar for nouveau in Bugzilla, but it seems like a genuine
memory corruption (1 bit flipped):

https://bugs.freedesktop.org/show_bug.cgi?id=84880

Any extra information would be of use :)

Regards, Joonas

PS. Could you open a bug to Bugzilla, it'll help to collect the
information in one consolidated place:

https://01.org/linuxgraphics/documentation/how-report-bugs
Post by Pavel Machek
...otoh, it still looks like an addres, so maybe it is "just" race in
GPU drivers?
Any ideas?
Pavel
Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 >
/dev/null && debian-sa
1 1 1)
Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be
ffff8801742b8178, but
was ffffc9000192fec8
Nov 8 18:42:57 duo kernel: ------------[ cut here ]------------
Nov 8 18:42:57 duo kernel: kernel BUG at
/data/fast/l/k/lib/list_debug.c:53!
Nov 8 18:42:57 duo kernel: invalid opcode: 0000 [#1] SMP PTI
Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not
tainted 4.20.0-rc1+ #3
Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU,
BIOS 8DET74WW (1.44 ) 03
/13/2018
0010:__list_del_entry_valid+0x8e/0x90
Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48
c7 c7 90 74 5e 85 e8
53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff
<0f> 0b 55 48 89 d0 48
8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48
00210086
ffff8801742b8178 RCX: 00000000000000
00
ffff88019e2a53d8 RDI: ffff88019e2a53
d8
ffff880196e2cd10 R09: 00000000000000
00
3863656632393101 R12: ffffc9000196be
c8
ffff8801742b8080 R15: ffffc9000192fd
d0
Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000)
GS:ffff88019e280000(0000) knlGS:000
0000000000000
0000000080050033
000000000581e001 CR4: 00000000000606a0
Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330
Nov 8 18:42:57 duo kernel: kthread+0x116/0x150
Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40
Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90
Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40
Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]---
0010:__list_del_entry_valid+0x8e/0x90
Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0
48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8
74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48
39 f2 75 19 48 8b 32 48
00210086
ffff8801742b8178 RCX: 0000000000000000
ffff88019e2a53d8 RDI: ffff88019e2a53d8
ffff880196e2cd10 R09: 0000000000000000
3863656632393101 R12: ffffc9000196bec8
ffff8801742b8080 R15: ffffc9000192fdd0
Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000)
GS:ffff88019e280000(0000) knlGS:0000000000000000
0000000080050033
000000000581e001 CR4: 00000000000606a0
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Pavel Machek
2018-11-21 11:54:49 UTC
Permalink
Hi!
Post by Joonas Lahtinen
Post by Pavel Machek
My machine locked hard (thinkpad x220). After reboot, I found this in
Sounds like memory corruption..? Does not sound like easy to debug.
Were you doing something GPU intense when you experienced the hard hang?
And if so, have you been able to hit the issue more than once? At this
point it doesn't look like anything we've hit previously, so would be
great to have some more insight into how we could reproduce.
I seen another crash since that, but I don't think it counts at
"easily reproducible".

I may have been running flightgear at that point. That's fairly GPU intensive.
Post by Joonas Lahtinen
There's one similar for nouveau in Bugzilla, but it seems like a genuine
https://bugs.freedesktop.org/show_bug.cgi?id=84880
Any extra information would be of use :)
Regards, Joonas
PS. Could you open a bug to Bugzilla, it'll help to collect the
https://01.org/linuxgraphics/documentation/how-report-bugs
I prefer email... certainly for bugs that can't be reproduced.

Best regards,
Pavel
Post by Joonas Lahtinen
Post by Pavel Machek
Post by Pavel Machek
...otoh, it still looks like an addres, so maybe it is "just" race in
GPU drivers?
Any ideas?
Pavel
Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 >
/dev/null && debian-sa
1 1 1)
Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be
ffff8801742b8178, but
was ffffc9000192fec8
Nov 8 18:42:57 duo kernel: ------------[ cut here ]------------
Nov 8 18:42:57 duo kernel: kernel BUG at
/data/fast/l/k/lib/list_debug.c:53!
Nov 8 18:42:57 duo kernel: invalid opcode: 0000 [#1] SMP PTI
Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not
tainted 4.20.0-rc1+ #3
Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU,
BIOS 8DET74WW (1.44 ) 03
/13/2018
0010:__list_del_entry_valid+0x8e/0x90
Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48
c7 c7 90 74 5e 85 e8
53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff
<0f> 0b 55 48 89 d0 48
8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48
00210086
ffff8801742b8178 RCX: 00000000000000
00
ffff88019e2a53d8 RDI: ffff88019e2a53
d8
ffff880196e2cd10 R09: 00000000000000
00
3863656632393101 R12: ffffc9000196be
c8
ffff8801742b8080 R15: ffffc9000192fd
d0
Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000)
GS:ffff88019e280000(0000) knlGS:000
0000000000000
0000000080050033
000000000581e001 CR4: 00000000000606a0
Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330
Nov 8 18:42:57 duo kernel: kthread+0x116/0x150
Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40
Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90
Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40
Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]---
0010:__list_del_entry_valid+0x8e/0x90
Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0
48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8
74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48
39 f2 75 19 48 8b 32 48
00210086
ffff8801742b8178 RCX: 0000000000000000
ffff88019e2a53d8 RDI: ffff88019e2a53d8
ffff880196e2cd10 R09: 0000000000000000
3863656632393101 R12: ffffc9000196bec8
ffff8801742b8080 R15: ffffc9000192fdd0
Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000)
GS:ffff88019e280000(0000) knlGS:0000000000000000
0000000080050033
000000000581e001 CR4: 00000000000606a0
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Joonas Lahtinen
2018-11-23 08:17:35 UTC
Permalink
Quoting Pavel Machek (2018-11-21 13:54:49)
Post by Pavel Machek
Hi!
Post by Joonas Lahtinen
Post by Pavel Machek
My machine locked hard (thinkpad x220). After reboot, I found this in
Sounds like memory corruption..? Does not sound like easy to debug.
Were you doing something GPU intense when you experienced the hard hang?
And if so, have you been able to hit the issue more than once? At this
point it doesn't look like anything we've hit previously, so would be
great to have some more insight into how we could reproduce.
I seen another crash since that, but I don't think it counts at
"easily reproducible".
I may have been running flightgear at that point. That's fairly GPU intensive.
Post by Joonas Lahtinen
There's one similar for nouveau in Bugzilla, but it seems like a genuine
https://bugs.freedesktop.org/show_bug.cgi?id=84880
Any extra information would be of use :)
Regards, Joonas
PS. Could you open a bug to Bugzilla, it'll help to collect the
https://01.org/linuxgraphics/documentation/how-report-bugs
I prefer email... certainly for bugs that can't be reproduced.
By adding it to the Bugzilla it may be recognized by somebody else
who is experiencing a similar issue. Internet points are not deducted
for submitting bugs in good faith, even if they get closed as NOTABUG.

It sounds like you've hit the same signature twice, so it may very well
be reproducible. Does flightgear have some demo mode where you could
leave it running a heavy scene overnight?

Were you running 4.19 kernel previously, distro one or vanilla? A full
dmesg from a boot would be appreciated (from kernel where you didn't
experience issues, and from one where you do).

We actually have a well defined process and personnel to look into the
Bugzilla entries, so it'd still be helpful to have this logged to
Bugzilla.

Regards, Joonas
Post by Pavel Machek
Best regards,
Pavel
Post by Joonas Lahtinen
Post by Pavel Machek
Post by Pavel Machek
...otoh, it still looks like an addres, so maybe it is "just" race in
GPU drivers?
Any ideas?
Pavel
Nov 8 18:35:01 duo CRON[28511]: (root) CMD (command -v debian-sa1 >
/dev/null && debian-sa
1 1 1)
Nov 8 18:42:57 duo kernel: list_del corruption. prev->next should be
ffff8801742b8178, but
was ffffc9000192fec8
Nov 8 18:42:57 duo kernel: ------------[ cut here ]------------
Nov 8 18:42:57 duo kernel: kernel BUG at
/data/fast/l/k/lib/list_debug.c:53!
Nov 8 18:42:57 duo kernel: invalid opcode: 0000 [#1] SMP PTI
Nov 8 18:42:57 duo kernel: CPU: 2 PID: 1082 Comm: i915/signal:1 Not
tainted 4.20.0-rc1+ #3
Nov 8 18:42:57 duo kernel: Hardware name: LENOVO 42872WU/42872WU,
BIOS 8DET74WW (1.44 ) 03
/13/2018
0010:__list_del_entry_valid+0x8e/0x90
Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0 48
c7 c7 90 74 5e 85 e8
53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8 74 5e 85 e8 40 88 d1 ff
<0f> 0b 55 48 89 d0 48
8b 52 08 48 89 e5 48 39 f2 75 19 48 8b 32 48
00210086
ffff8801742b8178 RCX: 00000000000000
00
ffff88019e2a53d8 RDI: ffff88019e2a53
d8
ffff880196e2cd10 R09: 00000000000000
00
3863656632393101 R12: ffffc9000196be
c8
ffff8801742b8080 R15: ffffc9000192fd
d0
Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000)
GS:ffff88019e280000(0000) knlGS:000
0000000000000
0000000080050033
000000000581e001 CR4: 00000000000606a0
Nov 8 18:42:57 duo kernel: intel_breadcrumbs_signaler+0x162/0x330
Nov 8 18:42:57 duo kernel: kthread+0x116/0x150
Nov 8 18:42:57 duo kernel: ? intel_engine_wakeup+0x40/0x40
Nov 8 18:42:57 duo kernel: ? kthread_park+0x90/0x90
Nov 8 18:42:57 duo kernel: ret_from_fork+0x35/0x40
Nov 8 18:42:57 duo kernel: ---[ end trace 2f8da183a56f80f6 ]---
0010:__list_del_entry_valid+0x8e/0x90
Nov 8 18:42:57 duo kernel: Code: 66 88 d1 ff 0f 0b 48 89 fe 31 c0
48 c7 c7 90 74 5e 85 e8 53 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 c8
74 5e 85 e8 40 88 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48
39 f2 75 19 48 8b 32 48
00210086
ffff8801742b8178 RCX: 0000000000000000
ffff88019e2a53d8 RDI: ffff88019e2a53d8
ffff880196e2cd10 R09: 0000000000000000
3863656632393101 R12: ffffc9000196bec8
ffff8801742b8080 R15: ffffc9000192fdd0
Nov 8 18:42:57 duo kernel: FS: 0000000000000000(0000)
GS:ffff88019e280000(0000) knlGS:0000000000000000
0000000080050033
000000000581e001 CR4: 00000000000606a0
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Pavel Machek
2018-11-24 15:23:52 UTC
Permalink
Hi!
Post by Joonas Lahtinen
Post by Pavel Machek
Post by Joonas Lahtinen
There's one similar for nouveau in Bugzilla, but it seems like a genuine
https://bugs.freedesktop.org/show_bug.cgi?id=84880
Any extra information would be of use :)
Regards, Joonas
PS. Could you open a bug to Bugzilla, it'll help to collect the
https://01.org/linuxgraphics/documentation/how-report-bugs
I prefer email... certainly for bugs that can't be reproduced.
By adding it to the Bugzilla it may be recognized by somebody else
who is experiencing a similar issue. Internet points are not deducted
for submitting bugs in good faith, even if they get closed as
NOTABUG.
Feel free to copy from email to bugzilla :-).
Post by Joonas Lahtinen
It sounds like you've hit the same signature twice, so it may very well
be reproducible. Does flightgear have some demo mode where you could
leave it running a heavy scene overnight?
I'm not sure if it was same signature twice. I had two lockups, but
IIRC only investigated one.

Not really a demo mode. I can put plane on autopilot, but eventually
gas runs out. (And I guess window needs to be visible for test to be
effective.) I tried today, but it did not crash.

Do you have something else I could run to do the testing?
Post by Joonas Lahtinen
Were you running 4.19 kernel previously, distro one or vanilla? A full
dmesg from a boot would be appreciated (from kernel where you didn't
experience issues, and from one where you do).
Recent kernels I'm running are self-compiled.
Post by Joonas Lahtinen
We actually have a well defined process and personnel to look into the
Bugzilla entries, so it'd still be helpful to have this logged to
Bugzilla.
If I can reproduce it, it makes sense to create bugzilla entry.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Pavel Machek
2018-12-08 11:13:46 UTC
Permalink
Hi!
Post by Pavel Machek
Post by Joonas Lahtinen
Post by Pavel Machek
Post by Joonas Lahtinen
There's one similar for nouveau in Bugzilla, but it seems like a genuine
https://bugs.freedesktop.org/show_bug.cgi?id=84880
Any extra information would be of use :)
Regards, Joonas
PS. Could you open a bug to Bugzilla, it'll help to collect the
https://01.org/linuxgraphics/documentation/how-report-bugs
I prefer email... certainly for bugs that can't be reproduced.
By adding it to the Bugzilla it may be recognized by somebody else
who is experiencing a similar issue. Internet points are not deducted
for submitting bugs in good faith, even if they get closed as NOTABUG.
Well, your documentation suggests you'll deduce my internet points:

Before filing the bug, please try to reproduce your issue with the
latest kernel. Use the latest drm-tip branch from
http://cgit.freedesktop.org/drm-tip and build as instructed on our
Build Guide.

:-)
Post by Pavel Machek
Feel free to copy from email to bugzilla :-).
Hmm, so it seems it happened again today:

Dec 8 11:45:01 duo CRON[29325]: (root) CMD (command -v debian-sa1 >
/dev/null && debian-sa1 1 1)
Dec 8 11:46:42 duo
org.mate.panel.applet.MateWeatherAppletFactory[3983]:
(mateweather-applet-2:4242): GLib-CRITICAL **: Source ID 14603 was not
found
when attempting to remove it
Dec 8 11:54:59 duo kernel: list_del corruption. prev->next should be
ffff88019283ea28, but was ffff8801411a1c68
Dec 8 11:54:59 duo kernel: ------------[ cut here ]------------
Dec 8 11:54:59 duo kernel: kernel BUG at
/data/fast/l/k/lib/list_debug.c:53!
Dec 8 11:54:59 duo kernel: invalid opcode: 0000 [#1] SMP PTI
Dec 8 11:54:59 duo kernel: CPU: 1 PID: 3428 Comm: Xorg Not tainted
4.20.0-rc1+ #4
Dec 8 11:54:59 duo kernel: Hardware name: LENOVO 42872WU/42872WU,
BIOS 8DET74WW (1.44 ) 03/13/2018
Dec 8 11:54:59 duo kernel: RIP:
0010:__list_del_entry_valid+0x8e/0x90
Dec 8 11:54:59 duo kernel: Code: 16 88 d1 ff 0f 0b 48 89 fe 31 c0 48
c7 c7 08 75 5e 85 e8 03 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 40 75
5e 85 e8 f0
87 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48 39 f2 75 19 48
8b 32 48
Dec 8 11:54:59 duo kernel: RSP: 0000:ffffc90000223ac0 EFLAGS:
00213282
Dec 8 11:54:59 duo kernel: RAX: 0000000000000054 RBX:
ffff880115a07c40 RCX: 0000000000000000
Dec 8 11:54:59 duo kernel: RDX: 0000000000000000 RSI:
ffff88019e2653d8 RDI: ffff88019e2653d8
Dec 8 11:54:59 duo kernel: RBP: ffffc90000223ac0 R08:
ffff880193a2ad10 R09: 0000000000000000
Dec 8 11:54:59 duo kernel: R10: 00000000008e9088 R11:
2e6e6f6974707501 R12: ffff8801960cb240
Dec 8 11:54:59 duo kernel: R13: ffff88019283e900 R14:
ffff880115a07ec0 R15: ffff88019283ea28
Dec 8 11:54:59 duo kernel: FS: 0000000000000000(0000)
GS:ffff88019e240000(0063) knlGS:00000000f79c4880
Dec 8 11:54:59 duo kernel: CS: 0010 DS: 002b ES: 002b CR0:
0000000080050033
Dec 8 11:54:59 duo kernel: CR2: 00000000086b0df8 CR3:
00000001939f6004 CR4: 00000000000606a0
Dec 8 11:54:59 duo kernel: Call Trace:
Dec 8 11:54:59 duo kernel: i915_vma_move_to_active+0x1c3/0x510
Dec 8 11:54:59 duo kernel: ? i915_request_await_object+0xf4/0x280
Dec 8 11:54:59 duo kernel: i915_gem_do_execbuffer+0xe2f/0x10a0
Dec 8 11:54:59 duo kernel: ? find_held_lock+0x39/0xb0
Dec 8 11:54:59 duo kernel: ? kvmalloc_node+0x26/0x70
Dec 8 11:54:59 duo kernel: i915_gem_execbuffer2_ioctl+0x1b4/0x360
Dec 8 11:54:59 duo kernel: ? i915_gem_execbuffer_ioctl+0x290/0x290
Dec 8 11:54:59 duo kernel: drm_ioctl_kernel+0xaa/0xf0
Dec 8 11:54:59 duo kernel: drm_ioctl+0x323/0x3d0
Dec 8 11:54:59 duo kernel: ? i915_gem_execbuffer_ioctl+0x290/0x290
Dec 8 11:54:59 duo kernel: ? posix_ktime_get_ts+0xc/0x10
Dec 8 11:54:59 duo kernel: i915_compat_ioctl+0x37/0x40
Dec 8 11:54:59 duo kernel: __ia32_compat_sys_ioctl+0x429/0xe90
Dec 8 11:54:59 duo kernel: ? put_old_timespec32+0x9/0x10
Dec 8 11:54:59 duo kernel: ?
__ia32_compat_sys_clock_gettime+0x67/0x90
Dec 8 11:54:59 duo kernel: do_int80_syscall_32+0x50/0x100
Dec 8 11:54:59 duo kernel: entry_INT80_compat+0x7d/0x82
Dec 8 11:54:59 duo kernel: RIP: 0023:0xf7fd5c42
Dec 8 11:54:59 duo kernel: Code: 65 8b 15 04 00 00 00 8b 0e 8b 0c
ca 83 f9 ff 75 0c 89 04 24 89 f0 e8 b3 fe ff ff eb 05 8b 46 04 01 c8
83 c4 14 5b 5e c3 cd 80 <c3> 8d b6 00 00 00 00 8d bc 27 00 00 00 00
8b 1c 24 c3 8d b6 00 00
Dec 8 11:54:59 duo kernel: RSP: 002b:00000000fff1a014 EFLAGS:
00203292 ORIG_RAX: 0000000000000036
Dec 8 11:54:59 duo kernel: RAX: ffffffffffffffda RBX:
000000000000000a RCX: 0000000040406469
Dec 8 11:54:59 duo kernel: RDX: 00000000fff1a0bc RSI:
0000000000000000 RDI: 0000000040406469
Dec 8 11:54:59 duo kernel: RBP: 000000000000000a R08:
0000000000000000 R09: 0000000000000000
Dec 8 11:54:59 duo kernel: R10: 0000000000000000 R11:
0000000000000000 R12: 0000000000000000
Dec 8 11:54:59 duo kernel: R13: 0000000000000000 R14:
0000000000000000 R15: 0000000000000000
Dec 8 11:54:59 duo kernel: Modules linked in:
Dec 8 11:54:59 duo kernel: ---[ end trace 0c1e74ccc719c763 ]---
Dec 8 11:54:59 duo kernel: RIP:
0010:__list_del_entry_valid+0x8e/0x90
Dec 8 11:54:59 duo kernel: Code: 16 88 d1 ff 0f 0b 48 89 fe 31 c0
48 c7 c7 08 75 5e 85 e8 03 88 d1 ff 0f 0b 48 89 fe 31 c0 48 c7 c7 40
75 5e 85 e8 f0 87 d1 ff <0f> 0b 55 48 89 d0 48 8b 52 08 48 89 e5 48
39 f2 75 19 48 8b 32 48
Dec 8 11:54:59 duo kernel: RSP: 0000:ffffc90000223ac0 EFLAGS:
00213282
Dec 8 11:54:59 duo kernel: RAX: 0000000000000054 RBX:
ffff880115a07c40 RCX: 0000000000000000
Dec 8 11:54:59 duo kernel: RDX: 0000000000000000 RSI:
ffff88019e2653d8 RDI: ffff88019e2653d8
Dec 8 11:54:59 duo kernel: RBP: ffffc90000223ac0 R08:
ffff880193a2ad10 R09: 0000000000000000
Dec 8 11:54:59 duo kernel: R10: 00000000008e9088 R11:
2e6e6f6974707501 R12: ffff8801960cb240
Dec 8 11:54:59 duo kernel: R13: ffff88019283e900 R14:
ffff880115a07ec0 R15: ffff88019283ea28
Dec 8 11:54:59 duo kernel: FS: 0000000000000000(0000)
GS:ffff88019e240000(0063) knlGS:00000000f79c4880
Dec 8 11:54:59 duo kernel: CS: 0010 DS: 002b ES: 002b CR0:
0000000080050033
Dec 8 11:54:59 duo kernel: CR2: 00000000086b0df8 CR3:
00000001939f6004 CR4: 00000000000606a0
Dec 8 11:54:59 duo org.mate.panel.applet.WnckletFactory[3983]:
wnck-applet: Fatal IO error 11 (Resource temporarily unavailable) on
X server :0.
Dec 8 11:54:59 duo
org.mate.panel.applet.MateWeatherAppletFactory[3983]:
mateweather-applet-2: Fatal IO error 11 (Resource temporarily
unavailable) on X server :0.
Dec 8 11:55:00 duo
org.mate.panel.applet.CommandAppletFactory[3983]: command-applet:
Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
Dec 8 11:55:00 duo
org.mate.panel.applet.NotificationAreaAppletFactory[3983]:
notification-area-applet: Fatal IO error 11 (Resource temporarily
unavailable) on X server :0.
Dec 8 11:55:00 duo org.mate.panel.applet.ClockAppletFactory[3983]:
clock-applet: Fatal IO error 11 (Resource temporarily unavailable)
on X server :0.
Dec 8 11:55:01 duo CRON[30056]: (root) CMD (command -v debian-sa1 >
/dev/null && debian-sa1 1 1)
Dec 8 11:55:02 duo
org.mate.panel.applet.InhibitAppletFactory[3983]:
mate-inhibit-applet: Fatal IO error 11 (Resource temporarily
unavailable) on X server :0.
Dec 8 11:55:09 duo org.a11y.atspi.Registry[4114]: XIO: fatal IO
error 11 (Resource temporarily unavailable) on X server ":0"

Do you see high chance of this being DRM/Intel issue?
Post by Pavel Machek
Post by Joonas Lahtinen
It sounds like you've hit the same signature twice, so it may very well
be reproducible. Does flightgear have some demo mode where you could
leave it running a heavy scene overnight?
I'm not sure if it was same signature twice. I had two lockups, but
IIRC only investigated one.
So it is twice now.
Post by Pavel Machek
Not really a demo mode. I can put plane on autopilot, but eventually
gas runs out. (And I guess window needs to be visible for test to be
effective.) I tried today, but it did not crash.
Do you have something else I could run to do the testing?
This time I was not really running anything graphics heavy, except of
chromium playing youtube video.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Pavel Machek
2018-12-08 11:24:47 UTC
Permalink
Post by Pavel Machek
Hi!
Post by Joonas Lahtinen
Post by Pavel Machek
Post by Joonas Lahtinen
There's one similar for nouveau in Bugzilla, but it seems like a genuine
https://bugs.freedesktop.org/show_bug.cgi?id=84880
Any extra information would be of use :)
Regards, Joonas
PS. Could you open a bug to Bugzilla, it'll help to collect the
https://01.org/linuxgraphics/documentation/how-report-bugs
I prefer email... certainly for bugs that can't be reproduced.
By adding it to the Bugzilla it may be recognized by somebody else
who is experiencing a similar issue. Internet points are not deducted
for submitting bugs in good faith, even if they get closed as NOTABUG.
Before filing the bug, please try to reproduce your issue with the
latest kernel. Use the latest drm-tip branch from
http://cgit.freedesktop.org/drm-tip and build as instructed on our
Build Guide.
:-)
I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if
it re-appears (but it takes long time to reproduce :-().

If you think it is useful, I can try to update my machine to
linux-next.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Pavel Machek
2018-12-09 11:18:56 UTC
Permalink
Hi!

Another day, another problem... but this one is different from the
previous hang, as machine survives.

Chromium was running with youtube video playing.

[31850.666274] [drm] GPU hangs can indicate a bug anywhere in the
entire gfx stack, including userspace.
[31850.666277] [drm] Please file a _new_ bug report on
bugs.freedesktop.org against DRI -> DRM/Intel
[31850.666279] [drm] drm/i915 developers can then reassign to the
right component if it's not a kernel issue.
[31850.666282] [drm] The gpu crash dump is required to analyze gpu
hangs, so please always attach it.
[31850.666285] [drm] GPU crash dump saved to
/sys/class/drm/card0/error
[31850.666394] i915 0000:00:02.0: Resetting chip for hang on rcs0
[31850.668474] WARNING: CPU: 0 PID: 13675 at
/data/fast/l/k/include/linux/dma-fence.h:503
i915_request_skip+0x71/0x80
[31850.668478] Modules linked in:
[31850.668484] CPU: 0 PID: 13675 Comm: kworker/0:3 Not tainted
4.20.0-rc5+ #5
[31850.668487] Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET74WW
(1.44 ) 03/13/2018

Dmesg and /sys/class/drm/card0/error are attached.

Best regards,
Pavel
--
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
Joonas Lahtinen
2018-12-10 08:30:08 UTC
Permalink
Post by Pavel Machek
Hi!
Another day, another problem... but this one is different from the
previous hang, as machine survives.
Please, file a bug. It says so even in the splat...

Regards, Joonas
Post by Pavel Machek
Chromium was running with youtube video playing.
[31850.666274] [drm] GPU hangs can indicate a bug anywhere in the
entire gfx stack, including userspace.
[31850.666277] [drm] Please file a _new_ bug report on
bugs.freedesktop.org against DRI -> DRM/Intel
[31850.666279] [drm] drm/i915 developers can then reassign to the
right component if it's not a kernel issue.
[31850.666282] [drm] The gpu crash dump is required to analyze gpu
hangs, so please always attach it.
[31850.666285] [drm] GPU crash dump saved to
/sys/class/drm/card0/error
[31850.666394] i915 0000:00:02.0: Resetting chip for hang on rcs0
[31850.668474] WARNING: CPU: 0 PID: 13675 at
/data/fast/l/k/include/linux/dma-fence.h:503
i915_request_skip+0x71/0x80
[31850.668484] CPU: 0 PID: 13675 Comm: kworker/0:3 Not tainted
4.20.0-rc5+ #5
[31850.668487] Hardware name: LENOVO 42872WU/42872WU, BIOS 8DET74WW
(1.44 ) 03/13/2018
Dmesg and /sys/class/drm/card0/error are attached.
Best regards,
Pavel
--
Joonas Lahtinen
Open Source Graphics Center
Intel Corporation
Joonas Lahtinen
2018-12-10 08:28:39 UTC
Permalink
Post by Pavel Machek
Post by Pavel Machek
Hi!
Post by Joonas Lahtinen
Post by Pavel Machek
Post by Joonas Lahtinen
There's one similar for nouveau in Bugzilla, but it seems like a genuine
https://bugs.freedesktop.org/show_bug.cgi?id=84880
Any extra information would be of use :)
Regards, Joonas
PS. Could you open a bug to Bugzilla, it'll help to collect the
https://01.org/linuxgraphics/documentation/how-report-bugs
I prefer email... certainly for bugs that can't be reproduced.
By adding it to the Bugzilla it may be recognized by somebody else
who is experiencing a similar issue. Internet points are not deducted
for submitting bugs in good faith, even if they get closed as NOTABUG.
Before filing the bug, please try to reproduce your issue with the
latest kernel. Use the latest drm-tip branch from
http://cgit.freedesktop.org/drm-tip and build as instructed on our
Build Guide.
:-)
I'd prefer not to run drm-tip. I'll update to 2.6.20-rc5+ and see if
it re-appears (but it takes long time to reproduce :-().
If we can or can not reproduce the issue with drm-tip, is a very useful
datapoint for us. If we can not reproduce, it'll be possible to bisect
which commit fixed it, and backport that. On the other hand, if it's
still reproducible, we know we're not spending time on something we
already fixed, and the priority gets a bump.
Post by Pavel Machek
If you think it is useful, I can try to update my machine to
linux-next.
linux-next is closer to drm-tip, so it's better. Do you have some
specific reason for not wanting to run drm-tip (but linux-next is still
ok)?

Regards, Joonas
Post by Pavel Machek
Best regards,
Pavel
--
Joonas Lahtinen
Open Source Graphics Center
Intel Corporation
Loading...