pxa270, nBATT_FAULT and disabling the platform i2c-1 driver

Discussion:

Aleksandr Koltsoff

2009-02-27 07:57:04 UTC

Hello all,

Apologies for the long mail, the issue is somewhat complicated.

First off, the question:
Is it detrimental to continue running the PXA270 with the following
settings:
Initially PMCR==0x15, then nBATT_FAULT goes down and PMCR is at 0x3d.
(0x15==Instead of entering deep-sleep on the fault signals, cause an
interrupt, aka BIDAE+VIDAE+IAS).

The reason why I ask is that the current pxa270 platform code
(mach-pxa/pxa27x.c) always enables the power i2c bus. This causes one of
the following scenarios in our case:

a) nBATT_FAULT occurs. If PMCR is left with default values, this forces
PXA270 to enter deep-sleep immediately (we can't wake it up).

b) If we turn the faults into interrupts (via PMCR-settings), the
nBATT_FAULT condition will cause an interrupt that will trigger the
irq-handler for i2c-1 in Linux. Since none of the I2C status registers
reflect that the source was the power monitoring block, this will cause
a spurious interrupt message (scream_blue_murder). Also, since the I2C
driver doesn't reset the necessary bits in PMCR, the interrupt will stay
"on". This will lead to the irq handler launching continuously (100%)
and this will block the system from any useful work; even PINGs are dropped.

We are using a CPU board that includes a PXA270, flash and some other
peripherals. The regulator is a TI TPS65020 and connects to the
nVDD_FAULT and nBATT_FAULT signals on the PXA270 (as well as the
V-signals). Unfortunately we cannot affect the wiring nor the components
on the CPU board (we don't even have the schemas for the CPU board).
We're running off a stable current source (not a mobile platform).

We are currently using a work around the problem:
1) Set PMCR to 0x15 at the earliest moment (currently in init, but I
plan to move this to uboot at some point).
2) Disable the power i2c driver from the platform driver. This will
disable the irq handler from ever launching, even when the nBATT_FAULT
condition occurs. This also forces me to change the pxa27x.c, which I'd
rather not do, but currently it doesn't offer any way of disabling i2c-1
(other than disabling i2c completely, but we need i2c-0). Ideally, we'd
like to use i2c-1 at some point to access the TPS, but we can live
without it for now.

The way that I read the pxa270 manuals, once an fault interrupt occurs,
the system should "save state"/do whatever it needs, before entering
deep sleep "manually". Since we now do not enter deep-sleep, I'm
wondering whether dragons will fly out of our noses at some point.

The obvious solution would be fixing the regulator/pull-ups/pull-downs,
but as stated above, we can't do this (and the cpu board manufacturer
isn't too forth-coming in admitting that this is a real problem).

Thanks in advance for any suggestions/info on the issue.

Aleksandr Koltsoff

-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm
FAQ: http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette: http://www.arm.linux.org.uk/mailinglists/etiquette.php

Aleksandr Koltsoff

2009-03-03 08:36:22 UTC

Permalink

Post by Aleksandr Koltsoff
Is it detrimental to continue running the PXA270 with the following
Initially PMCR==0x15, then nBATT_FAULT goes down and PMCR is at 0x3d.
(0x15==Instead of entering deep-sleep on the fault signals, cause an
interrupt, aka BIDAE+VIDAE+IAS).

It must've been a bad morning. If one decodes the 0x3d, one sees that it
was actually nVDD_FAULT, not nBATT_FAULT. However, the effects are the
same on the pxa270, although the wiring from the TPS is somewhat
different (looking at the example schema from the TPS datasheet).

Post by Aleksandr Koltsoff
1) Set PMCR to 0x15 at the earliest moment (currently in init, but I
plan to move this to uboot at some point).
2) Disable the power i2c driver from the platform driver. This will
disable the irq handler from ever launching, even when the nBATT_FAULT
condition occurs. This also forces me to change the pxa27x.c, which I'd
rather not do, but currently it doesn't offer any way of disabling i2c-1
(other than disabling i2c completely, but we need i2c-0). Ideally, we'd
like to use i2c-1 at some point to access the TPS, but we can live
without it for now.

The nBATT_FAULT signal sounds like it might be mis-applied here or there could
be a configuration problem with the power fail and low batt sense pins of the
TPS65020. If you don't have access to the hardware and no way to fix the
problem there you could be stuffed. The default settings of the TPS65020
should be fine if you are not doing voltage scaling, so no need for power
I2C. Low batt and power fail interrupts are masked off by default so you
will not get a nVDD_FAULT assertion when the system enters low battery. If
however any of the regulators have a problem that will generate a nVDD_FAULT
signal.
The TPS65020 PWRFAIL# pin should be connected to PXA's nBATT_FAULT pin, and
voltage resistors on the 65020 should be set to assert it when the input
voltage drops below the worst case low battery voltage, usually around 2.8V
for a single lithium ion cell. If the battery voltage has got this low and
you are still running as if all systems are normal then you've got problems
elsewhere, most everything else in the system is going to start falling over
below 3.0V. An orderly shutdown should have begun after the 65020 asserted
its LOWBATT# pin. nBATT_FAULT is fatal regardless of PCMR settings in the
PXA. It either causes immediate entry into deep sleep, or an imprecise data
abort which you must install a handler for. The only purpose for the handler
is to do emergency cleanup (shutting down all peripheral power for example)
then drop the processor into sleep mode. There is no exit from this state
other than hardware reset, it is fatal. You cannot simply install an
interrupt handler and go merrily along afterwards. (Well, OK, if nBATT_FAULT
merely glitched it should be possible to recover if an abort handler is
provided, but this points to other system problems.)
With a running system, nBATT_FAULT is the last line of defense, the absolute
last thing that should ever happen as input power drains away. For a
suspended or sleeping system, nBATT_FAULT asserted prevents the system from
starting as it indicates there's not enough power available to run normally.

Thanks for the detailed response Jeff. However, my boards (which are
under constant test bench variable current loads) seem to survive the
condition just fine.

Note that the IAS bit is set in my case in the PCMR. This turns the
"imprecise data abort" into an interrupt. When IAS is set to zero
(default), the kernel will indeed abort. I'll get a nice "kernel oops
inside an irq handler", mostly from pxa_serial (seems that the serial is
most likely trigger the problem in the CPU board that we're using).

So, once we've changed IAS to 1, it won't cause an abort at all, but
instead signal an interrupt to the pxa core. Since we've disabled the
power-i2c support from the platform driver, the interrupt will stay
asserted (reflected in PMCR and it cannot be reset, even if attempt to
do it according to pxa datasheet) but will never have any effect since
the IRQ has been disabled from the IRQ controller.

So, as long as PMCR is set to 0x15 prior to the power event, and as long
as we don't enable the platform power-i2c support, the system survives
quite nicely all of the power events. The CPU operation doesn't seem to
be affected by this condition (no excess CPU load/interrupts, since
they're disabled at the irq-controller in this case).

As a side note, I'm assuming that the CPU board supplier is using a very
similar schema for the TPS as the example one in the TPS datasheet
("typical configuration for the intel bulverde processor") wrt the
PWRFAIL_SNS and LOWBAT_SNS, since we're running from one power supply
(we don't use batteries). We managed to locate the two resistors which
divide the Vcc going into these two inputs and by removing one of them,
we can reduce the frequency of the failure event. It still does occur
however, but instead of occuring within 5-15 seconds, it does this once
per week or two. Either the Vcc input to PWRFAIL_SNS or the two outputs
from the TPS are coupling with something (my guess, but can't really do
anything about it).

Granted, this is far from the situation where we'd like to be. The
question still is whether running the PXA270 in this situation is a sane
thing to do. The datasheet doesn't really say how the CPU will behave in
this case and is rather vague.

ak.

-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm
FAQ: http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette: http://www.arm.linux.org.uk/mailinglists/etiquette.php

Aleksandr Koltsoff

2009-03-04 08:43:16 UTC

Permalink

Post by Aleksandr Koltsoff

OIC, well that makes all the difference in the world then ;-) nVDD_FAULT is
recoverable if you set up PCMR as below and simply generate an interrupt on a
VDD_FAULT, but you need to handle the interrupt though. But it might be
better to simply disable the INT# output from the TPS65020 for now. See my
closing comments below.

Ah, thanks for the clarification. I re-read the schema for the
recommended TPS connection and the vddfault is indeed fed by the INT#.
To cut a long story short, masking off the INT wasn't too successful.
Even with MASK (reg 02) set to zero (other regs at defaults), we still
get the vddfault (since the i2c-1 is enabled to talk to the TPS, I get
the 'spurious irq'-loop now, but that was to be expected).

Now this is somewhat odd. OTOH I'd expect the TPS to behave badly when
the power event happens, but after resetting the PXA270 I'm dumping the
TPS registers and they still seem all intact, so the TPS registers at
least survive.

As for the power source not being beefy enough, I think I can rule that
one out since I can repeat the same effect with various lab powers. I'll
have to talk with our HW people though before drawing final conclusions.

The way the TPS65020 is set up, you should never get a nVDD_FAULT in a running
system even on a low battery condition, and it has nothing to do with the
resistor network that programs low batt and power fail settings. It looks
like you have a system problem where one of the switching supplies of the
TPS65020 is getting overloaded, or else, since you mention you are not
running off of a battery, it is a case of 'wimpy power supply syndrome'. The
TPS65020 should be getting fed by a 4.2V supply that can supply a couple amps
if it has to. Check leads and input bypassing. Also beware that this little
bugger is nice for small systems, but that it doesn't have a whole lot of
power left over once it gets done supplying VCC_CORE, VCC_MEM, and some 3.3V
or 3.0V for IO. Sounds to me like it might be getting overloaded...
From our tests we've noticed the following things that will make the

power event more frequent:
* Using the UARTs. Higher speed == more frequent power events. More
simultanous UARTs used == more frequent power events.
* Using the SD and USB isn't as significant contributor to power events,
which is somewhat counter-logical, but they do have a small effect.
* Sometimes it only takes an incoming SSH connection to push the system
over the limit, but I guess this just the ethernet chip drawing some
extra juice.

By lowering the operating freq for the core and bus (using cpufreq), the
power event frequency is also lowered (we didn't change the TPS
settings in step with cpufreq, TPS was always running with default
settings, although we did try to change the FFPWM vs PWM/PFM modes, but
that didn't have any effect).

So in combination, it certainly looks like something is drawing too much
power, the odd thing being that irrespective of which power source we
use, the effects persist.

Thanks again for the answer Jeff, really appreciated. We've been trying
to track the core issue down for past months (on and off), and short of
making our own PXA270 boards (or using the horrible "disable power i2c
bus and set PMCR to 0x15) I'm running out of options. And seems that
we're the only people using Colibri that have run across the issue,
although I find it somewhat amusing that Toradex suggests setting the
PMCR to 0x15 as well because "some customers have had issues".

Best regards,

ak.

-------------------------------------------------------------------
List admin: http://lists.arm.linux.org.uk/mailman/listinfo/linux-arm
FAQ: http://www.arm.linux.org.uk/mailinglists/faq.php
Etiquette: http://www.arm.linux.org.uk/mailinglists/etiquette.php