Gaze data

6 replies [Last post]
oleg
oleg's picture
Offline
Joined: Sep 7 2009

In this topic we discuss the gaze data values we need to access from ET system and ways to retrieve it.

As always, I recommend to study what gaze data values are available from the current systems. The description of it is in the file posted to "API type": PDF .

And the first suggestion is also here:

adrian wrote:

    The '?' indicates that this value is allowed to be null.

    code:
    public struct GazeData
    {
    public long Time;
    public Eye Eye;
    public Gaze? LeftGaze;
    public Gaze? RightGaze;
    }


    It contains several substructures as defined below:

    public enum Eye
    {
    None,
    Left,
    Right,
    Both,
    }

    public enum Validity
    {
    None,
    Poor,
    Uncertain,
    Good,
    }

    public enum Unit
    {
    None,
    Millimeter,
    Pixel,
    Degree,
    }

    public enum PupilDataType
    {
    None,
    XYDiameters,
    EllipseDiameters,
    }

    public struct Pupil
    {
    public Unit DiameterUnit;
    public PupilDataType DiameterType;
    public double DiameterOne;
    public double DiameterTwo;
    }

    public struct Gaze
    {
    public Pupil GazePupil;
    public Validity GazeValidity;
    public Unit GazePosUnit;
    public double? GazePosOne;
    public double? GazePosTwo;
    }

    The problem is its restriction to 2D, should we replace the X,Y values with degrees and add a DegreeToXY method in the ITracker interface ?

The format is rather similar what I had in my mind. Some notes about it:
-Timeis of "long" type. I would suggest to use "double", as the integer type does not allow sub-millisecond values (what is the samping rate is 120Hz?)
-Validityhas 4 level... It is a matter of taste, but I would make it of 3 level, excluding "poor", as we have already "uncertain" to point to the data that is is here, but not reliable.
-PupilDataTypeextend with "area", and thusUnitwith "Pixel2" and "MM2".
- Remove the "Diameter" prefix inPupilstructure.

Couple of other suggestion:
- Now we specify "Time" for each sample. At what exact time samples should be timestamped? At the middle of video frame? At the time the sample is ready to be delivered to client application? I suggest using both ("Time" and "FrameTime";), this way we will know the latency between the moment a video frame was captures (thus, moment of true eye location) and the moment the sample was send to client application.
- Distance and pupil position in camera flied are often useful values too, and some other values may be available from a system. But this way we will get huge structure that often will be used just to get x/y (as Lars noted). So I would think about either some data-subscription mechanism that will use flexible data structure (how?), or specify many basic structures (only coordinates / only pupil / etc). Please suggest better ways, if such exist... Read Lars's post in the other topic and think about how to make data format and access satisfied for all, and how data layering can be described.

Adrian Voßkühler
Re: Gaze data

- Time as double is better. Right.
- For the Validity field I think we need some input from the manufacturers what their algorithms provide for this value.
- Extending pupil data with area like Oleg proposed will not serve Diameter and Area in parallel, instead we have to extend the Pupil struct:

code:

public struct Pupil
{
  public Unit DiameterUnit;
  public PupilDataType DiameterType;
  public double DiameterOne;
  public double DiameterTwo;
  public Unit AreaUnit;
  public double Area;
}

Regarding the flexibility:

What if we just provide a method to set the values we would like to retreive...

Consider the following enumeration

code:

public enum DataTypes
{
None=0,
Time=1,
FrameTime=2,
PupilXY=4,
PupilArea=8,
PupilPositionInVideo=16,
GazePosX=32,
GazePosY=64,
Minimal=97, // Time|GazePosX|GazePosY,
Validity=128,
etc....
}

When configuring the ET device a method sets the dataformat like:

code:

ETConfigure(Both, Time|PupilXY|GazePosX|GazePosY, other params);

or to get only Right Eyes Gaze XY:

code:
ETConfigure(Right, Minimal, other params);

The returned structure could stay the same GazeData, but only the values of the flags set are filled with data from the ET.

This is somehow similar to the ET_FRM command in the SMI iViewX UDP interface where you can set the values you want to receive via UDP.

----

Another way round would be to define some predefined data levels like BasicXY, XYWithPupil, Complete. Setting this level during configuration will return a different child class of the above Gaze struct. We have to change the gaze structure into an abstract class, with different implementations depending on the data level defining what data we would like to receive.

What do you think is better ?

Adrian

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Gaze data

adrian wrote:

- For the Validity field I think we need some input from the manufacturers what their algorithms provide for this value.

You never know - it depends on OEM API.

adrian wrote:

What if we just provide a method to set the values we would like to retreive...
...
The returned structure could stay the same GazeData, but only the values of the flags set are filled with data from the ET.

I thought about similar solutions. I may image that there could be an alternative way: sending gaze data and XML object. But I'm afraid, this way may be too demanding for resources. I haven't ever send XML as parameters of events...

adrian wrote:

Another way round would be to define some predefined data levels like BasicXY, XYWithPupil, Complete. Setting this level during configuration will return a different child class of the above Gaze struct. We have to change the gaze structure into an abstract class, with different implementations depending on the data level defining what data we would like to receive.

This is exactly what Lars was talking about, and I think that we should stop on this (or similar) solution, and derive this idea further. I'm still thinking about serializeable XML objects as parameters... Is it possible?

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Gaze data

One more peculiarity about gaze data standardization.

So far, we were talking about X/Y/Pupil/Validity/few_other_values, and a possibility to design several layer, like Basic/Extended/Complete... Easy to notice: it is all about low-level data only - about sample event. What about high-level data and events, that Lars was talking about. OK, we could add fixation/saccade/blink start/update/end events. But what about even higher layers and data, like gesture, selections, etc.? What about micro-saccades (some ET systems may be fast enough to detect them.

There is a huge number of values that may calculated as the outcome of video-processing procedure. If we somehow count all existing events and their values, and divide them between layers, more may come in future.

I see two solutions for this:

  • 1. Make all event and their values "declarable" (like using SMI iViewX UDP interface, or SR EyeLink library).
    • - pros: allows unlimited number of event and values
    • - cons: naming problem (the same event/value can be named differently in different APIs) and complexity of coding (especially for those who want X/Y only)
  • 2. Make some events and some their values strict (either by specifying a structure, or a list of names/identifiers), and all other "declarable".
    • - pros: still allows unlimited number of event and values, easy coding for those want X/Y only
    • - cons: naming problem remains for not-listed names;
  • 3. Make most known events and most their values strict. Other events/data will be available via custom OEM API (not standardized).
    • - pros: ?
    • - cons: ?

The last options seems to be the best, IMHO. But what events/data, other than sample that we discussed already, should be standardized? Or we stop with the sample (ETUDE 1.0), and later add standard specifications for others (ETUDE 1.x)?

Lars Hildebrandt
Re: Gaze data

oleg wrote:

What about high-level data and events, that Lars was talking about. OK, we could add fixation/saccade/blink start/update/end events.

Good comment. There is another contradiction between gaze interaction users and gaze analysis users. Gaze interaction users what to know that there is a fixations as early as possible. If there is a fixation that is 1,5 secs long they don't want to react at the end. They want to know as early as possible. And then they want to be updated (on sample basis) about the progress of the fixation as long as it goes. A fixation might change it's position slightly while it's ongoing. While gaze analysis users are interested only in the total fixation.

oleg wrote:

What about micro-saccades (some ET systems may be fast enough to detect them.

Microsaccades are very special. Those few researchers who publish about microsaccades would never trust a proprietary detection algorithm of an eye tracking manufacturer. Either they would calculate their own based on gaze data x/y or they want to make the manufacturer to publish the algorithm.

Keep in mind a standard also means that it's not open in all directions. Keep the focus on your key application(s). And make hard decision which data will be exposed. And my advice is that it wouldn't make any sense to provide microsaccades via API. Maybe a matrix with all layers of data vs. users groups would immediately answer your question which data should be exposed by the API.

Lars Hildebrandt

VP Development
www.alea-technologies.de

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Re: Gaze data

lars wrote:

Microsaccades are very special. Those few researchers who publish about microsaccades would never trust a proprietary detection algorithm of an eye tracking manufacturer.

I was expected this kind of reply... asked just to be sure the standard will not go into this area.

lars wrote:

Keep in mind a standard also means that it's not open in all directions. Keep the focus on your key application(s). And make hard decision which data will be exposed. And my advice is that it wouldn't make any sense to provide microsaccades via API. Maybe a matrix with all layers of data vs. users groups would immediately answer your question which data should be exposed by the API.

I will try to create such a matrix. But I think you were correct enough mentioning_gaze interaction users and gaze analysis users.

oleg
oleg's picture
Offline
Joined: Sep 7 2009
Temporal lock

This branch is temporally locked due to hard spamming. Please, contact administrator if you wish to post any message here.