Re: [PATCH 2/3] blkio: Add io controller stats like
From: Vivek Goyal
Date: Mon Apr 05 2010 - 10:46:17 EST
On Fri, Apr 02, 2010 at 01:53:30PM -0700, Divyesh Shah wrote:
[..]
> >> +#define GET_STAT_INDEXED(__VAR) \
> >> +uint64_t get_##__VAR##_stat(struct blkio_group *blkg, int type) \
> >> +{ \
> >> + return blkg->stats.__VAR[type]; \
> >> +} \
> >> +
> >> +GET_STAT_INDEXED(io_service_bytes);
> >> +GET_STAT_INDEXED(io_serviced);
> >> +GET_STAT_INDEXED(io_service_time);
> >> +GET_STAT_INDEXED(io_wait_time);
> >> +#undef GET_STAT_INDEXED
> >> +
> >> +#define GET_STAT(__VAR, __CONV) \
> >> +uint64_t get_##__VAR##_stat(struct blkio_group *blkg, int dummy) \
> >> +{ \
> >> + uint64_t data = blkg->stats.__VAR; \
> >> + if (__CONV) \
> >> + data = (uint64_t)jiffies_to_msecs(data) * NSEC_PER_MSEC;\
> >> + return data; \
> >> +}
> >> +
> >> +GET_STAT(time, 1);
> >> +GET_STAT(sectors, 0);
> >> +#ifdef CONFIG_DEBUG_BLK_CGROUP
> >> +GET_STAT(dequeue, 0);
> >> +#endif
> >> +#undef GET_STAT
> >> +
> >
> > I think now so many macros to define different kind of functions is
> > becoming really confusing. How about creating a two dimensional stat
> > array which is indexed by type of io stat like "io_service_time" and
> > then indexed by sub io type that is "READ, WRITE, SYNC" etc. And we
> > can define a single function to read all kind of stats.
> >
> > That way we will not have to define one GET_STAT_INDEXED() and GET_STAT()
> > for each unindexed variable. If there are not multiple functions then,
> > then everywhere we don't have to pass around a function pointer to read
> > the variable stat and code should become much simpler. Example, code
> > follows.
> >
> >
> > enum stat_type {
> > BLKIO_STAT_TIME
> > BLKIO_STAT_SECTORS
> > BLKIO_STAT_WAIT_TIME
> > BLKIO_STAT_SERVICE_TIME
> > }
> >
> > enum stat_sub_type {
> > BLKIO_STAT_READ
> > BLKIO_STAT_WRITE
> > BLKIO_STAT_SYNC
> > }
> >
> > blkio_get_stat_var(enum stat_type var, enum stat_sub_type var_type) {
> >
> > if (var == BLKIO_STAT_TIME)
> > reutrn blkg->stat.time
> >
> > /* Same as above for sectors */
> >
> > return blkg->stat[var][var_type];
> > }
>
> I'm not sure adding this level of complexity (2nd dimension for stats)
> and the numerous if (..) statements (or switch) is warranted
I think we need only two conditions to check. BLKIO_STAT_TIME and
BLKIO_STAT_SECTROS. Rest of these fall into typed category and we
don't have to check for tyeps.
> when they
> only apply for the get_stat() and get_typed_stat() functions. This
> seems like a more complicated simplification :).
I think this really is simplification. This also gets rid of all the
function pointers you are passing around to differentiate between two
types of stats. Also gets rid of special macros to generate one function
each for one kind of stat. These dynamic macros make code hard to read
and understand.
Can you please try this two dimensional array for stats approach once.
I belive your code will be much easier to read.
>
> >
> > Also all these get_* function needs to be static.
> Done
> >
> >> +#define SHOW_FUNCTION_PER_GROUP(__VAR, get_stats, getvar, show_total) \
> >> static int blkiocg_##__VAR##_read(struct cgroup *cgroup, \
> >> - struct cftype *cftype, struct seq_file *m) \
> >> + struct cftype *cftype, struct cgroup_map_cb *cb) \
> >> { \
> >> struct blkio_cgroup *blkcg; \
> >> struct blkio_group *blkg; \
> >> struct hlist_node *n; \
> >> + uint64_t cgroup_total = 0; \
> >> + char disk_id[10]; \
> >> \
> >> if (!cgroup_lock_live_group(cgroup)) \
> >> return -ENODEV; \
> >> @@ -184,19 +295,32 @@ static int blkiocg_##__VAR##_read(struct cgroup *cgroup, \
> >> blkcg = cgroup_to_blkio_cgroup(cgroup); \
> >> rcu_read_lock(); \
> >> hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node) {\
> >> - if (blkg->dev) \
> >> - seq_printf(m, "%u:%u %lu\n", MAJOR(blkg->dev), \
> >> - MINOR(blkg->dev), blkg->__VAR); \
> >> + if (blkg->dev) { \
> >> + spin_lock_irq(&blkg->stats_lock); \
> >> + snprintf(disk_id, 10, "%u:%u", MAJOR(blkg->dev),\
> >> + MINOR(blkg->dev)); \
> >> + cgroup_total += get_stats(blkg, cb, getvar, \
> >> + disk_id); \
> >> + spin_unlock_irq(&blkg->stats_lock); \
> >> + } \
> >> } \
> >> + if (show_total) \
> >> + cb->fill(cb, "Total", cgroup_total); \
> >> rcu_read_unlock(); \
> >> cgroup_unlock(); \
> >> return 0; \
> >> }
> >>
> >> -SHOW_FUNCTION_PER_GROUP(time);
> >> -SHOW_FUNCTION_PER_GROUP(sectors);
> >> +SHOW_FUNCTION_PER_GROUP(time, get_stat, get_time_stat, 0);
> >> +SHOW_FUNCTION_PER_GROUP(sectors, get_stat, get_sectors_stat, 0);
> >> +SHOW_FUNCTION_PER_GROUP(io_service_bytes, get_typed_stat,
> >> + get_io_service_bytes_stat, 1);
> >> +SHOW_FUNCTION_PER_GROUP(io_serviced, get_typed_stat, get_io_serviced_stat, 1);
> >> +SHOW_FUNCTION_PER_GROUP(io_service_time, get_typed_stat,
> >> + get_io_service_time_stat, 1);
> >> +SHOW_FUNCTION_PER_GROUP(io_wait_time, get_typed_stat, get_io_wait_time_stat, 1);
> >> #ifdef CONFIG_DEBUG_BLK_CGROUP
> >> -SHOW_FUNCTION_PER_GROUP(dequeue);
> >> +SHOW_FUNCTION_PER_GROUP(dequeue, get_stat, get_dequeue_stat, 0);
> >> #endif
> >> #undef SHOW_FUNCTION_PER_GROUP
> >>
> >> @@ -204,7 +328,7 @@ SHOW_FUNCTION_PER_GROUP(dequeue);
> >> void blkiocg_update_blkio_group_dequeue_stats(struct blkio_group *blkg,
> >> unsigned long dequeue)
> >> {
> >> - blkg->dequeue += dequeue;
> >> + blkg->stats.dequeue += dequeue;
> >> }
> >> EXPORT_SYMBOL_GPL(blkiocg_update_blkio_group_dequeue_stats);
> >> #endif
> >> @@ -217,16 +341,38 @@ struct cftype blkio_files[] = {
> >> },
> >> {
> >> .name = "time",
> >> - .read_seq_string = blkiocg_time_read,
> >> + .read_map = blkiocg_time_read,
> >> + .write_u64 = blkiocg_reset_write,
> >> },
> >> {
> >> .name = "sectors",
> >> - .read_seq_string = blkiocg_sectors_read,
> >> + .read_map = blkiocg_sectors_read,
> >> + .write_u64 = blkiocg_reset_write,
> >> + },
> >> + {
> >> + .name = "io_service_bytes",
> >> + .read_map = blkiocg_io_service_bytes_read,
> >> + .write_u64 = blkiocg_reset_write,
> >> + },
> >> + {
> >> + .name = "io_serviced",
> >> + .read_map = blkiocg_io_serviced_read,
> >> + .write_u64 = blkiocg_reset_write,
> >> + },
> >> + {
> >> + .name = "io_service_time",
> >> + .read_map = blkiocg_io_service_time_read,
> >> + .write_u64 = blkiocg_reset_write,
> >> + },
> >> + {
> >> + .name = "io_wait_time",
> >> + .read_map = blkiocg_io_wait_time_read,
> >> + .write_u64 = blkiocg_reset_write,
> >> },
> >
> > blkiocg_reset_write() is invoked if a user does some write on any of the
> > above files. This will reset all the stats. So if I do echo x >
> > blkio.time, I will see all other files being reset? I think this is
> > little counter intutive.
> >
> > Either we need to reset only those specific stats (the file being written
> > to) or may be we can provide a separate file "blkio.reset_stats" to reset
> > all the stats?
>
> I added a note in the Documentation file that writing an int to any of
> the stats should reset all. I don't get the point of introducing
> another file for this and would want stats to be cleared all together
> in order to have say io_service_time and io_serviced in sync, i.e.,
> the io_service_time should always be the total service time for the
> IOs accounted for in io_serviced.
>
I agree that reset should mean reset of all stats. It is a matter of
whether to introduce a separate file or write to one file resets all the
other files. Though I find writting to one file resetting the stats of
other file un-intitutive, at the same time don't feel too excited about
introducing a new file just for reset.
Do we really need to introduce reset interface. Changing the ioscheduler to
noop or deadline and then setting it back to cfq should reset the stats for
all the cgroups.
In fact we can probably get rid of per cgroup reset stats interface. We
will then get rid of extra stat spin lock. If somebody wants to reset
the stats, just change the ioscheduler.
This will work until and unless somebody wants to reset the stats of one
cgroup but not the other.
Thanks
Vivek
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/