Discussion:
ChangeLogs in commit messages
(too old to reply)
Gary Benson
2014-08-14 08:32:31 UTC
Permalink
Raw Message
Hi all,

I recently added an extra section to the contributions checklist on
the wiki documenting the conversation on commit messages from when
we switched to git:

https://sourceware.org/gdb/wiki/ContributionChecklist#Properly_formatted_commit_messages

Andreas pointed out that different people are including the ChangeLog
entries in different ways,

1. With paths and with the date-and-author header:

gdb/
2014-07-30 Gary Benson <***@redhat.com>

* btrace.c: Include defs.h.
* common/ptid.c: Include defs.h or server.h as appropriate.
* nat/mips-linux-watch.c: Likewise.

2. With date-and-author headers but no paths:

2014-08-04 Tom Tromey <***@redhat.com>

* gdb.base/sss-bp-on-user-bp-2.exp: Match "to_resume", not
"target_resume".

3. With paths but no date-and-author headers:

gdb/ChangeLog:

* amd64-windows-tdep.c (amd64_windows_frame_decode_insns):
Add debug trace.

4. With no preamble at all:

* chew.c (print_stack_level, main): Cast result of pointer
difference to match format string.

What are people's preferences here? My preference is #1, but I could
live with #3. If we come to some kind of concensus on this I'll
update the wiki to reflect this.

Thanks,
Gary

--
http://gbenson.net/
Joel Brobecker
2014-08-14 12:52:24 UTC
Permalink
Raw Message
> 1. With paths and with the date-and-author header:
>
> gdb/
> 2014-07-30 Gary Benson <***@redhat.com>
>
> * btrace.c: Include defs.h.
> * common/ptid.c: Include defs.h or server.h as appropriate.
> * nat/mips-linux-watch.c: Likewise.
>
> 2. With date-and-author headers but no paths:
>
> 2014-08-04 Tom Tromey <***@redhat.com>
>
> * gdb.base/sss-bp-on-user-bp-2.exp: Match "to_resume", not
> "target_resume".
>
> 3. With paths but no date-and-author headers:
>
> gdb/ChangeLog:
>
> * amd64-windows-tdep.c (amd64_windows_frame_decode_insns):
> Add debug trace.
>
> 4. With no preamble at all:
>
> * chew.c (print_stack_level, main): Cast result of pointer
> difference to match format string.
>
> What are people's preferences here? My preference is #1, but I could
> live with #3. If we come to some kind of concensus on this I'll
> update the wiki to reflect this.

#3, since date and author are often redundant with the commit's
author. And even if not in the same, it's in the ChangeLog entry
that should be checked in as part of the commit. Also, I feel
like having those in the CL is an extra source of potential
issue (eg: if forgot to update the date), and revision logs
cannot be fixed once the commit has been pushed, whereas dates
in ChangeLog entries can.

But I can also live with #1. Not super keen on #2 and #4 in the sense
that it seems important to me to say which ChangeLogs are being
updated, since filenames are relative to those.

I do think we can be a little flexible without much downside,
and therefore have a set of accepted practices.

--
Joel
Gary Benson
2014-08-14 13:15:30 UTC
Permalink
Raw Message
Joel Brobecker wrote:
> > 1. With paths and with the date-and-author header:
> >
> > gdb/
> > 2014-07-30 Gary Benson <***@redhat.com>
> >
> > * btrace.c: Include defs.h.
> > * common/ptid.c: Include defs.h or server.h as appropriate.
> > * nat/mips-linux-watch.c: Likewise.
> >
> > 2. With date-and-author headers but no paths:
> >
> > 2014-08-04 Tom Tromey <***@redhat.com>
> >
> > * gdb.base/sss-bp-on-user-bp-2.exp: Match "to_resume", not
> > "target_resume".
> >
> > 3. With paths but no date-and-author headers:
> >
> > gdb/ChangeLog:
> >
> > * amd64-windows-tdep.c (amd64_windows_frame_decode_insns):
> > Add debug trace.
> >
> > 4. With no preamble at all:
> >
> > * chew.c (print_stack_level, main): Cast result of pointer
> > difference to match format string.
> >
> > What are people's preferences here? My preference is #1, but I
> > could live with #3. If we come to some kind of concensus on this
> > I'll update the wiki to reflect this.
>
> #3, since date and author are often redundant with the commit's
> author. And even if not in the same, it's in the ChangeLog entry
> that should be checked in as part of the commit. Also, I feel like
> having those in the CL is an extra source of potential issue (eg:
> if forgot to update the date), and revision logs cannot be fixed
> once the commit has been pushed, whereas dates in ChangeLog entries
> can.

My concern with omitting the author-and-date is for commits with
multiple authors, and/or multiple commits that have been squashed
with git-rebase (which uses the date and author of the commit that
was "pick"ed rather than that of other commits that were "squash"ed
or "fixup"ed into it. For example:

commit 314c6a3559393741f22fdd9836f83d9f364fbd2a
Author: Tom Tromey <***@redhat.com>
Date: Fri Jun 13 09:22:09 2014 -0600

Make gdbserver CORE_ADDR unsigned

gdbserver defines CORE_ADDR to be signed. This seems erroneous to
me; and furthermore likely to cause problems in common/, as it is
different from gdb's definition.

gdb/gdbserver/
2014-07-24 Tom Tromey <***@redhat.com>
Gary Benson <***@redhat.com>

* server.h (CORE_ADDR): Now unsigned.

This commit came from a half-finished branch Tom started back in
January. I picked it up June or so, updated/tidied it a bit and
submitted it for review. It was pushed on July 24 but the date on
the commit is June 13. I think most people use rebase eventually,
so this kind of thing is pretty common. The multiple authors
things is less common but not unique.

Thanks,
Gary

--
http://gbenson.net/
Joel Brobecker
2014-08-14 13:25:09 UTC
Permalink
Raw Message
> My concern with omitting the author-and-date is for commits with
> multiple authors, and/or multiple commits that have been squashed
> with git-rebase (which uses the date and author of the commit that
> was "pick"ed rather than that of other commits that were "squash"ed
> or "fixup"ed into it. For example:

I had that in mind, but since the author'ing info is in the ChangeLog
file, I don't see any advantage of having to maintain it in the revision
log as well. But this is again an area where I think having multiple
options isn't going to hurt us, so I think we would be fine if some
prefer it this way, while others use the author-less version.

--
Joel
Eli Zaretskii
2014-08-14 15:22:06 UTC
Permalink
Raw Message
> Date: Thu, 14 Aug 2014 14:15:30 +0100
> From: Gary Benson <***@redhat.com>
> Cc: ***@sourceware.org, Andreas Arnez <***@linux.vnet.ibm.com>
>
> My concern with omitting the author-and-date is for commits with
> multiple authors

Doesn't the --author switch provide a solution for that?
Gary Benson
2014-08-15 08:05:10 UTC
Permalink
Raw Message
Eli Zaretskii wrote:
> > Date: Thu, 14 Aug 2014 14:15:30 +0100
> > From: Gary Benson <***@redhat.com>
> > Cc: ***@sourceware.org, Andreas Arnez <***@linux.vnet.ibm.com>
> >
> > My concern with omitting the author-and-date is for commits with
> > multiple authors
>
> Doesn't the --author switch provide a solution for that?

It only allows for two people though, author and committer.
As far as I'm aware anyway! I'm sure there are commits in
there with more than two authors.

Cheers,
Gary

--
http://gbenson.net/
Eli Zaretskii
2014-08-15 08:41:21 UTC
Permalink
Raw Message
> Date: Fri, 15 Aug 2014 09:05:10 +0100
> From: Gary Benson <***@redhat.com>
> Cc: ***@adacore.com, ***@sourceware.org, ***@linux.vnet.ibm.com
>
> Eli Zaretskii wrote:
> > > Date: Thu, 14 Aug 2014 14:15:30 +0100
> > > From: Gary Benson <***@redhat.com>
> > > Cc: ***@sourceware.org, Andreas Arnez <***@linux.vnet.ibm.com>
> > >
> > > My concern with omitting the author-and-date is for commits with
> > > multiple authors
> >
> > Doesn't the --author switch provide a solution for that?
>
> It only allows for two people though, author and committer.

Can't --author be used more than once, or mention more than 1 author
in the same --author switch?
Andreas Schwab
2014-08-15 11:45:20 UTC
Permalink
Raw Message
Eli Zaretskii <***@gnu.org> writes:

> Can't --author be used more than once, or mention more than 1 author
> in the same --author switch?

Git supports only a single author per commit.

Andreas.

--
Andreas Schwab, ***@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
Mike Frysinger
2014-08-14 12:56:57 UTC
Permalink
Raw Message
On Thu 14 Aug 2014 09:32:31 Gary Benson wrote:
> What are people's preferences here? My preference is #1, but I could
> live with #3. If we come to some kind of concensus on this I'll
> update the wiki to reflect this.

(4) -- none of the above. the ChangeLog files contain the same data, so the
commit messages should contain useful details. i.e. the same content you used
when you posted to the mailing list.
-mike
Gary Benson
2014-08-14 13:12:06 UTC
Permalink
Raw Message
Mike Frysinger wrote:
> On Thu 14 Aug 2014 09:32:31 Gary Benson wrote:
> > What are people's preferences here? My preference is #1, but I
> > could live with #3. If we come to some kind of concensus on this
> > I'll update the wiki to reflect this.
>
> (4) -- none of the above. the ChangeLog files contain the same
> data, so the commit messages should contain useful details.
> i.e. the same content you used when you posted to the mailing list.

Do you mean #4 (changelog entries with no path/author-date) or are you
proposing new option #5 (no changelog in the commit message at all)?
#5 would suit me too.

Cheers,
Gary

--
http://gbenson.net/
Joel Brobecker
2014-08-14 13:29:39 UTC
Permalink
Raw Message
> Do you mean #4 (changelog entries with no path/author-date) or are you
> proposing new option #5 (no changelog in the commit message at all)?
> #5 would suit me too.

No, I strongly object to #5. The ChangeLog entries are part of
the email to be sent along with the patch to be reviewed, We had
that discussion many many years ago, and at the time, people
wanted the CL entry in the email, rather than as a diff.

Also, having the CL entry as a diff means maintaining that diff
over time, which is guaranteed to be a source of conflicts.
I know there are tools out there to deal with this specific source
of conflict, but I don't think we should require everyone to set
that up.

And finally, I find the CL entry to be useful to have when I review
revision logs.

--
Joel
Andreas Schwab
2014-08-14 13:39:29 UTC
Permalink
Raw Message
Joel Brobecker <***@adacore.com> writes:

> No, I strongly object to #5. The ChangeLog entries are part of
> the email to be sent along with the patch to be reviewed, We had
> that discussion many many years ago, and at the time, people
> wanted the CL entry in the email, rather than as a diff.

How does this have anything to do with the commit message?

Andreas.

--
Andreas Schwab, SUSE Labs, ***@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Joel Brobecker
2014-08-14 13:48:07 UTC
Permalink
Raw Message
> > No, I strongly object to #5. The ChangeLog entries are part of
> > the email to be sent along with the patch to be reviewed, We had
> > that discussion many many years ago, and at the time, people
> > wanted the CL entry in the email, rather than as a diff.
>
> How does this have anything to do with the commit message?

The email can be sent nearly verbatim using "git send-email".
This is an objective that can be achieved with zero extra work
if you consider the other reasons I gave why I think the CL entry
should be in the revision log.

--
Joel
Andreas Schwab
2014-08-14 13:57:02 UTC
Permalink
Raw Message
Joel Brobecker <***@adacore.com> writes:

>> > No, I strongly object to #5. The ChangeLog entries are part of
>> > the email to be sent along with the patch to be reviewed, We had
>> > that discussion many many years ago, and at the time, people
>> > wanted the CL entry in the email, rather than as a diff.
>>
>> How does this have anything to do with the commit message?
>
> The email can be sent nearly verbatim using "git send-email".

The use of send-email is not a requirement.

> This is an objective that can be achieved with zero extra work
> if you consider the other reasons I gave why I think the CL entry
> should be in the revision log.

That can also be achieved without the duplication in the commit message.

Andreas.

--
Andreas Schwab, SUSE Labs, ***@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Joel Brobecker
2014-08-14 14:22:01 UTC
Permalink
Raw Message
> > This is an objective that can be achieved with zero extra work
> > if you consider the other reasons I gave why I think the CL entry
> > should be in the revision log.
>
> That can also be achieved without the duplication in the commit message.

Again, I do think that having the CL entry in the commit message is
useful, and I disagree with the idea of removing it from there.
Imagine you are scanning the past changes, looking for what might
have caused a behavior change. When using "git log", having the CL
in there often helps.

--
Joel
Andreas Schwab
2014-08-14 14:45:23 UTC
Permalink
Raw Message
Joel Brobecker <***@adacore.com> writes:

> Again, I do think that having the CL entry in the commit message is
> useful, and I disagree with the idea of removing it from there.
> Imagine you are scanning the past changes, looking for what might
> have caused a behavior change. When using "git log", having the CL
> in there often helps.

It's already included. You can use git log -p to show it.

Andreas.

--
Andreas Schwab, SUSE Labs, ***@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Joel Brobecker
2014-08-14 15:01:12 UTC
Permalink
Raw Message
> > Again, I do think that having the CL entry in the commit message is
> > useful, and I disagree with the idea of removing it from there.
> > Imagine you are scanning the past changes, looking for what might
> > have caused a behavior change. When using "git log", having the CL
> > in there often helps.
>
> It's already included. You can use git log -p to show it.

But -p shows the _entire_ patch. This is a lot of clutter just
to get the ChangeLog entries, don't you think?

--
Joel
Andreas Schwab
2014-08-14 15:13:29 UTC
Permalink
Raw Message
Joel Brobecker <***@adacore.com> writes:

>> > Again, I do think that having the CL entry in the commit message is
>> > useful, and I disagree with the idea of removing it from there.
>> > Imagine you are scanning the past changes, looking for what might
>> > have caused a behavior change. When using "git log", having the CL
>> > in there often helps.
>>
>> It's already included. You can use git log -p to show it.
>
> But -p shows the _entire_ patch.

You can limit it to the ChangeLogs.

Andreas.

--
Andreas Schwab, SUSE Labs, ***@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Joel Brobecker
2014-08-14 15:22:42 UTC
Permalink
Raw Message
> >> It's already included. You can use git log -p to show it.
> >
> > But -p shows the _entire_ patch.
>
> You can limit it to the ChangeLogs.

Why am I on the bench spending all this time definding a currently
accepted position while you just attack all my explanations with
one-liners? Since you are trying to get us to change, why do _you_
make a proposal, and answer precisely how to fill the gaps that
we might find with it?

--
Joel
Andreas Schwab
2014-08-14 15:44:02 UTC
Permalink
Raw Message
Joel Brobecker <***@adacore.com> writes:

>> >> It's already included. You can use git log -p to show it.
>> >
>> > But -p shows the _entire_ patch.
>>
>> You can limit it to the ChangeLogs.
>
> Why am I on the bench spending all this time definding a currently
> accepted position while you just attack all my explanations with

Where did I attack anyone?

> one-liners? Since you are trying to get us to change, why do _you_

I don't. You were claiming that the duplication is unavoidable, I only
show you the working alternatives.

Andreas.

--
Andreas Schwab, SUSE Labs, ***@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE 1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."
Joel Brobecker
2014-08-14 15:49:44 UTC
Permalink
Raw Message
> > one-liners? Since you are trying to get us to change, why do _you_
>
> I don't. You were claiming that the duplication is unavoidable, I only
> show you the working alternatives.

The way you do it just absolutely fustrating: 1. I don't say it is
unavoidable, I say it is useful. And 2. you counter it by saying you can,
but you don't show how. If you show me how, I can then tell you whether
I agree with you or not.

--
Joel
Sergio Durigan Junior
2014-08-14 19:04:35 UTC
Permalink
Raw Message
On Thursday, August 14 2014, Joel Brobecker wrote:

> No, I strongly object to #5. The ChangeLog entries are part of
> the email to be sent along with the patch to be reviewed, We had
> that discussion many many years ago, and at the time, people
> wanted the CL entry in the email, rather than as a diff.

On Thursday, August 14 2014, Joel Brobecker wrote:

> Again, I do think that having the CL entry in the commit message is
> useful, and I disagree with the idea of removing it from there.
> Imagine you are scanning the past changes, looking for what might
> have caused a behavior change. When using "git log", having the CL
> in there often helps.

Being the maintainer of GDB for RHEL, and having to deal with many
patches to backport, I second what Joel said.

I prefer CL messages to be included in the commit message (a la option
#1 in Gary's message), and I strongly prefer to have a good explanation
of what the patch does in the message too.

Cheers,

--
Sergio
GPG key ID: 0x65FC5E36
Please send encrypted e-mail if possible
http://sergiodj.net/
Gary Benson
2014-08-15 08:48:19 UTC
Permalink
Raw Message
Joel Brobecker wrote:
> > Do you mean #4 (changelog entries with no path/author-date) or are
> > you proposing new option #5 (no changelog in the commit message at
> > all)? #5 would suit me too.
>
> No, I strongly object to #5. The ChangeLog entries are part of the
> email to be sent along with the patch to be reviewed, We had that
> discussion many many years ago, and at the time, people wanted the
> CL entry in the email, rather than as a diff.

This discussion is starting to blur the lines between "what is in the
commit message" and "what is in the gdb-patches email". Right now,
the two are the same, especially if you use git-send-email, but I
don't think git-send-email convenience is a valid reason to say that
ChangeLog entries need to be in the commit message. If we wanted a
situation where gdb-patches emails have ChangeLog entries but commit
messages do not then we could surely script that.

> And finally, I find the CL entry to be useful to have when I review
> revision logs.

Noted.

My work on GDB mostly involves _generating_ commits, so I have a
natural bias towards removing things that make that difficult (and
ChangeLogs are an easy target). I do recognise, however, that some
people's work also involves _consuming_ commits (for creating release
branches, or for distro integration) so when I hear Joel and Sergio
stating the cases for a) ChangeLog files to exist, and b) ChangeLog
entries to be included in commit messages I have to admit that their
arguments carry more weight than my "everything would be so much more
convenient for me if ChangeLogs went away". So I think we should
stop discussing the removal of ChangeLogs, in this thread anyway.

Andreas Arnez pointed out that Joel previously mentioned generating
the actual ChangeLog files from the commit messages, here:

https://sourceware.org/ml/gdb-patches/2014-01/msg00578.html

This sounds like an interesting idea, but we really would have to
standardize on a particular format, and I think #1 (path and author-
date headers) is the only option that could realistically work.
If we standardize on this now (and put some checks on the server to
weed out bad messages) then come December we'll have four months of
commit messages we can use to check whether we can correctly replicate
the ChangeLog files. And, if we can, we can consider omitting the
files from the repo and generating them as needed for tarballs.

How does this sound?

Cheers,
Gary

--
http://gbenson.net/
Joel Brobecker
2014-08-15 12:11:02 UTC
Permalink
Raw Message
> Andreas Arnez pointed out that Joel previously mentioned generating
> the actual ChangeLog files from the commit messages, here:
>
> https://sourceware.org/ml/gdb-patches/2014-01/msg00578.html
>
> This sounds like an interesting idea, but we really would have to
> standardize on a particular format, and I think #1 (path and author-
> date headers) is the only option that could realistically work.
> If we standardize on this now (and put some checks on the server to
> weed out bad messages) then come December we'll have four months of
> commit messages we can use to check whether we can correctly replicate
> the ChangeLog files. And, if we can, we can consider omitting the
> files from the repo and generating them as needed for tarballs.
>
> How does this sound?

That sounds good to me. I do agree that a server-side "update"
check is needed to verify the CL entry is present in the revision
log and in the correct format.

--
Joel
Gary Benson
2014-08-15 13:09:13 UTC
Permalink
Raw Message
Joel Brobecker wrote:
> > Andreas Arnez pointed out that Joel previously mentioned
> > generating the actual ChangeLog files from the commit messages,
> > here:
> >
> > https://sourceware.org/ml/gdb-patches/2014-01/msg00578.html
> >
> > This sounds like an interesting idea, but we really would have to
> > standardize on a particular format, and I think #1 (path and
> > author- date headers) is the only option that could realistically
> > work. If we standardize on this now (and put some checks on the
> > server to weed out bad messages) then come December we'll have
> > four months of commit messages we can use to check whether we can
> > correctly replicate the ChangeLog files. And, if we can, we can
> > consider omitting the files from the repo and generating them as
> > needed for tarballs.
> >
> > How does this sound?
>
> That sounds good to me. I do agree that a server-side "update" check
> is needed to verify the CL entry is present in the revision log and
> in the correct format.

Does anybody have any experience writing such checks? Or, does
anybody know of any project that already uses such checks? I can
look into doing it myself (it's a pre-receive hook, right?)

Cheers,
Gary

--
http://gbenson.net/
Joel Brobecker
2014-08-15 13:28:16 UTC
Permalink
Raw Message
> Does anybody have any experience writing such checks? Or, does
> anybody know of any project that already uses such checks? I can
> look into doing it myself (it's a pre-receive hook, right?)

I have loads of experience writing git hooks, but none with
this repository's implementation.

I would typically adjust the "update" hook for that, but it looks
like the "pre-receive" hook would also work.

--
Joel
Gary Benson
2014-08-15 15:02:02 UTC
Permalink
Raw Message
Joel Brobecker wrote:
> > Does anybody have any experience writing such checks? Or, does
> > anybody know of any project that already uses such checks? I can
> > look into doing it myself (it's a pre-receive hook, right?)
>
> I have loads of experience writing git hooks, but none with
> this repository's implementation.
>
> I would typically adjust the "update" hook for that, but it looks
> like the "pre-receive" hook would also work.

I've put together a quick pre-receive hook (inlined below). Each
received commit on the "master" branch that touches the "gdb"
subdirectory gets its message checked. The check itself is fairly
cursory: it splits the message using the "YYYY-MM-DD NAME <EMAIL>"
headers, checks each is preceeded by a path starting with "gdb/" and
ending with "/", and checks each is followed by more "NAME <EMAIL>"
lines, blank lines, or lines starting with tab. I don't know how
comprehensive we want to be here as the message should already have
been checked over by the reviewer.

I've never done anything server-side with git before, so there may
well be things I'm missing here. I was mainly experimenting to see
how difficult this all was :)

Cheers,
Gary

--
#!/usr/bin/env python

import re
import subprocess
import sys

GIT = "/usr/bin/git"
SHA1HASH = re.compile(r"^[0-9a-f]{40}$", re.IGNORECASE)

name_mail = r".* <[^@]+@[^>]+>$"
CL_HDR_A = re.compile(r"^\d{4}-\d{2}-\d{2} " + name_mail)
CL_HDR_B = re.compile(r"^\s+" + name_mail)

class MessageChecker:
def __init__(self, rev):
self.rev = rev

def check(self, condition, msg):
if not condition:
print "error: %s: bad commit message" % self.rev
print "error: %s" % msg
sys.exit(1)

def check_message(self, lines):
self.check(len(lines) > 2, "message is too short")
self.check(lines[1] == "", "second line should be blank")
splits = [index
for index in xrange(2, len(lines))
if CL_HDR_A.match(lines[index]) is not None]
self.check(splits, "no ChangeLog entries found")
for start, limit in zip(splits, splits[1:] + [0]):
start -= 1
limit -= 1
self.check_changelog(lines[start:limit])

def check_changelog(self, lines):
self.check(len(lines) > 2, "ChangeLog entry is too short")
path = lines.pop(0)
self.check(path.startswith("gdb/") and path.endswith("/"),
"each ChangeLog entry should be preceeded by its path")
lines.pop(0) # lines[1] has already been checked
while lines and CL_HDR_B.match(lines[0]) is not None:
lines.pop(0)
self.check(lines, "only ChangeLog headers found (no data)")
for line in lines:
self.check(not line or line.startswith("\t"),
"bad ChangeLog line %s" % repr(line))

def check_hash(hash):
assert SHA1HASH.match(hash) is not None

def git(*command):
fp = subprocess.Popen((GIT,) + command, stdout=subprocess.PIPE)
output = fp.communicate()[0]
if fp.returncode:
sys.exit(1)
return output

def git_rev_list(oldrev, newrev):
check_hash(oldrev)
check_hash(newrev)
return git("rev-list", oldrev + ".." + newrev)

def git_diff_tree(rev):
check_hash(rev)
return git("diff-tree", rev)

def git_cat_file_commit(rev):
check_hash(rev)
return git("cat-file", "commit", rev)

def commit_touches_gdb(rev):
lines = git_diff_tree(rev).rstrip().split("\n")
check = lines.pop(0)
assert check == rev
for line in lines:
if line.split()[-1] == "gdb":
return True
return False

def check_commit(rev):
lines = git_cat_file_commit(rev).rstrip().split("\n")
while lines and lines[0]:
lines.pop(0)
assert lines
lines.pop(0)
MessageChecker(rev).check_message(lines)

def process_pack(oldrev, newrev, refname):
if refname == "refs/heads/master":
revs = git_rev_list(oldrev, newrev).rstrip().split("\n")
revs.reverse()
for rev in revs:
if commit_touches_gdb(rev):
check_commit(rev)
print "%s: ok" % rev

def main():
for line in sys.stdin.xreadlines():
bits = line.rstrip().split()
assert len(bits) == 3
process_pack(*bits)

if __name__ == "__main__":
main()
Andreas Arnez
2014-08-15 15:27:33 UTC
Permalink
Raw Message
On Fri, Aug 15 2014, Gary Benson wrote:

> Joel Brobecker wrote:
>> > Does anybody have any experience writing such checks? Or, does
>> > anybody know of any project that already uses such checks? I can
>> > look into doing it myself (it's a pre-receive hook, right?)
>>
>> I have loads of experience writing git hooks, but none with
>> this repository's implementation.
>>
>> I would typically adjust the "update" hook for that, but it looks
>> like the "pre-receive" hook would also work.
>
> I've put together a quick pre-receive hook (inlined below). Each
> received commit on the "master" branch that touches the "gdb"
> subdirectory gets its message checked. The check itself is fairly
> cursory: it splits the message using the "YYYY-MM-DD NAME <EMAIL>"
> headers, checks each is preceeded by a path starting with "gdb/" and
> ending with "/", and checks each is followed by more "NAME <EMAIL>"
> lines, blank lines, or lines starting with tab. I don't know how
> comprehensive we want to be here as the message should already have
> been checked over by the reviewer.

Maybe (for now) we also want to check that the same ChangeLog entries
appear in the patch?

> I've never done anything server-side with git before, so there may
> well be things I'm missing here. I was mainly experimenting to see
> how difficult this all was :)
>
> Cheers,
> Gary
Joel Brobecker
2014-08-15 16:08:29 UTC
Permalink
Raw Message
> I've put together a quick pre-receive hook (inlined below). Each
> received commit on the "master" branch that touches the "gdb"
> subdirectory gets its message checked. The check itself is fairly
> cursory: it splits the message using the "YYYY-MM-DD NAME <EMAIL>"
> headers, checks each is preceeded by a path starting with "gdb/" and
> ending with "/", and checks each is followed by more "NAME <EMAIL>"
> lines, blank lines, or lines starting with tab. I don't know how
> comprehensive we want to be here as the message should already have
> been checked over by the reviewer.
>
> I've never done anything server-side with git before, so there may
> well be things I'm missing here. I was mainly experimenting to see
> how difficult this all was :)

Just looking quickly at the script for the typical mistakes I made
when I first started dabbling into this kind of script:

. The SHA1 is 000[...]0 when creating a new reference
(Eg: creating a new branch). But that's an invalid SHA1

. Same when deleting a reference. the "new rev" is 0.

. I would suggest we do that on all branches except branches
under a specific namespace (thinking "vendor" branches, TBD).

. When creating a new branch, you really don't want to start
checking the entire history again, so you need to find the
branchpoint and use that as the "old rev".

I would like to push again for the format without the date and author
email but the simpler gdb/ChangeLog: instead. The commit already
stores that information (author name and email, committer date),
whereas requiring the user to provide it in the rev log means that
the date needs to be maintained throughout the lifetime of the patch
until that patch gets pushed. Same thing when cherry-picking a patch
from master to a release branch.

--
Joel
Gary Benson
2014-08-18 08:31:11 UTC
Permalink
Raw Message
Joel Brobecker wrote:
> I would like to push again for the format without the date and
> author email but the simpler gdb/ChangeLog: instead. The commit
> already stores that information (author name and email, committer
> date), whereas requiring the user to provide it in the rev log
> means that the date needs to be maintained throughout the
> lifetime of the patch until that patch gets pushed. Same thing
> when cherry-picking a patch from master to a release branch.

I'm happy with this. Do you want the "ChangeLog" in the path?

Cheers,
Gary

--
http://gbenson.net/
Doug Evans
2014-08-18 14:54:04 UTC
Permalink
Raw Message
On Mon, Aug 18, 2014 at 1:31 AM, Gary Benson <***@redhat.com> wrote:
> Joel Brobecker wrote:
>> I would like to push again for the format without the date and
>> author email but the simpler gdb/ChangeLog: instead. The commit
>> already stores that information (author name and email, committer
>> date), whereas requiring the user to provide it in the rev log
>> means that the date needs to be maintained throughout the
>> lifetime of the patch until that patch gets pushed. Same thing
>> when cherry-picking a patch from master to a release branch.
>
> I'm happy with this. Do you want the "ChangeLog" in the path?

Me too.

As for ChangeLog in the path,
Do you mean prepending entries with "gdb/ChangeLog" or
"testsuite/ChangeLog" instead of just "gdb/" or "testsuite/" ?
If so, I would say skip "ChangeLog", keep just, e.g., "testsuite/".
That's the existing convention.
Joel Brobecker
2014-08-18 15:05:10 UTC
Permalink
Raw Message
> > I'm happy with this. Do you want the "ChangeLog" in the path?
>
> Me too.
>
> As for ChangeLog in the path,
> Do you mean prepending entries with "gdb/ChangeLog" or
> "testsuite/ChangeLog" instead of just "gdb/" or "testsuite/" ?
> If so, I would say skip "ChangeLog", keep just, e.g., "testsuite/".
> That's the existing convention.

FWIW, I think that the extra "ChangeLog" bring 2 benefits:
- it makes it clear we're starting a ChangeLog entry:
- it reduces the chances of a false-positive when parsing
the revision log for one or more CL entries. I found this out
while developing a parser that I use in the context of
the gdb release.

Small bonus: When you modify the root ChangeLog file, the entry
ends up looking like this...

| ChangeLog:
|
| * file: description.

... instead of ...

| <nothing?>:
|
| * file: description.

... or maybe ...

| /
|
| * file: description.

--
Joel
Gary Benson
2014-08-18 15:27:24 UTC
Permalink
Raw Message
Joel Brobecker wrote:
> > > I'm happy with this. Do you want the "ChangeLog" in the path?
> >
> > Me too.
> >
> > As for ChangeLog in the path,
> > Do you mean prepending entries with "gdb/ChangeLog" or
> > "testsuite/ChangeLog" instead of just "gdb/" or "testsuite/" ?
> > If so, I would say skip "ChangeLog", keep just, e.g., "testsuite/".
> > That's the existing convention.
>
> FWIW, I think that the extra "ChangeLog" bring 2 benefits:
> - it makes it clear we're starting a ChangeLog entry:
> - it reduces the chances of a false-positive when parsing
> the revision log for one or more CL entries. I found this out
> while developing a parser that I use in the context of
> the gdb release.
>
> Small bonus: When you modify the root ChangeLog file, the entry
> ends up looking like this...
>
> | ChangeLog:
> |
> | * file: description.

I'm happy with this. I'll wait a day or so in case somebody objects,
and if not I'll update the wiki.

Thanks,
Gary

--
http://gbenson.net/
Gary Benson
2014-08-20 12:19:49 UTC
Permalink
Raw Message
Gary Benson wrote:
> Joel Brobecker wrote:
> > > > I'm happy with this. Do you want the "ChangeLog" in the path?
<br />
<b>Fatal error</b>: Maximum execution time of 30 seconds exceeded in <b>/home/httpd/gmane/php/lib.php</b> on line <b>504</b><br />
Andreas Arnez
2014-08-14 16:21:56 UTC
Permalink
Raw Message
On Thu, Aug 14 2014, Gary Benson wrote:

> Mike Frysinger wrote:
>> On Thu 14 Aug 2014 09:32:31 Gary Benson wrote:
>> > What are people's preferences here? My preference is #1, but I
>> > could live with #3. If we come to some kind of concensus on this
>> > I'll update the wiki to reflect this.
>>
>> (4) -- none of the above. the ChangeLog files contain the same
>> data, so the commit messages should contain useful details.
>> i.e. the same content you used when you posted to the mailing list.
>
> Do you mean #4 (changelog entries with no path/author-date) or are you
> proposing new option #5 (no changelog in the commit message at all)?
> #5 would suit me too.

Since Mike obviously meant #5, I tend to agree. At least as long as we
keep ChangeLog files in git, rather than extracting them with some
automatism from the git commit messages, e.g., as suggested here:

https://sourceware.org/ml/gdb-patches/2014-01/msg00578.html

Let's assume we consistently do #5, so no ChangeLog entries in the
commit messages, but just in the ChangeLog files. Then, to see them
again in a "git log", we can still do something like this:

git log -U0 gdb/ChangeLog --since='1 week ago'

On the other hand, it's more difficult to *suppress* the existing
ChangeLogs in a "git log".
Mike Frysinger
2014-08-14 13:23:16 UTC
Permalink
Raw Message
On Thu 14 Aug 2014 08:56:57 Mike Frysinger wrote:
> On Thu 14 Aug 2014 09:32:31 Gary Benson wrote:
> > What are people's preferences here? My preference is #1, but I could
> > live with #3. If we come to some kind of concensus on this I'll
> > update the wiki to reflect this.
>
> (4) -- none of the above. the ChangeLog files contain the same data, so the
> commit messages should contain useful details. i.e. the same content you
> used when you posted to the mailing list.

erp, i thought you only had 3 options. so i mean (5) here :).
-mike
Siva Chandra
2014-08-14 13:36:36 UTC
Permalink
Raw Message
On Thu, Aug 14, 2014 at 1:32 AM, Gary Benson <***@redhat.com> wrote:
> 3. With paths but no date-and-author headers:
>
> gdb/ChangeLog:
>
> * amd64-windows-tdep.c (amd64_windows_frame_decode_insns):
> Add debug trace.

I vote for this. However, IMHO, the suffix of "ChangeLog" in the
directory name is also redundant.
Loading...