Thanks to Doug and to sieberj993 for their rapid and cogent responses! Here’s a comment and two questions in reply.

I found that a RAND Corp report, “Incorporating Student Performance Measures into Teacher Evaluation Systems,” (Steel, Hamilton, Stecher, 2010) is one source that contains just the discussion I was looking for:
from p. 6 [24 in the PDF] — “Assessments that have evidence of validity for one purpose should not be used for another purpose until there is additional validity evidence related to the latter purpose (AERA, APA, & NCME, 1999; Perie et al., 2007).”
from p. 9 [27] — “However, even most commercial tests typically have not been validated for use in evaluations of teachers’ effectiveness.”
from p. 24 [42] — “Nevertheless, as noted earlier, measurement experts often express concerns about attaching high stakes to such diagnostic assessments as the DIBELS and Gates-MacGinitie because the assessments are designed to inform rather than evaluate instruction (see, for example, AERA, APA, & NCME, 1999).

Again, please let me know if you come across more on this specific issue (no matter the viewpoint or conclusion!)

Question for Doug
What do you think about the feasibility of an online portfolio library where artifacts of student performance would be available to all stakeholders, continually updated (so providing multiple measures) and with access controlled to protect privacy yet allow for teacher effectiveness to be evaluated over time?

Question for siebertj993
Might “student enrollment numbers, program offerings, festival ratings, and numbers of concerts” be considered attributes of teacher efficacy if the school/district has only one music teacher as is often the case here in VT?

Jim (